«Urban Problems and sPatial methods VolUme 17, nUmber 1 • 2015 U.S. Department of Housing and Urban Development | Office of Policy Development and ...»
and data for this study were adopted from Morckel (2013) who predicted residential abandonment in Columbus, Ohio, using neighborhood-level factors.1 The present study includes information on 120,109 properties in 382 Columbus neighborhoods, with neighborhoods defined as census block groups. The dependent variable is whether a house was identified by city code enforcement as being physically abandoned in 2011, and the independent variables are property values, property sales or transfers, arsons, demolitions, upkeep, property age, tax delinquency, and mortgage foreclosures in 2010. These variables are measured two different ways (at the house and neighborhood levels) to again demonstrate the importance of scale. Exhibit 1 provides additional information on the variables’ data sources, measurements, and abbreviations in the forthcoming models.
NA = not applicable.
A property demolished in 2010 cannot predict abandonment in 2011.
a Notes: “H” represents house-level variables. “N” represents neighborhood-level variables.
Not all variables from Morckel (2013) were used; only those variables for which house- and neighborhood-level data were available were included.
Methods Unlike traditional regression models, multilevel models enable researchers to predict the probability of a house being abandoned in a particular neighborhood, while taking into account house and neighborhood-level characteristics. Unfortunately, “…social scientists have tended to utilize traditional individual-level statistical tools for their data, even if their data and hypotheses are multilevel in nature” (Luke, 2004: 6). Using traditional methods is problematic with nested data (houses are located within neighborhoods), because not accounting for nesting can result in data dependency and correlated residuals, ultimately biasing regression estimates (Field, 2009). Likewise, it is better for regression analyses that use nested data to take on a multilevel form like the one that follows, for which the j subscripts indicate that a different level-one model is estimated for each of the j leveltwo units (that is, neighborhoods; Luke, 2004). The example is logistic because a house is either abandoned or not.
This model differs from a traditional model in that it contains fixed effects (γs) and random effects (us). It is called a random intercepts and slopes (or mixed) model because both the level-one intercepts and slopes are allowed to randomly vary across neighborhoods and are modeled using level-two predictors (Ws). This model form was chosen because previous studies indicated that neighborhoods have different probabilities of abandonment (Morckel, 2014; Morckel, 2013), and it seems plausible that neighborhood-level effects differ by house-level characteristics.
Traditional models are created by entering all predictors into the model at one time or in blocks and removing those that are not statistically significant. But, because of the presence of random effects and potential cross-level interactions, multilevel models require a more complex modelbuilding process. The process outlined here is a step-up approach to logistic modeling, similar to the one advocated by Luke (2004). First, an empty model is created with no predictors at either level. The purpose of this model is to estimate the overall probability of abandonment for the sample and to provide information about the proportion of total variability in abandonment that is attributable to neighborhood factors. This measurement is known as the intraclass correlation coefficient, or ICC (calculated as τ00 / [τ00 + 3.29], where τ00 is the variance component of the intercept u0j, and 3.29 is the variance of the logistic distribution). If variability within neighborhoods is low, but variability between neighborhoods is high, the ICC will be high (Field, 2009).
Next, all level-one variables are entered into the model and their intercepts and slopes are allowed to vary.2 Statistically insignificant variables are removed if their variances are also not statistically significant. If the ICC is high and residual variability still exists in the intercept ( 0.05), the next step is to enter level-two variables as predictors of the level-one intercept. Doing so creates an intercept-as-outcome model, with main effects of the new level-two variables. Any variables that are not statistically significant are removed, provided they are not of interest in later crosslevel interactions. Finally, for random effects with unmodeled variability (u terms with p 0.05), level-two variables are added as predictors of these effects, creating cross-level interactions and a slopes-as-outcomes model. Once again, effects that are not statistically significant are removed if doing so reduces the deviance, with higher order terms removed first.
Other differences are notable between multilevel models and traditional regression models. Unlike traditional regression models with continuous dependent variables, parameter estimation in logistic, multilevel regression is based on principles of maximum likelihood and involves iterative estimation methods (O’Connell et al., 2008).3 In addition, instead of pseudo R2 statistics, model fit is assessed using deviance statistics (-2 log likelihood values) and information criterion values like AIC (Akaike Information Criterion). For brevity’s sake, only the deviance will be used in this article. The deviance represents how poorly a model fits the data (that is, how far it “deviates” from a perfect model). If the deviance is reduced by a competing nested model (tested using a c2 difference test), the competing model is preferred (O’Connell et al., 2008). If nested models do not statistically differ, the model with fewer parameters is preferred for parsimony reasons.
Results This section briefly demonstrates how the author arrived at the final model; it then presents the results of this model. The empty model indicates that the average probability of abandonment across all houses is 1.1 percent (e-4.469 / [1 + e-4.469] = 0.011). The ICC for the model is 0.539 (3.844 / [3.844 + 3.29]), meaning that neighborhood-level factors account for 53.9 percent of the variability in housing abandonment. The second model added the level-one independent variables, all of which were statistically significant (p 0.05 for all). The deviance was also reduced compared with the empty model ( = 256,400.161 – 252,111.064 = 4,289.097; p 0.05). Because residual variability was still in the intercept ( 0.001) and the ICC was high, level-two variables were added as predictors of the level-one intercept. This model reduced the deviance ( = 252,111.064 – 251,655.988 = 455.076; p 0.05). Arson (N_Arson) was the only neighborhood-level variable that did not predict the intercept (p = 0.289); however, it was retained to avoid specification errors with later testing of cross-level interactions. The model with arsons retained had unexplained variability in the intercept (τ00 = 0.365; p 0.05), the slope for house-level tax delinquency (pH_Tax 0.001), Because this study is exploratory and has a large sample size, the author permitted all slopes to vary. Estimation becomes more difficult with additional random effects; therefore, determining which slopes to vary should be based on the research questions and theory.
Like those of O’Connell et al. (2008), the analyses presented in this article use full penalized maximum likelihood estimation for the coefficients and Laplace estimation for the deviances. A detailed discussion of estimation methods is beyond the scope of this article.
and the slope for house-level mortgage foreclosures (pH_Mfc 0.001). Because of this remaining variability, all level-two variables were entered as predictors of the tax and mortgage foreclosures slopes (H_Tax and H_Mfc). Although this model reduced the deviance ( = 251,655.988 – 251,593.033 = 62.955; p 0.05), arsons at the house level were still not statistically significant, nor were most of the new level-two variables. Therefore, nonstatistically significant variables were removed one at a time, starting with the higher order effects and ending with neighborhood-level arsons, until the most parsimonious model was achieved.
Exhibit 2 shows the final model. It does not explain all the variability in the probability of abandonment (τ00 = 0.440; p 0.05), or all the variability in the slopes for house-level tax delinquency and mortgage foreclosures (u6 = 0.033; u7 = 0.505; p 0.05 for both). The model, however, is a significant improvement over the empty model, the model with only level-one predictors, and the intercept-as-outcome model. Thus, the model is the so-called final model because it is the best model obtainable with the present dataset. Other variables could be added in future studies to help explain remaining variability. In particular, it seems likely that the socioeconomic characteristics
of owners or residents would be relevant, because the ability to afford a property might influence the decision to abandon. Because it is difficult to obtain personal data at the house level, this article does not examine socioeconomic factors.
Model Interpretation. Because the intercept (γ00) is the expected log-odds when all the predictor variables are zero, the negative coefficient for γ00 indicates that the probability of a house being abandoned when none of the characteristics (foreclosures, tax delinquency, poor property conditions, and so on) are present is virtually zero.4 As for house-level variables, the model indicates that mortgage foreclosures are the strongest predictor of abandonment. A house that experiences a mortgage foreclosure is 13 times more likely to be abandoned than a house that does not, holding all other variables constant (Odds Ratio [abbreviated OR hereafter] = 13.627; p 0.001).
The effect of house-level mortgage foreclosures is also one of the most complex, given the presence of three cross-level interaction effects. The interactions indicate that the effect of a mortgage foreclosure is tempered by a neighborhood’s age and tax delinquency status, but it is amplified by neighborhood abandonment. More specifically, the odds of abandonment for a house that experiences mortgage foreclosure decreases by 0.7 percent for every 1-percent increase in the number of homes built before 1945 (OR = 0.993; 100 [0.993 – 1] = -0.7 percent). Even when the house is in a neighborhood with all new houses, a house experiencing a mortgage foreclosure is still nearly 7 times more likely to be abandoned (-0.007 x 100 = -0.700; 2.612 – 0.700 = 1.912; e1.912 = 6.767).
A similar relationship holds between house-level mortgage foreclosures and neighborhood-level tax delinquency. The odds of abandonment for a house that experiences mortgage foreclosure decreases by 5.6 percent for every 1-percent increase in the number of homes that are tax delinquent (OR = 0.944; 100 [0.944 – 1] = -5.6 percent). This effect is opposite that of neighborhood abandonment (N_Aband), which increases the odds (OR = 1.031). If a mortgage foreclosure occurs in a neighborhood where 10 percent of the homes are abandoned, for example, the odds jump from 13 to nearly 19 times more likely to be abandoned (0.030 x 10 = 0.300; 2.612 + 0.30 = 2.912; e2.912 = 18.394).
This interpretation is true because zero has meaning for the independent variables. If a variable like square footage appeared in the model, it would have to be centered because zero has no practical meaning; a house cannot have zero square feet.
The next strongest effect is house-level tax delinquency. A house that is tax delinquent is nearly 10 times more likely to be abandoned than one that is not, holding all other variables constant (OR = 9.946). As indicated by the statistically significant interaction effects, however, this relationship is slightly tempered by neighborhood-level age (N_Age), tax delinquency (N_Tax), and neighborhood abandonment (N_Aband). All three variables have negative regression coefficients and odds ratios less than, but close to, 1 (ORN_Age = 0.997; ORN_Tax = 0.982; ORN_Aband = 0.983).
The remaining house-level effects, which do not have cross-level interactions, are as follows: a house in poor condition is 6 times more likely to be abandoned than a house that is not (OR = 6.477); an arsoned house is 5.4 times more likely to be abandoned than a house that has not been arsoned (OR = 5.388); a house that has sold or transferred in the past year is 2.3 times more likely to be abandoned than one that has not sold or transferred (OR = 2.313); a house with a value that is less than the citywide median is 1.6 times more likely to be abandoned than a house above the median (OR = 1.642); and finally, a house built before 1945 is 1.6 times more likely to be abandoned than a newer house (OR = 1.621).
Of the neighborhood-level effects, demolitions have the greatest impact on the probability of abandonment (OR = 1.276). A 1-percent increase in the number of demolitions in a neighborhood results in a 27.6-percent increase in the odds of a house being abandoned (100 [1.276 – 1] = 27.6).
The next strongest neighborhood effect is neighborhood abandonment, with a 1-percent increase resulting in a 12-percent increase in the house-level odds of abandonment (OR = 1.12). Similarly, a 1-percent increase in mortgage foreclosures increases the odds by 12 percent (OR = 1.12).
Interestingly, a 1-percent increase in the number of homes in poor condition decreases the odds by
9.6 percent (OR = 0.904; 100 [0.904 – 1] = -9.6), while a 1-percent increase in the number of homes not sold or transferred in the neighborhood increases the odds by 8.3 percent (OR = 1.083).
Finally, a 1-percent increase in the number of homes that are tax delinquent, valued at less than the median housing values, or built before 1945 increases the odds by less than 3 percent each (ORN_Tax = 1.029; ORN_Value = 1.017; ORN_Age = 1.009).