# «Moving to opportunity voluMe 14, nuMber 2 • 2012 U.S. Department of Housing and Urban Development | Office of Policy Development and Research ...»

The loans CoreLogic assigns to its subprime database are either serviced by institutions that specialize in servicing subprime loans or identified as subprime by the servicing institution. Despite the recent demise of most subprime-specializing institutions, the subprime database continues to track active subprime loan performance because the servicing of these loans has largely transferred to other institutions that contribute to the database. In contrast to CoreLogic’s more commonly used, loan-level subprime securities database, the subprime-servicing database provides information on loans retained in bank portfolios as well as those in securities.

Although adverse neighborhood effects generally are associated with properties in later stages of foreclosure and REO, we favor including all loans 60 or more days past due in our analysis of delinquency patterns, for several reasons. First, foreclosure moratoria and loan modification programs have artificially slowed the transition through foreclosure into REO, so that our measure may be a better indicator of actual “facts on the ground.” Second, our measure is somewhat forward looking, because most loans in early stages of delinquency as of the analysis date will move into later stages of foreclosure and REO, given the relatively low cure rates associated with the mortgage crisis. Third, early stages of delinquency are relevant when considering effective policy responses. Moreover, because the 60-plus-days-past-due measure is dominated by longer term delinquent loans that are in foreclosure or REO, and because neighborhoods with lower delinquency rates in general will also have higher cure rates, we would not expect classifications based on longer term delinquency to be much different from those arising from our cluster analysis.

Although CoreLogic takes steps to eliminate duplication, some duplicate reporting of loans may occur in the data obtained from Fannie Mae and Freddie Mac by the servicers of these loans. In some ZIP Codes, we observe excess counts and adjust these counts, as well.

246 Refereed Papers Geographic Patterns of Serious Mortgage Delinquency: Cross-MSA Comparisons Thus, we develop estimates of active loan counts, prime and subprime, as of October 2008, by ZIP Code. We aggregate the estimated prime and subprime delinquency rates and active loan counts to obtain estimates of overall mortgage delinquency rates by ZIP Code.

We use additional data sources to obtain explanatory variables for the regression analysis of metropolitan-area delinquency characteristics. We use 2005, 2006, and 2007 HMDA data to construct variables descriptive of the mortgage market in a metropolitan area, such as share of home purchase loans by occupancy type (owner vs. nonowner occupied). We rely on Economy.com for data describing local economic and housing market conditions from 2005 through 2008, including annual house price appreciation rates, annual changes in housing starts, affordability index, and unemployment rates by MSA.

Estimating Active Loan Counts by ZIP Code As discussed previously, we adjust the active loan counts from the CoreLogic servicing data by comparing 2005 and 2006 origination counts in the CoreLogic data with origination counts from HMDA data. Because the CoreLogic data provide the state, county, and ZIP Code associated with a mortgage, whereas HMDA data indicate the state, county, and census tract, not the ZIP Code, we first map state, county, and census tract into ZIP Code(s).11 We apply separate adjustments to prime and subprime loan counts, associating high-cost mortgages in HMDA data (those with a reported above-prime rate spread) with subprime.

Let nj denote the number of originations reported in the CoreLogic subprime servicing data, and let Nj denote the number of subprime (high-cost) originations in HMDA data, for ZIP Code j in 2005 through 2006. Our adjustment factor is then the ratio αj = nj /Nj. We multiply the 2008 active loan count in the CoreLogic subprime servicing data by αj to obtain the estimated active subprime loan count for ZIP Code j. We apply the analogous procedure to estimate active prime loan counts. ZIP Codes with fewer than 50 estimated total (prime plus subprime) active loans are excluded from the study.12 Note that this procedure assumes that the within-ZIP delinquency rates observed for subprime loans included in the CoreLogic subprime servicing data are representative of the aggregate (observed and unobserved) within-ZIP delinquency rate; we make the same assumption regarding the prime data. Likewise, this procedure assumes that the servicing databases are representative with respect to within-ZIP proportions of 2005-to-2006 originations that remain active in 2008. Although assessing the accuracy of these assumptions is not possible, the fact that we are holding constant both geographic (ZIP Code) location and risk category (prime versus subprime) provides some assurance that the observed quantities will be reasonable approximations. At the least, correcting for the undercounts is preferable to not doing so.

Where a census tract traversed more than one ZIP Code, we allocated the mortgages across the ZIP Codes in proportion to the loan counts observed in Freddie Mac internal data.

We also exclude ZIP Codes where αj is implausibly large or small. In addition, we apply consistency checks for the prime active counts using Freddie Mac internal data. For instance, if the estimated active prime loan count for a ZIP Code is less than the number of active loans in Freddie Mac data, we use the active loan count and delinquency rate from Freddie Mac data instead.

MSA Selection As defined by the Office of Management and Budget, 371 MSAs were in the United States, as of December 2006.13 To limit the scope of this study to major cities and to ensure the statistical relevance of the measures calculated at the ZIP Code level, we select the 88 MSAs with at least 50 ZIP Codes or 100,000 active mortgages in our data. We include an additional 3, marginally smaller MSAs (Knoxville, Tennessee, Boise, Idaho, and Sioux Falls, South Dakota) to achieve better geographic representation. In Appendix A, we provide the complete list of selected MSAs and the number of ZIP Codes and active mortgages in each.

Large MSAs usually contain several cities along with the suburban areas around the cities. For simplicity, we abbreviate the full name of an individual MSA in the following text by referring to the major city in the MSA. For example, we refer to the New York-Northern New Jersey-Long Island MSA as “New York.” Note that we include as part of an MSA any ZIP Codes that extend beyond the MSA boundary into adjacent non-MSA areas.

Geospatial Characterization In this article, we address how delinquent loans, as of October 2008, in individual MSAs were distributed in relation to neighborhood delinquency rate, and whether any generalized patterns emerge across MSAs. Using the ZIP Code-level data described previously, we calculate eight MSA distributional statistics to quantify the patterns in a standardized way. These distributional statistics become the basis of cross-MSA comparisons and analysis.

Note that the focus is the distribution of delinquent loans in relation to neighborhood delinquency rate, not the distribution of the overall population of mortgage borrowers, homeowners, or households in relation to neighborhood delinquency rate. Although these distributions will tend to be similar, we view the former as more relevant for policy analysis addressing the mortgage crisis. For example, the share of a city’s delinquent mortgages contained in high-delinquency neighborhoods is a more important consideration for judging the relevance of the neighborhood dimension than the share of the city’s population located in these neighborhoods.

From a policy perspective, characterizing the shape of the distribution is of interest; for example, knowing whether neighborhoods with extremely high delinquency rates comprise a long tail may be important. Initially, we attempted to fit metropolitan-area delinquency distributions to twoparameter lognormal or beta functional forms. In many cases, however, the data do not conform to these distributions and require greater flexibility in fitting the mean, standard deviation, and shape characteristics (skewness and kurtosis) of the distributions. Therefore, we calculate four descriptive statistics characterizing how the delinquent mortgages in an MSA are distributed in relation to the neighborhood delinquency rate: mean, standard deviation, skewness, and kurtosis.

These moments characterize the delinquent loan distributions across individual ZIP Codes but have no spatial component. The extent to which high-delinquency neighborhoods are spatially isolated, dispersed, or clustered is also of interest from a policy perspective. For example,

248 Refereed Papers Geographic Patterns of Serious Mortgage Delinquency: Cross-MSA Comparisons clustering may imply that delinquency problems are contained (or containable) within a limited geographical area and likely require neighborhood-specific responses. Therefore, we also calculate four gradient and spatial autocorrelation measures, which indicate spatial aspects of the neighborhood delinquency distribution.

We calculate the mean as the mean neighborhood delinquency rate for all the delinquent loans in the MSA. Because our data are at the ZIP Code level, we represent neighborhood by ZIP Code and calculate the mean as the weighted average ZIP Code delinquency rate, weighting by number of delinquent loans in the ZIP Code. Note that this is not equivalent to the overall measured delinquency rate for the MSA, which we would obtain by weighting by number of active loans.

We use the same weighting concept to calculate standard deviation, skewness, and kurtosis. Note that the standard deviation from this calculation is small because each loan in the same ZIP Code is assigned the same delinquency rate. Therefore, the deviation among delinquent loans in the same ZIP Code is 0; the measure captures only the deviation among the ZIP Codes.

Gradient. We calculate two measures of gradient—greatest rate of change in delinquency rate between the ZIP Code with the highest (peak) delinquency rate and neighboring ZIP Codes.15 When we restrict attention to ZIP Codes directly adjacent to the peak-delinquency ZIP Code, we obtain what we call the “first-layer gradient.” We obtain the “second-layer gradient” by focusing on those ZIP Codes adjacent to the directly adjacent ZIP Codes (those that touch the boundaries of the first layer). Specifically, FirstLayerGradient = Max( Di − D Max )/ D Max, (3) i = 1....n

where DMax is the highest ZIP Code delinquency rate in the MSA, Di is the delinquency rate of the n ZIP Codes adjacent to the ZIP Code with the highest delinquency rate, and Dj is the delinquency rate of the k ZIP Codes adjacent to the n first-layer ZIP Codes.

See “The Univariate Procedure—Descriptive Statistics” from SAS 9.1.3 Online Documentation (The SAS Institute, 2003) at http://support.sas.com/onlinedoc/913/docMainpage.jsp.

In calculus, the gradient of a vector field is the vectors that point in the direction of the greatest rate of increase, with magnitude equal to the greatest rate of change.

A steep gradient suggests that high-delinquency neighborhoods are more isolated or extreme. An MSA with flat first- and second-layer gradients is likely to have a broad region of high-delinquency neighborhoods. An MSA without any high-delinquency-rate areas will have low gradient measures.16 Spatial Autocorrelation Spatial autocorrelation refers to the degree to which observations from nearby locations (in our context, nearby ZIP Codes) are more likely to have similar magnitude (similar delinquency rate) than by chance alone (Fortin, Dale, and ver Hoef, 2002). We calculate two spatial autocorrelation measures: Moran’s I and Geary’s C.17

It varies from 0 for perfect positive autocorrelation to about 2 for a strong negative autocorrelation.

If correlation is absent, the expected value equals 1.

A low value of Geary’s C corresponds to a high value of Moran’s I, both indicating a high degree of spatial autocorrelation. Moran’s I is a global indicator, whereas Geary’s C is more sensitive to local differences across neighborhood pairs. In general, Moran’s I and Geary’s C will agree on the existence of spatial autocorrelation, but not necessarily on the magnitude.

Exhibit 1 shows the summary statistics for the eight analysis variables. The mean value across the 91 MSAs of the MSA mean variable is about 0.08, and the mean skewness is about 1.6, consistent with substantial positive skewness for most MSAs.

The gradient measures apply only to the neighborhoods surrounding the ZIP Code with the highest delinquency rate. If a large MSA has multiple pockets of high-delinquency areas, the gradient measures will describe only one of them. Also, the ZIP Code size may affect the gradient measure, as does the delinquency rate differential across neighborhoods; for instance, larger ZIP Codes may mask substantial within-ZIP variation. Nevertheless, the results of our cluster analysis that follows suggest that the gradient measure is an effective tool for identifying metropolitan areas where high-delinquency neighborhoods tend to be more isolated.

Much of our discussion of these spatial autocorrelation measures is drawn from Fortin, Dale, and ver Hoef (2002) and Lembo (2008).

250 Refereed Papers Geographic Patterns of Serious Mortgage Delinquency: Cross-MSA Comparisons

The mean values of the spatial autocorrelation measures (0.14 for Moran’s I and 0.95 for Geary’s C) suggest that spatial autocorrelation in each city, in general, is not high. These values may be somewhat misleading, however, because we define neighborhoods rather broadly, at the ZIP Code level.