The key feature of MTO that made it a test of the effects of neighborhood was restricting the use of the experimental group vouchers to low-poverty neighborhoods. As noted previously, without this restriction, MTO would have been simply a test of the addition of counseling to the traditional Section 8 voucher—a relatively unremarkable, if possibly useful, test much like many other experiments. With the locational restriction, MTO engineered a substantial difference in social environment for statistically matched groups of families—a circumstance that would never occur naturally.

The feature that allowed MTO to test both the effect of neighborhood and the relative effects of Section 8 rent vouchers and public housing was the random three-way assignment to the experimental group with locationally restricted vouchers, the Section 8 group with traditional vouchers, and a control group that remained in public housing. This design enabled the researchers to answer three distinct questions: (1) What is the effect of living in private housing in a low-poverty neighborhood relative to living in public housing in an area of concentrated poverty? (2) What is the effect of living in private housing in a low-poverty neighborhood relative to living in private housing in a substantially higher poverty neighborhood? (3) How effective are Section 8 vouchers, relative to public housing, in improving the lives of low-income people?

The Effects Analyzed and the Methods Used To Measure Them The hypothesis that residential environment shapes the lives of low-income people implies that neighborhood effects may be felt across virtually every domain of life. A project that sets out to measure the effects of residential environment must therefore measure a very large number of

potential effects of intervention. MTO measured the effects of neighborhood in six broad domains:

mobility, housing, neighborhood, and social networks; physical health; mental health; economic self-sufficiency; risky and criminal behavior; and educational achievement.

To measure outcomes across all these domains, the study tapped a wide range of data sources: personal interviews with family members, interviewer observations, census data, audio recordings and

physical biomarkers (discussed further in the following sections), administrative data on earnings and arrest histories, data from several national databases on the characteristics of schools attended by MTO youth, and study-administered achievement tests in math and reading.

This problem of compliance with treatment is an issue in most random-assignment studies and, from the outset, was treated as a serious problem in this one. Most personal outcomes are measured with considerable noise—variation not systematically related to measurable factors—and small numbers of lease ups in the treatment groups would threaten the chances that analysts could detect any statistical effect of neighborhood. Abt allocated the families recruited into the sample among the two treatment groups and the control group to minimize the variance of the treatment-control comparisons, using initial lease-up assumptions that were periodically updated.

In all cases, the outcome measures used by the study were state of the art. Survey measures were taken from, or designed to be comparable with, those used in large-scale national surveys. For example, the MTO educational assessments were those used in the fifth- and eighth-grade followup waves of the Department of Education’s Early Childhood Longitudinal Study, Kindergarten Cohort.

MTO took survey-based measures of physical and mental health largely from the National Health Interview Survey, the Behavioral Risk Factor Surveillance System, the National Survey on Drug Use & Health, the World Health Organization’s Composite International Diagnostic Interview, and other widely used survey batteries. The demonstration based measures of risky and criminal behavior on those used in the National Longitudinal Survey of Youth. Use of these established measures not only ensures that the MTO outcomes are based on well-tested interview scales, but it also allows for direct comparison of the MTO sample with national populations.

Perhaps the most innovative feature of the data collection was the use of biomarkers to assess physical health. As in the preceding interim analysis, survey interviewers measured respondents’ height and weight during the home visit. In addition, for the long-term study, interviewers measured blood pressure and waist circumference (a better measure of obesity-related health risks than previous height- and weight-based measures) and collected blood spots from finger pricks.

These blood samples enabled the researchers to detect the presence of uncontrolled diabetes and to measure high-sensitivity C-reactive protein levels, an important predictor of cardiovascular disease.

The MTO Followup Period Given the large sample and broad scope of investigation, the MTO followup data collection was of unprecedented length.12 MTO enrolled families from 1994 to 1998. Abt conducted the interim impacts evaluation (Orr et al.,

2003) followup survey in 2001, 4 to 7 years after random assignment.13 Orr et al.’s (2003) analysis, For context, Greenberg and Shroder (2004) reported longer term followups in the following cases: National Supported Work Demonstration, 8 years, sample of 6,600, administrative data only; New York Nurse Home Visitation, 15 years, sample of 400, survey and administrative data; Perry Preschool, 27 years, sample of 123, survey only; Carolina Abecedarian, 18 years, sample of 111, survey only. David Greenberg found these examples. Grinstein-Weiss et al. (2011) reported on an Individual Development Accounts experiment, 10 years, sample of 1,100, survey only.

Other federal agencies and private foundations contributed to the interim data collection and analysis, responding to proposals primarily written by Jeffrey Kling.

based on this survey and a wide array of other data sources, provided what in most demonstrations would have been considered a long-term followup; few experiments are able to follow sample members this long.

HUD recognized, however, that the effects of neighborhood might not only be pervasive but also take some time to develop. Changing the behavior of families from impoverished areas might, for example, take prolonged exposure to a better residential environment. Teenagers who had lived most of their lives in lower poverty environments might also behave very differently than teenagers who had spent their early years in a high-poverty area before moving to a low-poverty neighborhood.

For these reasons, one could not confidently conclude that the lack of a statistically significant effect on a given outcome 4 to 7 years after random assignment meant that the effect on that outcome would never be significant. On the other hand, strong interest persisted in determining how long some of the positive effects found in the interim impacts evaluation—for example, the improvements in adult mental health and obesity and the positive effects on girls’ risky behavior and criminal activity—would persist. These considerations led HUD, with support from other agencies and foundations, to fund long-term followup data collection and analysis 10 to 15 years after random assignment. The National Bureau of Economic Research (NBER) conducted a final impacts evaluation (Sanbonmatsu et al., 2011) survey in 2009 and 2010, again supplemented by an array of other data sources.

Many low-income households are difficult to interview, either because of high mobility or for other reasons. Both the Abt and the NBER surveys employed a two-stage sampling process to ensure that the people actually reached were representative of the full sample. For example, in the main phase of the final survey, the Michigan Survey Research Center (the survey subcontractor) first obtained interviews with approximately 75 percent of the sample using standard respondent incentives and standard intensity of search by the staff. The researchers then randomly chose 35 percent of the remaining sample and substantially increased the respondent incentives and intensity of staff search for that subsample. Sanbonmatsu et al. (2011) then “weighted up” the respondents from the subsample to obtain the effective response rate; thus, if the first-stage rate was 0.75 and the second-stage rate was 0.6, the effective response rate would be 0.75 + (0.25 x 0.6) = 0.9. The effective response rates obtained in the final impacts evaluation survey were 90 percent for adults and 87 percent for youth (Groves et al., 2004; Sanbonmatsu et al., 2011).

If one considers the length of followup in association with the other elements enumerated and unenumerated previously—the rigor of the research design, the size of the sample (more than 15,000 individuals), the broad array of outcome measures collected, the remarkably high survey response rates, the difficulty and importance of the research and policy questions—MTO emerges as a landmark study.14 The continuity of support for the experiment is the more remarkable because it occurred over six administrations. HUD Assistant Secretaries for Policy Development and Research John Weicher, Michael Stegman, Susan Wachter, Al Trevino, Dennis Shea, Darlene Williams, and Raphael Bostic all provided support for the project.

Has the Problem of Concentrated Poverty Changed?

According to the Census Bureau, in 1990, 13.5 percent of the U.S. population lived below the poverty line. In 2010, the figure was 15.1 percent. It is mathematically possible for the concentration of poverty to fall while the poverty rate rises, but the popular perception is that the nation became more, not less, economically segregated over those two decades. That perception has a strong factual basis. “After declining in the 1990s, the population in extreme-poverty neighborhoods— where at least 40 percent of individuals live below the poverty line—rose by one-third from 2000 to 2005–09” (Kneebone, Nadeau, and Berube, 2011: 1).

Certain changes, however, might affect the way that policymakers view the urgency of the problem and their policy levers for affecting it. The crack cocaine epidemic has passed and, with it, some of the surge in homicides, although the danger of violent death lingers in many low-income neighborhoods. As of September 30, 2011, the HOPE VI Program had effected the demolition of 96,797 public housing units, including nearly all of the most notoriously ungovernable and deteriorating projects.

Perhaps these projects contributed to the assumption that the concentration of poverty was anchored in place. Exhibit 3 maps the percentage of poverty by census tract in five metropolitan areas— Boston (3a), an MTO site, and Denver (3b), Houston (3c), Minneapolis (3d), and Nashville (3e), which were not MTO sites—both in 1990 and over a 5-year period from 2003 through 2007.15 In this exhibit, a tract with less than 10 percent of the population living in poverty is shown as white, a tract with more than 40 percent of the population living in poverty is black, and tracts in between are in graduated shades of gray. The number of tracts in low poverty shrank in all five metropolitan areas. Many of the high-poverty areas became gray, but the total landmass of highpoverty tracts did not decline. The concentration of poverty has often shifted from one place to another. Boston exhibits a reconcentration of poverty in the near southwest. Concentrated poverty has moved with marked centrifugal force in Houston. Some of Denver’s concentrations have leapfrogged other neighborhoods in the shift to the east and northeast of the city center. The deep poverty south of the center of Minneapolis has shifted to other locales. Only in Nashville does the pattern of neighborhood poverty appear stable.16 If the concentration of poverty is a moving target, lumbering federal policy based on rapidly obsolescing data will have a hard time dealing with it. Orr et al. (2003) repeatedly noted that the variability of the poverty rate over time tends to diminish the strength of the mobility treatment.

For example, “Because many [of the experimental group movers] moved to neighborhoods where the poverty rate was increasing between 1990 and 2000, we estimate that only about half of their destinations had poverty rates below 10 percent at the time of the move…” (Orr et al., 2003: viii), whereas, even among members of the control group who stayed in their origin project, about 21 percent were no longer living in tracts with more than 40 percent poverty.

Ron Wilson, a social science analyst in HUD PD&R, created these exhibits. With the elimination of the long-form decennial census, reliable poverty statistics at the census tract level depend on 5-year averages of American Community Survey data.

“The population in extreme-poverty neighborhoods rose more than twice as fast in suburbs as in cities from 2000 to 2005–09” (Kneebone, Nadeau, and Berube, 2011:1).

Percent in poverty 0–10 11–20 21–30 31–40 41–100 Sources: 1990–1990 Census; 2007–American Community Survey 5-year estimates

If the movement of poverty concentrations can undercut the case for a demonstration, the attenuation would likely be larger in a program that was to scale. Over the course of 5 years, the demonstration moved 813 families in the experimental group to low-poverty neighborhoods. Polikoff (2004)

describes what a national program to scale would look like:

Suppose 50,000 housing choice vouchers were made available annually, were earmarked for use by black families living in urban ghettos, and could be used only in non-ghetto locations—say, census tracts with less than 10 percent poverty and not minority impacted.

Suppose that the vouchers were allocated to our 125 largest metropolitan areas. Suppose also that to avoid “threatening” any receiving community, no more than a specified number of families (an arbitrary number—say, ten, or a small fraction of occupied housing units) could move into any city, town or village in a year.

