«Stephanie Riegg Cellini George Washington University School of Public Policy and Public Administration 805 21st Street, NW Washington, DC 20052 (202) ...»
Note that the polynomial is usually a quadratic or cubic term—anything higher may absorb all variation. Also, whether one uses the specification in equation (11), or uses the limited sample approach, the coefficient on the variable of interest should be the same in both cases. However, the standard errors may be smaller in the polynomial approach since the sample size is larger.
There are two important considerations to keep in mind when implementing a regression discontinuity design. First, the cutoff must be exogenously determined. In Kane’s case, the See for example Black (2002), Rothstein (2006), and Kane, Riegg, and Staiger (2006) using school assignment boundaries to isolate the impact of school quality on housing prices.
GPA cutoff for the grant changed every year based on the number of applicants—not based on characteristics of the applicants themselves.
A second but related point is that only the variable of interest should change at the cutoff.
In Kane’s case, the cutoff was not known a priori by students, which meant that motivated students could not work extra hard to push themselves over the threshold. If this had been the case, then estimates of the causal effect of the grants would have been biased, as this unobservable motivation would have varied along the same cutoff. Graphical analyses can be particularly useful in showing that only the treatment varies at the cutoff. Providing a visual comparison of the existence of the discontinuity in the variable of interest to the lack of a discontinuity in other observable characteristics lends credibility to the case that the discontinuity can isolate causal effects. 22
3.7. Instrumental Variables Though widely used in assessing the returns to schooling (Angrist and Krueger 1991, Card 1995, Kane and Rouse 1993, Staiger and Stock 1997), instrumental variables (IV) methods have rarely been applied to studies of financial aid and student access (Alon 2005, Seneca and Taussig 1987). The goal of the IV approach is to find a convincing variable that can be used as an “instrument” for the endogenous variable of concern—in this case, financial aid policy. The instrument allows one to identify and isolate a source of variation in the endogenous regressor that is not affected by omitted variables. To borrow an excellent intuitive explanation from Angrist and Krueger (2001), IV estimation solves the omitted variable problem by using only part of the variability in the endogenous variable—a part that is uncorrelated with the omitted For more on this topic and regression discontinuity methods generally see, Imbens and Lemieux (2007), Lee and Card (2006), and Shadish, Cook, and Campbell (2002).
variables—to estimate the relationship between the endogenous regressor and the dependent variable.
There are two conditions that must be met to implement IV estimation successfully.
First, the instrument must be correlated (after controlling for exogenous variables) with the endogenous variable. This is known as the strength of the instrument and is denoted mathematically as E ( Fs, Z s | X s ) ≠ 0, where Z s is the instrumental variable and Fs is the financial aid policy variable.
Second, the instrument must be exogenous to all other factors that might affect enrollment. That is, it must be uncorrelated with the error term ε s, mathematically, E (ε s, Z s ) = 0.
This is referred to as the validity of the instrument and is extremely important for causal inference. To see this, consider the equation for the IV estimator, where Z s is an instrument for
Putting the strength and validity conditions together in the context of our financial aid examples, the instrument must affect a state’s adoption of a financial aid policy (or a student’s ability to get it), but have no independent affect on enrollment. Put another way, the instrument must only impact enrollment through the state’s adoption of a financial aid policy. For example, the instrument cannot also affect the adoption of higher tuition or an increase in the number of colleges, since it would also impact enrollment through those variables.
Once a potential instrument has been found, IV estimation can be implemented using substitution or two-stage least squares (2SLS). Using 2SLS as an example, in the first stage, one
estimates a linear projection of the endogenous variable Fs :
where again, Z s is an instrument for the omitted variable. From this equation, one calculates ˆ
predicted values, Fs. These are used in the second stage regression equation to estimate:
Using the predicted values of Fs essentially leaves behind the residuals from the first stage, thereby eliminating the part of the variation in Fs that is correlated with Enrolls.
While many statistical software packages make IV estimation straightforward to carry out, the most significant challenge of this approach is finding convincing instruments. One can test the strength of the instrument using the first stage estimation. If equation (14) produces a
instrument is only weakly correlated with the endogenous variable, however, then additional problems of bias and inconsistency may be introduced. 23 In contrast to an instrument’s strength, the validity of an instrument cannot be tested directly, and must be argued by the researcher. As such, finding plausibly valid instruments is difficult. In an effort to control for the unobservable innate ability, researchers estimating the returns to schooling have used instruments derived from natural experiments, such as quarter of See Bound, Jaeger, and Baker (1995) for more on this issue.
birth in conjunction with compulsory schooling laws (Angrist and Krueger 1991), and proximity to college (Card 1995) but even these instruments are not completely free of criticism on the grounds of validity. In financial aid research, finding convincing instruments may be just as challenging.
Still, as pointed out by Angrist and Krueger (2001), a promising new direction for education research is the use of instrumental variables in random assignment experiments with incomplete compliance. As described above, compliance issues can compromise causal inference in pure random assignment evaluations if some individuals assigned to the treatment group refuse treatment or some assigned to the control group receive it. However, using IV estimation, one can still effectively estimate the causal effects of the treatment in question. The individual’s actual treatment status would be considered endogenous regressor and the assigned group (treatment or control) would serve as the instrument. The instrument would be strong, because assignment status is likely to be highly correlated with actual treatment, and it would be valid since it was assigned at random. Instrumental variables estimation would then yield an accurate estimate of the causal effect of the treatment on population that complied with the random assignment. This approach has been used in the education literature assessing the impact of class size (Kreuger 1999), hours of study (Powers and Swinton 1984), and voucher programs (Howell, Wolf, Campbell, and Peterson 2002) on students’ achievement test scores. To date, however, it has not been used in financial aid research. 24 For more details on instrumental variables methods see Angrist and Kreuger (2001), Angrist, Imbens, and Rubin (1996), Meyer (1995), Staiger and Stock (1997), and Wooldridge (2002).
4. Discussion and Conclusions The discussion above has outlined several methods for addressing the problem of omitted variable bias in financial aid research. While random assignment is undoubtedly the most methodologically sound and mathematically simple approach to reducing this bias, the ethical considerations and resources needed to carry out such evaluations make this method impractical in many situations. On the other hand, the Department of Education has made a strong effort to support large-scale random assignment evaluations in education in the past few years, making this approach more feasible in education research.
For smaller-scale projects, proxy variable, fixed effects, and difference-in-differences approaches are becoming quite common. Indeed, these approaches have replaced basic multivariate regression as the new standard for education research in the economics literature.
New applications and extensions of these approaches—such as triple differences—may yield high returns in financial aid research. The use of instrumental variables techniques, on the other hand, has declined in recent years, since finding strong and valid instruments has proven to be an almost insurmountable obstacle. On the other hand, the approach has proved useful in the context of random assignment evaluations with incomplete compliance. Finally, regression discontinuity holds great promise for the future of non-experimental financial aid research, as the identifying assumptions are straightforward and the approach is relatively simple to implement.
Future research on the causal effects of financial aid should go further in applying these new methods. Often, the variables needed to apply these techniques already exist in commonly used and publicly-available data sets. In other cases, a bit of resourcefulness in data collection or manipulation is all that is required to identify and exploit a suitable proxy, discontinuity, or instrument. Researchers should also consider employing and comparing the results of multiple methods in order to assess the magnitude of omitted variable bias and the extent to which various methods can control for it. For example, comparing multivariate regression estimates with fixed effects estimates can identify the magnitude of omitted variable bias from factors that are constant within groups. Similarly, comparing fixed effects estimates with those derived from regression discontinuity methods can indicate the size of the bias from remaining unobservables that vary within groups.
It is worth noting, however, that quantitative causal inference is by no means the only valid method of inquiry in financial aid or educational research generally. Indeed, descriptive quantitative and qualitative analyses are equally important in establishing patterns of correlation, developing theory, and directing our attention to areas where further research is warranted. They are also vitally important in understanding the nature of the forces that drive the causal effects of financial aid on college-going. These forces—whether they are economic, familial, psychological, social, cultural, or political—are essential determinants of college-going and financial aid receipt in their own right. In fact, these are the unobservable factors that create the problem of self-selection and omitted variable bias in the first place.
In this article I am concerned only with suggesting ways to mitigate or eliminate the effects of some or all of these forces—whether we can identify them or not—in order to make inferences about the independent effects of financial aid on enrollment. I argue simply that omitted variable bias must be explored and addressed before we make claims as to whether policies, programs, or treatment interventions cause an effect, or are simply correlated with it— the distinction is paramount.
I assess and explore the strengths and weaknesses of some of the most common methodological solutions to the problem of omitted variable bias and provide practical guidance for researchers interested in implementing these approaches. Each question we ask and each data set we employ will present unique opportunities and challenges, requiring careful thought and a nuanced understanding of the issues before determining the most appropriate research strategy. And while the methods presented above all have disadvantages, each also holds promise for the future of causal inference in financial aid research. The possibilities are limited only by the creativity of the researcher.
References Alon, S. (2005). Model Mis-Specification in Assessing the Impact of Financial Aid on Academic Outcomes. Research in Higher Education, 46(1), 109-125.
Angrist, J.D., Imbens, G.W., & Rubin, D.B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91(434), 444-455.
Angrist, J.D. & Krueger, A.B. (1991). Does Compulsory School Attendance Affect Schooling and Earnings? Quarterly Journal of Economics, 106(4), 979-1014.
Angrist, J.D. & Krueger, A.B. (1999). Empirical Strategies in Labor Economics. In O.
Ashenfelter & D. Card (Eds.) Handbook of Labor Economics, Volume 3 (pp. 1277-1366).
London: Elsevier Science.
Angrist, J.D. & Krueger, A.B. (2001). Instrumental Variables and the Search for Identification:
From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69-85.
Angrist, J.D. (1998). Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicatants. Econometrica, 66(2), 249-288.
Athey, S. & Imbens, G. (2006). Identification and Inference in Non-linear Difference-inDifferences Models. Econometrica, 74(20), 431-497.
Becker, G.S. (1964). Human Capital: A Theoretical and Empirical Analysis with Special Reference to Education. New York: Columbia University Press.
Becker, W.E. (2003). Omitted Variables and Sample Selection Problems in Studies of CollegeGoing Decisions. Conference paper for presentation at the Mini-Conference on Evaluation Methods and Practices Appropriate for Faith-Based and Other Providers of Social Services.
October 3, 2003.
Berliner, D.C. (2002). Educational Research: The Hardest Science of All. Educational Researcher, 31(8), 18-20.
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). How Much Should We Trust Differencesin-Differences Estimates? Quarterly Journal of Economics, 91(1), 249-275.
Bettinger, E. (2004). How Financial Aid Affects Persistence. National Bureau of Economic Research Working Paper No. 10242. Cambridge, MA.
Bound, J., Jaeger, D.A., & Baker, R.M. (1995). Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and Exogenous Explanatory Variable is Weak.
Journal of the American Statistical Association, 90(430), 443-450.