«Stephanie Riegg Cellini George Washington University School of Public Policy and Public Administration 805 21st Street, NW Washington, DC 20052 (202) ...»
Causal Inference and Omitted Variable Bias in Financial Aid Research:
Stephanie Riegg Cellini
George Washington University
School of Public Policy and Public Administration
805 21st Street, NW
Washington, DC 20052
Note: For the published version of this paper please see and cite: Cellini, Stephanie Riegg,
“Causal Inference and Omitted Variable Bias in Financial Aid Research: Assessing Solutions,” Review of Higher Education, 31(3), Spring 2008: 329-354.
I gratefully acknowledge comments and advice from Latika Chaudhary, Dylan Conger, Amaury Nora, Ed St. John, two anonymous referees, and participants in the Lumina Conference on Barriers to Financial Aid and Access. I thank Phillip Stegner for excellent research assistance.
Abstract This article highlights the problem of omitted variable bias in research on the causal effect of financial aid on college-going. I first describe the problem of self-selection and the resulting bias from omitted variables. I then assess and explore the strengths and weaknesses of random assignment, multivariate regression, proxy variables, fixed effects, difference-indifferences, regression discontinuity, and instrumental variables techniques in addressing the problem. I focus on the intuition, assumptions, and applications of each method in the context of the same research question, providing practical guidance for researchers interested in implementing these approaches.
1. Introduction In the United States, the real cost of a college education has climbed almost 30 percent in the past ten years and shows no sign of stabilizing in the near future. 1 In this environment, student financial aid is perhaps the most important policy tool for maintaining and increasing access to postsecondary education among low- and middle-income students. With more than $13 billion spent on the federal Pell Grant program alone in 2004-05 2 —not to mention the many other federal, state, local, and private aid programs—an accurate assessment of the causal impact of financial aid on college-going is a crucial consideration for college administrators and policymakers at all levels of government.
Researchers in the social sciences have analyzed the price sensitivity of college students from many different angles using a wide variety of methods. This article addresses an important challenge in the quantitative literature on this topic—determining the causal effect of financial Author’s tabulations of data from the U.S. Department of Education (2006a, Table 312).
U.S. Department of Education (2006b).
aid on student access to college. While simple ordinary least squares estimates of the impact of aid on college-going can reveal a correlation between financial aid policies and enrollment, these estimates are likely to suffer from omitted variable bias due to self-selection, potentially overestimating or underestimating the causal impact of these policies on enrollment.
I draw on recent quantitative studies of the impact of student financial aid on collegegoing to illustrate the problem of omitted variable bias and highlight the most promising solutions. In so doing, I seek to bridge the economics and education literature. While economists have spent several decades developing and implementing solutions to the problems of omitted variable bias in education, 3 researchers in the field of higher education have only more recently begun to acknowledge the problem, 4 and few have adopted convincing alternate estimation strategies to address it. 5 In this article, I first discuss the problem of omitted variables and the bias they create. I then explain and assess the most common techniques used to control for this problem in the economics and education literature on financial aid: random assignment, multivariate regression, proxy variables, fixed effects, difference-in-differences, regression discontinuity, and instrumental variables. In contrast to traditional econometrics texts, my focus is on the intuition, assumptions, and applications of each method. 6 The approach is novel in that I examine each method in the context of the same question: Does financial aid increase college enrollment?
While this strategy allows for straightforward comparisons between methods, it is worth noting For a review of the economics literature see Card (1999) or Angrist and Kreuger (1999).
See for example see Alon (2005), Becker (2003), Curs and Singell (2002), DesJardins, Alburg, and McCall (2006), Dowd and Coury (2006), and Dowd (forthcoming).
Though I draw on many studies of financial aid policy in both the economics and higher education literature, this article is not intended to be a comprehensive review of the literature. Rather, the goal is to use selected articles to illustrate salient features of the methods used to address omitted variable bias. For a more detailed review of this literature see Dowd (forthcoming).
For more detailed explanations of these methods, see for example Angrist and Krueger (1999), Becker (2003), Greene (2000), Meyer (1995), Shadish, Cook, and Campbell (2002), and Wooldridge (2002).
that the same methods may be applied to any number of other research questions that involve similar problems of self-selection and omitted variable bias. I conclude with recommendations for future research, highlighting the methodological approaches that hold the most promise in overcoming omitted variable bias to more accurately measure the causal impact of student financial aid on access to education.
2. The Problem Despite an extensive literature on the impact of financial aid and tuition on college enrollment, few studies can truly estimate the causal effect of a particular policy. The reason is what economists generally refer to as endogeneity: the idea that change in the variable of interest comes from within the system or model under study, rather than from outside factors. Two potential sources of endogeneity create bias in traditional linear ordinary least squares estimates of the causal impact of financial aid policy—namely, simultaneity and omitted variable bias.
Simultaneity, or reverse causality, occurs when the independent variable of interest and the outcome are determined jointly, or at the same time, making the direction of causality unclear.
DesJardins, Ahlburg, and McCall (2006) and Curs and Singell (2002) provide excellent discussions of the problem of simultaneity in application, enrollment, and financial aid receipt.
This article addresses the second and equally important source of endogeneity in financial aid research—omitted variable bias.
The problem of omitted variable bias arises because states that implement certain financial aid policies and students who receive financial aid are typically self-selected. The idea of self-selection is that individuals, states, or other entities make conscious choices about whether or not to adopt a policy, apply for aid, go to college, and so forth. The basis for these decisions may be related to characteristics that we can observe, but they may also be based on characteristics that we cannot observe—a problem economists refer to unobserved heterogeneity.
In studies that use states as the unit of observation, whether or not a state adopts a particular financial aid policy is likely to be related to other policies and characteristics of that state, for example, how many low-income students are in the state or how many colleges are within its borders. Similarly, in studies that focus on students, variation in the receipt of financial aid will undoubtedly be driven by factors such as financial need, academic achievement, and the student’s knowledge of aid programs. In both cases, financial aid policy and receipt are considered endogenous variables because changes in these variables are driven by some of the same factors that also influence enrollment—the outcome of interest in the model.
In both cases, if any factors that are correlated with both financial aid and enrollment are left out of a model, simple cross-sectional ordinary least squares (OLS) estimates will be biased.
To look more closely at the first situation—where we take states as the unit of analysis—
consider the following equation of interest:
Where Enrolls is the number of students enrolled in four-year colleges in state s, Fs is a binary variable (also known as an indicator or dummy variable) that equals one if the state has a particular financial aid policy in place (e.g. a merit aid program), and zero otherwise. 7 In this over-simplified model, the error term ε s includes every other observable or unobservable factor that affects enrollment in the state and a component of random noise. 8 For simplicity and without loss of generality, I use financial aid as the endogenous regressor (typically denoted X in textbooks) and enrollment as the outcome of interest (typically denoted Y) throughout.
One could think of the error term as ε s = λ X s + η s where X is a vector of all state characteristics affecting enrollment and η is noise. However, for simplicity, I consider ε alone in the examples that follow.
If one was to perform univariate OLS regression analysis (or simply calculate a correlation coefficient), our estimate of the impact of financial aid on enrollment would be
determined as follows:
( cov(ε s, Fs ) / var( Fs ) ). If this last term is not equal to zero (that is, that there is some correlation between the error term and the policy), the estimator will be biased. We will either over- or under-estimate the effect of the policy, depending on the sign of the final term. Intuitively, if there are any other factors left out of this model that influence both the adoption of the financial aid policy and enrollment in the state, our estimate of the impact of the policy will be biased.
If for example, states with low per-capita income have both low enrollment rates and strong financial aid policies, the omission of the per-capita income variable will cause
the causal impact of financial aid on enrollment.
An analogous argument can be made when studying individuals and their probability of enrolling, when facing a certain financial aid offer. To see this in the student-level model we consider the following linear probability specification: 9
where i indexes individual students and Enrolli now represents the binary choice of a student to enroll in college ( Enrolli equals zero if the student does not enroll and one if she does). Ri equals one if the student received a particular form of financial aid (e.g. a Pell Grant) and zero
In this case, the factors in ε i that influence both a student’s enrollment decision and her financial aid award, such as parental income, academic achievement in high school, and
take a typical example, if students who receive need-based financial aid are less likely to enroll in college for any number of reasons that are not captured in this model (for example, lower than average income, low levels of parental support, or little knowledge of postsecondary options), estimates of β will underestimate the impact of financial aid on enrollment—potentially ˆ attributing false negative effects of the aid program on enrollment. A similar, but opposite bias is likely to occur in the case of merit-based scholarships. If students who receive merit-based scholarships also tend to be more likely to enroll in college in the absence of the aid (due to
Probit and logit models are more prevalent in the literature, but I use a linear probability model for simplicity. See Becker (2003) for an excellent description of the implications of omitted variable bias in linear vs. non-linear models.
overestimate the impact of the aid on enrollment—potentially finding an overly positive impact of aid—as these students would have been more likely to attend college even without it.
Though the examples are oversimplified—univariate OLS is rarely if ever used to draw causal inferences—they illustrate in the simplest terms the problem of omitted variable bias. In the sections that follow, I focus my attention on possible solutions to this problem.
3.1. Random Assignment In the social sciences, random assignment has become the gold standard for proving causality. In this approach, observations, in this case states or students, are randomly assigned (a process akin to a coin toss) to either the “treatment” or “control” groups. In the case of students, the treatment group might each receive a grant to attend the college of their choice, while the control group would not. Since financial aid receipt is completely unrelated to any characteristics of the student (including characteristics related to her enrollment decision), then cov(ε i, Ri ) = 0 and the bias is completely eliminated from univariate OLS estimates. One must only compare the mean enrollment rates of the treatment and control groups using t-tests to obtain unbiased estimates of the impact of the grant. The main identifying assumption of this approach is that given a large enough sample size, there should be no mean differences in characteristics between the group that was selected to receive the grant and the group that was not. As a robustness check, one can also use t-tests to compare the means of all observable characteristics of the two groups before treatment to ensure that the identifying assumption holds.
This last point suggests that random assignment may not be feasible when looking at state-level financial aid policy. With just 50 states—25 in each experimental group—the sample size would likely be too small to assure that there were no mean differences between the two groups. Random assignment may be possible with smaller geographic areas, if for example, certain counties were assigned to a particular financial aid policy while others were not. Random assignment of colleges would be another potentially fruitful approach for financial aid research.