«Stephanie Riegg Cellini George Washington University School of Public Policy and Public Administration 805 21st Street, NW Washington, DC 20052 (202) ...»
When using proxy variables, as in the case of more straightforward multivariate regression analysis, it is a good idea to add proxies sequentially and compare coefficients on the variable of interest. Since reasonable proxies reduce omitted variable bias, if estimates do not change with the addition of these variables, it is plausible to believe that any remaining unobservables also have little effect on the estimates. Neumark and Rothstein’s (2003) research on school-to-work programs has made extensive use of this method, and in financial aid research, St. John (1990) includes test scores as proxies for ability and postsecondary plans as proxies for aspirations. In another example from the higher education literature, Seneca and Taussig (1987) use faculty compensation as a proxy for academic prestige of a university.
The main constraint with this method is finding data sets that contain a rich array of potential proxies and convincing readers that the proxied variables are the only unobservables of concern. Moreover, exogeneity of the proxy variable is again important to consider, as the proxies must be determined by factors unrelated to the student’s enrollment and financial aid choices to avoid introducing additional bias. Often this can be accomplished by using variables measured in years prior to the choice, such as test scores or aspirations reported by students in middle or high school (as in St. John 1990), before a student has the option to enroll in college. 16
3.4. Fixed Effects Fixed effects methods hold considerable promise in research on financial aid and many recent studies in both the education and economics literature have implemented this approach (Ehrenberg, Zhang, and Levin 2006; Card and Lemieux 2000; Heller 1999; Kane 1994, 1995, 2004). The essential element for fixed effects and difference-in-differences estimates (described in the next section) is access to longitudinal or panel data—or multiple observations for each unit of analysis. Most often, this type of data tracks cross-sectional units (in our case states, students, or institutions) over time. Examples include the High School and Beyond, Beginning Postsecondary Students, the Integrated Postsecondary Education System, and the National Longitudinal Survey of Youth. But fixed effects methods can also be employed as long as there are multiple observations within larger units such as states, colleges, or families— making this method feasible with even with cross-sectional data sets such as the National Postsecondary Student Aid Study or the Current Population Survey.
While basic OLS estimation compares different states or students to each other, the fixed effects approach essentially compares a unit of analysis to itself. The model is identified by changes or differences within units. For this reason, the fixed effects estimator is sometimes referred to as a within estimator. Because the estimates are derived from within units, the approach eliminates the bias from any characteristics—either observed or unobserved—that are common to units.
For more details on proxy variables see Stahlecker and Trenkler (1993), Wooldridge (2002).
In the most typical example, if we have data on each state for several years, fixed effects estimation will control for all time-invariant characteristics of the state—both unobservable and observable—that might bias cross-sectional OLS estimates. These include many, but not all of the characteristics that make Wisconsin different from California, such as the size and structure of the public university system or state high school graduation requirements. However, time-varying unobservables may remain, an issue I return to in the following sections.
To implement the fixed effects approach, one simply adds dummy variables for each
state (or individual) to the model as follows:
where the vector of dummy variables is denoted d s. In this case, one can also add time fixed
effects, or dummy variables for each year, denoted dt, as in equation (9) below:
These time or year fixed effects will absorb any time trends that are common to all states—for example, inflation or changes in federal laws that affect all states the same way.
In a good example of this approach, Heller (1999) implements a fixed effects model to look at the impact of tuition and state need-based grants on enrollment rates. In addition to statelevel fixed effects, his analysis also includes year fixed effects interacted with regional fixed effects. That is, he creates a dummy variable for each region of the country (e.g. South, West) and multiplies that by an indicator for the year to create new dummy variables. This allows any trends over time to vary by region.
Because fixed effects estimates rely on variation within units, this approach only uses units that experience changes or differences in the variable of interest—all others are dropped from the analysis. If, for example, Fst represents the value of state need-based grants to students in a particular state and year, as in Heller’s work, states that offer students the same amount every year will not be included in the analysis, possibly reducing the sample size. Fst must vary over time within states to be identified. Moreover, any control variables in X st that do not vary over time within units will also be unidentified; by eliminating time-invariant unobservables, fixed effects also eliminate the time-invariant observables. This is one reason why Heller includes very few control variables in his model. In fact, the only variable he adds is the unemployment rate, since this is one of the few that varies over time within a state.
Again, in order to prove causality, it is important ensure that the variation over time in the variable of interest is exogenous. That is, fluctuations in the amount of financial aid offered to students each year could be determined by the idiosyncratic state budgetary process or by a formula based on national data, but they should not be determined by factors correlated with enrollment in the state.
Finally, it is worth returning to an earlier point—that fixed effects can also be used at other levels of aggregation, depending on data availability. For example, rather than using state or individual fixed effects, Ehrenberg, Zhang, and Levin (2006) use college-level fixed effects to look at the impact of institution-level National Merit Scholarships on the enrollment of lowincome students. Generally, the more precise or homogeneous the groups are in fixed effects estimation, the less omitted variable bias will remain in the estimates. Using the state policy example, state-level fixed effects are more meaningful than region-fixed effects since states in the same region of the country can have very different systems of higher education. Similarly, in studies of individuals, family-level fixed effects are more effective at reducing omitted variable bias than institution-level fixed effects since children in the same families share more characteristics than students in the same school. 17 Individual-level fixed effects would theoretically provide the best control for omitted variable bias, but this is rarely possible since one would need to observe the same individual’s enrollment response both with and without an offer of financial aid to be able to identify effects. Because of this, fixed effects methods are typically considered only a partial correction for omitted variable bias. 18
3.5. Difference-in-Differences The difference-in-differences approach goes one step further than the fixed effects approach by adding an additional level of variation. In financial aid research, this approach typically uses a binary variable to represent a policy change. The variable Fs again takes on a value of zero for states that do not implement a given policy and one if they do, but it is now multiplied by Tt, an indicator for the time period that takes on value of zero in years before the
policy change and a one after. The model is estimated as follows:
This approach introduces two levels of variation. The “first difference” is the variation across time. The “second difference” is the adoption of the policy. That is, the model compares the within-state changes over time between states that adopted the policy and those that did not.
It is worth noting that the difference-in-differences approach and the regression discontinuity approach (described in the next section) rely on “natural experiments” or “quasiexperiments” for identification. In the absence of a true random assignment experiment, these two approaches exploit seemingly random variation that occurs in non-experimental (or observational) settings, such as in nature, government policy, or institutional design. In the case See Cellini (2006) and Currie and Thomas (1995) for examples of family fixed effects strategies in education.
For more technical details on fixed effects methods see Chamberlain (1980, 1984) and Wooldridge (2002, 2005).
of difference-in-differences, the model above mimics random assignment by comparing enrollment in “treatment” states before and after the policy change to changes in enrollment over the same time period in the “control” states that did not implement the policy. The key identifying assumption is that the policy change is exogenous—passage of the policy must not be related to enrollment in the state. Typically, this cannot be proven directly, but must be argued by the researcher. If the policy or law is determined at the federal level, this assumption is typically uncontroversial, but it becomes less plausible at the local level.
Another important consideration is that the difference-in-differences approach introduces serial correlation. That is, the observations for each person or state are correlated with the other observations for that state in the previous year. To take an example from Cornwell, Mustard, and Sridhar (2006), the state of Georgia implemented a merit-based financial aid program in 1993. It would therefore be assigned a zero in 1990, 1991, and 1992 before introducing the program, and a one in 1993, 1994, and 1995. If, as we suspect, enrollment goes up in 1993 and stays high in the following years in response, the errors will be correlated over time. Fortunately, there are several solutions to this problem, the simplest of which is clustering standard errors. 19 Research on financial aid has made extensive use of the difference-in-differences approach (Conley and Taber 2005; Cornwell, Mustard, Sridhar 2006, Dynarki 2000, 2003; Kane 2003; Lisenmeier, Rosen, and Rouse 2003) and there are likely to be many more creative applications of this strategy in the coming years. A promising new direction for difference-indifferences is to add additional levels of variation for a difference-in-difference-in-differences (or more) estimator. This could be achieved in financial aid research by identifying groups of For more details on difference-in-differences methods and solutions to this problem see Athey and Imbens (2006), Bertrand, Duflo, and Mullainathan (2004), Conley and Taber (2005), Lee and Kang (2006), and Meyer (1995).
states or students who might be more strongly affected by a policy than others, such as lowincome students or states with high poverty rates. The difference-in-differences and triple differences approaches are fairly straightforward to implement and, as in the case of fixed effects, can control for all time-invariant unobservables, though time-invariant unobservables may remain.
3.6. Regression Discontinuity In this quasi-experimental approach, identification rests on an exogenous discontinuity, or cutoff. The key assumption is that students’ or states’ unobservable characteristics vary smoothly across this cutoff, while the cutoff creates a sharp difference in one variable that can be used to identify the causal effect of a program or policy.20 Interestingly, this approach was first developed for use in models of financial aid and college access by Thistlewaite and Campbell (1960), but has only more recently become widespread. Kane (2003), Van der Klaauw (2002), and Bettinger (2004) all draw on this strategy to identify the effects of financial aid policies.
To take an example from Kane (2003), California’s CalGrant financial aid program used a GPA cutoff of 3.15 to determine which students would be eligible for grants. Students just below the cutoff did not receive grants, but students just above it did. Within a narrow range, Kane argues that these students are likely to be quite similar—there is no reason to believe that students with 3.14 GPAs are systematically different (on observable or unobservable dimensions) than those with 3.15 GPAs who received grants. The only mean difference should be their grant receipt. The causal impact of the CalGrant program can then be identified by the In the discussion that follows, I focus on “sharp” regression discontinuity designs where the cutoff is deterministic in nature. However, so-called “fuzzy” regression discontinuity designs are also feasible in cases of incomplete compliance.
difference in the outcomes of those students who just barely made the cutoff versus those that did not.
Institutionally-derived, arbitrary or formulaic cutoffs are good candidates for regression discontinuity research designs, as are physical or geographic borders or boundaries. While physical boundaries have not yet been used in financial aid policy research, elementary school district boundaries have been widely used to study the effects of school quality on housing prices. 21 There are two ways to implement the regression discontinuity approach. The first is simply to limit one’s sample to observations very close to the discontinuity on either side (e.g., if the cutoff is 3.15, then limit the sample to students with GPA’s between 3.12 and 3.17). The equation would look like equation (6), where Ri represents a dummy variable that equals one if the student made the GPA cutoff, and therefore received the grant.
Alternatively, if the sample size is relatively small, one can use the entire sample of students, but then the specification must include a polynomial in GPA to account for the
smoothness of the underlying function, as follows: