IDENTIFICATION AND ESTIMATION OF CAUSAL EFFECTS WITH TIME-VARYING TREATMENTS AND TIME-VARYING OUTCOMES*

Authors


  • Earlier versions of this paper were presented at the 2006 Annual Meeting of the Robert Wood Johnson Health & Society Scholars Program and the 2005 Winter Conference of the American Sociological Association Methodology Section. We thank Ross Stolzenberg for serving as the editor for this manuscript. We also thank David Harding and two anonymous reviewers for helpful comments and suggestions. Brand received support from the Robert Wood Johnson Foundation Health & Society Scholars at the University of Michigan and the Carolina Population Center NICHD training grant at the University of North Carolina–Chapel Hill. This research uses data from the Wisconsin Longitudinal Study (WLS) of the University of Wisconsin–Madison. Since 1991, the WLS has been supported principally by the National Institute on Aging (AG-9775 and AG-21079), with additional support from the Vilas Estate Trust, the National Science Foundation, the Spencer Foundation, and the Graduate School of the University of Wisconsin–Madison. A public use file of data from the Wisconsin Longitudinal Study is available from the Data and Program Library Service, University of Wisconsin–Madison, 1180 Observatory Drive, Madison, Wisconsin 53706 and at http://dpls.dacc.wisc.edu/WLS/wlsarch.htm. The ideas expressed herein are those of the authors. Direct all correspondence to Jennie E. Brand, University of North Carolina–Chapel Hill, Carolina Population Center, 123 West Franklin Street, Chapel Hill, NC 27514; email: jebrand@email.unc.edu.

Abstract

We develop an approach to identifying and estimating causal effects in longitudinal settings with time–varying treatments and time–varying outcomes. The classic potential outcome approach to causal inference generally involves two time periods: units of analysis are exposed to one of two possible values of the causal variable, treatment or control, at a given point in time, and values for an outcome are assessed some time subsequent to exposure. In this paper, we develop a potential outcome approach for longitudinal situations in which both exposure to treatment and the effects of treatment are time-varying. In this longitudinal setting, the research interest centers not on only two potential outcomes, but on a whole matrix of potential outcomes, requiring a complicated conceptualization of many potential counterfactuals. Motivated by sociological applications, we develop a simplification scheme—a weighted composite causal effect that allows identification and estimation of effects with a number of possible solutions. Our approach is illustrated via an analysis of the effects of disability on subsequent employment status using panel data from the Wisconsin Longitudinal Study.

Despite the ongoing philosophical debate regarding whether any relationship can be deemed causal, a significant share of quantitative research in sociology attempts to establish causal effects. Regression coefficients, while often not explicitly termed causal effects, are generally interpreted as indicating how much the dependent variable would increase or decrease under an intervention in which the value of a particular independent variable is changed by one unit, while the values of the other independent variables are held constant (Blalock 1961:17). Whether or not a regression model has been properly specified does not, however, justify the interpretation that a coefficient is a causal effect rather than a partial association without explicit attention to the conditions under which estimates should or should not be interpreted as causal effects. Freedman (1987), for example, offers this sharp criticism of the regression approach commonly practiced in sociology.

All statements about causality can be understood as counterfactual statements (Lewis 1973). The potential outcome approach to causal inference extends the conceptual apparatus of randomized experiments to the analysis of nonexperimental data, with the goal of explicitly estimating causal effects of particular “treatments” of interest. This approach has early roots in experimental designs (Neyman 1935) and economic theory (Roy 1951), but it has been extended and formalized for observational studies in statistics (Holland 1986; Rosenbaum and Rubin 1984, 1983; Rubin 1974) and in economics (Heckman 2005; Heckman, Ichimura, and Todd 1997, 1998; Manski 1995). The potential outcome approach has recently gained attention in sociological research (Brand and Halaby 2006; Harding 2003; Winship and Morgan 1999; Winship and Sobel 2004).

According to the potential outcome causal model, a “treatment” is defined as an intervention that can, at least in principle, be given to or withheld from a unit under study. Each unit has a response or outcome that would have been observed had the unit received the treatment, yit, and a response that would have been observed had the unit received the control, yic, given n observations (i= 1, …, n). The effect caused by the treatment in place of the control is a comparison of yit and yic. If both yit and yic could be observed for each unit, the causal effect could be directly calculated. However, each unit receives only one treatment and so only yit or y ci is observed for each unit. The estimation of a causal effect therefore requires an inference about the response that would have been observed for a unit under a treatment condition it did not actually receive. Moreover, the existing literature on causal inferences assumes the stable unit treatment value assumption (SUTVA) (Rubin 1978), which means that the potential outcomes for one unit are unaffected by assignment mechanisms and assignment conditions of other units. It is as if potential outcomes were fixed attributes of the unit, with the observed assignment condition merely revealing one of them to the researcher.

As per the classic potential outcome approach, units of analysis are exposed to one of two possible values of the causal variable, treatment or control, at a given point in time, and values for an outcome are assessed some time subsequent to exposure.1 There is no time variation implicated in this setup, beyond the fact that the outcome is measured after exposure to the treatment. Robins and his associates (e.g., Robins, Hernan, and Brumback 2000) have extended the potential outcome approach to the time–varying case. Their emphasis is on recovering biases in epidemiological research that arise from endogenous time–varying covariates.

In this paper, we utilize the conceptual apparatus of the potential outcome framework, with its explicit attention to the comparisons needed in order to make causal claims. However, we examine a more general framework for longitudinal studies and consider the analysis of causal effects in which both exposure to treatment and the effects of treatment are time-varying. In this generalized setup, treatment of a unit can potentially take place at any point in time and the effect of treatment on an outcome can vary over time subsequent to treatment. We limit our paper only to the situation where treatment is dichotomous (yes or no), nonrepeatable, and nonreversible.2 That is, a unit can receive a treatment only once, and the treatment status stays “on” once a unit receives a treatment. Another way to visualize this is to imagine that each unit carries an indicator of being treated or not over time. The indicator can be turned “on” but not “off” once it is turned on. We are interested in the causal effects of whether and when the indicator is turned on.

Our limitation to nonrepeatable and nonreversible treatments in this paper makes our case qualitatively different from situations in which fixed-effects models are applied to longitudinal data. Fixed-effects models are powerful statistical tools for causal inference because they control for unobserved but time-invariant characteristics that may be confounders that affect both the causal variable and the outcome variable in observational studies (Allison 1994; Allison and Christakis 2006; Angrist and Krueger 2000; Winship and Morgan 1999). However, fixed-effect models capitalize on the condition that a treatment condition can be reversed. For a dichotomized treatment, a fixed-effect model utilizes information effectively only from units that change treatment status over time—that is, those that change the treatment indicator from “on” to “off” versus those that change from “off” to “on.” As shown by Chamberlain (1984), the comparison of the two-way transitions affords the researcher a particular leverage with which to net out unobserved but fixed attributes (see also Powers and Xie 2000, chap. 5) on longitudinal data. Since our setup does not permit units to transition from the “on” state to the “off” our conceptual framework is incongruent with the fixed-effects model.3

Even for this restricted case, we need to consider a matrix of potential outcomes. Consequently, the causal framework for this setting, requires a complicated conceptualization of many potential counterfactuals. As we show below, consideration of time-varying treatments and time-varying outcomes gives rise to a large number of possible contrasts for potential outcome comparisons. Indeed, the number of such contrasts can become unmanageably large with even a moderate number of time points. Motivated by substantive considerations in sociological research, we propose a simplifying solution for the analysis of causal effects with time-varying treatments and time-varying outcomes.

The rest of the paper is organized as follows: (1) We provide notation for individual-level treatment effects in four scenarios: (a) classic potential outcome setup with two periods, (b) single-time treatment and time-varying outcomes, (c) time-varying treatments and single-time outcome, and (d) time-varying treatments and time-varying outcomes. (2) We define population-level mean treatment effects, including estimation under ignorability and comparison units utilized in the aforementioned settings. (3) We develop a composite causal effect, in which we decompose the expected value of the outcome for the comparison units with a “forward-looking sequential” approach. This approach involves a weighted combination of comparison units where the weights correspond to when the units are treated or not treated in the observation period. (4) We illustrate our approach with an empirical example demonstrating the causal effect of disability on unemployment using panel data from the Wisconsin Longitudinal Study (WLS). (5) We discuss a few possibilities of parametric modeling and nonparametric smoothing strategies. (6) We end the paper with concluding remarks.

1. NOTATION FOR INDIVIDUAL-LEVEL TREATMENT EFFECTS

The occurrence of a life event, such as disability, can be conceptualized as a “treatment” for which we wish to establish an effect.4 The estimation of a treatment effect on an outcome (such as unemployment) hinges on a counterfactual; that is, inferences must be made about an outcome that would have been observed for a treated unit had that unit not been treated. The potential outcome approach formalizes this counterfactual view of causal inference and explicitly recognizes that each observational unit can be conceptualized as potentially having different values of the dependent variable that correspond to different conditions of the causal variable (Rosenbaum and Rubin 1983; Rubin 1974). Below, we develop notation for four different scenarios.

1.1. Classical Two-Period Setup

We first consider the conventional case where an effect is evaluated without attention to the timing of the treatment, beyond the fact that the outcome is measured subsequent to the occurrence of the treatment. Let y be an outcome, and let d be a variable scored d= 1 for a treated unit and d > 1 for a unit that was not treated. The conventional notation is to let d= 0 for a control unit; however, letting d > 1 will prove useful as we develop the more general, time-varying case. Letting d > 1 also makes substantive sense; we know only that a unit was not treated in this study, not that a unit was never treated. Let yis be the potential values of the outcome variable for unit i, with superscript s representing treatment status with two possibilities: d= 1 or d > 1. That is, yid=1 is the outcome value if i is treated, and yid>1is the outcome value if i is not treated. Note that notations yid=1 and yid>1 correspond to more commonly used notations yit and yic (Winship and Morgan 1999), which we also used earlier in the paper.

For unit i, the treatment effect is defined as the difference between the two potential outcomes in the treatment and control states:

image(1)

Of the two potential outcomes, however, only one is actually observed, depending on the actual treatment that unit i receives. For example, for a person who is treated, yid=1 is observed while the value that would have been observed if that person had not been treated, yid>1, is unobserved. Similarly for a person who was not treated, yid>1 is observed but not yid=1.

Let us now examine the time component to this conventional potential outcome framework: A unit is assigned to treatment or control at a given point in time (period 1), and values for an outcome are assessed at some fixed time subsequent to the assignment, say the end of period 1. This conventional, two-period case is depicted in Table 1, which cross-classifies the treatment period and the outcome measurement period. There is no time variation implicated in this setup, beyond the fact that the outcome is measured after treatment assignment (although we refer to the time of outcome measurement also by period 1).

Table 1. 
Time-Invariant Treatment, Time-Invariant Outcome: Two Potential Outcomes
inline image

1.2. Single-Time Treatment and Time-Varying Outcomes

We can easily generalize this two-period setup into one in which the treatment condition is introduced only at one time (period 1), but outcomes are assessed at multiple subsequent time points. For example, we might wish to know the effect of a parental divorce on a child's educational attainment at age 20 and at age 25, or the effect of a job displacement on a worker's subsequent earnings at multiple time periods after experiencing the event. To address such causal questions, we extend the earlier setup by allowing the outcome variable to vary with time, as depicted in Table 2. Time is treated as discrete in our setup (with t= 0, 1, …T). It may correspond to historical period or age.

Table 2. 
Time-Invariant Treatment, Time-Varying Outcome: y as a Matrix of Potential Outcomes
inline image

In this setup, y is a [2 × (T+1)] matrix of potential outcomes, with two possible treatment conditions. Outcomes can be measured in period 0 (i.e., baseline measurement), period 1, and so on all the way to period T, the final period under study. Note that the restriction of time-invariant treatment rules out the possibility that some units may be treated between time 1 and T. This means that if a unit is not treated in period 1, it remains untreated by the end of the observation (T). The causal question is then focused on the comparison of a pair of potential outcomes at any time (i.e., any column in Table 2). This means that, for a study of T observation periods, there are T counterfactual comparisons. We rewrite equation (1) to incorporate time-varying outcomes as follows:

image(2)

where the subscript v= 1, …T indicates the outcome measurement period.

1.3. Time-Varying Treatments and Single-Time Outcome

There are many situations in sociological research in which we are interested in more than two treatment periods. For example, suppose that we want to know the effect of a disability on subsequent employment status. The previous two scenarios would restrict us to evaluate the effect of a disability for an individual at time 1 on employment status at subsequent times. However, an individual could be disabled at many different and substantively interesting time points over the life course. Another sociological example is the effect of a parental divorce on a child's educational attainment. When we define the individual-level effect of a divorce on high school completion, we are faced with a time-varying treatment (i.e., a parental divorce can occur at many points in time throughout childhood) and a single-time outcome (i.e., educational attainment as of age 20). A time-varying setup would allow for consideration of different points at which the individual experiences an event.

Table 3 illustrates the scenario in which we have time-varying treatments and a single-time outcome. Note that this table shows a vector of potential outcomes for y. Given that treatment is not repeatable, treatment can occur in period d (d= 1, 2, …, T). For units not treated in the observed T periods, we denote them by the notation d > T. Clearly, this setup is more complicated than the first scenario, illustrated in Table 1, in which we have just two potential outcomes for an outcome measured at time T.

Table 3. 
Time-Varying Treatment, Time-Invariant Outcome: y is a Vector of Potential Outcomes
inline image

Our first task is to define the causal effect of interest. As discussed earlier, a causal effect entails the comparison of potential outcomes associated with two possible treatment conditions. If loss of a job at time t is one treatment condition, the causal effect will depend upon one's definition of the reference counterfactual treatment condition. One possibility, which is a common practice, is to treat the untreated status (designated by d > T) as the reference counterfactual. Under this conceptualization, the causal effects associated with treatments at T different time points correspond to T versions of equation (1), with treatments specified by times of treatment:

image(3)

with t= 1, …T−1.

However, this practice precludes many other interesting causal questions. For example, we may be interested in the causal effect of being treated at one time (say t) versus being treated at another time (say t'). For many sociological questions, the appropriate comparison is not whether or not an individual is treated but when treatment occurs. For example, events such as leaving school and entering sexual union are likely to happen to most people. For these events, a scientifically interesting question is not to compare the condition of experiencing the event to the condition of not experiencing the event, but to evaluate outcomes associated with different time points at which the event occurs. That is, we may be interested in the following quantities:

image(4)

where tt', t < T, and t' < T.

This means that we can compare any two elements in the main column of Table 3. With time-varying treatments, the number of possible pairwise contrasts thus increases rapidly. Letting T represent the number of possible treatment periods, the number of possible pairwise comparisons is equal to [T(T+ 1)/2]. If there is one possible treatment period, then there is only one comparison, reducing our setup to the conventional case comparing the treated versus untreated. If there are two possible treatment periods, there are three possible pairwise comparisons: d=1 with d>T, d= 2 with d>T, and d= 1 with d=2. They answer the following different questions: (1) What is the causal effect of treatment at time 1 versus no treatment at all?(2) What is the causal effect of treatment at time 2 versus no treatment at all?(3) What is the causal effect of treatment at time 2 versus treatment at time 1? If there are six possible treatment periods, there are 21 possible pairwise comparisons.

Furthermore, it is unclear that a comparison of two potential outcomes associated with specific treatment conditions, as expressed in equations (3) and (4), is always substantively interpretable. The problem is rooted in the fact that the actual social process is always cumulative and in this sense path-dependent. At any given point (i.e., conditional on past experience), we are interested in the consequences of experiencing a treatment. Potential outcomes associated with treatments at earlier times are no longer relevant and should not serve as reference counterfactuals for comparison, because they are no longer available for the unit to experience. In our setup, the unit at risk for experiencing a treatment at time t has not experienced the event up to t. If a unit remains untreated at time t, which is the only alternative to treatment at time t, the unit could experience treatment at any time subsequent to t. Given that we do not know which potential outcome associated with a future treatment condition should be used as reference, we may find a way to simplify the problem and focus only on treatment information at t when assessing the treatment effect at t. This calls for a way to incorporate future treatment paths into a composite reference at the present.

Let us consider the effect of divorce on health as an example, treating divorce as an absorbing state. A person may get a divorce at time t. When we evaluate the causal effect of getting a divorce at time t, we take for granted that the person has remained married until t. It is thus not sensible to ask the causal question of the effect of divorce at time t versus divorce at an earlier period before t. Rather, an appropriate question to ask is the causal effect of being divorced at time t versus not being divorced at time t. If a person remains married at time t, he or she may be divorced at time t+ 1, or at t+ 2, and so on. Thus, we focus on causal questions that center on whether or not an event occurs at a particular time, with the reference being a composite incorporating future counterfactuals. In constructing a composite reference, we remain agnostic about future events and collapse all future paths when assessing the treatment effect at a particular time. We call this a “forward-looking approach.”

Using this approach, we define the composite treatment effect at t on an outcome measured at T, denoted by Δt*iT, as

image(5)

where yid=t is the value of the outcome that would be observed if a unit is treated in period d=t, t= 1, …, T, and y *di>t is the value of the composite outcome for the same unit had that unit not been treated up to t. Note that in our original setup with SUTVA, potential outcomes are assumed to be associated with particular times of treatment (shown in Table 3). In this setup, there is no room for a counterfactual outcome that is associated with not experiencing an event at t. Thus, the reference for comparison in equation (5), y *di>t, is a composite of counterfactuals rather than a true counterfactual. For this reason, we add a superscript asterisk to denote specifically that this quantity is a composite. For the special case of t=T, we follow the convention and treat the potential outcome of the untreated state yid>T as a true counterfactual. So, we simply make y *di>t=yid>t., omitting the asterisk, if t=T.

For simplicity, we consider only linear combinations when constructing the composite. Thus, we can define y *d >ti as

image(6)

where ws are weights, with the following normalization constraints:

image(7)

As long as SUTVA is assumed for all counterfactual outcomes, a composite as a linear combination of them in the form of equation (6) also satisfies SUTVA. That is, while y *diT>t is not a counterfactual in our setup, it can be treated like one.

1.4. Time-Varying Treatments and Time-Varying Outcomes

Generalizing the setup further, we now consider the situation in which we have a time-varying treatment and a time-varying outcome. Table 4 illustrates this case, where y is a matrix of potential outcomes. The matrix is a square with (T+ 1) rows and (T+ 1) columns. Treatment can occur in period 1, period 2, and so on to period T, or not at all in the observation period. Outcomes can be measured in period 0 (i.e., baseline measurement), period 1, and so on to period T. We do not include an outcome measurement beyond time T.

Table 4. 
Time-Varying Treatment, Time-Varying Outcome: y as a Matrix of Potential Outcomes
inline image

The causal questions have a dynamic dimension such that each particular causal effect of interest entails a different counterfactual. The matrix is divided by the main diagonal, with diagonal and lower off-diagonal cells bracketed into boxes, which may be thought of as “black boxes,” the future of which is unknown at the time of the corresponding outcome measurement. The upper off-diagonal cells refer to potential outcomes associated with specific treatment conditions and measured outcomes, and the lower off-diagonal cells refer to outcomes only for untreated states. Since a potential outcome measured at time v is not defined after v, we define y d >vv as the potential outcome at time v when the unit is not treated by time v.

Let us now illustrate our forward-looking approach in Table 5. First, consider the second column in the table. Determining the effect of a treatment for an individual treated in period 1 on an outcome measured immediately thereafter (i.e., at the end of period 1) involves a comparison of y d= 11 with the outcome measured at time 1 for the individual's untreated state at period 1, y d1>1, which is a potential outcome at time 1; either its future outcomes will depend on conditions of treatment in later periods or not treated at all. Similarly, consider an example from the third column in Table 5. Determining the effect of treatment for an individual treated in period 2 on an outcome measured at the end of period 2 involves a comparison of y d= 22 with y d >22 for this same individual. However, we may wish to make comparisons when outcomes are measured at a later point than the time of treatment. For example, we may want to know the effect of treatment for an individual treated at time 1 on the outcome at T−1. This involves the comparison of the element of y d= 1i (T−1) to an array of other elements in the T−1 column, summarized as y *d >1i (T−1).

Table 5. 
Time-Varying Treatment, Time-Varying Outcome: Examples
inline image

In general, we can define the composite effect of treatment at t on an outcome measured at v, denoted by Δ*tiv, as

image(8)

where v≥ t. We define y *d >tiv=y d>tiv if v=t. When v > t, y *d >tiv is a composite counterfactual reference, which is analogous to equation (6) as

image(9)

with normalization constraints that all weights sum to 1:

image(10)

Our key formula, equation (8), illustrates that, in a time-varying treatment and time-varying outcome setting, we can consider a composite treatment event by two time dimensions, the time of treatment (t) and the time of outcome (v), as long as vt. That is, the composite treatment effect can be defined for all upper-diagonal cells in Table 4, as illustrated by the examples in Table 5. Thus, there are altogether [(T−1)T/2] possibilities. For example, we may want to know the effect of being treated in period 1 on an outcome measured at T– 2. This entails a comparison of yT−2d= 1 with potential yT−2 outcomes for all states not treated by 1. As Table 5 shows, the composite counterfactual reference turns out to involve all the other elements in the column labeled (T−2) for the time of the outcome. Suppose instead that we want to know the effect of treatment in the first period on an outcome measured at T– 1. Here we compare yd= 1T−1 with a composite that involves yT−1d= 2, yT−1d= 3, yT−1d=T−2, yT−1d=T−1, and yT−1d>T−1.

The question regarding what composite treatment effects to focus on in a research setting is a substantive one. At what point in the life cycle or in what temporal period, for example, does a disability “hurt” the most? While the WLS does not have detailed data on job characteristics between 1975 and 1993, it does have a detailed record of employment status for those years. Suppose that a person is disabled at age 38 and we observe his or her employment status at age 43. We want to compare that person's employment status at age 43 to his or her employment status at age 43 had he or she not been disabled at age 38.5 We could ask many similar life cycle or temporal period causal questions: What is the effect of disability for someone disabled at age 38 on employment status at age 50? Or, what is the effect of being disabled in 1980 on employment status measured in 1990? Our approach lends itself to addressing such questions by explicitly depicting the apt comparisons.

2. ESTIMATION OF POPULATION-LEVEL MEAN TREATMENT EFFECTS

The fundamental problem of causal inference is that the individual treatment effect is unobservable because one of the quantities needed to calculate it is necessarily missing (Holland 1986). At a given point in time an individual may be exposed to one of two values of the causal variable, treatment or control, but not both. In this section, we first provide the conventional discussion of estimation under ignorability, followed by a discussion of the comparison units utilized to estimate treatment effects in the time-invariant versus the time-varying treatment setting. We then discuss a weighted composite estimand for the estimation of mean treatment effects.

2.1. Estimation Under Ignorability

Although an individual-level causal effect is unobservable, average treatment effects over a population or subpopulation can be identified, under the assumption that the treatment assignment satisfies some form of ignorability, exogeneity, or “unconfoundedness”—that is, controlling for a set of observed covariates. The ignorability assumption requires that the likelihood of treatment be independent of the potential outcomes associated with different treatment conditions (Angrist and Krueger 2000; Heckman, LaLonde, and Smith 2000; Imbens 2004; Rosenbaum and Rubin 1983). Let us define the time-invariant average treatment effect by taking the expectation of equation (1):

image(11)

Neither component of this treatment effect has a direct sample analogue unless there is universal treatment, or treatment is randomly determined (Heckman 1997). In other words, estimation of this quantity is not possible without assumptions because the potential outcomes d =1 and d>1 may be correlated with d. To see this, note that E (d= 1) pertains to the whole population of units, those actually assigned to treatment and those actually assigned to control; the same may be done for E (d>1). Hence, E (d =1) is not necessarily equal to E (d =1 | d= 1); the latter expectation is observable by observed treatment status. The two would be equal only if d =1 is mean-independent of d—that is, only if

image(12)

where the second and third terms are unobservable. The same argument applies to E (d>1). It is equal to the observable E (d>1 | d>1) only if d>1 is mean-independent of d—that is, only if

image(13)

where the second and third terms are unobservable.

Randomization is one way to address this problem, to make sure equations (12) and (13) hold, so that the average treatment effect may be estimated from observed data. In a randomized experiment, the treatment and control samples are randomly drawn from the same population. Therefore, randomization ensures the following independence condition:

image(14)

This says that the potential outcomes associated with treatment and control conditions are independent of assignment status. This is, in the language of Rubin (1974), “ignorable treatment assignment.” Since the treated and control groups do not systematically differ from each other, randomized treatment guarantees that the difference-in-means estimator of the treatment effect is unbiased and consistent. In other words, with random assignment,

image(15)

where the terms on the right can be estimated by the respective observed sample means of y for the treated and the control groups.

In observational studies, ignorable treatment assignment is seldom plausible, which means that equations (14) and (15) are unlikely to hold. Hence, comparing the respective sample means of the treated and control groups will likely yield a biased estimator of the average treatment effect because the potential outcomes will not be mean-independent of d. The typical recourse in this situation is to conjecture that the potential outcomes are mean-independent of treatment status d after conditioning on a set of observable exogenous covariates, say X, that capture pretreatment characteristics of the units and that may determine selection into treatment and control groups. Hence, if we measure all the systematic factors that determine whether or not a unit is treated, or given the measured covariates the unmeasured factors that predict treatment assignment are rendered null, then conditioning on these variables would be like randomizing and would render d mean-independent of the potential outcomes.

Let X denote a vector of observed exogenous pretreatment covariates. Ignorable treatment assignment is satisfied conditionally:

image(16)

The mean independence assumption implies that

image(17)

and

image(18)

Notice that the first equality signs in (17) and (18) establish a relationship that is analogous to those given in (14) and (15), conditional on the observed covariates. Equality (17) states that for units actually treated, their conditional average outcome had they not been treated would have been just like the conditional average outcome observed for the control group of untreated units. This implies that the observed sample mean for the control group is representative of what the mean outcome for the treated units would have been (i.e., their potential outcome) had they not been treated. Equality (18) is analogous and has a similar implication.

A second assumption in addition to (16) is needed to exactly parallel the case of randomization:

image(19)

where P(d= 1 | X) is the probability of assignment to the treatment group given the set of observed pretreatment covariates. This assumption, sometimes labeled “overlap” (Imbens 2004), states that there is the possibility of both a nontreated analogue for each treated unit and a treated analogue for each nontreated unit. If a subgroup (as defined by X) belongs entirely to either the treated group or the control group, the overlap assumption is violated, with P(d= 1 | X) equal to 1 or 0. When this occurs, it is infeasible to estimate both potential outcomes for the subgroup.

Under assumptions (16) and (19), the average treatment effect conditional on X can be written as

image(20)

where both terms can be estimated from observed data. In our discussion of time-varying treatment effects, we will assume ignorability given a set of observable covariates X. To avoid complications of endogenous covariates in a longitudinal setting (Barber, Murphy, and Verbitsky 2004), we limit ourselves only to pretreatment covariates that do not vary with time.

2.2. Comparison Units in a Time-Varying Setting

One practical implication of the preceding discussion is that, in order to estimate causal effects of a treatment, the researcher needs to find appropriate comparison units (or “control groups”) that are observationally equivalent to the treated units. For the classic two-period setup, untreated units (after appropriate covariate controls) constitute a natural comparison group so that the average treatment effect is estimated by the difference expressed in equation (20). When the timing of a treatment is taken into consideration, however, it is no longer clear what should be the appropriate comparison units. Depending on the causal question asked, the comparison group changes. In this setup, the research question may center on the causal effect of the timing of treatment. The untreated group is just a special case in which the event has not occurred by the end of the observation period. In other words, we can think of the untreated group as units for which the timing of treatment is censored (Smith and Maddala 1983).

Consider again Table 3. Any other potential outcome could serve as a comparison group for another potential outcome. As argued before, the number of pairwise comparisons can become unmanageably large even with a moderate number of time points: [T(T+ 1)/2]. Our forward-looking approach leads us to a simplifying solution, one that focuses the researcher's attention on the time of treatment, as if the units in question were momentarily frozen at time t and then randomized into treatment versus nontreatment. This solution has two important implications for defining the appropriate comparison units. First, units that have received treatment in the past (before t) no longer serve as comparison units. Second, units that are not treated at t may be treated at a later time or remain untreated until the end of the study.

More concretely, this simplifying solution yields a composite estimand that combines all possible outcomes into a (d > t) comparison group. We take the expectation of equation (8), conditional on X:

image(21)

where y *ivd>t, the composite counterfactual reference, was defined earlier in equation (9). The ignorability assumption means that, conditional on X, the following is true:

image((22a))
image((22b))

Thus, we can use observed data, which can yield the second terms of equations (22a) and (22b), to estimate the population average composite treatment effect defined by equation (21).

This approach forces the researcher to focus on the time of treatment and also significantly reduces the number of potential comparisons. For example, let the outcome be measured at T. It significantly reduces the number of comparisons from [T (T+ 1)/2] to T. If we have six possible treatment periods, we have six possible composite comparisons instead of 21 possible pairwise comparisons. These six comparisons include: d= 1 with y *d>1, d= 2 with y *d>2, d= 3 with y *d>3, d= 4 with y *d>4, d= 5 with y *d>5, and d= 6 with d>6. As shown earlier in equation (9), the information set for the composite reference group for a treatment effect at t depends on the time at which the outcome is evaluated (denoted by v). The more that v is greater than t, the more potential treatment-specific future paths are observed.

The literature on causal inference with observational data in statistics has been developed largely on the ignorability assumption, which may be unrealistic: the premise is that observational data can be made analogous to experimental data through statistical controls. For the classic two-period case, the ignorability assumption is analogous to single-time random assignments into treatment or control. For our time-varying treatment case, we need to assume sequential ignorability to mimic sequential randomization: at each discrete point of treatment t, it is as if subjects were randomly assigned into treatment and or not treatment. For those who are assigned not to be treated at t, they are at risk for being assigned to treatment again later. However, we do not impose a priori the fractions assigned to treatment at different time points. As we will show later, these fractions serve as appropriate weights in forming composites. In this paper, we take the Xie and Wu (2005) approach and use the fractions from observed data.

Hence, instead of looking for a set of comparison units that are untreated by the end of a study, we call for comparison units that are untreated at time t. Under ignorability, observed values of untreated units at time t give us the necessary information about the expected value of the individual-level composite counterfactual y *d>t. We call our approach forward-looking because units that are treated in the future, but not in the past, are part of the comparison group.6 For example, if we are interested in d= 2, we compare this outcome with those units treated in all subsequent treatment periods—that is, d= 3, d= 4, d= 5, and so on, and those units not treated in the observation period, d>T.

Consider again Table 4. Information is utilized across cells to yield estimates of causal effects. The untreated states in the boxes are later separated into actual paths; however, we do not know these future potential paths at each point when the outcome for the treated is measured. Therefore, for estimation purposes, these states collapse into one undifferentiated untreated state at time t. With the passage of time since t, however, states in a box are sorted into future treatment paths, with outcomes observed associated with the treatment paths.

Whereas units treated at time t serve as a comparison group for units treated before time t, these units should not be included in a comparison group for units treated later than t. Thus, we argue that the comparison group for counterfactual reasoning with time-varying, nonrepeatable treatments should be forward-looking. Consequently, while pairwise comparisons are symmetrical, composite comparisons entail asymmetry. Consider two causal questions: (1) What is the causal effect of treatment that occurs at d= 1?(2) What is the causal effect of treatment that occurs at d= 2? The first causal question involves the comparison between those units treated at d= 1 and those units not treated at d= 1. The second causal question is only sensible for those units not treated prior to t= 2. That composite comparisons involve asymmetry is a reflection of an asymmetrical cumulative social process.

An example that would benefit from our conceptualization, and a subject matter that has received considerable attention in the sociological literature, is the effect of parental divorce on children's educational attainment (see Seltzer [1994] for a review of the literature). If we want to estimate the effect of divorce on high school completion (McLanahan and Sandefur 1994), we may want to consider a time-varying treatment (i.e., parental divorce can occur at many points throughout childhood), and a fixed outcome (i.e., educational attainment as of age 20).7 There is general agreement that time is an important component of the effects of parental divorce on children's achievement; children who are younger when their parents divorce may be more seriously disadvantaged than those who are older at the time of disruption. It may also be, however, that some of the loss of economic, parental, and community resources is recouped as time passes, such that children who are younger at the time when the event occurs may have lessened their disadvantage (Hanson, McLanahan, and Thomson 1998). Our approach is well-suited to consider carefully the comparisons needed in order to estimate the effects of divorce on achievement for children experiencing parents' divorce at different points in time throughout childhood.

Another example is the effect of a job displacement on subsequent earnings.8 Using the time-invariant approach, we evaluate the effect of a displacement for individuals at time 1 on earnings at time 2. The simple pairwise comparison can tell us the average earnings that would have been observed for displaced workers had they not been displaced. The time-invariant setup does not, however, fully reflect the complexity of longitudinal data structures or the reality of a worker's lived experience. A worker could be displaced from a job at any point in time that he or she was at risk for being displaced. In other words, those who never receive treatment are a selected subset of those who are assigned not to receive treatment at time t. This selection process is difficult to model or control statistically. Imagine an experiment in which persons are assigned at random to receive or not receive treatment at time t and among those assigned not to receive treatment at time t, some will and some will not receive it at t+ 1, t+ 2, and so on, up until time T.

Sometimes, data limitations constrain the outcome to be time-invariant. Brand (2006) examines panel data from the Wisconsin Longitudinal Study and considers displacement events for workers who were displaced between the years 1975 and 1992, or between the ages of approximately 35 and 53 years old. The WLS collected data on characteristics of respondents' jobs in 1992. Suppose that a worker in the WLS is displaced at age 38 and we observe his or her earnings in 1992, at age 53. We want to know what that worker's earnings at age 53 would have been had he or she not been displaced at age 38. We can ask numerous similar questions: What is the effect of displacement for workers displaced at age 40 on earnings at age 53? Or, what is the effect of being displaced at age 50 on earnings at age 53? Again, our approach motivates a careful consideration of the comparisons needed for each causal question. Additionally, data allowing, such as would be the case using data from the Panel Study of Income Dynamics, earnings could be measured at multiple time points postdisplacement: 1 year postdisplacement, 5 years postdisplacement, and so on.

One other example is the effect of disability on subsequent employment status. The time-invariant setup only allows individuals to be treated or not treated—that is, to experience disabling events or not, by a fixed point prior to the outcome variable measured at a later point. The time-varying setup allows for consideration of different points at which the individuals experience an event as well as the assessment of outcomes at multiple points throughout the life course. Charles (2003) uses longitudinal data from the Panel Study of Income Dynamics (PSID) and examines how temporal effects of disability on earnings depend on the point in the life cycle at which the treated suffer the onset of impairment. Charles hypothetically asks what the effect of being disabled at age 25 is on earnings at age 50, and how the effect of being disabled at age 25 on earnings at age 50 differs from the effect of being disabled at age 40 on earnings at age 50.9 Our approach lends itself to attend to such a question by explicitly depicting the apt comparisons. Moreover, Charles' inquiry involves a fixed outcome. We might further investigate the effects on earnings at different points in time subsequent to the onset of disability.

3. COMPOSITE CAUSAL EFFECT FOR TIME-VARYING TREATMENTS

For simplicity, we drop the notation of conditioning on X, although this is implicit throughout the remainder of the paper. From equation (21), we define the average treatment effect of a time-varying treatment on a time-varying outcome as

image(23)

where E(yvd=t) is the expected value of the outcome that would be observed for units treated at d=t. Again, we note vt. When v = t, we define E(y *vd>t) =E(yvd>t), and equation (23) is reduced to a two-group comparison case, as in equation (11). When v > t, E(y *vd>t) is the expected value of the forward looking composite outcome for units not treated up until d=t. E(y *vd>t) is decomposable into a combination of group-specific expectations associated with subsequent treatment conditions. For a unit that was not treated at d=t, we specify the counterfactual outcome to follow the principle of forward-looking sequential expectation. A forward looking sequential approach involves a weighted combination of those units later treated and those units not treated at all by v. Under the ignorability assumption of equations (22a) and (22b), we can use observed data to estimate the two quantities in equation (23), both for the situation v=t and the situation v > t.

We explicate the general formula for δvd=t by first discussing three specific cases. First, consider the case when d=t=T. The average effect is defined as

image(24)

The outcome can only be assessed at the last period, with v=T. Figure 1 is a “forward tree” depicting the situation in which t=T–2, t=T–1, and t=T. If a unit is not treated at T, that unit has only one possible alternative, to go untreated in the observation period. In other words, because T is the last possible treatment period, units cannot be treated after T. As a result, equation (23) is reducible to the two-period case as in equation (24)—the simple difference between the expected value of the outcome for units treated at T and the expected value of the outcome for units not treated.

Figure 1.

Forward tree (from d=T−2).

Second, consider the situation when t=T– 1. As depicted in Figure 1, units that were not treated at T– 1 could be either treated at T or not treated in the observation period. In this case, v can be measured at two time points, T– 1 or T, but we only consider v=T here for illustration. As there are two possible paths for units that were not treated by T– 1, there are two components to E(y *Td>T−1), shown as follows:

image(25)

where P(d=T | dT) is the probability of being treated at t=T given that units were not treated at t=T– 1, E(yTd=T) is the expected value of the outcome for units treated at t=T, and E(yTd>T) is the expected value for units not treated in the observation period.

Third, consider the situation when t=T– 2. The outcome can be assessed at v=T– 2, T– 1, or T; again, we consider only v=T here. As depicted in Figure 1, there are three possible paths for units that were not treated at T– 2: treated at T– 1, treated at T, or not treated. Again, we decompose the E (y *Td>T−2) into its components:

image(26)

We need to further decompose a part of the second component, E (y *Td>T−1), by equation (25). To simplify notation, let p(t) =P(d=t | dt), and q(t) = 1 –p(t). Then,

image(27)

Equation (27) shows that the “controls” for treatment at the t=T−2 period consists of three components—that is, the three possible forward-looking paths (treated at T– 1, treated at T, or not treated) that are appropriately weighted by transition probabilities. The transition probability is cumulative between the treatment period and the period of decomposition. For example, the third component in (27) contains the product of q(T) and q(T– 1).

We now present a general formula. The E (y *d>tv) term in equation (23) is decomposable into additive components corresponding to counterfactuals by treatment periods from t to v, plus a component corresponding to the counterfactual for untreated status by v. Each “treated” component contains an expected value associated with being treated at a time period t ′, t < t ′≤v, with weights equal to the product of q()s (of not being treated) up to t' and p(t') (of being treated). For the untreated condition by v, we use the product of q()s as the weight. Thus, we derive the following formula:

image(28)

where v ranges from {t+1, …T}, and the q(h) term requires that t' > t; otherwise, the q(h) term equals 1. The p() and q() weights in equation (28) are assigned based on how likely it is that units are treated or not treated at each possible treatment period—that is, the probabilities of being in each cell. In general, weights are assigned based on marginal probabilities estimated from observed data, as was done in Xie and Wu (2005). This approach allows weights to be determined by social processes that have naturally occurred.

4. AN EMPIRICAL EXAMPLE

We demonstrate our approach by taking up our previously mentioned example of the effect of the onset of a disability on subsequent employment status, using data from the Wisconsin Longitudinal Study (WLS).10 WLS data provide both yearly employment status and disability status for a large sample that is broadly representative of non-Hispanic white high school graduates over their life course. Our analysis sample consists of 6739 individuals for whom we have data on employment status between ages 35 and 65 (or between 1975 and 2005) and disability status and timing. Of those 6739 individuals, 1575 were disabled at some point between ages 35 and 65.

As a first step, we estimate the effect of a disability that occurred between ages 35 and 65 on the probability of being unemployed at age 65 using a simple pairwise comparison.11 We adopt a linear probability model of the following form: 12

image(29)

We find that persons who were disabled between ages 35 and 65 have an increased probability of unemployment at age 65 of 0.077 (p= 0.000); in other words, disabled persons are about 8 percent more likely to be unemployed than they would be if they had not been disabled.

Disability can occur at various points in time over the course of an individual's life. We might hypothesize that there would be differences in the likelihood of unemployment depending upon when a person experiences the onset of a disability. We observe a 30-year life history in the WLS. For simplicity, as well as for the possibility of recall bias, we divide this lengthy interval into six 5–year time intervals.13Figure 2 is a flowchart of disability transitions in the WLS, where the numbers in parentheses indicate sample sizes at each transition. We begin with a sample of 6739 nondisabled individuals, and those individuals can either be disabled at age 35–39 or not disabled; those non-disabled individuals can either be disabled at age 40–44 or not disabled; those nondisabled by age 44 can either be disabled at age 45–49 or not disabled, and so on. Each transition is associated with a marginal probability weight p() of being treated or q() of not being treated at that particular period. For example, among the nondisabled at age 35, the p(1) weight (treated age 35–39) is equal to 0.007 and the q (1) weight (not treated age 35–39) is equal to 0.993.14

Figure 2.

Flow chart of disability transitions in the Wisconsin Longitudinal Study.

We now consider the case in which we have a vector of potential outcomes, as depicted in Table 3, such that we have six possible time periods in which individuals may have been disabled, plus the possibility that persons are not disabled in the six periods. Employment status is measured in the last period (i.e., at age 65). Consider the example of the effect of being disabled between ages 40 and 44 on the probability of being unemployed at age 65, or approximately 20 years after the onset of a disability. If we compare those disabled at ages 40–44 to those not disabled in the observation period (i.e., not disabled age 35–65), a pairwise comparison, we find an increased probability of unemployment of 0.215 (p= 0.000). If, however, we compare those disabled at ages 40–44 to those not disabled until age 40–44, the future of which is unknown at that particular time, we have five potential paths: persons could have been disabled at age 45–49 (period 3), disabled at age 50–54 (period 4), disabled at age 55–59 (period 5), disabled at age 60–64 (period 6), or not disabled up until age 65. We utilize our composite causal effect estimand and estimate the treatment effect as follows:15

image
image(30)

The composite approach indicates that being disabled at age 40–44 results in a 20 percent increase in the probability of unemployment at age 65, rather than a 22 percent increase in the probability of unemployment using the pairwise approach. Therefore, if we use a simple pairwise comparison, we overstate the effect of being disabled at ages 40–44. The reason for this can be easily shown from the expected values in (30); not being disabled at age 40–44 does not preclude the possibility that one is disabled at a later age, and being disabled in a later age is associated with a greater probability of unemployment relative to those never disabled. If we ignore those potential future pathways, we overstate the effect of being disabled at an earlier period.

Not only can disability occur at various points in time over the course of an individual's life, its effects can be assessed at various points in time subsequent to its occurrence. Suppose again that we are interested in the effect of being disabled age 40–44 on employment status at age 55, or approximately 10 years following the onset of a disability. Our counterfactual path includes being disabled at age 45–49, disabled at age 50–54, or not disabled within the observation window (i.e., up until age 55), as depicted in Figure 2. So we compare the outcome for those disabled at age 40–44 to all possible future paths, where those disabled in the periods prior to the outcome measurement are sorted into treatment paths while we remain agnostic as to the occurrence of disability beyond age 55. Using our composite causal effect formula, this time we have three components or potential paths: disabled at age 45–49 (period 3), disabled at age 50–54 (period 4), or not disabled until age 55. We calculate the treatment effect as follows:

image
image(31)

The composite approach indicates that being disabled at age 40–44 results in a 13 percent increase in the probability of unemployment at age 55; in contrast, a pairwise approach indicates a 14 percent increase in the probability of unemployment. In this case, we would overstate the effect of being disabled by about 1percent.

Table 6 (a) provides the effects of being disabled during these six possible treatment periods on subsequent outcomes using the conventional pairwise approach; Table 6 (b) provides the corresponding effects using our composite approach. In most cases, the pairwise approach overstates the effect of a disability on subsequent employment status. Of course, life course factors dictate changes in employment status over time, which means that the mean level of unemployment is increasing over time for both disabled and nondisabled persons. However, when we compare only the employment status at age 55, or at age 65, for those individuals disabled at age 40–44 to those never disabled, we are overlooking some very different future possible paths that disabled persons at that age might have followed in the absence of a disability. Those potential pathways include being disabled at later periods, which are associated with a greater probability of unemployment relative to those never disabled.

Table 6. 
The Effects of Disability on Employment Status over the Life Course: Wisconsin Longitudinal Study
(a) Pairwise Comparisons
 Outcome Measurement Period (v= Age 40, 45, … 65)
Age 40Age 45Age 50Age 55Age 60Age 65
  1. Note: Numbers in parentheses are t-ratios.

 d= age 0.184*** 0.08 0.219*** 0.141** 0.083 0.079
35–39(3.44)(1.61)(4.93)(2.80)(1.24)(1.08)
d= age  0.059 0.105** 0.138* 0.138* 0.215***
40–44 (1.48)(2.95)(2.56)(2.56)(3.67)
d= age  0.151*** 0.195*** 0.22*** 0.162***
Treatment45–49 (5.54)(6.27)(5.33)(3.63)
Periodd= age  0.129*** 0.18*** 0.123***
50–54 (6.23)(6.55)(4.14)
d= age  0.053* 0.09***
55–59 (2.37)(3.72)
d= age  0.005
60–64 (0.23)
 
(b) Composite Comparisons
 Outcome Measurement Period (v= Age 40, 45, … 65)
Age 40Age 45Age 50Age 55Age 60Age 65
 d= age 0.187*** 0.079 0.214*** 0.131* 0.073 0.061
35–39(3.53)(1.58)(4.74)(2.54)(1.08)(0.84)
d= age  0.059 0.101** 0.13** 0.13*  0.2***
40–44 (1.46)(2.80)(3.14)(2.39)(3.42)
d= age  0.151*** 0.19*** 0.215*** 0.15***
Treatment45–49 (5.51)(6.07)(5.22)(3.37)
Periodd= age  0.131*** 0.184*** 0.116***
50–54 (6.37)(6.77)(3.92)
d= age  0.062** 0.089***
55–59 (2.80)(3.72)
d= age  0.005
60–64 (0.23)

5. ADDITIONAL MODELING STRATEGIES

In our example studying the effect of disability on subsequent employment status, we used a simple and descriptive method to illustrate the usefulness of our proposed framework. There are other possible modeling strategies that can make better use of available data or better answer scientific questions. We may impose structure (1) to handle sparse data across cells of the potential outcome matrix; (2) to test theoretically derived hypotheses with certain structural constraints; and (3) to condition on observable covariates. How to implement modeling strategies is a substantive question. In this section, we provide some examples to demonstrate possible modeling strategies for illustrative purposes. Consider now the example of the effects of job displacement on earnings. Several studies have used the Panel Study of Income Dynamics (PSID) to assess the effect of job displacement on earnings (see Fallick [1996] for a review). Figure 3 depicts a simple model of the effects of displacement on subsequent earnings. For workers who were never displaced (d > T), the earnings trajectory might follow a steady upward trajectory.16 For units treated at d= 2 in our hypothetical model, y is increasing until the event occurs, drops, and then recovers. We may hypothesize that workers enjoy an upward earnings trajectory over time prior to a job displacement, experience a large drop in earnings immediately after the displacement event, to be followed by a period of modest recovery in the years subsequent to displacement.17 A discontinuous change trajectory, where the reflection point occurs at the time of treatment, can capture shifts in elevation and/or slope. It might also be true that the effect of treatment differs across the life course or differs according to the historical period; for instance, older workers might experience a steeper initial decline and slower recovery than workers displaced in earlier career stages. In Figure 3, for units treated at d=T– 2, representing workers displaced at a later time in the life course than those workers displaced at d= 2, the drop in earnings is larger and the recovery subsequent to treatment is slower. We could adopt a multilevel approach to the discontinuity model depicted in Figure 3:

image(32)

where v is historical time, d= 1 if unit i is treated at time v, 0 otherwise, and Ev is the elapsed time since treatment. Under these definitions, the growth function for unit i has intercept β0 and slope β1 before treatment. At the time of treatment, unit i experiences an instantaneous increment β2. Posttreatment, the unit has intercept β02 and slope β13. The β coefficients can be specified as randomly varying around a mean and/or modeled as functions of measurable characteristics of the person.

Figure 3.

Modeling a discontinuity hypothesis.

If the model above did not meet our theoretical needs, we might utilize a different approach. For example, we might hypothesize that the effect of parents' divorce on children's educational achievement would lend itself to a spline approach. Splines are used to impose continuity restrictions at the join points so that the line can change direction without causing an abrupt change in the line itself. In a spline regression model, a turning point in the outcome is represented by a spline knot that joins the pretreatment regression line with the posttreatment regression line.18 We might model a linear-quadratic spline regression to capture the possibility of a diminishing effect of a parents' divorce on subsequent achievement decline. Or, if the response function is unknown, nonparametric regression can be used to explore the nature of the response function. Two common types of smoothing methods include moving average filtering and locally weighted scatter plot smoothing (“loess”) (Cleveland and Devlin 1988). For both methods, each smoothed value is determined by neighboring data points defined within a specified span. The loess method fits either a first- or second-order model based on cases in the neighborhood; each point in the neighborhood is weighted according to its Euclidean distance.

6. CONCLUSION

For statistical analyses, it is essential to begin by understanding the quantities to estimate (Rubin 2005). This is particularly critical when dealing with causal inference. Assumptions are always needed; it is imperative that they be explicated and justified in order to understand the basis of the conclusions of a study. Also, understanding assumptions imposed allows scrutiny and investigation of them and, consequently, the opportunity for improvement. Increasingly, social scientists are recognizing that the use of the potential outcome framework results in greater clarity, enabling precise definitions of causal estimands of interest and evaluation of methods traditionally used to draw causal inferences (Sobel 2000).

In this paper, we utilize the conceptual apparatus of the potential outcome, counterfactual approach to causal inference and develop a more general causal framework for longitudinal studies. We consider causal effects in which both exposure to treatment and the effects of treatment are time-varying. We compare the situation in which we have two potential outcomes to the situation in which we have a vector of potential outcomes (i.e., for a time-varying treatment and a fixed outcome), and the situation in which we have a matrix of potential outcomes (i.e., for a time-varying treatment and a time-varying outcome). The matrix of potential outcomes requires a complicated conceptualization of many potential counterfactuals. The causal question has a dynamic dimension, motivating integration of information over future outcomes.

Researchers repeatedly make decisions about the composition of control groups. By clearly showing the potential pathways an individual might follow, we see that inclusion of units treated at later periods in a control group is a sensible approach in a time-varying setting. With time-varying treatments and time-varying outcomes, the number of potential contrasts increases rapidly with passage of time to the assessment of outcomes, with units in the earlier comparison group sorted into future paths with associated outcomes. In contrast to the symmetrical pairwise approach, we develop an asymmetrical composite comparison group; we decompose the expected value of the outcome for the controls with a forward-looking sequential approach. This approach involves a weighted combination of those units later treated and not treated at all in the observation period. Our approach is an analog of ignorability for observational data to sequential-randomization for experimental data in a time-varying setting

At a superficial level, our approach looks similar to Robins's weighting method using the inverse of the propensity score of treatment as the weight, which is also used in longitudinal settings for causal inference (Barber, Murphy, and Verbitsky 2004; Robins, Hernan, and Brumback 2000). However, there are two important differences that set our approach apart from Robins's approach. First, our weighting method is asymmetric, with all units at risk of experiencing an event as controls, regardless of their future treatment paths, while all previously treated cases are not used as comparisons for those who were later treated. This asymmetrical treatment is sensible for understanding social consequences of nonrepeatable treatments but much less so for repeatable treatments, such as medication or health behavior. Second, our weighting scheme is cumulative over all future treatment paths. We propose this approach because we are interested more in the causal effects of a treatment at a particular time than those of a generic treatment regardless of time. Thus, we essentially view treatments at different times as qualitatively different treatments, whereas Robins and his associates view treatments at different times as essentially interchangeable.

We have discussed several examples of social research that may benefit from our approach, including the effects of parental divorce, job displacement, and disability on subsequent educational attainment, occupation, and earnings. We briefly illustrated our approach with an analysis of the effects of disability on employment status using 30 years of panel data from the Wisconsin Longitudinal Study. Disability is an inherently time-varying event and subsequent labor force participation is an inherently time-varying outcome. Our analysis of the causal effects of disability benefited from our longitudinal approach. We also discussed additional modeling strategies, including interrupted time series regression, spline regression, and loess smoothing.

A methodological extension to this approach would be to allow events to be repeatable. To extend our conceptualization to repeatable events is significantly more complicated. While many treatments can be conceptualized as nonrepeatable by treating the initial occurrence of the event as distinctive, such as an initial displacement event or the initial onset of a disability or parents' initial divorce, allowing events to be repeatable is a substantively important extension. For example, in the case of job displacement, Stevens (1997) finds that much of the persistence in earnings losses among displaced workers can be explained by additional job losses in the years following an initial displacement. To accommodate repeatable events would require additional simplifying assumptions. We leave this task to future development.

Footnotes

  • 1

    Efforts are under way to generalize the setting of two treatment conditions to multiple treatment conditions and continuous treatments, see Imai and Van Dyk (2004) and Imbens and Hirano (2004).

  • 2

    The nonrepeatable, nonreversible event restriction avoids significant complication to the time-varying potential outcome conceptualization. We plan, however, to consider multiple treatments in a subsequent paper. We discuss this further in our concluding remarks.

  • 3

    We thank an anonymous reviewer for pointing this out to us.

  • 4

    The U.S. Department of Labor defines disability as visible and nonvisible physical and mental impairments. Disability is generally defined in the literature, however, as a physical impairment that limits the kind or amount of work that an individual can do.

  • 5

    Because the WLS is a single cohort, this is akin to asking what the effect of disability is for an individual disabled in 1978 on employment status in 1983.

  • 6

    Comparing responses of those units treated in d=t with those units treated in d > t reveals the usefulness of letting d >T, rather than d= 0, for units never treated in the observation period; that is, the notation is greatly simplified when all control units correspond to periods greater than the treated period. This notation would not be possible if we had control units treated at d= 0. See Yunfei, Propert, and Rosenbaum (2001) for a discussion of the importance of matching units only on past data rather than future data. In other words, Yunfei et al. (2001) also use a forward-looking approach.

  • 7

    McLanahan and Sandefur (1994) use several longitudinal data sets to address this question, including the National Longitudinal Survey of Youth (NLSY), the Panel Study of Income Dynamics (PSID), and the High School and Beyond Study (HSB).

  • 8

    Job displacement is generally defined as involuntary job loss due to downsizing or restructuring, plant closing or relocation, or lay-off. Displacement is not the result of a worker quitting or of a worker being fired.

  • 9

    Several theories could be advanced to address this question. Charles (2003) hypothesizes that those individuals who became disabled at 25 should have higher earnings because they would have more years and incentive to adjust to disability status and acquire “disability capital.” His analysis confirms his hypothesis—that is, being older at onset causes the losses from disability to be larger and the recovery to be smaller.

  • 10

    The Wisconsin Longitudinal Study is a panel study of a cohort of 10,317 Wisconsin high school seniors in 1957. Follow-up data were collected in 1964, 1975, 1992–1993, and 2003–2005. In the early 1990s and 2000s, when WLS respondents were approximately 53 and 64 years old respectively, retrospective work history was obtained, providing 30 years of data on employment status. Moreover, in 2003–2005, respondents were asked whether they had a physical or mental condition that limited the amount or kind of work that could be done for pay and were asked about the timing of the onset of such a condition.

  • 11

    For simplicity, we do not include any covariates in our models other than a dichotomous indicator of treatment status. We control for sex, a continuous measure of educational attainment as of age 35, and employment status at baseline in other models and find that the results are not substantively different from models without controls for these basic variables.

  • 12

    Logit or probit models are more commonly used in sociology than a linear probability model because unless restrictions are placed on β, the estimated coefficients can imply probabilities outside the interval [0, 1]. Nevertheless, we prefer the linear probability model for two reasons. First, it gives direct sample analogs to estimands in causal inference, which are usually defined as differences in expectations, as in equation (23); see Angrist (2001) for a discussion. Second, when there are no other covariates, as in our example, the linear probability model is essentially nonparametric and thus does not impose a linear functional form on the regression function.

  • 13

    While the longitudinal nature of the WLS provides a somewhat exceptional setting for demonstrating the usefulness of our approach, we contend that our approach is well-suited for much shorter time intervals. In fact, any time there is a potential pathway for future treatment, our approach can be utilized.

  • 14

    Note that p(1) +q(1) = 1.

  • 15

    We are centrally concerned with identification issues in this paper (Manski 1995). For simplicity, we ignore statistical inference issues and treat the point estimates from the sample as if they were true population parameters.

  • 16

    For simplicity, we hypothesize a linear model with logged earnings as the outcome variable.

  • 17

    Using the PSID, Ruhm (1991) finds that earnings losses of displaced workers persist for many years subsequent to the displacement event.

  • 18

    Spline regression models have greater flexibility than polynomial regression models, and they are generally less likely to generate perfect multicollinearity (Marsh and Cormier 2002).

Ancillary