Confounding and Regression Adjustment in Difference-in-Differences

Difference-in-differences (diff-in-diff) is a study design that compares outcomes of two groups (treated and comparison) at two time points (pre- and post-treatment) and is widely used in evaluating new policy implementations. For instance, diff-in-diff has been used to estimate the effect that increasing minimum wage has on employment rates and to assess the Affordable Care Act's effect on health outcomes. Although diff-in-diff appears simple, potential pitfalls lurk. In this paper, we discuss one such complication: time-varying confounding. We provide rigorous definitions for confounders in diff-in-diff studies and explore regression strategies to adjust for confounding. In simulations, we show how and when regression adjustment can ameliorate confounding for both time-invariant and time-varying covariates. We compare our regression approach to those models commonly fit in applied literature, which often fail to address the time-varying nature of confounding in diff-in-diff.

What is known on this topic?
• Difference-in-difference studies can estimate causal effects of treatment if strong causal assumptions are met.
• Confounding in difference-in-difference arises because covariates evolve over time differently in the treated and control groups or because the effects of covariates on outcomes vary over time.
• Time-varying confounding can bias estimates from difference-in-difference designs by violating the causal assumptions.
What this study adds?
• Regression and matching techniques to address confounding by observed covariates must be coherent with the underlying causal model to produce unbiased estimates.
• Postulating a causal model of the evolution of covariates in treated and control groups over time and those covariates' relationships to outcomes over time is a crucial prerequisite for any difference-in-differences study.

| INTRODUCTION
Difference-in-differences (diff-in-diff) studies are frequently used to evaluate new policies and programs. For example, hundreds of studies have estimated the effects of expanded Medicaid eligibility through the Affordable Care Act (ACA) in the United States, and many of these used diff-in-diff. Following the Supreme Court ruling on the ACA, 1 each state chose whether to expand its threshold for Medicaid eligibility, which created groups of treated states and comparison (untreated) states and enabled the application of diff-in-diff. 2 These studies have informed ongoing policy debates about the future of the ACA and state Medicaid waivers.
Diff-in-diff relies on strong and unverifiable assumptions. The key assumption for diff-in-diff is that the outcomes of the treated and comparison groups would have evolved similarly in the absence of treatment. Unlike cross-sectional studies, diff-in-diff does not require the treated and comparison groups to be balanced on covariates.
Thus, a covariate that differs by treatment group and is associated with the outcome is not necessarily a confounder in diff-in-diff. Only covariates that differ by treatment group and are associated with outcome trends are confounders in diff-in-diff.
In applied literature, many diff-in-diff studies are run on autopilot: plot the data, test for parallel trends before the intervention, and fit a regression model that includes an interaction between time and treatment, perhaps with some adjustment for covariates. Rarely are the mechanisms of confounding considered. In this paper, we discuss how diff-in-diff requires a different understanding of confounding and regression adjustment than other study designs. We show how covariates, both time-invariant and time-varying, affect the causal assumptions and inform analysis choices. Using simulations, we demonstrate how to adjust for confounders using regression and matching. We focus on common diff-in-diff models with a single start date for a binary treatment and no unobserved treatment effect heterogeneity. To applied researchers, we offer strategies to estimate unbiased causal effects by combining subject matter expertise with thoughtful modeling.

| Parallel trends
In cross-sectional studies, the definition of a confounder comes from the assumption that potential outcomes are independent of treatment. Colloquially, we say that a confounder is a covariate related to both treatment and outcome, and we must condition on all confounders to ensure independence between treatment and potential outcomes. VanderWeele and Shpitser noted the lack of rigor in the definition of a confounder. 3 In this spirit, we examine confounding in diff-in-diff.
First, we define time-varying and time-invariant covariates and time-varying effects of covariates. A time-varying covariate is one that changes over time for a unit, whereas a time-invariant covariate does not change over time for a unit. For example, a person's weight is time-varying while their place of birth is time-invariant. A covariate that has a time-varying effect on an outcome is different than the (in) variance of the covariate itself. When a covariate affects the outcome differently over time, we say it has a time-varying effect on the outcome.
In diff-in-diff, our target estimand is the average effect of treatment on the treated (ATT), for some time t * ≥ T 0 after the intervention is introduced to the treatment group (T 0 ). In this expression, D = 1 indicates the treated group and Y d (t) is the potential outcome at time t under treatment d. Note that Equation (1) contains the posttreatment untreated outcome in the treated group, Y 0 (t * ), which we can never observe. However, with some additional assumptions, we can re-write the target estimand in a form that contains only observables, a process known as identification.
Below, we describe assumptions that allow us to identify the ATT.
First, we assume no anticipation effects, that is, potential outcomes are not affected by future treatment. From this, it follows that the observed and potential outcomes are the same at pretreatment times, Y(t) = Y 0 (t) = Y 1 (t) for t < T 0 . Second, we assume that we can observe the potential outcomes corresponding to actual treatment received, Third, we make the so-called "parallel trends" assumption, which we define first in the simple setting of one pretreatment time (t = 0) and one posttreatment time (t = 1): Under parallel trends we assume the change in the average untreated potential outcomes from pre-to posttreatment is the same in the treated and comparison groups. Since the untreated potential outcome in the posttreatment period Y 0 (1) is not observable in the treated group, this assumption is untestable.
This definition of parallel trends with two time points is nearly universal in the diff-in-diff literature. 4 However, many applications consider more than two time points, so we extend the assumption accordingly. In the strictest version of parallel trends, every pair of time points satisfies Equation (2). That is, for t * ≠ t 0 . While we can relax this, many researchers have this version in mind when testing for parallel trends in the preintervention periods, contending that evidence of parallel trends before treatment strengthens the plausibility of parallel trends over the whole study period. 5 Given these assumptions, we can now rewrite the ATT in a form involving only observable quantities 6 : To estimate this quantity, we can select from a variety of techniques, ranging from simple nonparametric estimators based on sample means to more sophisticated regression models.
We start by specifying a model for the untreated potential outcomes. Following convention in diff-in-diff literature, 7 we write the untreated potential outcome of the ith unit as where ζ t are time fixed effects, d i is an indicator for the treated group, and x it is a covariate that can vary across units i and time t. The coefficients are an intercept, α 0 ; a constant difference between treated and comparison groups, α 1 ; and the effect of the covariate on the outcome at time t, λ t .
So far, we have only considered untreated potential outcomes. Next, we write the data-generating model for the treated potential outcomes by assuming a constant, additive effect of treatment, With these data-generating models, we can establish conditions in which the covariate can confound the treatment effect γ. Putting this all together, a confounder in diff-in-diff is a variable with a time-varying effect on the outcome or a time-varying difference between groups. The parallel trends assumption ensures that groupinvariant time trends or time-invariant level differences between the groups are not problematic. However, time-varying differences between groups, due to covariates with an evolving relationship to the outcome or differential evolution in the groups, can cause confounding bias.
Compare this to the definition of a confounder in cross-sectional settings, which is a variable associated with both treatment and outcome. In diff-in-diff, a confounder always has some time-varying effect: either the relationship of the variable to the outcome changes over time or the variable evolves differently between the groups over time.
Next, we consider adjusting for these types of confounding vari- ables. An effective adjustment strategy must account for the covariate's time-varying differences between groups or its time-varying effect on the outcome. In addition to regression adjustment, we also consider matching 8,9 in the section titled "What about Matching?"

| Adjusting for confounders
We use a linear regression model to estimate the ATT γ in the presence of a confounder X. In our simulations, we explore models of the following form: where ζ t are time fixed effects, α 1 is the constant difference between treated and comparison groups, and p t is an indicator for posttreatment time points. The coefficient γ on the interaction between treatment group and postintervention times, p t d i , is the ATT when the model is correctly specified.
The correct form for regression models that account for confounding depends on whether the covariate is time-invariant or time-varying and whether its effect on the outcome is constant or time-varying. We consider models that include constant (main) effects of time-invariant and time-varying covariates (λx i and λx it ) or timevarying (interactions with time) effects of covariates (λ t x i and λ t x it ).

| Adjusting for time-invariant confounders
When X is a time-invariant confounder, linear regression with a (timeinvariant) main effect will not eliminate bias. Nevertheless, practitioners often adjust for main effects only, 10-13 perhaps out of habit. A simple demonstration will show that adjusting only for main effects is ineffective in correcting nonparallel trends. Suppose we have a timeinvariant covariate x i with different means in the two groups at base- To be a confounder, it must have a time-varying effect. Recall that confounding arises because of a covariate's effect on parallel trends, which involve only the untreated outcomes, so we ignore treatment effects. Thus, the treated and untreated potential outcomes are the same, and we can illustrate our points in observed data. Outcomes are generated from Equation (4) with a time-varying relationship between the covariate and outcome and different covariate means in the treated and comparison groups.
In Panel A of Figure 1, we plot the mean outcomes by group and time, and the nonparallel outcome evolution is apparent. Panel B shows residuals from a simple linear regression with only a time effect.
In Panel C, we add a main effect for the covariate X to the model. However, Panels B and C still show diverging trends. In Panel D, we add an interaction between X and time. Only in Panel D do we properly account for the time-varying nature of the confounder and obtain an unbiased result (recall the true treatment effect is zero here).

| Adjusting for time-varying confounders
Time-varying confounders can also invalidate parallel trends and introduce bias into our estimate of the ATT. If we adjust for time-varying confounders by including the main effect or its interaction with time in a regression, we risk conditioning on posttreatment covariates that may be affected by treatment. As Rosenbaum notes, at best, adjusting for posttreatment covariates provides no benefit; at worst, it may introduce additional bias. 14 This occurs because the time-varying covariate can act as both a confounder and a mediator. As such, when trying to recover the ATT via regression, the usual interaction parameter may not be an unbiased estimate of the ATT. However, if we fail to account for the covariate, we face parallel trends violations. For more details, see in Supporting Information.

| What about matching?
Matching on time-invariant covariates Through matching, we aim to reduce confounding bias by selecting units from the treated and comparison groups that have similar observable characteristics, eliminating imbalances between the groupsa key ingredient in confounding. When matching, we can match observations on pretreatment outcomes, pretreatment covariates, or some combination.
Matching on pretreatment outcomes allows us to use an alternative assumption to estimate the causal effect. This assumptionindependence between potential outcomes and treatment assignment conditional on past outcomesis the basis of lagged dependent variables regression and synthetic control methods. 6,7,15 However, matching on pretreatment outcomes in diff-in-diff can yield unwanted results. In some settings, it reduces bias, 8,9 while in others, matching induces regression to the mean and creates bias. 7,16 Matching on time-varying covariates Matching only on time-invariant pretreatment covariates is attractive because it removes covariate differences between groups.
Matching on time-varying covariates in the pretreatment period can produce bias due to regression to the mean. Moreover, if confounding arises because of differential evolution of the covariate in the two groups, matching only on pretreatment values will be insufficient to address the confounding. While it may be tempting in this case to match on both pre-and posttreatment values of a timevarying covariate, matching on posttreatment variables that may be affected by treatment can produce causal estimates that do not equal the ATT. 14 For this reason, we do not explore strategies that match on posttreatment covariates. Clearly, choosing the right matching variables is the key to effective matching. A good overview on the current state of matching for diff-in-diff is provided by Lindner and McConnell. 17 Returning to the demonstration of parallel trends in Figure 1, matching on the pretreatment covariate also fixes diverging trends.
Eliminating the difference between the covariate means in the treated and comparison group via matching is sufficient to address confounding. If the confounding had arisen due to a time-varying covariate, the strategy would not suffice.

| METHODS
As we have discussed, both matching and regression adjustment have limitations. We conduct simulation studies to illustrate the advantages and shortcomings of regression and matching techniques that are commonly employed by practitioners of diff-in-diff. In each simulation scenario, we generate 400 datasets of n = 800 units observed at T = 10 time points. The first five time points are pretreatment times, and the rest are posttreatment. Each unit is assigned to the treatment group with probability 0.5. To each simulated data set, we apply regression and matching techniques and compare the bias of the resulting treatment effect estimates.
We simulate data and analyze it using the R environment. 18 We fit regression models using the lm function and estimate post hoc, cluster-robust SEs using the cluster.vcov function in the multiwayvcov package. 19 For our matching estimators, we implement nearest neighbor matching with replacement using the MatchIt package. 20 We present averages, across simulated data sets, of the absolute percent bias and SE of the estimated treatment effects. Mean absolute percent bias is calculated by taking the average of all estimates, subtracting the true value of the ATT, taking the absolute value, and converting it to a percentage relative to the true ATT. Mean SE is the mean of the 400 SE estimates.
Below, we describe the specifics of our data-generating and analysis models, first for scenarios with time-invariant covariates and then for scenarios with time-varying covariates. Table 1 gives an overview of the datagenerating process for each simulation scenario; more detail is provided in Table D1 in Supporting Information. Simulation code is on GitHub (https://www.github.com/zeldow/DID-confounding-supplementary).

| Data-generating models
Our first set of simulations involves a time-invariant covariate. In Scenario 1, the distribution of X differs by treatment group, but X has a time-invariant effect on the outcome Y. Scenario 2 is the same as Scenario 1 but we allow the effect of X on Y to be time-varying. In Scenario 3, the effect of X on Y is again time-varying, but the distribution of X is the same in the treated and control groups.
In Scenarios 1 and 3, analyses that do not adjust for X will be unbiased, because X does not satisfy the definition of a confounder.
In Scenario 1, this is because X does not have a time-varying effect on Y; in Scenario 3, this is because the distribution of X is the same in both groups. In Scenario 2, only analyses that adjust appropriately for the time-varying effect of X on Y will yield unbiased results. For all three scenarios, the ATT equals the regression parameter which was set to 1. We measure bias with respect to this true ATT.

| Analysis approaches
We use both matched and unmatched regression to analyze the simulated data. All regression models include time fixed effects and indicators for treatment, the postperiod, and their interaction. The simple model includes only those elements, ignoring the covariate entirely: T A B L E 1 Illustration of the data-generating processes for simulation studies [Color The time-varying adjusted (TVA) model allows the coefficient on the covariate to vary over time: Our matching strategies include matching on both outcomes and covariates. We use nearest-neighbor matching to create three matched data sets, to which we fit the model in Equation (5

| Data-generating models
The second set of simulations involves a time-varying covariate, which may evolve differently in the treated and comparison groups.
The setup of these simulations is the same as in Scenarios 1 through 3. We include three types of covariate evolution. In Scenario 4, the covariate evolves the same for both the treated group and the comparison group; in Scenario 5, the covariate evolves differently starting from baseline; and in Scenario 6, the covariate evolves the same in the two groups before treatment but differently after treatment.
For all these scenarios, we have two outcome processes: (a) the covariate has a time-invariant effect on the outcome and (b) the covariate has a time-varying effect on the outcome. The datagenerating distributions are summarized in Table 1 with more detail in Table D2 of Supporting Information. For scenarios 4 and 5, the ATT equals the regression parameter (set to 1). However, scenario 6 has a covariate that is changed by treatment, acting in part as a mediator.
Thus, for scenario 6, the ATTs are 0.85 and 0.87 for outcome processes (a) and (b), respectively. These calculations are provided in Supporting Information.

| Analysis approaches
The analysis methods are the same as for time-invariant covariates In Scenario 2, the time-varying effect of X on Y makes X a con-   (Table 1 and Table D2). In Scenario 4a, there is no confounding when the effect of X on Y is constant over time, and the mean of X evolves the same for each group. As a result, each modeling strategy is unbiased. However, when X has a time-varying effect on Y in Sce-   Figure 2).

| Time-varying covariate
A correctly specified regression approach avoids conditioning on pretreatment outcomes and thus is not susceptible to regression to the mean as some matching methods are. 16 Lastly, our regression adjustment strategy is agnostic to the structure of the data, whether we have panel data or repeated cross sections. Our simulations assumed panel data, but our results will hold for repeated cross sections.
Matching on repeated cross sections is trickier, since some covariates will necessarily be measured on different subjects at different time points, but it is possible. 29 Both matching and regression adjustment have clear pitfalls (discussed in the above paragraphs), and both have strengths in diff-in-diff applications. Deciding which to implement must be done carefully and depends on various factors, including data structure, which covariates are measured, and how many units are in the dataset. Our goal in this paper is not to provide guidance in choosing between matching and regression adjustment. However, in our simple simulations, matching was not better than regression adjustment, and in some cases, it increased bias. We only implemented nearest neighbor matching with replacement; many other matching techniques are possible.
For applied researchers using diff-in-diff, we recommend several steps for addressing confounding. First, researchers should clearly specify a causal model and explain how the inclusion of covariates and their functional forms conforms to their assumptions about the relationships among covariates, treatments, and outcomes over time.
This begins by writing out the full model specification and by providing analysis code in Supporting Information. Each covariate and coefficient should correspond to a threat to the validity of parallel trends and provide a remedy. We recommend researchers comprehensively list covariatesboth observed and unobservedthat might cause violations of parallel trends. The list should contain information on whether the variable is observed, whether the distribution of the covariate is expected to differ in the treatment and comparison groups, whether the covariate is time-varying, whether its effect on the outcome is likely to vary over time, and whether the covariate may be causally affected by treatment. Such a list is critical to choosing an analytical approach that is suited to the true underlying datagenerating model. For example, if many unobserved covariates are a concern, the analyst may choose a different estimator (instead of one that relies on diff-in-diff and the parallel trends assumption). On the other hand, a single time-invariant confounder with a simple linear relationship to the outcome suggests a straightforward regression approach. Other authors have given similar advice, stressing attention to the reasons for baseline differences between the treated and comparison groups and how these differences might affect parallel trends. 30 Being thorough in our diff-in-diff studies will strengthen conclusions and help alleviate concerns on the credibility of parallel trends.
We expect diff-in-diff to continue its critical role in informing policy decisions for the foreseeable future. Further development of diff-indiff methodology should involve cooperation among statisticians, epidemiologists, economists, political scientists, and policy analysts.