A formal causal interpretation of the case-crossover design

The case-crossover design of Maclure is widely used in epidemiology and other fields to study causal effects of transient treatments on acute outcomes. However, its validity and causal interpretation have only been justified under informal conditions. Here, we place the design in a formal counterfactual framework for the first time. Doing so helps to clarify its assumptions and interpretation. In particular, when the treatment effect is nonnull, we identify a previously unnoticed bias arising from strong common causes of the outcome at different person-times. We analyze this bias and demonstrate its potential importance with simulations. We also use our derivation of the limit of the case-crossover estimator to analyze its sensitivity to treatment effect heterogeneity, a violation of one of the informal criteria for validity. The upshot of this work for practitioners is that, while the case-crossover design can be useful for testing the causal null hypothesis in the presence of baseline confounders, extra caution is warranted when using the case-crossover design for point estimation of causal effects.


Introduction
The case-crossover design (Maclure, 1991) is used in epidemiology and other fields to study causal effects of transient treatments on acute outcomes.One of its major advantages is that it only requires information from individuals who experience the outcome of interest (the cases).Another appealing feature is that under certain circumstances (which we will discuss at length) the case-crossover estimator adjusts for unobserved time invariant confounding.In a seminal application of this design (Mittleman et al., 1993), researchers obtained data on the physical activity (a transient treatment) of individuals who experienced a myocardial infarction (MI, an acute outcome).They then defined any person-times less than one hour after vigorous activity as 'treated', and all other person-times as 'untreated'.Finally, they considered each person-time as an individual observation and computed a Mantel-Haenszel estimate of the corresponding hazard ratio (Tarone, 1981;Nurminen, 1981;Kleinbaum et al., 1982;Greenland and Robins, 1985).This hazard ratio estimate was interpreted as the causal effect of vigorous physical activity on MI.Some variants of the case-crossover design allow flexible control time selection strategies where control times can follow outcome occurrence (e.g.Levy et al., 2001), but in this paper we restrict attention to studies in which follow-up is terminated at the time of the first outcome occurrence as in the above MI example.
Past authors have extensively considered several threats to validity of the case-crossover design (Maclure, 1991;Greenland, 1996 The Mantel-Haenszel estimator was originally applied to estimate the treatment-outcome odds ratio when subjects were classified in strata sharing values of confounders V , and observed subjects in each stratum could be conceived of as independent draws from the (hypothetically) infinite stratum population.Under the assumptions that stratum-specific odds ratios are all equal and observations are independent within each stratum, the Mantel-Haenszel estimator was proven consistent for the constant odds ratio as the number of strata approach infinity even if only a few subjects are observed in each stratum (Breslow, 1981).Since the values of the confounders V are held constant within each stratum, the constant odds ratio can be endowed with a causal interpretation if V includes all confounders.The same goes for the rate ratio (Robins and Greenland, 1985).
Maclure's idea was to regard person-times (rather than subjects) as the units of analysis and subjects as the strata, then apply the Mantel-Haenszel estimator.As Maclure (1991) put it: "In the case-crossover design, the population base is considered to be stratified in the extreme, so there is only one individual per stratum... Use of subjects as their own controls eliminates confounding by subject characteristics that remain constant".Analogy to past applications of the Mantel-Haenszel estimator would seem to imply that the case-crossover design eliminates baseline confounding as a source of bias assuming a constant treatment effect across subjects (informal condition (e)) and independent identically distributed observations across time within each subject.Of course, these two assumptions are unlikely to be satisfied in most research settings: the effect of treatment is rarely the same in all subjects, and variables at different persontimes are typically not independent within subjects.Informal assumptions (a)-(d) can be viewed as a more plausible alternative to independent person-times, but to determine when the casecrossover estimator is asymptotically unbiased for causal effects in the presence of unobserved confounding requires a formal analysis.
Here we place the case-crossover design in a formal counterfactual causal inference framework (Rubin, 1978;Robins, 1986).Doing so helps to clarify its assumptions and interpretation.In Section 2, we introduce notation, describe the (possibly hypothetical) cohort that gives rise to the data in a case-crossover analysis, and summarize the MI study in more detail so that it can serve as a running example.In Section 3, we define a natural estimand motivated by a hypothetical randomized trial practitioners of the case-crossover design might wish to emulate.In Section 4, we state formal assumptions (mostly analogous to informal assumptions (a)-(e)) that allow us to causally interpret the limit of the case-crossover estimator and under which the limit approximates the trial estimand from Section 3. We identify and characterize a previously unnoticed bias present when there exist strong common causes of the outcomes at different times (as would seem likely in many instances) and the treatment effect is non-null.In Section 5, we discuss this bias and illustrate it with simulations.We also use our results from Section 4 to analyze sensitivity to effect heterogeneity, i.e. violations of informal assumption (e).In Section 6, we conclude.Our general message to practitioners is that, while the case-crossover can be a clever way to test the null hypothesis of no causal effect in the presence of unobserved baseline confounding, its point estimates of non-null effects can be sensitive to violations of unrealistic assumptions.

Notation
While case-crossover studies only use data from subjects who experience the outcome, we will nonetheless describe a full cohort from which these subjects are drawn in order to facilitate the definition of certain concepts and quantities of interest.Consider a cohort of individuals followed from baseline (i.e.study entry)-defined by calendar time, age, or time of some pre-defined index event-until they develop the outcome or the administrative end of follow-up, whichever occurs first.For simplicity, we assume no individual is lost to follow-up.Subjects are indexed by i, i ∈ {1, ..., N }.Subject i is followed for at most T + 1 person-times (e.g.hours) indexed by j ∈ {0, . . ., T }.For simplicity we take T to be the same for all subjects.Let A ij be a binary variable taking values 0 and 1 indicating whether subject i was treated at time j.Let Y ij be a binary variable taking values 0 and 1 indicating whether the outcome of interest occurred in subject i before time j + 1.We assume that Y ij is a 'time to event' outcome in the sense that if Y ij = 1 then Y ij = 1 for all j > j.The above implies the temporal ordering A ij , Y ij , A i(j+1) .Thus the outcome has an acute onset as required by informal condition (a).We define A ij = 0 if the event has occurred by time j.
For a time-varying variable X, we denote by Xij the history (X i0 , . . ., X ij ) of X in subject i up (i.e.prior) to time j + 1.We will often omit the subscript i in the subsequent notation because we assume the data from different subjects i are independent and identically distributed.Let V denote a possibly multidimensional and unobserved baseline confounding variable that we assume has some population density p(v).(For notational convenience we shall write conditional probabilities To avoid measure theoretic subtleties we shall henceforth assume that when V has continuous components, conditions sufficient to pick out a particular version of P {•|•, V = v} have been imposed as in Gill and Robins (2001).)Let ŪT denote common causes of outcomes (but not treatments) at different person-times not included in V .For example, in the MI and exercise study, U j could denote formation of a blood clot by hour j after baseline.We assume that the N subjects are iid realizations of the random vector (V, ĀT , ȲT , ŪT ) and that U j precedes A j and Y j in the temporal ordering at each j.Recall that in a case-crossover study the observed data on subject i are ĀiT , ȲiT as data on V and ŪT are not available.
We assume that the causal directed acyclic graph (DAG) (Greenland, Pearl, and Robins, 1999) in Figure 1 describes the data generating process within levels of baseline confounders V .This DAG encodes aspects of informal assumptions (b) and (c).One salient feature of the DAG is that there are no directed paths from a current treatment to an outcome at a later time that do not first pass through the outcome at the time of the current treatment or through a later treatment.This can be considered a representation of informal assumption (b) that treatment is transient.The DAG also excludes any common causes of treatments and outcomes other than V not through past outcomes.(Since occurrence of the outcome at time j determines the values of all variables at all later time points, outcome variable nodes in the DAG trivially must have arrows to all temporally subsequent variables.)This represents informal assumption (c) which bars non-baseline confounding.This DAG also has fully forward connected treatments with arbitrary common causes of treatment at different times ŪAT , indicating that we put no causal restrictions on the treatment assignment process.(We will, however, impose distributional assumptions.)We provide a fuller discussion of causal assumptions in Section 4, but we find it helpful to keep this DAG in mind.

The Case-Crossover Design
The outcome-censored case-crossover Mantel-Haenszel estimator requires data from subjects who experience the outcome on treatment status at the time of outcome occurrence and at designated 'control' times preceding the outcome.It is computed as follows: • Select a random sample of H person-times from the H * person times ij satisfying Y ij = 1, Y i(j−1) = 0, and j > W where W is a maximum 'look back' time chosen by the investigator.We refer to these H * person-times when the outcome occurred for the first time and after time W as the set of 'case' person-times.
• Let i h j h denote the person-time of the h th element of the set of H sampled case person times.From the same subject i h , select m times {j h − c 1 , . . ., j h − c m } from the W times prior to the time j h of subject i h s first outcome event.We call these m times the 'control' person-times for subject i h .We discuss selection of 'control' times below.
• Let A 1 h denote the treatment at the case time and (A 0 h1 , . . ., A 0 hm ) denote treatments at the m control times in subject i h .The Mantel-Haenszel case-crossover estimator IRR M H is Note that for subject i h the only data necessary to compute IRR M H is (A 1 h , A 0 h1 , . . ., A 0 hm ).
Intuitively, the more subjects tend to be treated at the time of the outcome but not at earlier control times as opposed to vice versa, the stronger the estimated effect of treatment.To fix ideas, we consider an example of a case-crossover study from the literature.In a simplified version of Mittleman et al.'s (1993) study on the impact of exercise on MI mentioned in the introduction, suppose we collect data from a random sample of patients suffering MI on a particular Sunday.We record whether each patient exercised in the hour immediately preceding their MI and whether they exercised in the same hour the day before their MI.We compute the Mantel-Haenszel casecrossover estimator (1): in the numerator is the number of subjects who exercised immediately prior to their MI but not 24 hours before, and in the denominator is the number of subjects who did not exercise immediately prior to their MI but did 24 hours before.Mittleman et al. estimated a ratio of 5.9 (95% CI 4.6,7.7).They found the ratio was much higher among subjects who rarely exercised (107, 95% CI 67,171) than those who exercised regularly (2.4,95% CI 1.5,3.7)prior to the study period.
Many approaches to selecting control times might be acceptable.In the MI example, the lookback window is the 24 hours before the MI and there is only one control time exactly 24 hours before the outcome time.So W = 24, m = 1, and c 1 = 24 in the notation above.

A Natural Estimand
Consider T parallel group randomized trials in which, in trial k, treatment is randomly assigned at and only at time k to all subjects who have yet to experience the outcome.Such a time kspecific trial could estimate the immediate effect of treatment at time k.To formalize, we adopt the counterfactual framework of Robins (1986).Let Y āj ij be the value of the outcome at time j had, possibly contrary to fact, subject i followed treatment regime āj ≡ (a 1 , . . ., a j ) through time j.We refer to Y āj ij as a counterfactual or potential outcome.Since we will frequently consider treatment interventions at a single time point, we also introduce the notation Y aj as shorthand for Y Āj−1,aj , i.e. the counterfactual value of random variable Y j under observed treatment history through j − 1 and treatment at time j set to a j .The randomized trial described above conducted at time k would yield an estimate of relative risk or hazard ratio In the remainder of the paper, we will establish (strong) assumptions under which the case-crossover estimator approximately converges to a causally interpretable quantity.Under these assumptions, ρ k does not depend on k and the case-crossover estimator approximately approaches ρ ≡ ρ k constant over k. ( 4 Derivation of the Counterfactual Interpretation of the Limit of the Case-Crossover Estimator

Assumptions
Our goal is to specify natural and near minimal assumptions that allow us to causally interpret the limit of the case-crossover estimator.Counterfactuals and the observed data are linked by the following standard assumption: Consistency states that the counterfactual outcomes corresponding to the observed treatment regimes are equal to the observed outcomes.Consistency is a technical assumption that has no counterpart in the informal assumptions (a)-(e) but is implicit in almost all analyses.We assume that the causal graph in Figure 1 describes the data generating process (Greenland, Pearl, and Robins, 1999).We will state some specific assumptions implied by the graph in counterfactual notation and also state additional assumptions.Figure 1 encodes informal assumption (c) that there are no post-baseline confounders not contained in V , i.e.

Sequential Exchangeability:
(4) See Appendix 2 for further details.An example violation of (4) in the MI study would be if caffeine intake at hour j both encouraged exercise and increased MI risk at j.We might expect that confounders of this sort in the MI study (short term encouragements to exercise that are associated with MI) are weak.
The DAG in Figure 1 also reflects informal assumption (b) that effects are transient by implying that A j has no direct effect on Y j+1 , . . ., Y T not through A j+1 , . . ., A T and Y j for all j.Graphically, this is the statement that the only treatment variable that is a parent of Y j is A j .We might hope that the graphical definition of the transient effect assumption would be equivalent to the assumption that, conditional on V , counterfactual hazards are independent of past treatment history, i.e. that λ āj vj ≡ p v (Y āj j = 1| Ȳj−1 = 0) does not depend on āj−1 .However, this is not generally true due to collider bias (Hernan et al., 2004) stemming from selection on survival and the presence of common causes V and ŪT of the outcome in Figure 1 since, e.g., the path Because of the collider bias, a formal counterfactual definition of the transient effect assumption requires that we condition on U -histories.Specifically, let i.e. the conditional counterfactual hazard at time j under treatment a j given past treatments āj−1 , common causes of outcomes ūj , and baseline confounders v.As in Figure 1, we assume: That is, conditional on V and the history of U , the current counterfactual hazard does not depend on past treatments.This assumption is consistent with the absence of any mention of such dependence in the case crossover literature.Biological considerations determine the plausibility of ( 5).In the MI study, (5) would be violated if exercise can have delayed effects on MI.Maclure (1991) argued that delayed effects would be weak in this setting.Under (5), we can write the counterfactual hazard λ aj vj (ā j−1 , ūj ) for any āj−1 as λ aj vj (ū j ).The causal hazard ratio at time j given ūj and v is then λ 1 vj (ū j )/λ 0 vj (ū j ).We assume that the causal hazard ratio is constant: (6) is a version of the constant effects assumption (e).Under the constant hazard ratio assumption (6), β = ρ from (2) and the trial described in Section 3 would target β.(Note that for ( 6) not to depend on the specific set of variables included in V and Ū , which we leave unspecified, requires that β vj (ū j ) is collapsible over Ū and V .While it is well known that hazard ratios are not generally collapsible (Greenland, 1999), the scenario in which non-collapsibility arises entails a baseline exposure influencing failure at all future timepoints.Under our transient effects assumption, β vj (ū j ) is just an immediate conditional relative risk, which is collapsible.)( 6) is a very strong assumption unlikely to ever hold exactly.Violations can be less extreme in subpopulations, e.g.subjects who exercise regularly in the MI study.We examine sensitivity to violations of (6) in Section 5.2.Under the above assumptions, we will show (in Theorem 1) that the case-crossover estimator approaches β times a multiplicative bias term.The assumptions below, invoked in the order they are introduced, are sufficient to ensure the multiplicative bias term is near 1.

Rare Outcome:
T j=1 (1 − λ aj vj (ū j )) < ∀ū T , v, āT and a small positive number.(7) This rare outcome assumption holds under all levels of V , Ū , and Ā.Because (V, Ū ) can be very high dimensional and contain post-baseline information, it is unlikely that this assumption holds in the MI study.For example, formation of a clot might cause a violation.But we will see that bias can be small even if this assumption fails as long as cases occurring under the violating (V, Ū ) levels do not account for a large proportion of total cases.Next, we require the assumption No Time-Modified Confounding: where λ 0 vk is the untreated counterfactual hazard at k marginal over ūk and k−c l is a control time for an outcome occurring at k.A sufficient condition for (8) to hold is that, for each k and l the marginal correlation Cov(p In fact, we require only that the sum over k and l of the k-specific covariances for each control time is zero.This condition prevents bias from so called time modified baseline confounders V (Platt et al., 2009) which, by definition, are baseline confounders V that predict both (i) the hazard of an unexposed subject failing at various times k and (ii) the difference in marginal probabilities of the events (A k = 0, A k−c l = 1) and (A k = 1, A k−c l = 0).The case-crossover literature distinguishes between baseline and post-baseline confounders and says the former are allowed but not the latter.The more relevant distinction is whether a confounder has time-varying effects.To understand the issue, first consider a post-baseline confounder.We gave the example earlier of caffeine intake (C k ) at time k impacting probability of both exercise and MI at k (more precisely, between k and k + 1).C k is temporally a post-baseline variable as its value is realized at time k, but in the causal ordering it could be equivalent to a baseline variable if it is not influenced by past treatments.For example, coffee at time k could be equivalent to a k-hour delayed release caffeine pill at baseline.Suppose Z(k) ∈ V is a baseline variable (like the delayed release caffeine pill) such that Z(k) = 1 causes A k = 1 and Y k = 1 to be more likely.Z(k) would induce bias just like C k , even though Z(k) ∈ V is a baseline confounder that (unlike C k ) would not lead to a violation of (4).However, whenever and λ 0 vk will both be large, inducing a correlation of the sort banned by (8).Thus, (8) serves to ban time modified confounding.
There is a straightforward intuitive motivation behind informal assumption (d) that there are no time trends in treatment.Because control times always precede case times, a steady change in treatment probability over time would result in a preponderance of discordant pairs of one type over the other in the estimator (1) even in the absence of any causal effect of treatment on outcome.Our version of informal assumption (d) is: for all k, c l such that k − c l would be a control time if the outcome were to occur at k.
Note that we make this assumption marginally over V and Ūk .A sufficient condition for the No Time Trends assumption to hold is that, at every time k and for every control time k − c l , the assumption ) of (marginal) pairwise exchangeability holds as previously derived by Vines and Farrington (2001).( 9) can be viewed as a more precise formulation of the informal "no time trends in treatment"assumption (d).Exposure can exhibit arbitrarily complex temporal dependence (as illustrated in the DAG in Figure 1) as long as ( 9) holds.Whether this assumption holds depends in part on how control times are chosen.In the MI study, control times 12 hours prior to the outcome could be much less likely to satisfy (9) than control times 24 hours prior (e.g.2PM the previous day would be a better control time than 2AM the morning of an MI that occurred at 2PM).
We note that all the results in the paper hold if there exist any V and Ū for which both the independencies of the causal DAG in Figure 1 and assumptions (3)-( 9) are satisfied.

The limit of the Mantel-Haenszel estimator
In the theorem below we derive the limit of IRR M H (1) in the outcome-censored case-crossover design under an asymptotic sequence in which the full cohort, the number of cases in the cohort, and the number of sampled cases grow at similar rates, i.e.N → ∞, H * /N → d 1 > 0, and H/H * → d 2 > 0. We also assume subjects are iid.The proof is in Appendix 1.

Bias due to strong common causes of the outcome
As discussed earlier, our rare outcome assumption within levels of (possibly post-baseline and high dimensional) common causes of the outcome is novel and unreasonably strong.In this subsection, we will examine analytically and through simulations the bias that arises when it fails.We first consider the special case in which, at each time k, exposure is determined by an independent coin flip with success probability p.In that case, as shown in Remark 1 in Appendix 1, the multiplicative bias τ from Theorem 1 is well approximated by where . Disparities between the numerator and denominator of the bias term (10) will lead to bias of the estimator.Before examining disparities related to non-negligible V and U -specific survival probabilities, we note that the bias contribution of a disparity at a given level of v and ūk depends on the weight M v (ū k )p(v), which is large when both the probability of observing (v, ūk ) and the probability of an untreated event occurring at k given v and ūk are large.Thus, the larger the proportion of total cases occurring at v and ūk , the more that failure of the rare outcome assumption at v and ūk biases the estimator.
The only difference between the numerator and denominator of ( 10) is that where ) appears in the denominator.The ratio of the term in the numerator to that in the demoninator is this factor is equal to 1 and there is no bias.When β = 1, the bias is away from the null since and only if β > 1 and thus the MH estimator converges to a limit that is further from 1 than the true β and in the same direction.For the mutiplicative bias to be nonnegligible requires a violation of the rare outcome assumption in which there exist histories v, ū * k for which both ) are non-negligible.We illustrate this bias with a simulation.For N = 100, 000 subjects, we simulated treatments and counterfactual outcomes for 24 time steps or until the first occurrence of the outcome according to the following data generating process (DGP).
The DAG for this DGP is depicted in Figure 2. The true value of β is 2.There are no common causes of treatments and outcomes, treatments are independent identically distributed and hence exhibit no time trends, and the outcome is rare when marginalized over U .(While the outcome is not rare when U t = 1, it is rare that U t = 1.)Yet the limit of the case-crossover estimator using the time prior to outcome occurrence as the control is approximately 2.8.The estimator fails because the outcome was common when U t or U t−1 were 1 and a large proportion of total cases occurred when U t or U t−1 were 1.The bias is away from the null, as predicted by our analysis above.The effect of U on the outcome needed to be strong to produce the bias in this simulation.If , then the case-crossover estimator is about 2.3 instead of 2.8.A recently formed blood clot could roughly play the role of Ū in the MI example-a rare event that does not influence probability of exposure, greatly increases probability of the outcome at multiple time points after the clot forms, and without which the outcome is rare.Now we consider bias in the more general scenario where treatments are correlated across  (1 − λ as vs (ū s ))) for some values of v and ūk than the highly weighted treatment trajectories in the denominator.By the reasoning we applied to infer direction of bias in the case with uncorrelated exposures, strongly weighted untreated survival probabilities in the numerator combined with strongly weighted treated survival probabilities in the denominator would lead to bias away from the null, and vice versa.Depending on the treatment correlation pattern, treated or untreated survival probabilities might be more strongly weighted in the numerator or denominator.Thus, in the correlated treatment case the resulting bias can be either toward or away from the null.As in the case without correlated exposures, the magnitude of the bias contribution stemming from this dynamic for a given v and ūk depends on M v (ū k ).
To illustrate, we modify our previous simulation example to add correlations in treatments across time.If time bins are interpreted as hours in the previous simulation, they are seconds in this one.Exposure and the unobserved common cause of the outcome are still independently assigned to one hour intervals as in the previous simulation.This induces perfect correlation between treatments corresponding to one second time bins within the same hour.The untreated one second discrete hazards are set to preserve the hourly untreated survival probability from the previous simulation, and the multiplicative treatment effect within each one second bin is again set to 2. To formalize, we simulated data according to Ũk ∼ Bernoulli(.001) for k ∈ {1, . . ., 24}; U kt = Ũk for k ∈ {1, . . ., 24}, t ∈ {1, . . ., 3600} Ãk ∼ Bernoulli(.5) for k ∈ {1, . . ., 24}; A kt = Ãk for k ∈ {1, . . ., 24}, t ∈ {1, . . ., 3600} where we have indexed 'hours' by k and seconds within hours by t.The true value of β in this DGP is again 2, but the case-crossover estimate using the time bin exactly one hour (3600 seconds) prior to the case as the control (as in the previous simulation) is 1.84.So modifying the DGP from the previous simulation to be finer grained (and thus inducing correlations between treatments across times) while still constructing the case-crossover estimator in the identical way made the bias switch direction.While neither DGP (fine or coarse) is likely to be a good approximation to any realistic process, we would argue that it is difficult to reason about which would be a better approximation for any given use case with the broad characteristics that both simulations share.Thus, the two simulations taken together illustrate that bias from strong common causes of the outcome, when present, can be both sizable and unpredictable.(See Web Appendix A for analytic confirmation of simulation results from both DGPs using (22), discussion of what drives the discrepancy between the two simulations, and further analysis of bias in the correlated exposure setting.)

Treatment Effect Heterogeneity
We now examine sensitivity to violations of the constant causal hazard ratio assumption if the rare outcome assumption holds.For simplicity, we consider a scenario where there are just two types of subjects and counterfactual hazard ratios are constant across time within types.For g ∈ {0, 1}, say subjects of type g arise from the following data generating process: with data censored at the first occurrence of the outcome.So within each type g, the constant causal hazard ratio is λ 1 g /λ 0 g .Let p g denote the proportion of the population of type g = 1 at baseline, which under the rare outcome assumption would also be approximately the proportion of type g = 1 among surviving subjects at all subsequent followup times.According to equation (20) from the proof of Theorem 1, if the rare outcome assumption holds then the case-crossover estimator with m=1 (i.e. using just one control) will approach (12) can be expressed as a weighted average of λ 1 g=0 /λ 0 g=0 and λ 1 g=1 /λ 0 g=1 , where δ = λ 0 g=0 p Ag=0 (1 − p A,g=0 )(1 − p g ) and θ = λ 0 g=1 p A,g=1 (1 − p A,g=1 )p g .Hence, the limit of the case-crossover estimator is bounded by the group-specific hazard ratios.
The relative risk computed from any of the RCTs described in Section 3 would approach Like the case-crossover limit, the RCT estimand can be expressed as a weighted average of λ 1 g=0 /λ 0 g=0 and λ 1 g=1 /λ 0 g=1 : Without loss of generality assume . The ratio of the weight placed on the higher hazard ratio to the weight placed on the lower hazard ratio in the RCT estimand is The corresponding case-crossover weight ratio is (16) implies that bias of the case-crossover estimator due to treatment effect heterogeneity depends on the difference in treatment probability between groups with different effect sizes.If treatment probability does not vary across groups with different treatment effects, effect heterogeneity will not induce bias in the case-crossover estimator.When treatment probabilities do vary, whichever group has higher treatment variance p A,g (1 − p A,g ), i.e. whichever group has probability of treatment closer to .5, will be weighted too highly by the case-crossover estimator compared to the RCT estimand.Some intuition behind this behavior is that the closer the treatment probability within a group is to .5, the more subjects from that group will contribute discordant case-control pairs to the case-crossover estimator, weighting the estimator disproportionately toward the effect within that group.For illustrative purposes, consider a numerical example where we set: λ 0 g=0 = .001;λ 1 g=0 = .002;λ 0 g=1 = .0005;λ 1 g=1 = .005;p A,g=0 = .8;p A,g=1 = .5;p g = .5.
Then λ 1 g=1 /λ 0 g=1 = 10, λ 1 g=0 /λ 0 g=0 = 2, and the estimand ( 13) is equal to 4.67.IRR M H converges to 5.5, while the naive cohort hazard ratio estimator P (Y =1|A=1) P (Y =1|A=0) that does not adjust for the confounder g approaches 4.9.In this example, bias from effect heterogeneity overrides any benefits from control of unobserved confounding.(While we set baseline outcome risks to be different across levels of g in this example, note that ( 16) implies this plays no role in inducing bias due to effect heterogeneity.) The specific numerical example above is a cautionary tale illustrating the potential significance of heterogeneity induced bias.But if both cohort and case-crossover analyses are feasible with available data, and unobserved baseline confounding and effect heterogeneity vary within realistic ranges, does one estimator tend to be more biased than the other?We addressed this question in the framework of our toy example by computing the limiting values of case-crossover and cohort estimators for a large grid of data generating process parameter settings.We let λ 0 g=0 and λ 0 g=1 take values in {0.0005, 0.001}, λ 1 g=0 /λ 0 g=0 take values in {1, . . ., 5}, λ 1 g=1 /λ 0 g=1 take values in {1 × λ 1 g=0 /λ 0 g=0 , . . ., 10 × λ 1 g=0 /λ 0 g=0 }, and p A,g=0 and p A,g=1 take values in {1/20, . . ., 19/20}.Figure 3 shows that neither estimator has a general advantage over the other across parameter settings.
In the MI study, the effect of exercise appeared much greater in subjects who rarely exercised than in those who exercised regularly.Probability of treatment (i.e.exercise) clearly varied considerably between regular and rare exercise groups.Hence, we would expect an estimate of the marginal effect to be biased.The authors of the MI study reported separate effect estimates for the strata over which the effect was thought to vary.This is appropriate, as marginal effect estimates for the full population can be misleading.We note that the numerical analyses in this section provide a framework for quantitative bias analysis (Lash et al, 2014) to assess sensitivity to violations of the effect homogeneity assumption.Given a case-crossover estimate, an analyst can first specify a simple heterogeneity model (or many) similar to our toy example above.The analyst can then derive from (20) the expression for the limit of the case-crossover estimator under that model as a function of its parameters, just as we easily derived (12).Finally, by exploring a grid of plausible model parameters, the analyst can identify a range of true effect sizes that might result in the observed case-crossover estimate under effect heterogeneity.

Discussion
We have put the case-crossover estimator on more solid theoretical footing by providing a proof of its approximate convergence to a formal counterfactual causal estimand, β, under certain assumptions.This result alone may not be of much utility, but it was overdue for such a widely used method.And the derivation yielded some practical insights as byproducts.
First, we discovered a new source of potential bias when the treatment effect is not nullstrong common causes of the outcome across time.We analyzed this bias and illustrated its potential significance and unpredictability with simulations.The effect of the common cause needs to be quite strong to induce sizable bias, but the fact that (V, Ū ) can be high dimensional and temporally post-baseline increases the likelihood of this in a real analysis.Formation of a blood clot might induce a bias of this sort in the MI example, but it is difficult to speculate about how often meaningful bias of this type appears in practice.
Second, expression (20) characterizing the limit of the case-crossover estimator allowed us to quantify sensitivity to violations of the constant treatment effect assumption.We analyzed a simple scenario with two groups of subjects having potentially different baseline risks, exposure rates, and treatment effects.The limit of the case-crossover estimator was a weighted average of the group-specific hazard ratios.The bias relative to the estimand (2) that would be targeted by a RCT depends on the exposure rates in the groups.If the groups have the same exposure rate, effect heterogeneity would not induce any bias.Otherwise, whichever group had exposure rate closer to 0.5 would be overweighted.We provided a numerical example in which significant unobserved baseline confounding (which could be controlled by the case-crossover estimator) and effect heterogeneity were both present.In this example, the effect heterogeneity bias in the case-crossover estimator was greater than the confounding bias in a standard cohort hazard ratio estimator, illustrating that effect heterogeneity can sometimes override benefits from control of unobserved baseline confounding in the case-crossover estimator.More extensive numerical analyses showed that neither the cohort estimator nor the case-crossover had a general advantage across a range of settings in which the levels of unobserved confounding and effect heterogeneity varied.An analyst concerned about bias from effect heterogeneity could employ the general framework of our numerical studies to conduct a quantitative bias analysis (Lash et al, 2014).
Overall, the formal assumptions required for consistency mostly mapped onto informal assumptions (a)-(e).Unsurprisingly for a method that has been used for thirty years, our contributions do not drastically alter its recommended use.As an illustrative exercise, we assess our simplified version of Mittelman et al.'s (1993) study of the effect of exercise on MI assumption by assumption through the lens of our analysis in Web Appendix B.
We might summarize our general guidance to practitioners and consumers of case-crossover analyses as follows.If unobserved baseline confounding is thought to be serious and/or data collection for a cohort study is unfeasible, the case-crossover should be considered as an option.If interest lies only in testing the null hypothesis of no effect, fewer assumptions are necessary.Under the null: the transient treatment assumption automatically holds; common causes of the outcome do not induce bias; the rare outcome assumption is not necessary; and there is no treatment effect heterogeneity.Hence, the case-crossover design remains a clever method for causal null hypothesis testing in the presence of unmeasured baseline confounders under the exchangeability (4), no time trends in treatment (9), and no time-modified confounding (8) assumptions.If interest lies in obtaining a point estimate, results should be interpreted with considerable additional caution as effect heterogeneity, delayed treatment effects, and common causes of outcomes will all be present to some degree, and as we have shown can have a large impact on results.
There are many variants of the case-crossover design, of which we have here only analyzed arguably the simplest one.One important extension of the MH estimator adjusts for post-baseline confounders through matching.Another variant employs conditional logistic regression in place of the MH estimator.In this case, Vines and Farrington (2001) showed that joint exchangeability is required among all control times and the case time as opposed to just pairwise exchangeability.Additionally, in situations where time trends in treatment are present, the case-time-control method (Suissa, 1995) is often utilized and requires alternative assumptions (Greenland, 1996).The case-crossover design is also frequently applied in air pollution epidemiology.In this setting, the treatment regime is shared among all subjects and later values of treatment are not influenced by past values of subjects' outcomes, allowing more flexible control time selection strategies, including using control times following outcome occurrence (Navidi, 1998;Levy et al., 2001;Janes et al., 2005).It would be interesting to investigate these variants in a similar counterfactual We go from ( 18) to ( 19) by basic probability rules; (19) to (20) by consistency (3), Sequential Exchangeability (4), and UV-transient hazards ( 5); ( 20) to ( 21) by Constant Hazard Ratio (6); and ( 21) to ( 22) by the law of total probability.(ii) Under rare outcome assumption (7), pv( And by (7) and the DAG in Figure 1 . Therefore, we can approximate the bias term in (22) as Now, by applying assumption (8) and then (9), where we again used the rare outcome assumption to approximate λ 0 vk (ū k ) by the density f 0 vk (ū k ).This implies that ( 23) is approximately 1, proving the result.
Remark: In the absence of the rare disease assumption we can expand τ as where pv(as| Ȳs−1 = 0, Ās−1 = ās−1, Ūs = ūs).If at each time s, As is determined by an independent coin flip with success probability p the bias is approximately where we have indexed 'hours' by k and seconds within hours by t.
We can again confirm these results analytically.We use the shorthand λ a (U = 1) to denote the hazard at time kt if U k−1t = 1 or if U kt = 1, ignoring for the sake of convenient approximation the possibility that there are multiple hours with Ũk = 1.The bias term for the case-crossover estimator (S1) approximately (again, under the simplifying assumption that there are not multiple hours with Ũk = 1) reduces to: .
where p U and p A denote the Bernoulli parameters of Ũk and Ãk , respectively, in the DGP.Plugging in the parameter values from the DGP, this expression is equal to .92 = 1.84/2, the bias factor obtained in simulation.
Examining this bias approximation, we can see how the bias gets pushed toward the null.Selection on surviving the control hour when Ũk−1 = 1 leads to 1 − λ 0 (U = 1) terms in the numerator and 1 − λ 1 (U = 1) terms in the denominator.We argued in Section 5.1 that the discrepancy between these terms pushes the bias away from the null, as in the first simulation.Selection on surviving the portion of the case hour preceding the occurrence of the event leads to 1 − λ 1 (U = 1) terms in the numerator and 1 − λ 0 (U = 1) terms in the denominator, which by analogous reasoning pushes the bias toward the null.Selection on surviving the control hour only enters into the formula if Ũk−1 = 1, since risk is 0 whenever U is 0. Selection on surviving the case hour preceding the event, however, occurs whether Ũk = 1 or Ũk−1 = 1.This explains how terms pushing the bias factor toward the null outweigh terms pushing the bias factor away from the null in this example.
No direct effect of treatment on later outcomes (UV-Transient Hazards, assumption (5)).It is possible that exercise has a cumulative effect on the outcome.Two consecutive hours of exercise might cause an MI in some subjects for whom just one hour would not.Perhaps extended vigorous exercise is rare enough that the cumulative effect of exercise does not seriously impact results or their interpretation.Delayed effects of exercise are not thought to be significant.
No time modified confounding (assumption ( 8)).It might be that within levels of certain baseline confounders, exercise is more probable on Saturday (or Sunday) afternoon and MIs are more (or less) probable than average, even if marginal probability of exercise in the full cohort is equal on the two days.If the control hour is taken to be 24 hours before the MI, this scenario would induce bias.As a strained, stylized, and purely illustrative example, in the United States men are more likely to have MIs than women and also more likely to be fans of the National Football League (NFL).Suppose our study takes place the weekend of Super Bowl Sunday, which is the day of the NFL championship game.Then men in our study would be particularly less likely to exercise on Sunday than Saturday, which could lead to excessive unexposed MIs on Sunday, making exercise appear a less potent cause of MI than it is.
No time trends in treatment (assumption ( 9)).Marginal probability of recent exercise varies greatly by time of day.If the control time is chosen appropriately (e.g.exactly 24 hours before the MI), then approximate pairwise exchangeability may hold.But perhaps there are reasons why exercise is generally more or less common on Sunday than Saturday (e.g.church or football games).
Under the above assumptions (in addition to Consistency), the case-crossover could reasonably be applied to test the causal null hypothesis.To interpret the case-crossover point estimate, additional considerations are required.
Rare outcome (assumption ( 7)).The outcome must be rare within all levels of the baseline confounders, exposure, and common causes of the outcome.MIs are certainly rare marginally at the level of a day, and probably also rare across levels of baseline confounders and exposures.However, we mentioned that perhaps causes of the outcome such as presence of a clot could make the outcome common, particularly under exposure.This would induce bias of the sort seen in the simulation in Section 5.1.
Constant causal hazard ratio (assumption ( 6)).It is highly unlikely that the multiplicative effect of exercise across hour and covariate levels is constant.While true under the null, this is a very strong assumption if the null does not hold.We saw in Section 5.2 that it is difficult to interpret the point estimate if the effect is heterogeneous.
; Vines and Farrington, 2001; Levy et al., 2001; Janes et al., 2005; Mittleman and Mostofsky, 2014), and conditions for causal interpretation of the estimator have been informally stated in the literature.The usual criteria cited are that: (a) the outcome has acute onset; (b) the treatment is transient; (c) there are no unobserved post-baseline common causes of treatment and outcome; (d) there are no time trends in treatment; and (e) the treatment effect is constant across subjects.

Figure 1 :
Figure 1: Causal DAG within levels of V .A V node with arrows pointing into every other node was omitted for visual clarity.This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

Figure 2 :
Figure 2: Causal DAG for simulation DGP with unobserved post-baseline common causes of outcomes at different times ) where āk/k,k−c denotes āk excluding a k and a k−c and G v (a, a , āk/k,k−c , ūk ) (defined in Appendix 1) roughly corresponds to the probability of observing treatment trajectory with a k = a, a k−c = a , and treatment at the other time points equal to āk/k,k−c .When treatments are correlated, G v (1, 0, āk/k,k−c , ūk ) in the numerator might assign high weights to different treatment sequences āk/k,k−c than G v (0, 1, āk/k,k−c , ūk ) in the denominator, and under failure of the rare outcome assumption the highly weighted treatment sequences in the numerator might have significantly different survival probabilities ( s =k−c,k

Figure 3 :
Figure 3: Left: Scatterplot of case-crossover vs cohort estimator multiplicative bias across a range of settings.Middle: Distribution of case-crossover estimator bias across settings.Right: Distribution of ratio of case-crossover bias to cohort bias across settings.