Causal inference in outcome-dependent two-phase sampling designs


Address for correspondence: Weiwei Wang, Department of Operations Research and Financial Engineering, 215 Sherrerd Hall, Princeton University, Princeton, NJ 08544, USA.


Summary.  We consider estimation of the causal effect of a treatment on an outcome from observational data collected in two phases. In the first phase, a simple random sample of individuals is drawn from a population. On these individuals, information is obtained on treatment, outcome and a few low dimensional covariates. These individuals are then stratified according to these factors. In the second phase, a random subsample of individuals is drawn from each stratum, with known stratum-specific selection probabilities. On these individuals, a rich set of covariates is collected. In this setting, we introduce five estimators: simple inverse weighted; simple doubly robust; enriched inverse weighted; enriched doubly robust; locally efficient. We evaluate the finite sample performance of these estimators in a simulation study. We also use our methodology to estimate the causal effect of trauma care on in-hospital mortality by using data from the National Study of Cost and Outcomes of Trauma.