Volume 53, Issue 1
Methods Article

Covariate Balancing through Naturally Occurring Strata

Farrokh Alemi Ph.D.

Corresponding Author

E-mail address: falemi@gmu.edu

Department of Health Administration and Policy, George Mason University, Fairfax, VA

Address correspondence to Farrokh Alemi, Ph.D., Department of Health Administration and Policy, George Mason University, Fairfax, VA; e‐mail: falemi@gmu.edu.Search for more papers by this author
Amr ElRafey

Department of Health Administration and Policy, George Mason University, Fairfax, VA

Search for more papers by this author
Ivan Avramovic

Department of Computer Science, George Mason University, Fairfax, VA

Search for more papers by this author
First published: 14 December 2016
Citations: 2

Abstract

Objective

To provide an alternative to propensity scoring (PS) for the common situation where there are interacting covariates.

Setting

We used 1.3 million assessments of residents of the United States Veterans Affairs nursing homes, collected from January 1, 2000, through October 9, 2012.

Design

In stratified covariate balancing (SCB), data are divided into naturally occurring strata, where each stratum is an observed combination of the covariates. Within each stratum, cases with, and controls without, the target event are counted; controls are weighted to be as frequent as cases. This weighting procedure guarantees that covariates, or combination of covariates, are balanced, meaning they occur at the same rate among cases and controls. Finally, impact of the target event is calculated in the weighted data. We compare the performance of SCB, logistic regression (LR), and propensity scoring (PS) in simulated and real data. We examined the calibration of SCB and PS in predicting 6‐month mortality from inability to eat, controlling for age, gender, and nine other disabilities for 296,051 residents in Veterans Affairs nursing homes. We also performed a simulation study, where outcomes were randomly generated from treatment, 10 covariates, and increasing number of covariate interactions. The accuracy of SCB, PS, and LR in recovering the simulated treatment effect was reported.

Findings

In simulated environment, as the number of interactions among the covariates increased, SCB and properly specified LR remained accurate but pairwise LR and pairwise PS, the most common applications of these tools, performed poorly. In real data, application of SCB was practical. SCB was better calibrated than linear PS, the most common method of PS.

Conclusions

In environments where covariates interact, SCB is practical and more accurate than common methods of applying LR and PS.

Number of times cited according to CrossRef: 2

  • Constructing Causal Networks Through Regressions: A Tutorial, Quality Management in Health Care, 10.1097/QMH.0000000000000272, 29, 4, (270-278), (2020).
  • Recent advances in scaling‐down sampling methods in machine learning, Wiley Interdisciplinary Reviews: Computational Statistics, 10.1002/wics.1414, 9, 6, (2017).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.