Volume 21, Issue 2
Original Article

Graphical displays for assessing covariate balance in matching studies

Ariel Linden DrPH

Corresponding Author

President, Adjunct Associate Professor

Linden Consulting Group, LLC, Ann Arbor, MI, USA

Department of Health Management and Policy, School of Public Health, University of Michigan, Ann Arbor, MI, USA

Correspondence

Dr Ariel Linden

Linden Consulting Group, LLC

1301 North Bay Drive

Ann Arbor, MI 48103

USA

E‐mail: alinden@lindenconsulting.org

Search for more papers by this author
First published: 26 December 2014
Citations: 10

Abstract

Rationale, aims and objectives

An essential requirement for ensuring the validity of outcomes in matching studies is that study groups are comparable on observed pre‐intervention characteristics. Investigators typically use numerical diagnostics, such as t‐tests, to assess comparability (referred to as ‘balance’). However, such diagnostics only test equality along one dimension (e.g. means in the case of t‐tests), and therefore do not adequately capture imbalances that may exist elsewhere in the distribution. Furthermore, these tests are generally sensitive to sample size, raising the concern that a reduction in power may be mistaken for an improvement in covariate balance. In this paper, we demonstrate the shortcomings of numerical diagnostics and demonstrate how visual displays provide a complete representation of the data to more robustly assess balance.

Methods

We generate artificial datasets specifically designed to demonstrate how widely used equality tests capture only a single‐dimension of the data and are sensitive to sample size. We then plot the covariate distributions using several graphical displays.

Results

As expected, tests showing perfect covariate balance in means failed to reflect imbalances at higher moments (variances). However, these discrepancies were easily detected upon inspection of the graphic displays. Additionally, smaller sample sizes led to the appearance of covariate balance, when in fact it was a result of lower statistical power.

Conclusions

Given the limitations of numerical diagnostics, we advocate using graphical displays for assessing covariate balance and encourage investigators to provide such graphs when reporting balance statistics in their matching studies.

Number of times cited according to CrossRef: 10

  • Clean environments as a social norm: a field experiment on cigarette littering, Environmental Research Communications, 10.1088/2515-7620/abb6da, 2, 9, (091002), (2020).
  • Continuum of maternal health care services and its impact on child immunization in India: an application of the propensity score matching approach, Journal of Biosocial Science, 10.1017/S0021932020000450, (1-20), (2020).
  • The Effect of Parental Economic Expectation on Gender Disparity in Secondary Education in Ghana: A Propensity Score Matching Approach, Sustainability, 10.3390/su11236707, 11, 23, (6707), (2019).
  • Combining synthetic controls and interrupted time series analysis to improve causal inference in program evaluation, Journal of Evaluation in Clinical Practice, 10.1111/jep.12882, 24, 2, (447-453), (2018).
  • A matching framework to improve causal inference in interrupted time‐series analysis, Journal of Evaluation in Clinical Practice, 10.1111/jep.12874, 24, 2, (408-415), (2017).
  • Using machine learning to assess covariate balance in matching studies, Journal of Evaluation in Clinical Practice, 10.1111/jep.12538, 22, 6, (848-854), (2016).
  • Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments, Journal of Evaluation in Clinical Practice, 10.1111/jep.12610, 22, 6, (875-885), (2016).
  • Combining machine learning and matching techniques to improve causal inference in program evaluation, Journal of Evaluation in Clinical Practice, 10.1111/jep.12592, 22, 6, (868-874), (2016).
  • The Comparison of Matching Methods Using Different Measures of Balance: Benefits and Risks Exemplified within a Study to Evaluate the Effects of German Disease Management Programs on Long‐Term Outcomes of Patients with Type 2 Diabetes, Health Services Research, 10.1111/1475-6773.12452, 51, 5, (1960-1980), (2016).
  • Estimating causal effects for multivalued treatments: a comparison of approaches, Statistics in Medicine, 10.1002/sim.6768, 35, 4, (534-552), (2015).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.