Graphical displays for assessing covariate balance in matching studies
Abstract
Rationale, aims and objectives
An essential requirement for ensuring the validity of outcomes in matching studies is that study groups are comparable on observed pre‐intervention characteristics. Investigators typically use numerical diagnostics, such as t‐tests, to assess comparability (referred to as ‘balance’). However, such diagnostics only test equality along one dimension (e.g. means in the case of t‐tests), and therefore do not adequately capture imbalances that may exist elsewhere in the distribution. Furthermore, these tests are generally sensitive to sample size, raising the concern that a reduction in power may be mistaken for an improvement in covariate balance. In this paper, we demonstrate the shortcomings of numerical diagnostics and demonstrate how visual displays provide a complete representation of the data to more robustly assess balance.
Methods
We generate artificial datasets specifically designed to demonstrate how widely used equality tests capture only a single‐dimension of the data and are sensitive to sample size. We then plot the covariate distributions using several graphical displays.
Results
As expected, tests showing perfect covariate balance in means failed to reflect imbalances at higher moments (variances). However, these discrepancies were easily detected upon inspection of the graphic displays. Additionally, smaller sample sizes led to the appearance of covariate balance, when in fact it was a result of lower statistical power.
Conclusions
Given the limitations of numerical diagnostics, we advocate using graphical displays for assessing covariate balance and encourage investigators to provide such graphs when reporting balance statistics in their matching studies.
Citing Literature
Number of times cited according to CrossRef: 10
- Julian Sagebiel, Lukas Karok, Julian Grund, Jens Rommel, Clean environments as a social norm: a field experiment on cigarette littering, Environmental Research Communications, 10.1088/2515-7620/abb6da, 2, 9, (091002), (2020).
- Mohd Usman, Enu Anand, Laeek Siddiqui, Sayeed Unisa, Continuum of maternal health care services and its impact on child immunization in India: an application of the propensity score matching approach, Journal of Biosocial Science, 10.1017/S0021932020000450, (1-20), (2020).
- Prince Donkor, Ding Ya, Gideon Adu-Boateng, The Effect of Parental Economic Expectation on Gender Disparity in Secondary Education in Ghana: A Propensity Score Matching Approach, Sustainability, 10.3390/su11236707, 11, 23, (6707), (2019).
- Ariel Linden, Combining synthetic controls and interrupted time series analysis to improve causal inference in program evaluation, Journal of Evaluation in Clinical Practice, 10.1111/jep.12882, 24, 2, (447-453), (2018).
- Ariel Linden, A matching framework to improve causal inference in interrupted time‐series analysis, Journal of Evaluation in Clinical Practice, 10.1111/jep.12874, 24, 2, (408-415), (2017).
- Ariel Linden, Paul R. Yarnold, Using machine learning to assess covariate balance in matching studies, Journal of Evaluation in Clinical Practice, 10.1111/jep.12538, 22, 6, (848-854), (2016).
- Ariel Linden, Paul R Yarnold, Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments, Journal of Evaluation in Clinical Practice, 10.1111/jep.12610, 22, 6, (875-885), (2016).
- Ariel Linden, Paul R Yarnold, Combining machine learning and matching techniques to improve causal inference in program evaluation, Journal of Evaluation in Clinical Practice, 10.1111/jep.12592, 22, 6, (868-874), (2016).
- Birgit Fullerton, Boris Pöhlmann, Robert Krohn, John L. Adams, Ferdinand M. Gerlach, Antje Erler, The Comparison of Matching Methods Using Different Measures of Balance: Benefits and Risks Exemplified within a Study to Evaluate the Effects of German Disease Management Programs on Long‐Term Outcomes of Patients with Type 2 Diabetes, Health Services Research, 10.1111/1475-6773.12452, 51, 5, (1960-1980), (2016).
- Ariel Linden, S. Derya Uysal, Andrew Ryan, John L. Adams, Estimating causal effects for multivalued treatments: a comparison of approaches, Statistics in Medicine, 10.1002/sim.6768, 35, 4, (534-552), (2015).




