Measuring balance and model selection in propensity score methods
ABSTRACT
Purpose
Propensity score (PS) methods focus on balancing confounders between groups to estimate an unbiased treatment or exposure effect. However, there is lack of attention in actually measuring, reporting and using the information on balance, for instance for model selection. We propose to use a measure for balance in PS methods and describe several of such measures: the overlapping coefficient, the Kolmogorov‐Smirnov distance, and the Lévy distance.
Methods
We performed simulation studies to estimate the association between these three and several mean based measures for balance and bias (i.e., discrepancy between the true and the estimated treatment effect).
Results
For large sample sizes (n = 2000) the average Pearson's correlation coefficients between bias and Kolmogorov‐Smirnov distance (r = 0.89), the Lévy distance (r = 0.89) and the absolute standardized mean difference (r = 0.90) were similar, whereas this was lower for the overlapping coefficient (r = −0.42). When sample size decreased to 400, mean based measures of balance had stronger correlations with bias. Models including all confounding variables, their squares and interaction terms resulted in smaller bias than models that included only main terms for confounding variables.
Conclusions
We conclude that measures for balance are useful for reporting the amount of balance reached in propensity score analysis and can be helpful in selecting the final PS model. Copyright © 2011 John Wiley & Sons, Ltd.
Citing Literature
Number of times cited according to CrossRef: 43
- Emily Granger, Tim Watkins, Jamie C. Sergeant, Mark Lunt, A review of the use of propensity score diagnostics in papers published in high-ranking medical journals, BMC Medical Research Methodology, 10.1186/s12874-020-00994-0, 20, 1, (2020).
- Vijay Krishnamoorthy, Alan R. Ellis, Duncan J. McLean, Mihaela S. Stefan, Brian H. Nathanson, Julien Cobert, Peter K. Lindenauer, M. Alan Brookhart, Tetsu Ohnuma, Karthik Raghunathan, Bleeding After Musculoskeletal Surgery in Hospitals That Switched From Hydroxyethyl Starch to Albumin Following a Food and Drug Administration Warning, Anesthesia & Analgesia, 10.1213/ANE.0000000000004942, 131, 4, (1193-1200), (2020).
- Anais Andrillon, Romain Pirracchio, Sylvie Chevret, Performance of propensity score matching to estimate causal effects in small samples, Statistical Methods in Medical Research, 10.1177/0962280219887196, 29, 3, (644-658), (2020).
- Adeola Oyenubi, Martin Wittenberg, Does the choice of balance-measure matter under genetic matching?, Empirical Economics, 10.1007/s00181-020-01873-9, (2020).
- Paul I. Ramler, Dacia D. C. A. Henriquez, Thomas Akker, Camila Caram‐Deelder, Rolf H. H. Groenwold, Kitty W. M. Bloemenkamp, Jos Roosmalen, Jan M. M. Lith, Johanna G. Bom, Comparison of outcome between intrauterine balloon tamponade and uterine artery embolization in the management of persistent postpartum hemorrhage: A propensity score‐matched cohort study, Acta Obstetricia et Gynecologica Scandinavica, 10.1111/aogs.13679, 98, 11, (1473-1482), (2019).
- Emily Granger, Jamie C. Sergeant, Mark Lunt, Avoiding pitfalls when combining multiple imputation and propensity scores, Statistics in Medicine, 10.1002/sim.8355, 38, 26, (5120-5132), (2019).
- David B. Price, Jaco Voorham, Guy Brusselle, Andreas Clemens, Konstantinos Kostikas, Jeffrey W. Stephens, Hye Yun Park, Nicolas Roche, Robert Fogel, Inhaled corticosteroids in COPD and onset of type 2 diabetes and osteoporosis: matched cohort study, npj Primary Care Respiratory Medicine, 10.1038/s41533-019-0150-x, 29, 1, (2019).
- M Sanni Ali, Daniel Prieto-Alhambra, Luciane Cruz Lopes, Dandara Ramos, Nivea Bispo, Maria Y. Ichihara, Julia M. Pescarini, Elizabeth Williamson, Rosemeire L. Fiaccone, Mauricio L. Barreto, Liam Smeeth, Propensity Score Methods in Health Technology Assessment: Principles, Extended Applications, and Recent Advances, Frontiers in Pharmacology, 10.3389/fphar.2019.00973, 10, (2019).
- Elizabeth L Turner, Lanqiu Yao, Fan Li, Melanie Prague, Properties and pitfalls of weighting as an alternative to multilevel multiple imputation in cluster randomized trials with missing binary outcomes under covariate-dependent missingness, Statistical Methods in Medical Research, 10.1177/0962280219859915, (096228021985991), (2019).
- Fei Wan, Matched or unmatched analyses with propensity‐score–matched data?, Statistics in Medicine, 10.1002/sim.7976, 38, 2, (289-300), (2018).
- Bas B.L. Penning de Vries, Maarten van Smeden, Rolf H.H. Groenwold, Propensity Score Estimation Using Classification and Regression Trees in the Presence of Missing Covariate Data, Epidemiologic Methods, 10.1515/em-2017-0020, 0, 0, (2018).
- Yeying Zhu, Jennifer S. Savage, Debashis Ghosh, A Kernel-Based Metric for Balance Assessment, Journal of Causal Inference, 10.1515/jci-2016-0029, 0, 0, (2018).
- Elizabeth L. Turner, Melanie Prague, John A. Gallis, Fan Li, David M. Murray, Review of Recent Methodological Developments in Group-Randomized Trials: Part 2—Analysis, American Journal of Public Health, 10.2105/AJPH.2017.303707, 107, 7, (1078-1086), (2017).
- Tri-Long Nguyen, Gary S. Collins, Jessica Spence, Jean-Pierre Daurès, P. J. Devereaux, Paul Landais, Yannick Le Manach, Double-adjustment in propensity score matching analysis: choosing a threshold for considering residual imbalance, BMC Medical Research Methodology, 10.1186/s12874-017-0338-0, 17, 1, (2017).
- John W. Jackson, Ian Schmid, Elizabeth A. Stuart, Propensity Scores in Pharmacoepidemiology: Beyond the Horizon, Current Epidemiology Reports, 10.1007/s40471-017-0131-y, 4, 4, (271-280), (2017).
- Richard Wyss, Ben B. Hansen, Alan R. Ellis, Joshua J. Gagne, Rishi J. Desai, Robert J. Glynn, Til Stürmer, The “Dry-Run” Analysis: A Method for Evaluating Risk Scores for Confounding Control, American Journal of Epidemiology, 10.1093/aje/kwx032, 185, 9, (842-852), (2017).
- Diana Trutschel, Rebecca Palm, Bernhard Holle, Michael Simon, Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias, International Journal of Nursing Studies, 10.1016/j.ijnurstu.2017.06.017, 76, (36-44), (2017).
- Kirsten Kainz, Noah Greifer, Ashley Givens, Karen Swietek, Brianna M. Lombardi, Susannah Zietz, Jamie L. Kohn, Improving Causal Inference: Recommendations for Covariate Selection and Balance in Propensity Score Methods, Journal of the Society for Social Work and Research, 10.1086/691464, 8, 2, (279-303), (2017).
- Patrick Thornton, Barbara L. McFarlin, Chang Park, Kristin Rankin, Mavis Schorn, Lorna Finnegan, Susan Stapleton, Cesarean Outcomes in US Birth Centers and Collaborating Hospitals: A Cohort Comparison, Journal of Midwifery & Women's Health, 10.1111/jmwh.12553, 62, 1, (40-48), (2016).
- Jason R. Guertin, Elham Rahme, Colin R. Dormuth, Jacques LeLorier, Head to head comparison of the propensity score and the high-dimensional propensity score matching methods, BMC Medical Research Methodology, 10.1186/s12874-016-0119-1, 16, 1, (2016).
- Bo Fu, Li Su, Missing Confounder Data in Propensity Score Methods for Causal Inference, Statistical Causal Inferences and Their Applications in Public Health Research, 10.1007/978-3-319-41259-7_5, (101-110), (2016).
- Clémence Leyrat, Agnès Caille, Yohann Foucher, Bruno Giraudeau, Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic, BMC Medical Research Methodology, 10.1186/s12874-015-0100-4, 16, 1, (2016).
- Jason R Guertin, Elham Rahme, Jacques LeLorier, Performance of the high-dimensional propensity score in adjusting for unmeasured confounders, European Journal of Clinical Pharmacology, 10.1007/s00228-016-2118-x, 72, 12, (1497-1505), (2016).
- Md. Jamal Uddin, Rolf H. H. Groenwold, Anthonius Boer, Ana S. M. Afonso, Paola Primatesta, Claudia Becker, Svetlana V. Belitser, Arno W. Hoes, Kit C. B. Roes, Olaf H. Klungel, Evaluating different physician's prescribing preference based instrumental variables in two primary care databases: a study of inhaled long‐acting beta2‐agonist use and the risk of myocardial infarction, Pharmacoepidemiology and Drug Safety, 10.1002/pds.3860, 25, S1, (132-141), (2016).
- Md Jamal Uddin, Rolf H. H. Groenwold, Anthonius Boer, Helga Gardarsdottir, Elisa Martin, Gianmario Candore, Svetlana V. Belitser, Arno W. Hoes, Kit C. B. Roes, Olaf H. Klungel, Instrumental variables analysis using multiple databases: an example of antidepressant use and risk of hip fracture, Pharmacoepidemiology and Drug Safety, 10.1002/pds.3863, 25, S1, (122-131), (2016).
- Robert F. Reynolds, Xavier Kurz, Mark C.H. Groot, Raymond G. Schlienger, Lamiae Grimaldi‐Bensouda, Stephanie Tcherny‐Lessenot, Olaf H. Klungel, The IMI PROTECT project: purpose, organizational structure, and procedures, Pharmacoepidemiology and Drug Safety, 10.1002/pds.3933, 25, S1, (5-10), (2016).
- Melanie Prague, Rui Wang, Alisa Stephens, Eric Tchetgen Tchetgen, Victor DeGruttola, Accounting for interactions and complex inter‐subject dependency in estimating treatment effect in cluster‐randomized trials with missing outcomes, Biometrics, 10.1111/biom.12519, 72, 4, (1066-1077), (2016).
- Emmanuel Caruana, Sylvie Chevret, Matthieu Resche-Rigon, Romain Pirracchio, A new weighted balance measure helped to select the variables to be included in a propensity score model, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2015.04.009, 68, 12, (1415-1422.e2), (2015).
- Nina R. O'Connor, Mary E. Moyer, Maryam Behta, David J. Casarett, The Impact of Inpatient Palliative Care Consultations on 30-Day Hospital Readmissions, Journal of Palliative Medicine, 10.1089/jpm.2015.0138, 18, 11, (956-961), (2015).
- C. Marijn Hazelbag, Irene J. Zaal, John W. Devlin, Nicolle M. Gatto, Arno W. Hoes, Arjen J. C. Slooter, Rolf H. H. Groenwold, An Application of Inverse Probability Weighting Estimation of Marginal Structural Models of a Continuous Exposure, Epidemiology, 10.1097/EDE.0000000000000346, 26, 5, (e52-e53), (2015).
- M. Sanni Ali, Rolf H.H. Groenwold, Svetlana V. Belitser, Wiebe R. Pestman, Arno W. Hoes, Kit C.B. Roes, Anthonius de Boer, Olaf H. Klungel, Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2014.08.011, 68, 2, (122-131), (2015).
- Bryan Keller, Jee-Seon Kim, Peter M. Steiner, Neural Networks for Propensity Score Estimation: Simulation Results and Recommendations, Quantitative Psychology Research, 10.1007/978-3-319-19977-1_20, (279-291), (2015).
- David Casarett, Joan Harrold, Pamela S. Harris, Laura Bender, Sue Farrington, Eugenia Smither, Kevin Ache, Joan Teno, Does Continuous Hospice Care Help Patients Remain at Home?, Journal of Pain and Symptom Management, 10.1016/j.jpainsymman.2015.04.007, 50, 3, (297-304), (2015).
- M. Sanni Ali, Md. Jamal Uddin, R. H. H. Groenwold, W. R. Pestman, S. V. Belitser, A. W. Hoes, A. de Boer, K. C. B. Roes, Olaf H. Klungel, Quantitative Falsification of Instrumental Variables Assumption Using Balance Measures, Epidemiology, 10.1097/EDE.0000000000000152, 25, 5, (770-772), (2014).
- Elizabeth Tipton, How Generalizable Is Your Experiment? An Index for Comparing Experimental Samples and Populations, Journal of Educational and Behavioral Statistics, 10.3102/1076998614558486, 39, 6, (478-501), (2014).
- M. Sanni Ali, Rolf H. H. Groenwold, Wiebe R. Pestman, Svetlana V. Belitser, Kit C. B. Roes, Arno W. Hoes, Anthonius Boer, Olaf H. Klungel, Propensity score balance measures in pharmacoepidemiology: a simulation study, Pharmacoepidemiology and Drug Safety, 10.1002/pds.3574, 23, 8, (802-811), (2014).
- Jessica M. Franklin, Jeremy A. Rassen, Diana Ackermann, Dorothee B. Bartels, Sebastian Schneeweiss, Metrics for covariate balance in cohort studies of causal effects, Statistics in Medicine, 10.1002/sim.6058, 33, 10, (1685-1699), (2013).
- Anjana Ranganathan, Meredith Dougherty, David Waite, David Casarett, Can Palliative Home Care Reduce 30-Day Readmissions? Results of a Propensity Score Matched Cohort Study, Journal of Palliative Medicine, 10.1089/jpm.2013.0213, 16, 10, (1290-1293), (2013).
- Jeremy A. Rassen, Abhi A. Shelat, Jessica M. Franklin, Robert J. Glynn, Daniel H. Solomon, Sebastian Schneeweiss, Matching by Propensity Score in Cohort Studies with Three Treatment Groups, Epidemiology, 10.1097/EDE.0b013e318289dedf, 24, 3, (401-409), (2013).
- Oliver Kuss, The z-difference can be used to measure covariate balance in matched propensity score analyses, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2013.06.001, 66, 11, (1302-1307), (2013).
- Elizabeth A. Stuart, Brian K. Lee, Finbarr P. Leacy, Prognostic score–based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2013.01.013, 66, 8, (S84-S90.e1), (2013).
- William R. Shadish, Propensity score analysis: promise, reality and irrational exuberance, Journal of Experimental Criminology, 10.1007/s11292-012-9166-8, 9, 2, (129-144), (2012).
- Rolf H. H. Groenwold, Frank Vries, Anthonius Boer, Wiebe R. Pestman, Frans H. Rutten, Arno W. Hoes, Olaf H. Klungel, Balance measures for propensity score methods: a clinical example on beta‐agonist use and the risk of myocardial infarction, Pharmacoepidemiology and Drug Safety, 10.1002/pds.2251, 20, 11, (1130-1137), (2011).




