Volume 27, Issue 12
Research Article

A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003

Peter C. Austin

Corresponding Author

E-mail address: peter.austin@ices.on.ca

Institute for Clinical Evaluative Sciences, Toronto, Ont., Canada

Department of Public Health Sciences, University of Toronto, Toronto, Ont., Canada

Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ont., Canada

Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ont., Canada M4N 3M5Search for more papers by this author
First published: 23 November 2007
Citations: 655

Abstract

Propensity‐score methods are increasingly being used to reduce the impact of treatment‐selection bias in the estimation of treatment effects using observational data. Commonly used propensity‐score methods include covariate adjustment using the propensity score, stratification on the propensity score, and propensity‐score matching. Empirical and theoretical research has demonstrated that matching on the propensity score eliminates a greater proportion of baseline differences between treated and untreated subjects than does stratification on the propensity score. However, the analysis of propensity‐score‐matched samples requires statistical methods appropriate for matched‐pairs data. We critically evaluated 47 articles that were published between 1996 and 2003 in the medical literature and that employed propensity‐score matching. We found that only two of the articles reported the balance of baseline characteristics between treated and untreated subjects in the matched sample and used correct statistical methods to assess the degree of imbalance. Thirteen (28 per cent) of the articles explicitly used statistical methods appropriate for the analysis of matched data when estimating the treatment effect and its statistical significance. Common errors included using the log‐rank test to compare Kaplan–Meier survival curves in the matched sample, using Cox regression, logistic regression, chi‐squared tests, t‐tests, and Wilcoxon rank sum tests in the matched sample, thereby failing to account for the matched nature of the data. We provide guidelines for the analysis and reporting of studies that employ propensity‐score matching. Copyright © 2007 John Wiley & Sons, Ltd.

Number of times cited according to CrossRef: 655

  • The Value of Total Lymph Nodes Examined and Number of Positive Lymph Nodes in Determining the Role of Adjuvant Radiation in Pancreatic Cancer Patients, Pancreas, 10.1097/MPA.0000000000001512, 49, 3, (435-441), (2020).
  • Flexible regression approach to propensity score analysis and its relationship with matching and weighting, Statistics in Medicine, 10.1002/sim.8526, 39, 15, (2017-2034), (2020).
  • Associations of Benzodiazepine With Adverse Prognosis in Heart Failure Patients With Insomnia, Journal of the American Heart Association, 10.1161/JAHA.119.013982, 9, 7, (2020).
  • Impact of Early-Onset Sepsis and Antibiotic Use on Death or Survival with Neurodevelopmental Impairment at 2 Years of Age among Extremely Preterm Infants, The Journal of Pediatrics, 10.1016/j.jpeds.2020.02.038, 221, (39-46.e5), (2020).
  • Direct Medical Burden of Antimicrobial Resistant Healthcare-Associated Infections - Empirical Evidence from China, Journal of Hospital Infection, 10.1016/j.jhin.2020.01.003, (2020).
  • Risk of acute mesenteric ischemia in patients with diabetes: A population-based cohort study in Taiwan, Atherosclerosis, 10.1016/j.atherosclerosis.2020.01.016, (2020).
  • Variance estimation when using propensity‐score matching with replacement with survival or time‐to‐event outcomes, Statistics in Medicine, 10.1002/sim.8502, 39, 11, (1623-1640), (2020).
  • Association of Epoxide Hydrolase 2 Gene Arg287Gln with the Risk for Primary Hypertension in Chinese, International Journal of Hypertension, 10.1155/2020/2351547, 2020, (1-7), (2020).
  • Elderly Patients Undergoing Minimally Invasive Transforaminal Lumbar Interbody Fusion May Have Similar Clinical Outcomes, Perioperative Complications, and Fusion Rates As Their Younger Counterparts, Clinical Orthopaedics and Related Research, 10.1097/CORR.0000000000001054, 478, 4, (822-832), (2020).
  • On Propensity Score Methodology, Statistics for Data Science and Policy Analysis, 10.1007/978-981-15-1735-8, (41-53), (2020).
  • A Variant in the NEDD4L Gene Associates With Hypertension in Chronic Kidney Disease in the Southeastern Han Chinese Population, American Journal of Hypertension, 10.1093/ajh/hpaa015, 33, 4, (341-349), (2020).
  • Modeling Dissolved and Particulate Th in the Canada Basin: Implications for Recent Changes in Particle Flux and Intermediate Circulation, Journal of Geophysical Research: Oceans, 10.1029/2019JC015640, 125, 2, (2020).
  • Numerical Experiments on Variation of Freshwater Plume and Leakage Effect From Mississippi River Diversion in the Lake Pontchartrain Estuary, Journal of Geophysical Research: Oceans, 10.1029/2019JC015282, 125, 2, (2020).
  • Transesophageal Echocardiography, Mortality, and Length of Hospitalization after Cardiac Valve Surgery, Journal of the American Society of Echocardiography, 10.1016/j.echo.2020.01.014, (2020).
  • Introduction to Applied Statistics—Chapter 1 Propensity Score Analysis, Annals of Clinical Epidemiology, 10.37737/ace.2.2_33, 2, 2, (33-37), (2020).
  • Performance of matching methods in studies of rare diseases: a simulation study, Intractable & Rare Diseases Research, 10.5582/irdr.2020.01016, (2020).
  • A review of the use of propensity score diagnostics in papers published in high-ranking medical journals, BMC Medical Research Methodology, 10.1186/s12874-020-00994-0, 20, 1, (2020).
  • Association Between Anemia and Dementia: A Nationwide, Populationbased Cohort Study in Taiwan, Current Alzheimer Research, 10.2174/1567205017666200317101516, 17, 2, (196-204), (2020).
  • Association Between Comorbid Psychiatric Disorders and Hospital Resource Utilization in Physically Ill Pediatric Inpatients: A Case-Matched Analysis, Journal of the American Academy of Child & Adolescent Psychiatry, 10.1016/j.jaac.2020.07.889, (2020).
  • A brief introduction to propensity score for anesthesiologists, Korean Journal of Anesthesiology, 10.4097/kja.20016, 73, 4, (296-301), (2020).
  • Cross-Sectional Studies, Chest, 10.1016/j.chest.2020.03.012, 158, 1, (S65-S71), (2020).
  • Data Missingness Patterns in Homicide Datasets: An Applied Test on a Primary Data Set, Violence and Victims, 10.1891/VV-D-17-00189, 35, 4, (589-614), (2020).
  • Evaluation of propensity score used in cardiovascular research: a cross-sectional survey and guidance document, BMJ Open, 10.1136/bmjopen-2020-036961, 10, 8, (e036961), (2020).
  • Survival analysis of patients with glioblastoma treated by long-term administration of temozolomide, Medicine, 10.1097/MD.0000000000018591, 99, 2, (e18591), (2020).
  • Brief discussion on sampling variability in 1:1 propensity score matching without replacement, Pharmacoepidemiology and Drug Safety, 10.1002/pds.5094, 29, 9, (1194-1197), (2020).
  • Is There a Role for Hypofractionated Thoracic Radiation Therapy in Limited-Stage Small Cell Lung Cancer? A Propensity Score Matched Analysis, International Journal of Radiation Oncology*Biology*Physics, 10.1016/j.ijrobp.2020.06.008, (2020).
  • Unknown confounders did not bias the treatment effect when improving balance of known confounders in randomized trials, Journal of Clinical Epidemiology, 10.1016/j.jclinepi.2020.06.012, 126, (9-16), (2020).
  • Population-based outcomes by immunosuppressed status in patients undergoing radiotherapy for oropharyngeal cancer, Radiotherapy and Oncology, 10.1016/j.radonc.2020.08.005, (2020).
  • Reconsidering α-Blockade for the Management of Hypertension in Patients With CKD, American Journal of Kidney Diseases, 10.1053/j.ajkd.2020.08.007, (2020).
  • Not All Abdomens Are the Same: A Comparison of Damage Control Surgery for Intra-abdominal Sepsis versus Trauma , The American Surgeon, 10.1177/000313481608200518, 82, 5, (427-432), (2020).
  • Comparing Survival between Peritoneal Dialysis and Hemodialysis Treatment in Esrd Patients with Chronic Hepatitis C Infection, Peritoneal Dialysis International: Journal of the International Society for Peritoneal Dialysis, 10.1177/089686081003000101, 30, 1, (86-90), (2020).
  • Causal Inference and Estimands in Clinical Trials, Statistics in Biopharmaceutical Research, 10.1080/19466315.2019.1697739, (1-14), (2020).
  • Comparison of propensity score methods for pre-specified subgroup analysis with survival data, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2020.1730868, (1-18), (2020).
  • Propensity score matched mortality comparisons of peritoneal and in-centre haemodialysis: systematic review and meta-analysis, Nephrology Dialysis Transplantation, 10.1093/ndt/gfz278, (2020).
  • Outcomes for Elderly Patients Aged 70 to 80 Years or Older with Locally Advanced Oral Cavity Squamous Cell Carcinoma: A Propensity Score–Matched, Nationwide, Oldest Old Patient–Based Cohort Study, Cancers, 10.3390/cancers12020258, 12, 2, (258), (2020).
  • Comparison between Y90 Radioembolization Plus Sorafenib and Y90 Radioembolization alone in the Treatment of Hepatocellular Carcinoma: A Propensity Score Analysis, Cancers, 10.3390/cancers12040897, 12, 4, (897), (2020).
  • Investigating the Performance of Propensity Score Approaches for Differential Item Functioning Analysis, Journal of Modern Applied Statistical Methods, 10.22237/jmasm/1556669280, 18, 1, (2-26), (2020).
  • Association of Chemotherapy With Survival in Elderly Patients With Multiple Comorbidities and Estrogen Receptor–Positive, Node-Positive Breast Cancer, JAMA Oncology, 10.1001/jamaoncol.2020.2388, (2020).
  • Weighted nearest neighbours-based control group selection method for observational studies, PLOS ONE, 10.1371/journal.pone.0236531, 15, 7, (e0236531), (2020).
  • Reproductive coercion sometimes works: evaluating whether young African-American women who experience reproductive coercion or birth control sabotage are more likely to become pregnant, Health Services and Outcomes Research Methodology, 10.1007/s10742-020-00213-9, (2020).
  • Insurance-Mandated Medical Weight Management Programs in Sleeve Gastrectomy Patients Do Not Improve Postoperative Weight Loss Outcomes at 1 Year, Obesity Surgery, 10.1007/s11695-020-04692-0, (2020).
  • Impact of Patient Portal Use on Duplicate Laboratory Tests in Diabetes Management, Telemedicine and e-Health, 10.1089/tmj.2019.0237, (2020).
  • Efficacy and Safety of Non-Anesthesiologist Administration of Propofol Sedation in Endoscopic Ultrasound: A Propensity Score Analysis, Diagnostics, 10.3390/diagnostics10100791, 10, 10, (791), (2020).
  • Contrast-Enhanced Harmonic Endoscopic Ultrasound-Guided Fine-Needle Aspiration versus Standard Fine-Needle Aspiration in Pancreatic Masses: A Propensity Score Analysis, Diagnostics, 10.3390/diagnostics10100792, 10, 10, (792), (2020).
  • Association between Anemia and Stroke in Females: A Nationwide, Population-Based Cohort Study in Taiwan, International Journal of Environmental Research and Public Health, 10.3390/ijerph17207440, 17, 20, (7440), (2020).
  • Evaluating the Utility of Coarsened Exact Matching for Pharmacoepidemiology Using Real and Simulated Claims Data, American Journal of Epidemiology, 10.1093/aje/kwz268, 189, 6, (613-622), (2019).
  • Quantifying how small variations in design elements affect risk in an incident cohort study in claims, Pharmacoepidemiology and Drug Safety, 10.1002/pds.4892, 29, 1, (84-93), (2019).
  • Effortless or less effort? Effects of tracks on students’ engagement, British Journal of Educational Psychology, 10.1111/bjep.12290, 90, 2, (487-516), (2019).
  • Effect of tiotropium inhaler use on mortality in patients with tuberculous destroyed lung: based on linkage between hospital and nationwide health insurance claims data in South Korea, Respiratory Research, 10.1186/s12931-019-1055-5, 20, 1, (2019).
  • Perioperative Antibiotic Prophylaxis in Total Joint Arthroplasty, The Journal of Bone and Joint Surgery, 10.2106/JBJS.18.00336, 101, 5, (429-437), (2019).
  • The educational aspirations and psychological well-being of adopted young people in the UK, Adoption & Fostering, 10.1177/0308575919826900, 43, 1, (46-59), (2019).
  • OUTCOMES OF UNIPORTAL VS MULTIPORTAL VIDEO-ASSISTED THORACOSCOPIC LOBECTOMY, Seminars in Thoracic and Cardiovascular Surgery, 10.1053/j.semtcvs.2019.05.021, (2019).
  • Staged partial hepatectomy versus transarterial chemoembolization for the treatment of spontaneous hepatocellular carcinoma rupture: a multicenter analysis in Korea, Annals of Surgical Treatment and Research, 10.4174/astr.2019.96.6.275, 96, 6, (275), (2019).
  • Hazard Ratio Estimators after Terminating Observation within Matched Pairs in Sibling and Propensity Score Matched Designs, The International Journal of Biostatistics, 10.1515/ijb-2017-0103, 15, 1, (2019).
  • Pathological features and survival analysis of gastric cancer patients with positive surgical margins: A large multicenter cohort study, European Journal of Surgical Oncology, 10.1016/j.ejso.2019.06.026, (2019).
  • Stereotactic Body Radiotherapy Versus Surgery for Early-Stage Non–Small-Cell Lung Cancer, Journal of Surgical Research, 10.1016/j.jss.2019.04.083, 243, (346-353), (2019).
  • High serum IgA/C3 ratio better predicts a diagnosis of IgA nephropathy among primary glomerular nephropathy patients with proteinuria ≤ 1 g/d: an observational cross-sectional study, BMC Nephrology, 10.1186/s12882-019-1331-0, 20, 1, (2019).
  • What is the treatment effect of surgery compared with nonoperative treatment in patients with lumbar spinal stenosis at 1-year follow-up?, Journal of Neurosurgery: Spine, 10.3171/2019.1.SPINE181098, 31, 2, (185-193), (2019).
  • Robotic Thyroidectomy Decreases Postoperative Pain Compared With Conventional Thyroidectomy, Surgical Laparoscopy, Endoscopy & Percutaneous Techniques, 10.1097/SLE.0000000000000689, 29, 4, (255-260), (2019).
  • Lumpectomy Plus Hormone or Radiation Therapy Alone for Women Aged 70 Years or Older With Hormone Receptor–Positive Early Stage Breast Cancer in the Modern Era: An Analysis of the National Cancer Database, International Journal of Radiation Oncology*Biology*Physics, 10.1016/j.ijrobp.2019.07.052, (2019).
  • Using propensity score matching with doses in observational studies: An example from a child physical abuse and sleep quality study, Research in Nursing & Health, 10.1002/nur.21991, 42, 6, (436-445), (2019).
  • In vitro fertilization is associated with the onset and progression of preeclampsia, Placenta, 10.1016/j.placenta.2019.09.011, (2019).
  • Integrating a safety smart list into the electronic health record decreases intensive care unit length of stay and cost, Journal of Critical Care, 10.1016/j.jcrc.2019.09.016, (2019).
  • Effects of exenatide and open-label SGLT2 inhibitor treatment, given in parallel or sequentially, on mortality and cardiovascular and renal outcomes in type 2 diabetes: insights from the EXSCEL trial, Cardiovascular Diabetology, 10.1186/s12933-019-0942-x, 18, 1, (2019).
  • Use of Propensity Score Methodology in Contemporary High-Impact Surgical Literature, Journal of the American College of Surgeons, 10.1016/j.jamcollsurg.2019.10.003, (2019).
  • Association of Contrast and Acute Kidney Injury in the Critically Ill, Chest, 10.1016/j.chest.2019.10.005, (2019).
  • Legume-based rotations have clear economic advantages over cereal monocropping in dry areas, Agronomy for Sustainable Development, 10.1007/s13593-019-0602-2, 39, 6, (2019).
  • A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting, Biometrical Journal, 10.1002/bimj.201800132, 61, 4, (1049-1072), (2019).
  • Neoadjuvant chemotherapy improves survival compared with concurrent chemoradiation alone in nasopharyngeal carcinoma patients with N3 disease, Head & Neck, 10.1002/hed.25955, 41, 12, (4076-4087), (2019).
  • Bayesian estimation of the average treatment effect on the treated using inverse weighting, Statistics in Medicine, 10.1002/sim.8121, 38, 13, (2447-2466), (2019).
  • First Intracranial Pressure Monitoring or First Operation: Which One Is Better?, World Neurosurgery, 10.1016/j.wneu.2019.08.166, (2019).
  • Can propensity score matching be applied to cross-sectional data to evaluate Community-Based Rehabilitation? Results of a survey implementing the WHO’s Community-Based Rehabilitation indicators in Vietnam, BMJ Open, 10.1136/bmjopen-2018-022544, 9, 1, (e022544), (2019).
  • Obstructive sleep apnoea, positive airway pressure treatment and postoperative delirium: protocol for a retrospective observational study, BMJ Open, 10.1136/bmjopen-2018-026649, 9, 8, (e026649), (2019).
  • Prison Addiction Program and the Role of Integrative Treatment and Program Completion on Recidivism, International Journal of Offender Therapy and Comparative Criminology, 10.1177/0306624X19871650, 63, 15-16, (2741-2770), (2019).
  • The Effects of Glycopyrrolate as Premedication on Post-Operative Nausea and Vomiting: A Propensity Score Matching Analysis, The Open Anesthesia Journal, 10.2174/2589645801913010093, 13, 1, (93-99), (2019).
  • Healthcare Utilization and Costs for Patients With Parkinson's Disease After Deep Brain Stimulation, Movement Disorders Clinical Practice, 10.1002/mdc3.12765, 6, 5, (369-378), (2019).
  • Evaluating the use of bootstrapping in cohort studies conducted with 1:1 propensity score matching—A plasmode simulation study, Pharmacoepidemiology and Drug Safety, 10.1002/pds.4784, 28, 6, (879-886), (2019).
  • Risk for pneumonia requiring hospitalization or emergency room visit according to delivery device for inhaled corticosteroid/long-acting beta-agonist in patients with chronic airway diseases as real-world evidence, Scientific Reports, 10.1038/s41598-019-48355-2, 9, 1, (2019).
  • There Is No Difference in Radiographic Outcomes After Average 9 Years After Arthroscopic Partial Medial Meniscectomy for Both Posterior Horn Tears and Posterior Horn Root Tears, Arthroscopy: The Journal of Arthroscopic & Related Surgery, 10.1016/j.arthro.2019.08.039, (2019).
  • Antibiotics Do Not Decrease the Rate of Infection After Endoscopic Ultrasound Fine-Needle Aspiration of Pancreatic Cysts, Digestive Diseases and Sciences, 10.1007/s10620-019-05655-x, (2019).
  • Disciplinary Segregation’s Effects on Inmate Behavior: Institutional and Community Outcomes, Criminal Justice Policy Review, 10.1177/0887403419862338, (088740341986233), (2019).
  • Prehemodialysis arteriovenous access creation is associated with better cardiovascular outcomes in patients receiving hemodialysis: a population-based cohort study, PeerJ, 10.7717/peerj.6680, 7, (e6680), (2019).
  • Comparing the high-dimensional propensity score for use with administrative data with propensity scores derived from high-quality clinical data, Statistical Methods in Medical Research, 10.1177/0962280219842362, (096228021984236), (2019).
  • Use of lipid-lowering agents is not associated with improved outcomes for tuberculosis patients on standard-course therapy: A population-based cohort study, PLOS ONE, 10.1371/journal.pone.0210479, 14, 1, (e0210479), (2019).
  • Propensity Score Methods in Health Technology Assessment: Principles, Extended Applications, and Recent Advances, Frontiers in Pharmacology, 10.3389/fphar.2019.00973, 10, (2019).
  • A Propensity Score Method for Investigating Differential Item Functioning in Performance Assessment, Educational and Psychological Measurement, 10.1177/0013164419878861, (001316441987886), (2019).
  • Challenges and Opportunities for Using Big Health Care Data to Advance Medical Science and Public Health, American Journal of Epidemiology, 10.1093/aje/kwy292, (2019).
  • Degenerative medial meniscus posterior root tear and non-root tear do not show differences in joint survival and clinical outcome after partial meniscectomy, Knee Surgery, Sports Traumatology, Arthroscopy, 10.1007/s00167-019-05771-1, (2019).
  • Clinical significance of preoperative chemoradiotherapy for advanced esophageal cancer, evaluated by propensity score matching and weighting of inverse probability of treatment, Molecular and Clinical Oncology, 10.3892/mco.2019.1843, (2019).
  • Effects of Hip Structure Analysis Variables on Hip Fracture: A Propensity Score Matching Study, Journal of Clinical Medicine, 10.3390/jcm8101507, 8, 10, (1507), (2019).
  • Symptomatic menopausal transition and risk of subsequent stroke, PeerJ, 10.7717/peerj.7964, 7, (e7964), (2019).
  • Feasibility of No Prophylactic Antibiotics Use in Patients Undergoing Total Laparoscopic Distal Gastrectomy for Gastric Carcinoma: a Propensity Score-Matched Case-Control Study, Journal of Gastric Cancer, 10.5230/jgc.2019.19.e41, 19, (2019).
  • Increased Risk of Pain after Electroconvulsive Therapy among Depressed Patients: a Nationwide Study in Taiwan, Psychiatric Quarterly, 10.1007/s11126-019-09682-9, (2019).
  • Estimating the Effect of a Teacher Training Program on Advanced Placement® Outcomes, International Journal of Computer Science Education in Schools, 10.21585/ijcses.v2i4.35, 2, 4, (3-21), (2019).
  • Healthcare resource use and cost associated with timing of pharmacological treatment for major depressive disorder in the United States: a real-world study, Current Medical Research and Opinion, 10.1080/03007995.2019.1652053, (1-1), (2019).
  • Prostate Cancer Outcomes Following Solid-Organ Transplantation: A SEER-Medicare Analysis, JNCI: Journal of the National Cancer Institute, 10.1093/jnci/djz221, (2019).
  • Design of randomized controlled confirmatory trials using historical control data to augment sample size for concurrent controls, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2018.1559853, (1-16), (2019).
  • Subgroup balancing propensity score, Statistical Methods in Medical Research, 10.1177/0962280219870836, (096228021987083), (2019).
  • Safe and Effective Management of Pain in People with CKD, Clinical Journal of the American Society of Nephrology, 10.2215/CJN.11140919, (CJN.11140919), (2019).
  • Assessing covariate balance when using the generalized propensity score with quantitative or continuous exposures, Statistical Methods in Medical Research, 10.1177/0962280218756159, 28, 5, (1365-1377), (2018).
  • See more

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.