A comparison of 12 algorithms for matching on the propensity score


  • Peter C. Austin

    Corresponding author
    1. Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada
    2. Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
    3. Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, Ontario, Canada
    • Correspondence to: Peter C. Austin, Institute for Clinical Evaluative Sciences G1 06, 2075 Bayview Avenue Toronto, Ontario M4N 3M5 Canada.

      E-mail: peter.austin@ices.on.ca

    Search for more papers by this author


Propensity-score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub-algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.