• Open Access

Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples


  • Peter C. Austin

    Corresponding author
    1. Institute for Clinical Evaluative Sciences, Toronto, Ont., Canada
    2. Department of Health Management, Policy and Evaluation, University of Toronto, Toronto, Ont., Canada
    3. Dalla Lana School of Public Health, University of Toronto, Toronto, Ont., Canada
    • Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ont., Canada M4N 3M5
    Search for more papers by this author


Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring. Copyright © 2011 John Wiley & Sons, Ltd.