Investigating differences in treatment effect estimates between propensity score matching and weighting: a demonstration using STAR*D trial data

Authors

  • Alan R. Ellis,

    Corresponding author
    • Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
    Search for more papers by this author
  • Stacie B. Dusetzina,

    1. Division of General Medicine and Clinical Epidemiology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
    2. Department of Health Policy and Management, UNC Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
    Search for more papers by this author
  • Richard A. Hansen,

    1. Department of Pharmacy Care Systems, Harrison School of Pharmacy, Auburn University, Auburn, Alabama, USA
    Search for more papers by this author
  • Bradley N. Gaynes,

    1. Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
    2. Department of Psychiatry, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
    Search for more papers by this author
  • Joel F. Farley,

    1. Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
    Search for more papers by this author
  • Til Stürmer

    1. Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
    2. Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
    3. Department of Epidemiology, University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
    Search for more papers by this author

  • This research was conducted at the University of North Carolina at Chapel Hill.

Correspondence to: Alan R. Ellis, PhD, MSW, Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, CB 7590, Chapel Hill, NC 27599–7590, USA. E-mail: are@unc.edu

ABSTRACT

Purpose

The choice of propensity score (PS) implementation influences treatment effect estimates not only because different methods estimate different quantities, but also because different estimators respond in different ways to phenomena such as treatment effect heterogeneity and limited availability of potential matches. Using effectiveness data, we describe lessons learned from sensitivity analyses with matched and weighted estimates.

Methods

With subsample data (N = 1292) from Sequenced Treatment Alternatives to Relieve Depression, a 2001–2004 effectiveness trial of depression treatments, we implemented PS matching and weighting to estimate the treatment effect in the treated and conducted multiple sensitivity analyses.

Results

Matching and weighting both balanced covariates but yielded different samples and treatment effect estimates (matched RR 1.00, 95% CI: 0.75–1.34; weighted RR 1.28, 95% CI: 0.97–1.69). In sensitivity analyses, as increasing numbers of observations at both ends of the PS distribution were excluded from the weighted analysis, weighted estimates approached the matched estimate (weighted RR 1.04, 95% CI 0.77–1.39 after excluding all observations below the 5th percentile of the treated and above the 95th percentile of the untreated). Treatment appeared to have benefits only in the highest and lowest PS strata.

Conclusions

Matched and weighted estimates differed due to incomplete matching, sensitivity of weighted estimates to extreme observations, and possibly treatment effect heterogeneity. PS analysis requires identifying the population and treatment effect of interest, selecting an appropriate implementation method, and conducting and reporting sensitivity analyses. Weighted estimation especially should include sensitivity analyses relating to influential observations, such as those treated contrary to prediction. Copyright © 2012 John Wiley & Sons, Ltd.

Ancillary