Weaknesses of goodness-of-fit tests for evaluating propensity score models: the case of the omitted confounder

Authors

  • Sherry Weitzen PhD,

    Corresponding author
    1. Department of Community Health, Brown Medical School, Providence, RI, USA
    2. Department of Obstetrics and Gynecology, Division of Research, Women and Infants Hospital, Providence, RI, USA
    • Department of Obstetrics and Gynecology, Box G-WIH, Brown University, Providence, RI 02912, USA.
    Search for more papers by this author
  • Kate L. Lapane PhD,

    1. Department of Community Health, Brown Medical School, Providence, RI, USA
    2. Center for Gerontology and Health Care Research, Brown Medical School, Providence, RI, USA
    Search for more papers by this author
  • Alicia Y. Toledano ScD,

    1. Department of Community Health, Brown Medical School, Providence, RI, USA
    2. Center for Statistical Sciences, Brown Medical School, Providence, RI, USA
    Search for more papers by this author
  • Anne L. Hume PharmD,

    1. Department of Pharmacy Practice, University of Rhode Island, Kingston, RI, USA
    2. Department of Family Medicine, Brown Medical School, Providence, RI, USA
    Search for more papers by this author
  • Vincent Mor PhD

    1. Department of Community Health, Brown Medical School, Providence, RI, USA
    2. Center for Gerontology and Health Care Research, Brown Medical School, Providence, RI, USA
    Search for more papers by this author

  • No conflict of interest was declared.

Abstract

Purpose

Propensity scores are used in observational studies to adjust for confounding, although they do not provide control for confounders omitted from the propensity score model. We sought to determine if tests used to evaluate logistic model fit and discrimination would be helpful in detecting the omission of an important confounder in the propensity score.

Methods

Using simulated data, we estimated propensity scores under two scenarios: (1) including all confounders and (2) omitting the binary confounder. We compared the propensity score model fit and discrimination under each scenario, using the Hosmer–Lemeshow goodness-of-fit (GOF) test and the c-statistic. We measured residual confounding in treatment effect estimates adjusted by the propensity score omitting the confounder.

Results

The GOF statistic and discrimination of propensity score models were the same for models excluding an important predictor of treatment compared to the full propensity score model. The GOF test failed to detect poor model fit for the propensity score model omitting the confounder. C-statistics under both scenarios were similar. Residual confounding was observed from using the propensity score excluding the confounder (range: 1–30%).

Conclusions

Omission of important confounders from the propensity score leads to residual confounding in estimates of treatment effect. However, tests of GOF and discrimination do not provide information to detect missing confounders in propensity score models. Our findings suggest that it may not be necessary to compute GOF statistics or model discrimination when developing propensity score models. Copyright © 2004 John Wiley & Sons, Ltd.

Ancillary