Article first published online: 7 JAN 2013
Copyright © 2013 John Wiley & Sons, Ltd.
Statistics in Medicine
Volume 32, Issue 9, pages 1467–1482, 30 April 2013
How to Cite
Pepe, M. S., Kerr, K. F., Longton, G. and Wang, Z. (2013), Testing for improvement in prediction model performance. Statist. Med., 32: 1467–1482. doi: 10.1002/sim.5727
- Issue published online: 10 APR 2013
- Article first published online: 7 JAN 2013
- Manuscript Accepted: 11 DEC 2012
- Manuscript Received: 19 DEC 2011
- National Institutes of Health. Grant Numbers: GM54438, CA86368
- logistic regression;
- receiver operating characteristic curve;
- risk factors;
- risk reclassification
Authors have proposed new methodology in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, H0 : P(D = 1 | X,Y ) = P(D = 1 | X). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We also investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing for no improvement in performance. Copyright © 2013 John Wiley & Sons, Ltd.