Article first published online: 3 MAY 2013
Copyright © 2013 John Wiley & Sons, Ltd.
Statistics in Medicine
Volume 32, Issue 24, pages 4196–4210, 30 October 2013
How to Cite
Demler, O. V., Pencina, M. J. and D'Agostino, R. B. (2013), Impact of correlation on predictive ability of biomarkers. Statist. Med., 32: 4196–4210. doi: 10.1002/sim.5824
- Issue published online: 1 OCT 2013
- Article first published online: 3 MAY 2013
- Manuscript Accepted: 22 MAR 2013
- Manuscript Revised: 20 MAR 2013
- Manuscript Received: 18 JUL 2012
- risk prediction model;
- linear discriminant analysis;
- logistic regression
In this paper, we investigate how the correlation structure of independent variables affects the discrimination of risk prediction model. Using multivariate normal data and binary outcome, we prove that zero correlation among predictors is often detrimental for discrimination in a risk prediction model and negatively correlated predictors with positive effect sizes are beneficial. A very high multiple R-squared from regressing the new predictor on the old ones can also be beneficial. As a practical guide to new variable selection, we recommend to select predictors that have negative correlation with the risk score based on the existing variables. This step is easy to implement even when the number of new predictors is large. We illustrate our results by using real-life Framingham data suggesting that the conclusions hold outside of normality. The findings presented in this paper might be useful for preliminary selection of potentially important predictors, especially is situations where the number of predictors is large. Copyright © 2013 John Wiley & Sons, Ltd.