Special Issue Article
Is your QSAR/QSPR descriptor real or trash?
Article first published online: 21 OCT 2010
Copyright © 2010 John Wiley & Sons, Ltd.
Journal of Chemometrics
Special Issue: Herman Wold Medal Winners 2007–2009
Volume 24, Issue 11-12, pages 681–693, November - December 2010
How to Cite
Kiralj, R. and Ferreira, M. M. C. (2010), Is your QSAR/QSPR descriptor real or trash?. J. Chemometrics, 24: 681–693. doi: 10.1002/cem.1331
- Issue published online: 29 DEC 2010
- Article first published online: 21 OCT 2010
- Manuscript Accepted: 2 JUN 2010
- Manuscript Revised: 20 MAY 2010
- Manuscript Received: 18 FEB 2010
- univariate regression;
- multivariate regression;
The sign change problem in quantitative structure–activity relationship (QSAR), quantitative structure–property relationship (QSPR) and related studies is the controversy related to the signs of correlation coefficients and regression coefficients of a descriptor in univariate and multivariate regressions, before and after the data split. Among 50 investigated regression models with 227 descriptors extracted from the literature, the sign change problem was shown to have a very high frequency, according to four new criteria proposed in this work for its assessment. The sign change problem can be substantially reduced and even eliminated for a given dataset by statistically based variable selection and by checking for the sign change problem before model validation and interpretation. Knowing the fundamentals of statistics related to the sign change problem, its identification and understanding aid in finding effective means to remedy regression models with this deficiency. Copyright © 2010 John Wiley & Sons, Ltd.