• correlation;
  • univariate regression;
  • multivariate regression;
  • descriptors


The sign change problem in quantitative structure–activity relationship (QSAR), quantitative structure–property relationship (QSPR) and related studies is the controversy related to the signs of correlation coefficients and regression coefficients of a descriptor in univariate and multivariate regressions, before and after the data split. Among 50 investigated regression models with 227 descriptors extracted from the literature, the sign change problem was shown to have a very high frequency, according to four new criteria proposed in this work for its assessment. The sign change problem can be substantially reduced and even eliminated for a given dataset by statistically based variable selection and by checking for the sign change problem before model validation and interpretation. Knowing the fundamentals of statistics related to the sign change problem, its identification and understanding aid in finding effective means to remedy regression models with this deficiency. Copyright © 2010 John Wiley & Sons, Ltd.