Impartial graphical comparison of multivariate calibration methods and the harmony/parsimony tradeoff



For multivariate calibration with the relationship y = Xb, it is often necessary to determine the degrees of freedom for parsimony consideration and for the error measure root mean square error of calibration (RMSEC). This paper shows that degrees of freedom can be estimated by an effective rank (ER) measure to estimate the model fitting degrees of freedom and the more parsimonious model has the smallest ER. This paper also shows that when such a measure is used on the X-axis, simultaneous graphing of model errors and other regression diagnostics is possible for ridge regression (RR), partial least squares (PLS) and principal component regression (PCR) and thus, a fair comparison between all potential models can be accomplished. The ER approach is general and applicable to other multivariate calibration methods. It is often noted that by selecting variables, more parsimonious models are obtained; typically by multiple linear regression (MLR). By using the ER, the more parsimonious model is graphically shown to not always be the MLR model. Additionally, a harmony measure is proposed that expresses the bias/variance tradeoff for a particular model. By plotting this new measure against the ER, the proper harmony/parsimony tradeoff can be graphically assessed for RR, PCR and PLS. Essentially, pluralistic criteria for fairly valuating and characterizing models are better than a dualistic or a single criterion approach which is the usual tactic. Results are presented using spectral, industrial and quantitative structure activity relationship (QSAR) data. Copyright © 2007 John Wiley & Sons, Ltd.