SEARCH

SEARCH BY CITATION

Keywords:

  • PLS regression;
  • model selection;
  • prediction;
  • bootstrap;
  • linearization

Abstract

We investigate a number of approaches to estimating the mean squared error of prediction (MSEP) in partial least squares (PLS) regression without resorting to external validation. Using two simulation examples based on real data, performances of the methods are evaluated in terms of their accuracy and their usefulness in determining the optimal number of factors to include in the PLS model. We find that for problems with relatively few variables, methods based on ignoring the effect of non-linearity in PLS regression or using a linear approximation give good estimates of MSEP, with little to choose between them. However, where linear approximation is feasible, we prefer it, since it gives estimates of MSEP which have lower bias and variance than cross-validation. In situations where there are large numbers of variables, these methods break down. In these circumstances, cross-validation and bootstrapping methods are better able to capture the changes in MSEP with the number of factors fitted and thus are more useful for identifying the optimal PLS regression model. Copyright © 2000 John Wiley & Sons, Ltd.