SEARCH

SEARCH BY CITATION

Keywords:

  • PLS regression algorithm;
  • Kernel;
  • Many-variable data sets

Abstract

A fast PLS regression algorithm dealing with large data matrices with many variables (K) and fewer objects (N) is presented For such data matrices the classical algorithm is computer-intensive and memory-demanding. Recently, Lindgren et al. (J. Chemometrics, 7, 45–49 (1993)) developed a quick and efficient kernel algorithm for the case with many objects and few variables. The present paper is focused on the opposite case, i.e. many variables and fewer objects. A kernel algorithm is presented based on eigenvectors to the ‘kernel’ matrix XXTYYT, which is a square, non-symmetric matrix of size N × N, where N is the number of objects. Using the kernel matrix and the association matrices XXT (N × N) and YYT (N × N), it is possible to calculate all score and loading vectors and hence conduct a complete PLS regression including diagnostics such as R2. This is done without returning to the original data matrices X and Y. The algorithm is presented in equation form, with proofs of some new properties and as MATLAB code.