On the effectiveness of cross-fitting in multi-block PLS (CF-MBPLS)

Authors


Jarno Kohonen, Lappeenranta University of Technology, Lappeenranta FI-53851, Finland.

E-mail: jarno.kohonen@lut.fi

Abstract

Multi-block PLS is an extension of partial least squares or projection to latent structures (PLS), where the descriptor matrix is divided into meaningful blocks based on either process units or type of data. A typical application is using process variables as one block and spectral data on another block. It has been utilized in obtaining more information of processes and the effect of different types of variables. In comparison with priority or hierarchical PLS, in multi-block PLS, there is no need to prioritize blocks in advance because they are iteratively calculated at the same time. With multi-block PLS, however, it is easy to overfit data resulting in a poor predictive ability. A recent development called cross-fitting has been reported to alleviate the problem of overfitting in PLS. This approach was adjusted to multi-block PLS and is tested on two different data sets, where overfitting and sensitivity to outliers are issues. Copyright © 2012 John Wiley & Sons, Ltd.

Ancillary