Partitioned partial least squares regression with application to a batch fermentation process



The performance of Partial Least Squares regression (PLS) in predicting the output with multivariate cross- and autocorrelated data is studied. With many correlated predictors of varying importance PLS does not always predict well and we propose a modified algorithm, Partitioned Partial Least Squares (PPLS). In PPLS the predictors are partitioned into smaller subgroups and the important subgroups with high prediction power are identified. Finally, regular PLS analysis using only those subgroups is performed. The proposed Partitioned PLS (PPLS) algorithm is used in the analysis of data from a real pharmaceutical batch fermentation process for which the process variables follow certain profiles during a specific fermentation period. We observed that PPLS leads to a more accurate prediction of the yield of the fermentation process and an easier interpretation, since fewer predictors are used in the final PLS prediction. In the application important issues such as alignment of the profiles from one batch to another and standardization of the predictors are also addressed. For instance, in PPLS noise magnification due to standardization does not seem to create problems as it might in regular PLS. Finally, PPLS is compared to several recently proposed functional PLS and PCR methods and a genetic algorithm for variable selection. More specifically for a couple of publicly available data sets with near infrared spectra it is shown that overall PPLS has lower cross-validated error than PLS, PCR and the functional modifications hereof, and is similar in performance to a more complex genetic algorithm. Copyright © 2011 John Wiley & Sons, Ltd.