Automated feature extraction from profiles with application to a batch fermentation process


Address for correspondence: Stina W. Andersen, Biopharm Manufacturing Development, Statistics, Novo Nordisk A/S, Hagendornsvej 1, DK-2820, Gentofte, Denmark.


Summary.  An automated approach to extract interpretable features of univariate or multivariate profiles (functional data) is proposed. A landmark alignment algorithm is modified and the alignment is combined with piecewise linear approximations. Least absolute shrinkage and selection operator (lasso) regression is used for selecting the most important intercepts and slopes and yields an alternative to partial least squares to model a response associated with the profiles. Latent variables can be difficult to interpret but our extracted features simply correspond to slopes and intercepts of particular parts of the profiles. Also, features that relate to the degree of warping between a given profile and a reference can be extracted as predictors. Selection criteria for the number of knots and common knot locations between profiles are developed. We apply our proposed method to batch fermentation data where the profiles consist of on-line measurements of process variables and the corresponding yield of the process. The extracted features have good interpretability (with large dimensional reduction) and in combination with the lasso have prediction accuracy which is comparable with that of partial least squares applied to the original profiles. Also our proposed feature extraction method is applied to publicly available data where near infrared spectra define the profiles and the prediction accuracy of our feature lasso method is comparable with those of more complicated alternatives.