Canonical variate analysis (CVA) has been applied successfully in process monitoring. This paper proposes an efficient recursive CVA approach to monitor time-varying processes. The exponential weighted moving average approach has been adopted to update the covariance matrix of past observation vectors without the need for recalling past training data. The most important challenge faced by the recursive CVA algorithm is the high computation cost. To reduce the computation cost, the first order perturbation theory was introduced to update the singular value decomposition (SVD) of the Hankel matrix recursively. The computation cost of recursive SVD based on the first order perturbation theory is significantly less compared with conventional SVD. The proposed method is illustrated by the simulation of the continuous stirred tank reactor system. Simulation results indicate that not only can the proposed method effectively adapt to the natural changes of time-varying processes but also the proposed method can also identify two types of abrupt sensor faults.

]]>As people have become more focused on their own health, the role of ginseng for medical uses has begun to receive substantial interest. However, the quality control of ginseng remains in question because different species vary considerably in this respect. In this paper, particle swarm optimization–support vector regression combined with microwave plasma torch–atomic emission spectrometry (MPT-AES) was used, for the first time, for quality control of ginseng. To build calibration models, quantitative determination of target element concentrations in ginseng samples was conducted by MPT-AES because ginseng quality was closely related to the place of origin and can thus be judged by the elemental composition. Characteristic spectral lines were extracted via principal component analysis to reduce the computational effort and improve the representativeness of the input variables. Two heuristic algorithms, particle swarm optimization and a genetic algorithm, were selected to optimize the parameters (eg, *c*, *g*, and *ε*) that were extremely significant in the construction of the support vector regression (SVR) models. Another linear regression approach, partial least squares regression (PLSR), was also used and compared. The comparisons were based on evaluation indexes, namely, the root mean square error and the squared correlation coefficient (*R*^{2}). A significant difference between SVR and PLSR showed that SVR outperformed PLSR in such a multivariate regression problem. The acquired results showed that particle swarm optimization was slightly better than a genetic algorithm. In conclusion, the proposed MPT-AES combined with particle swarm optimization–support vector regression is appropriate for quantitative elemental analysis and further application in the quality control of ginseng.

Traditionally, chemometric models consists of parameters found by solving a least squares criterion. However, these models can suffer from overfitting, as well as being hard to interpret because of the large number of active parameters. This work proposes the use of a generalized *L*_{1} norm penalty for constraining models to obey certain structural properties, including parameter sparsity and sparsity on pairwise differences between parameter estimates. The utility of this framework is used to modify principal component analysis, partial least squares, canonical correlation analysis, and multivariate analysis of variance type of models applied to synthetic and chemical data. This work argues that *L*_{1} norm penalized models offers parsimony, robustness and predictive performance, and reveals a path for modifying unconstrained chemometric models through convex penalties.

Fundamental matrix operations, including the pseudo-inverse, are described.

]]>A historical perspective of the development of chemometrics as a field is given. Special attention is given to early workers in the analysis and modeling of chemical data in an effort to show how the field evolved from a broad, mostly statistical discipline to a more narrowly focused field focused on multivariate predictive analysis. A discussion of the relationships of chemometrics with statistics, chemical modeling, and analytical chemistry is provided to show how early conflicts have shaped and continue to shape the field. The author considers the state of the field now and makes suggestions for the way in which chemometrics should change to remain vital in the rapidly changing area of data analysis.

Many of the issues that arose early in the development of chemometrics continue to limit growth of the field. A change in perspective and approach is needed both to grow the field and to respond to the changing demands on data analysis.

The idea of variable space and object space is distinguished from that of column space and row space, and the relationship to matrix rank is described.

]]>Tikhonov regularization was recently proposed for multivariate calibration. We use this framework for modeling the statistical association between spectroscopy data and a scalar outcome. In both the calibration and regression settings, this regularization process has advantages over methods of spectral preprocessing and dimension-reduction approaches such as feature extraction or principal component regression. We propose an extension of this penalized regression framework by adaptively refining the penalty term to optimally focus the regularization process. We illustrate the approach using simulated spectra and compare it with other penalized regression models and with a 2-step method that first preprocesses the spectra then fits a dimension-reduced model using the processed data. The methods are also applied to magnetic resonance spectroscopy data to identify brain metabolites that are associated with cognitive function.

]]>Penalized regression with a combination of sparseness and an interframe penalty is explored for image deconvolution in wide-field single-molecule fluorescence microscopy. The aim is to reconstruct superresolution images, which can be achieved by averaging the positions and intensities of individual fluorophores obtained from the analysis of successive frames. Sparsity of the fluorophore distribution in the spatial domain is obtained with an *L*_{0}-norm penalty on estimated fluorophore intensities, effectively constraining the number of fluorophores per frame. Simultaneously, continuity of the fluorophore localizations in the time mode is obtained by penalizing the total numbers of pixel status changes between successive frames. We implemented the interframe penalty in a sparse deconvolution algorithm (sparse image deconvolution and reconstruction) for improved imaging of densely labeled biological samples. For simulated and real biological data, we show that more accurate estimates of the final superresolution images of cellular structures can be obtained.

The feasibility of direct (ie, without sample preparation) quantitative analysis of total hydrocarbons and water in oil-contaminated soils using mid-infrared spectroscopy and an attenuated total reflection (ATR) probe has been investigated. Spectral characteristics of unpolluted and oil-contaminated soils composed of sand, clay, dolomite, and humus have been studied over the full mid-infrared range (4000-400 cm^{−1}). Spectra of 25 typical soil samples containing varying levels of oil and water have been analyzed using a chalcogenide infrared fiber–based probe with a ZrO_{2} crystal as an ATR element. The spectral data were used to build calibration models for the analysis of hydrocarbon contamination as well as moisture content of soil samples. The low quality of ATR spectra of drier samples and variable spectral intensity inherent in the ATR measurement of solids has been overcome by suitable data processing. Further improvement of the model performance has been achieved using a variable selection based on the modified genetic algorithm. Our proposed method allows the determination of oil and moisture content in soils with accuracies of 1.1% and 0.6%, respectively, which is sufficient for a number of practical applications. The reported results may be used to develop portable devices for measuring petroleum and water content of soils.

Multivariate curve resolution (MCR) of absorption spectra is now a ubiquitously used tool. However, MCR methods, which use ordinary least squares (OLS) approach, assume that the measurement uncertainties are unbiased and homoscedastic. This is not true for absorption measurements, in which uncertainty variance and bias both increase as the true absorbance increases. The bias produces a well-known flattening/saturation of the peaks at high optical densities, which makes the data nonlinear and unsuitable for OLS-based MCR analysis. This problem can be reduced by using weighted least squares (WLS).

In the present paper, the ability of WLS-based MCR to handle simulated and real datasets with realistic optical noise and flattening was assessed. Three weighting schemes were tested: OLS (unity weights), weights based on the maximum likelihood principle (MLP) and the physics of absorption measurement, and weights based on empirical cutoff (zero weights for saturated data points). The abilities of MCR to recover the true profiles and to evaluate rotational ambiguity of the solutions were compared for the 3 weighting schemes. MLP- and cutoff-based WLS-MCR produced better resolution of flattened data than OLS, but the success of the extension to strongly flattened spectra depended on data structure. MLP-based MCR was general and stable, while cutoff-based MCR was more sensitive to the data but could recover unbiased profiles. Generally, the use of WLS can expand MCR functionality to the analysis of flattened spectra.

The specifics of finding WLS bilinear solutions and approaches to migrate factor-based MCR methods from OLS to WLS are also discussed.

Maintaining multivariate calibrations involves keeping models developed on an instrument applicable to predicting new samples over time. Sometimes, a primary instrument model is needed to predict samples measured on secondary instruments. This situation is referred to as calibration transfer. Sometimes, a primary instrument model is needed to predict samples that have acquired new spectral features (chemical, physical, and environmental influences) over time. This situation is referred to as calibration maintenance. Calibration transfer and maintenance problems have a long history and are well studied in chemometrics and spectroscopy. In disciplines outside of chemometrics, particularly computer vision, calibration transfer and maintenance problems are more recent phenomena, and these problems often go under the umbrella term *domain adaptation*. Over the past decade, domain adaptation has demonstrated significant successes in various applications such as visual object recognition. Since domain adaptation already constitutes a large area of research in computer vision and machine learning, we narrow our scope and report on penalty-based eigendecompositions, a class of domain adaptation methods that has its motivational roots in linear discriminant analysis. We compare these approaches against chemometrics-based approaches using several benchmark chemometrics data sets.

No abstract is available for this article.

]]>A diversity of multiresponse optimization methods has been introduced in the literature; however, their performance has not been thoroughly explored, and only a classical desirability-based criterion has been commonly used. With the aim of contributing to help practitioners in selecting an effective criterion for solving multiresponse optimization problems developed under the response surface methodology framework, and thus to find compromise solutions that are technically and economically more favorable, the working ability of several easy-to-use criteria is evaluated and compared with that of a theoretically sound method. Four case studies with different numbers and types of responses are considered. Less-sophisticated criteria were able to generate solutions similar to those generated by sophisticated methods, even when the objective is to depict the Pareto frontier in problems with conflicting responses. Two easy-to-use criteria that require less-subjective information from the user yielded solutions similar to those of a classical desirability-based criterion. Preference parameters range and increment impact on optimal solutions were also evaluated.

]]>We propose a form of random forests that is especially suited for functional covariates. The method is based on partitioning the functions' domain in intervals and using the functions' mean values across those intervals as predictors in regression or classification trees. This approach appears to be more intuitive to applied researchers than usual methods for functional data, while also performing very well in terms of prediction accuracy. The intervals are obtained from randomly drawn, exponentially distributed waiting times. We apply our method to data from Raman spectra on boar meat as well as near-infrared absorption spectra. The predictive performance of the proposed functional random forests is compared with commonly used parametric and nonparametric functional methods and with a nonfunctional random forest using the single measurements of the curve as covariates. Further, we present a functional variable importance measure, yielding information about the relevance of the different parts of the predictor curves. Our variable importance curve is much smoother and hence easier to interpret than the one obtained from nonfunctional random forests.

]]>Uniformly distributed uncertainty exists in industrial process; additive error introduced by quantization is an example. To be able to handle additive uniform and Gaussian measurement uncertainty simultaneously in system identification, the Flat-topped Gaussian distribution is considered in this paper as an alternative to the Gaussian distribution. To incorporate this type of uncertainty in the maximum likelihood estimation framework, the explicit form of its density function is of necessity. This work proposes an approach for obtaining both the functional structure and corresponding parameter estimation of Flat-topped Gaussian distribution by a moment fitting strategy. The performance of the proposed approximation function is verified by comparison to the Flat-topped Gaussian distributed random variable with different Gaussian and uniform components. Results of numerical simulations and industrial applications in system identification are presented to verify the effectiveness of the Flat-topped Gaussian distribution for noise distribution in handling additional uniform uncertainty.

]]>Models such as ordinary least squares, independent component analysis, principle component analysis, partial least squares, and artificial neural networks can be found in the calibration literature. Linear or nonlinear methods can be used to explain the structure of the same phenomenon. Each type of model has its own advantages with respect to the other. These methods are usually grouped taxonomically, but different models can sometimes be applied to the same data set. Taxonomically, ordinary least square and artificial neural network use completely different analytical procedures but are occasionally applied to the same data set. The aim of the study of methodological superiority is to compare the residuals of models because the model with the minimum error is preferred in real analyses. Calibration models, in general, are based on deterministic and stochastic parts; in other words, the data are equal to the model + the error. Explaining a model solely using statistics such as the coefficient of determination or its related significance values is sometimes inadequate. The errors of a model, also called its residuals, must have minimum variance compared to its alternatives. Additionally, the residuals must be unpredictable, uncorrelated, and symmetric. Under these conditions, the model can be considered adequate. In this study, calibration methods were applied to the raw materials, hydrochlorothiazide and amiloride hydrochloride, of a drug, as well as a sample of the drug tablet. The applied chemical procedure was fast, simple, and reproducible. The various linear and nonlinear calibration methods mentioned above were applied, and the adequacy of the calibration methods was compared according to their residuals.

]]>Nowadays, the detection, localization, and quantification of different kinds of features in an RGB image (*segmentation*) is extremely helpful for, e.g., process monitoring or customer product acceptance. In this article, some of the most commonly used RGB image segmentation approaches are compared in an orange quality control case study. Analysis of variance and correspondence analysis are combined for determining their most relevant differences and highlighting their pros and cons.

Human anaplastic lymphoma kinase (ALK) is a potential target for the treatment of pediatric acute lymphoblastic leukemia. However, a number of residue mutations in ALK kinase domain have been observed to cause drug resistance in pediatric acute lymphoblastic leukemia chemotherapy. Here, a chemometrics quantitative structure-activity relationship predictor was developed using a structure-based panel of kinase-inhibitor activity data. The predictor was validated rigorously through internal cross-validation and external blind test to ensure its statistical reliability, which was then used to computationally construct a systematic activity profile of 13 noncognate kinase inhibitors against both wild-type ALK^{wt} and cancer-related variants ALK^{vt}. It is revealed that most noncognate inhibitors exhibit weak potency on ALK^{wt}, but some of them are able to selectively target ALK^{vt} over ALK^{wt}. The chemometrics findings were then evaluated by using a kinase inhibition protocol; results showed that few noncognate inhibitors are 2- to 5-fold higher potent against ALK variant than wild-type kinase.