Journal of Chemometrics

Cover image for Vol. 25 Issue 4

April 2011

Volume 25, Issue 4

Pages 139–223

  1. Special Issue Articles

    1. Top of page
    2. Special Issue Articles
    3. Research Articles
    1. Construction of stable multivariate calibration models using unsupervised segmented principal component regression (pages 139–150)

      Bahram Hemmateenejad and Sadegh Karimi

      Version of Record online: 8 APR 2011 | DOI: 10.1002/cem.1390

      A segmentation approach based on unsupervised pattern recognition was proposed to identify the most informative spectral region and then to construct a stable multivariate calibration model by PCR. The instrument channels were clustered into different segments via Kohonen self-organization map and are then subjected to PCA. The derived PCs are used as input variables for an ILS regression model. The proposed method could model both simulated and experimental data sets with prediction errors lower than conventional PLS and PCR methods.

    2. Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers (pages 151–158)

      Károly Héberger and Klára Kollár-Hunek

      Version of Record online: 28 MAY 2010 | DOI: 10.1002/cem.1320

      The theoretical background and algorithm are described for sum of ranking differences (SRD - a novel procedure of ordering, grouping of methods, models). The proximity of SRD values shows similarity of the methods, whereas large variation will imply dissimilarity. Validation of the SRD procedure can be carried out by using simulated random numbers for comparison (a permutation test called CRRN). The theoretical distribution is visualized; probabilities are calculated to SRD values showing whether they are far from being random.

  2. Research Articles

    1. Top of page
    2. Special Issue Articles
    3. Research Articles
    1. Partitioned partial least squares regression with application to a batch fermentation process (pages 159–168)

      Stina W. Andersen and George C. Runger

      Version of Record online: 10 FEB 2011 | DOI: 10.1002/cem.1332

      A modification of PLS denoted Partitioned PLS (PPLS) is proposed as a suitable approach for predicting the yield of batch processes where several variables are being measured successively, leading to profiles of process variables. PPLS utilizes the natural grouping or ordering of predictors that is often present in high-dimensional data and in an iterative manner finds and selects the best predictors. PPLS also performs well on two reference data sets from the literature consisting of Near Infrared (NIR) spectra.

    2. Hard–soft modeling parallel factor analysis to solve equilibrium processes (pages 169–182)

      S. M. Sajjadi and H. Abdollahi

      Version of Record online: 2 FEB 2011 | DOI: 10.1002/cem.1341

      PARAFAC model is the most famous model to analyze three-way data because of its uniqueness properties. But the uniqueness does not hold in data with rank overlap profiles at least in one mode. The goal of this work is to present hard-soft PARAFAC (HSPARAFAC) to overcome non-uniqueness problem in equilibrium processes involving linearly dependent factors at least in one mode. It will be shown that the unique results will be obtained if the rank overlap species obey equilibrium model in HSPARAFAC.

    3. Quantification of magnetic resonance spectroscopy signals with lineshape estimation (pages 183–192)

      M. I. Osorio-Garcia, D. M. Sima, F. U. Nielsen, U. Himmelreich and S. Van Huffel

      Version of Record online: 4 MAR 2011 | DOI: 10.1002/cem.1353

      Quantification of magnetic resonance spectroscopy (MRS) signals is required for providing metabolite concentrations of the tissue under investigation. However, inhomogeneities of the static magnetic field and tissue heterogeneities affect the lineshape of MRS signals and thus quantification. We propose an extension of the self-deconvolution method, by estimating and smoothing a common lineshape using a robust method with local regression. This common lineshape is imposed in the metabolite quantification method and the spectral parameters are obtained via nonlinear least squares.

    4. Three-way analysis of a designed compost experiment using near-infrared spectroscopy and laboratory measurements (pages 193–200)

      Tom Lillhonga and Paul Geladi

      Version of Record online: 16 FEB 2011 | DOI: 10.1002/cem.1371

      A lab-scale designed experiment, with nine compost batches, was monitored over five weeks by near infrared spectroscopy and by wet chemical and physical measurements: pH, energy content, moisture content, NH3/NHmath image and temperature. The data was organized in three-way data arrays and different three-way methods were used for analysis: (1) PARAFAC, (2) Tucker3 and (3) PARAFAC2. The results reproduced the design, gave common (PARFAC) and individual (PARAFAC2) time profiles for all compost batches and rate constants (half-lives) could be calculated.

    5. Feature importance sampling-based adaptive random forest as a useful tool to screen underlying lead compounds (pages 201–207)

      Dong-Sheng Cao, Yi-Zeng Liang, Qing-Song Xu, Liang-Xiao Zhang, Qian-Nan Hu and Hong-Dong Li

      Version of Record online: 17 FEB 2011 | DOI: 10.1002/cem.1375

      Good performance of ensemble approaches could generally be obtained when base classifiers are diverse and accurate. In the present study, feature importance sampling-based adaptive random forest (fisaRF) was proposed to obtain superior classification performance to the primal one-step random forest (RF). fisaRF takes a convenient, yet very effective, way called feature importance sampling (FIS), to select the more eligible feature subset at each splitting node instead of simple random sampling and thereby strengthen the accuracy of individual trees, without sacrificing diversity between them. Additionally, the iterative use of feature importance obtained by the previous step can adaptively capture the most significant features in data and effectively deal with multiple classification problems, not easily solved by other feature importance indexes.

    6. Cross fitted partial least squares (CF-PLS): an alternative algorithm for a more reliable PLS (pages 208–215)

      Olivier Cloarec

      Version of Record online: 10 FEB 2011 | DOI: 10.1002/cem.1380

      In this paper, a modified version of the NIPALS algorithm for PLS regression on single response is presented. This new algorithm reduces the over-fit and allows hypothesis testing using a probabilistic framework due to the characterized distribution of the R2 for the null hypothesis. In this case R2 follows a beta distribution only function of the number of observation. The interpretation of the scores and loadings is also more reliable because they are directly related to the R2.

    7. Ultrasonic characterization of aqueous solutions with varying sugar and ethanol content using multivariate regression methods (pages 216–223)

      Daniel Krause, Thomas Schöck, Mohamed Ahmed Hussein and Thomas Becker

      Version of Record online: 10 FEB 2011 | DOI: 10.1002/cem.1384

      Calibration of a new measuring device for investigation of the temperature-dependent ultrasonic velocity in aqueous solutions including sugar and ethanol was tested via different multivariate regression methods. The accuracy with respect to the absolute error achieved with a PLS regression model in average was three times lower than with other regression methods. This spectral analysis shows the possibility of using ultrasonic sensor devices in combination with multivariate regression for concentration determination.