Comparison and improvement of commonly applied statistical approaches for identification of plant species from IR spectra



Fourier transform infrared spectroscopy (FTIR) has been studied many times in the context of identification of plant, fungal and bacterial species. Infrared spectra are commonly analyzed using multivariate statistical methods such as cluster analysis (CA), principal component analysis (PCA), partial least squares analysis (PLS) and discriminant analysis (DA). In this study, a univariate statistical method for analysis of variance (ANOVA) was used to reduce the number of variables before applying the multivariate methods. Analyzing variables using ANOVA or a combination of ANOVA with CA produced better results. Here, experiments were carried out by performing ANOVA using the first derivative of the spectra instead of the original spectra or its second derivative because using the first-derivative variables led to improved distinction between species. Different results were obtained by applying different validation methods. The leave-one-out validation method gave higher results than the validation-with-training and validation sample sets, thus indicating the non-objectivity of the leave-one-out validation method. Copyright © 2010 John Wiley & Sons, Ltd.