SEARCH

SEARCH BY CITATION

Keywords:

  • clustering of variables;
  • multivariate calibration;
  • QSAR;
  • clustering algorithm;
  • self-organizing map

We have recently suggested using clustering of variables (CLoVA) based on unsupervised pattern recognition for partitioning variables into informative and redundant ones. Because data clustering plays a central role in CLoVA, in the present study, we compared the efficiency of different clustering methods including the Kohonen self-organizing map (SOM), principal component analysis, fuzzy c-means clustering, K-means clustering, and hierarchical cluster analysis for clustering spectroscopic data and molecular descriptors to build multivariate calibration and quantitative structure–activity (QSAR) models. To investigate which clustering methods are more efficient for CLoVA, four data sets (three spectroscopic and one QSAR) were analyzed. Most of the CLoVA-based models obtained by SOM resulted in the least root mean square errors of cross-validation and prediction, suggesting a higher efficiency of SOM for clustering variables. In all cases, the results obtained by the CLoVA-based method were compared with those obtained by conventional principal component regression as well as genetic algorithm and successive projection algorithm partial least square regression. Interestingly, models produced by the CLoVA-based method were more predictive with respect to that of the other methods, as indicated by the lowest root mean square error of prediction. Copyright © 2013 John Wiley & Sons, Ltd.