This paper was presented in its entirety at FACSS in Louisville, KY on October 22, 2009.
Special Issue Article
One stop shopping: feature selection, classification and prediction in a single step†
Article first published online: 16 FEB 2011
Copyright © 2011 John Wiley & Sons, Ltd.
Journal of Chemometrics
Special Issue: Thirty-sixth Annual Meeting of the Federation of Analytical Chemistry and Spectroscopy Societies (FACSS 2009)
Volume 25, Issue 3, pages 116–129, March 2011
How to Cite
Lavine, B. K., Nuguru, K. and Mirjankar, N. (2011), One stop shopping: feature selection, classification and prediction in a single step. J. Chemometrics, 25: 116–129. doi: 10.1002/cem.1358
- Issue published online: 16 MAR 2011
- Article first published online: 16 FEB 2011
- Manuscript Accepted: 20 SEP 2010
- Manuscript Revised: 14 SEP 2010
- Manuscript Received: 1 JUL 2010
- feature selection;
- genetic algorithms;
- machine learning;
- pattern recognition;
- principal component analysis;
- transverse learning
We report on the application of a genetic algorithm (GA) for pattern recognition that uses both supervised and transverse learning to mine spectroscopic and proteomic data. The pattern recognition GA selects features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. For training sets with small amounts of labeled data (i.e. data points tagged with a class label) and large amounts of unlabeled data (i.e. data points that are not tagged with a class label), this approach is preferred, as our results show, information in the unlabeled data is used by the fitness function to guide feature selection. The advantages of incorporating transverse learning into the fitness function of the pattern recognition GA have been evaluated in two recently published studies by our group. In one study, Raman spectroscopy and the pattern recognition GA were used to develop a potential method to discriminate hardwoods, softwoods and tropical woods. In a second study, biopsy material of small round blue cell tumors analyzed by cDNA microarrays was identified as to type (Ewings sarcoma, Burkitt's lymphoma, neuroblastoma and rhabdomyosarcoma) through supervised learning implemented by the pattern recognition GA. Copyright © 2011 John Wiley & Sons, Ltd.