Get access

One stop shopping: feature selection, classification and prediction in a single step


  • This paper was presented in its entirety at FACSS in Louisville, KY on October 22, 2009.


We report on the application of a genetic algorithm (GA) for pattern recognition that uses both supervised and transverse learning to mine spectroscopic and proteomic data. The pattern recognition GA selects features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. For training sets with small amounts of labeled data (i.e. data points tagged with a class label) and large amounts of unlabeled data (i.e. data points that are not tagged with a class label), this approach is preferred, as our results show, information in the unlabeled data is used by the fitness function to guide feature selection. The advantages of incorporating transverse learning into the fitness function of the pattern recognition GA have been evaluated in two recently published studies by our group. In one study, Raman spectroscopy and the pattern recognition GA were used to develop a potential method to discriminate hardwoods, softwoods and tropical woods. In a second study, biopsy material of small round blue cell tumors analyzed by cDNA microarrays was identified as to type (Ewings sarcoma, Burkitt's lymphoma, neuroblastoma and rhabdomyosarcoma) through supervised learning implemented by the pattern recognition GA. Copyright © 2011 John Wiley & Sons, Ltd.