Selecting Genes for Cancer Classification Using SVM: An Adaptive Multiple Features Scheme
Article first published online: 12 AUG 2013
© 2013 Wiley Periodicals, Inc.
International Journal of Intelligent Systems
Volume 28, Issue 12, pages 1196–1213, December 2013
How to Cite
Hsu, W.-C., Liu, C.-C., Chang, F. and Chen, S.-S. (2013), Selecting Genes for Cancer Classification Using SVM: An Adaptive Multiple Features Scheme. Int. J. Intell. Syst., 28: 1196–1213. doi: 10.1002/int.21625
- Issue published online: 16 OCT 2013
- Article first published online: 12 AUG 2013
Selecting important genes from microarray data is a considerably challenging problem as shown in Guyon's 2002 paper in this journal. We have developed an alternative feature ranking and selection methodology to tackle this problem. On the basis of several cancer data sets, AMFES (adaptive multiple features selection) outperforms Guyon's RFE (recursive feature elimination). In this paper, we will present a comprehensive and systematic comparison of three methods: AMFES, RFE, and the CORR (correlation coefficient) on five data sets (leukemia, colon, lymphoma, prostate, and potentially others). The leukemia, colon, and lymphoma data sets are adapted from Guyon's paper for convenience and the prostate cancer data set is from a public database, NCBI GEO (Gene Expression Omnibus). These three methods are compared in terms of test accuracy, number of selected features, computational time (total and training), statistical significance (t test, p values, and ROC (receiver operating characteristic)/AUC (area under curve)), and the discovery rate of informative features. AMFES obtains better results in computational time and number of selected features while maintaining higher or comparable test accuracy, statistical significance, and the discovery rate of informative features. In addition, AMFES can serve as a general methodology for other similar problems such as sampling and data mining.