SEARCH

SEARCH BY CITATION

Selecting important genes from microarray data is a considerably challenging problem as shown in Guyon's 2002 paper in this journal. We have developed an alternative feature ranking and selection methodology to tackle this problem. On the basis of several cancer data sets, AMFES (adaptive multiple features selection) outperforms Guyon's RFE (recursive feature elimination). In this paper, we will present a comprehensive and systematic comparison of three methods: AMFES, RFE, and the CORR (correlation coefficient) on five data sets (leukemia, colon, lymphoma, prostate, and potentially others). The leukemia, colon, and lymphoma data sets are adapted from Guyon's paper for convenience and the prostate cancer data set is from a public database, NCBI GEO (Gene Expression Omnibus). These three methods are compared in terms of test accuracy, number of selected features, computational time (total and training), statistical significance (t test, p values, and ROC (receiver operating characteristic)/AUC (area under curve)), and the discovery rate of informative features. AMFES obtains better results in computational time and number of selected features while maintaining higher or comparable test accuracy, statistical significance, and the discovery rate of informative features. In addition, AMFES can serve as a general methodology for other similar problems such as sampling and data mining.