Research Articles
Controlling the False Discovery Rate for Feature Selection in High-resolution NMR Spectra
Article first published online: 25 MAR 2008
DOI: 10.1002/sam.10005
Copyright © 2008 Wiley Periodicals, Inc., A Wiley Company
Additional Information
How to Cite
Kim, S. B., Chen, V. C. P., Park, Y., Ziegler, T. R. and Jones, D. P. (2008), Controlling the False Discovery Rate for Feature Selection in High-resolution NMR Spectra. Statistical Analysis and Data Mining, 1: 57–66. doi: 10.1002/sam.10005
Publication History
- Issue published online: 12 JUN 2008
- Article first published online: 25 MAR 2008
- Manuscript Accepted: 15 FEB 2008
- Manuscript Revised: 13 FEB 2008
- Manuscript Received: 16 APR 2007
- Abstract
- Article
- References
- Cited By
Keywords:
- false discovery rate;
- metabolomics;
- nuclear magnetic resonance;
- orthogonal signal correction;
- feature selection
Abstract
Successful implementation of feature selection in nuclear magnetic resonance (NMR) spectra not only improves classification ability, but also simplifies the entire modeling process and, thus, reduces computational and analytical efforts. Principal component analysis (PCA) and partial least squares (PLS) have been widely used for feature selection in NMR spectra. However, extracting meaningful metabolite features from the reduced dimensions obtained through PCA or PLS is complicated because these reduced dimensions are linear combinations of a large number of the original features. In this paper, we propose a multiple testing procedure controlling false discovery rate (FDR) as an efficient method for feature selection in NMR spectra. The procedure clearly compensates for the limitation of PCA and PLS and identifies individual metabolite features necessary for classification. In addition, we present orthogonal signal correction to improve classification and visualization by removing unnecessary variations in NMR spectra. Our experimental results with real NMR spectra showed that classification models constructed with the features selected by our proposed procedure yielded smaller misclassification rates than those with all features. Copyright © 2008 Wiley Periodicals, Inc., A Wiley Company Statistical Analy Data Mining 1: 000-000, 2008

1932-1872/asset/SAM_left.gif?v=1&s=95cbbe8824e1aaad351b489e115d8595305bcb3f)
1932-1872/asset/cover.gif?v=1&s=0f380272270594bb30722e7b51bbc3f9b45039b2)