Raman spectroscopy utilizing Fisher-based feature selection combined with Support Vector Machines for the characterization of breast cell lines

Authors

  • Michael B. Fenn,

    Corresponding author
    1. Center for Applied Optimization, University of Florida, Gainesville, FL, USA
    2. Particle Engineering Research Center, University of Florida, Gainesville, FL, USA
    • J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, USA
    Search for more papers by this author
  • Vijay Pappu,

    1. Center for Applied Optimization, University of Florida, Gainesville, FL, USA
    2. Industrial Systems Engineering, University of Florida, Gainesville, FL, USA
    Search for more papers by this author
  • Pando G. Georgeiv,

    1. Center for Applied Optimization, University of Florida, Gainesville, FL, USA
    2. Industrial Systems Engineering, University of Florida, Gainesville, FL, USA
    Search for more papers by this author
  • Panos M. Pardalos

    1. J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, USA
    2. Center for Applied Optimization, University of Florida, Gainesville, FL, USA
    3. Industrial Systems Engineering, University of Florida, Gainesville, FL, USA
    4. McKnight Brain Institute, University of Florida, Gainesville, FL, USA
    Search for more papers by this author

Correspondence to: Michael B. Fenn, J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, FL, USA.

E-mail: mfenn@ufl.edu

Abstract

Raman spectroscopy has the potential to significantly aid in the research and diagnosis of cancer. The information dense, complex spectra generate massive datasets in which subtle correlations may provide critical clues for biological analysis and pathological classification. Therefore, implementing advanced data mining techniques is imperative for complete, rapid and accurate spectral processing. Numerous recent studies have employed various data methods to Raman spectra for classification and biochemical analysis. Although, as Raman datasets from biological specimens are often characterized by high dimensionality and low sample numbers, many of these classification models are subject to overfitting. Furthermore, attempts to reduce dimensionality result in transformed feature spaces making the biological evaluation of significant and discriminative spectral features problematic. We have developed a novel data mining framework optimized for Raman datasets, called Fisher-based Feature Selection Support Vector Machines (FFS-SVM). This framework provides simultaneous supervised classification and user-defined Fisher criterion-based feature selection, reducing overfitting and directly yielding significant wavenumbers from the original feature space. Herein, we investigate five cancerous and non-cancerous breast cell lines using Raman microspectroscopy and our unique FFS-SVM framework. Our framework classification performance is then compared to several other frequently employed classification methods on four classification tasks. The four tasks were constructed by an unsupervised clustering method yielding the four different categories of cell line groupings (e.g. cancer vs non-cancer) studied. FFS-SVM achieves both high classification accuracies and the extraction of biologically significant features. The top ten most discriminative features are discussed in terms of cell-type specific biological relevance. Our framework provides comprehensive cellular level characterization and could potentially lead to the discovery of cancer biomarker-type information, which we have informally termed ‘Raman-based spectral biomarkers’. The FFS-SVM framework along with Raman spectroscopy will be used in future studies to investigate in-situ dynamic biological phenomena. Copyright © 2013 John Wiley & Sons, Ltd.

Ancillary