Get access

Sequential support vector regression with embedded entropy for SNP selection and disease classification

Authors

  • Yulan Liang,

    Corresponding author
    1. Department of Family and Community Health, University of Maryland, Baltimore, 655 W. Lombard Street, Baltimore, MD 21201-1579, USA
    • Department of Family and Community Health, University of Maryland, Baltimore, 655 W. Lombard Street, Baltimore, MD 21201-1579, USA
    Search for more papers by this author
  • Arpad Kelemen

    1. Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, 655 W. Lombard Street, Baltimore, MD 21201-1579, USA
    Search for more papers by this author

Abstract

Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphism (SNP) structure with common diseases on the genome-wide scale is currently a hot area in human genome research. For less costly and faster diagnostics, advanced computational approaches are needed to select the minimum SNPs with the highest prediction accuracy for common complex diseases. In this article, we present a sequential support vector (SV) regression model with embedded entropy algorithm to deal with the redundancy for the selection of the SNPs that have best prediction performance of diseases. We implemented our proposed method for both SNP selection and disease classification, and applied it to simulation data sets and two real disease data sets. Results show that on the average, our proposed method outperforms the well-known methods of support vector machine recursive feature elimination (SVMRFE), logistic regression, classification and regression tree (CART), and logic regression-based SNP selections for disease classification. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2011

Ancillary