• support vector regression;
  • sequential algorithm;
  • entropy measures;
  • embedded methods;
  • sliding window;
  • single nucleotide polymorphism;
  • common complex disease


Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphism (SNP) structure with common diseases on the genome-wide scale is currently a hot area in human genome research. For less costly and faster diagnostics, advanced computational approaches are needed to select the minimum SNPs with the highest prediction accuracy for common complex diseases. In this article, we present a sequential support vector (SV) regression model with embedded entropy algorithm to deal with the redundancy for the selection of the SNPs that have best prediction performance of diseases. We implemented our proposed method for both SNP selection and disease classification, and applied it to simulation data sets and two real disease data sets. Results show that on the average, our proposed method outperforms the well-known methods of support vector machine recursive feature elimination (SVMRFE), logistic regression, classification and regression tree (CART), and logic regression-based SNP selections for disease classification. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2011