• Adaptive genetic algorithm;
  • Heme-thiolate proteins;
  • Machine learning;
  • Single nucleotide polymorphisms


The support vector machine (SVM), an effective statistical learning method, has been widely used in mutation prediction. Two factors, i.e., feature selection and parameter setting, have shown great influence on the efficiency and accuracy of SVM classification. In this study, according to the principles of a genetic algorithm (GA) and SVM, we developed a GA-SVM program and applied it to human cytochrome P450s (CYP450s), which are important monooxygenases in phase I drug metabolism. The program optimizes features and parameters simultaneously, and hence fewer features are used and the overall prediction accuracy is improved. We focus on the mutation of non-synonymous single nucleotide polymorphisms (nsSNPs) in protein sequences that appear to exhibit significant influences on drug metabolism. The final predictive model has a quite satisfactory performance, with the prediction accuracy of 61% and cross-validation accuracy of 73%. The results indicate that the GA-SVM program is a powerful tool in optimizing mutation predictive models of nsSNPs of human CYP450s.