• Open Access

Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models

Authors


  • Communicated by Christophe Béroud

  • Contract grant sponsors: UK Medical Research Council (G1000427 to T.R.G. and I.N.M.D.); UK Biotechnology and Biological Sciences Research Council (BB/G022771 to J.G.); BIOBASE GmbH (to D.N.C. and P.D.S.).

Correspondence to: Tom Gaunt, Bristol Centre for Systems Biomedicine and MRC CAiTE Centre, School of Social and Community Medicine, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK. E-mail: Tom.Gaunt@bristol.ac.uk

ABSTRACT

The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.

Ancillary