• active site;
  • genetic variance;
  • nsSNP;
  • nsSNV;
  • proteome-wide analysis

An enzyme's active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Nonsynonymous single-nucleotide variations (nsSNVs), which alter the amino acid sequence, are one type of disruption that can alter the active site. When this occurs, it is assumed that enzyme activity will vary because of the criticality of the site to normal protein function. We integrate nsSNV data and active site annotations from curated resources to identify all active-site-impacting nsSNVs in the human genome and search for all pathways observed to be associated with this data set to assess the likely consequences. We find that there are 934 unique nsSNVs that occur at the active sites of 559 proteins. Analysis of the nsSNV data shows an over-representation of arginine and an under-representation of cysteine, phenylalanine and tyrosine when comparing the list of nsSNV-impacted active site residues with the list of all possible proteomic active site residues, implying a potential bias for or against variation of these residues at the active site. Clustering analysis shows an abundance of hydrolases and transferases. Pathway and functional analysis shows several pathways over- or under-represented in the data set, with the most significantly affected pathways involved in carbohydrate metabolism. We provide a table of 32 variation–substrate/product pairs that can be used in targeted metabolomics experiments to assay the effects of specific variations. In addition, we report the significant prevalence of aspartic acid to histidine variation in eight proteins associated with nine diseases including glycogen storage diseases, lacrimo-auriculo-dento-digital syndrome, Parkinson's disease and several cancers.