Get access

Using Reference Databases of Genetic Variation to Evaluate the Potential Pathogenicity of Candidate Disease Variants


  • Additional Supporting Information may be found in the online version of this article.

  • Communicated by Lars Bertram

Correspondence to: Kevin Kenna, Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland. E-mail:


The potential pathogenicity of genetic variants identified in disease-based resequencing studies is often overlooked where variants have previously been reported in dbSNP, the 1000 genomes project, or the National Heart, Lung and Blood Institute Exome Sequencing Project (ESP). In this work, we estimate that collectively, these databases capture ∼52% of mutations (dbSNP 50.4%; 1000 genomes 4.8%; and ESP 10.2%) reported as disease causing within phenotype-based locus-specific databases (LSDBs). To investigate whether these mutations may simply represent benign population variants, we evaluated whether the carrier frequencies associated with mutations implicated in amyotrophic lateral sclerosis were higher than what could be accounted for by high-penetrance disease models. In doing so, we have questioned the veracity of 51 mutations, but also demonstrated that each of the three databases included credible disease variants. Our results demonstrate the benefits of using databases such as dbSNP, the 1000 genomes project, and the ESP to evaluate the pathogenicity of putative disease variants, and suggest that many disease mutations reported across LSDBs may not actually be pathogenic. However, they also demonstrate that even in the context of rare Mendelian disorders, the potential pathogenicity of variants reported by these databases should not be overlooked without proper evaluation.