Simple and Efficient Identification of Rare Recessive Pathologically Important Sequence Variants from Next Generation Exome Sequence Data


  • Contract grants sponsors: Sir Jules Thorn Charitable Trust (Grant 09/JTA); EPSRC (Grant FP/I000623/1); Cancer Research UK (Grant 600130); KACST Grants 08MED497-20 and 09-MED941-20; DHFMR Collaborative Research Grant; CNPq of Brazil (Projects 472588/2004-5 and 401983/2010-2).

  • Communicated by Nobuyoshi Shimizu


Massively parallel (“next generation”) DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome (“exome sequencing”) or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs (“Agile Suite”) are particularly suitable for small-scale gene discovery or for diagnostic analysis.