For the Focus on the NIH Undiagnosed Diseases Program
Exome and whole-genome sequencing for gene discovery: The future is now!†
Article first published online: 12 MAR 2012
© 2012 Wiley Periodicals, Inc.
Special Issue: Focus on the NIH Undiagnosed Diseases Program
Volume 33, Issue 4, pages 591–592, April 2012
How to Cite
Majewski, J. and Rosenblatt, D. S. (2012), Exome and whole-genome sequencing for gene discovery: The future is now!. Hum. Mutat., 33: 591–592. doi: 10.1002/humu.22055
- Issue published online: 12 MAR 2012
- Article first published online: 12 MAR 2012
Personalized medicine is a recurring theme in current medical and public literature, but the prospect that the tools of modern genetics and genomics will have an immediate impact on patient care is at time a matter of hype and hope. The obvious exception is in the area of Mendelian disease, where alterations in a single gene can have a major impact on the health of an individual patient. One of the disappointments of modern medicine is that although the tools are in place to find virtually all the causal genes for these diseases, thousands are not yet discovered either because of the rarity of the diseases or the lack of a coordinated systematic approach on the part of the international medical community [Rosenblatt, 2011]. In the last 2 years, the tide has begun to turn, and increasing interest is being focused on those problems that can help patients directly.
In Canada, where we both work, FORGE (Finding of Rare Diseases in Canada) has focused its attention on rare disorders in Canadian children. Its stated goals include assisting doctors with identifying patients, sequencing the genomes of patients to identify disease-causing genetic changes, setting up a Canadian coordination center to streamline and improve existing large-scale sequence analysis tools, and creating ethical guidelines for analyzing sequence data from entire genomes and sharing the results with families (http://www.cpgdsconsortium.com/).
In Europe, the UK10k project (http://www.uk10k.org/) aims to discover the effects of rare genetic variation. The project does not focus on rare diseases per se, but by sequencing patients with extreme phenotypes of specific conditions, the group aims to associate those extreme traits with the presence of rare protein-coding variants.
In the United States, the National Institutes of Health (NIH) Undiagnosed Diseases Program has gained national and international prominence. Its stated goals are to provide answers to patients with mysterious conditions that have long eluded diagnosis, and to advance medical knowledge about rare and common diseases (http://rarediseases.info.nih.gov/Resources.aspx?PageID=31). A recent report describes the findings of this program during its first 2 years [Gahl et al., 2012].
Another NIH initiative, National Human Genome Research Institute Focus on Rare inherited Diseases (http://www.genome.gov/27546261), has been recently created to fund sequencing and analysis centers for the purpose of identifying mutations underlying rare disorders.
It is becoming increasingly clear that next-generation sequencing, in one or another iteration, will soon become a first-line approach for patients suspected of Mendelian disease, when there is more than one candidate gene or when the gene is unknown. In the same way that microarrays have replaced traditional karyotypes as a first-line test for children with suspected chromosomal abnormalities, next-generation sequencing will replace target sequencing.
The four articles published in this issue of Human Mutation [Adams et al., 2012; Dias et al., 2012; Fuentes Fajardo et al., 2012; Sincan et al., 2012] address the question “Can we bundle a large number of molecular diagnostic studies using whole-exome sequencing of a single affected patient?” (Cornelius Boerkoel and William Gahl, personal communication). A researcher or physician embarking on a personal genome or exome sequencing quest must be aware of a number of pitfalls associated with generating gigabytes of massively parallel sequencing data. The papers included in this focus section describe several of the most important problems and walk the reader through the fundamentals of high-throughput sequence data analysis. The four general themes are: (1) incomplete coverage resulting in possible false negative results; (2) sources of false positive results and some guidelines for limiting them; (3) identification of candidate causative mutations within hundreds of variants; and (4) integrating pedigree and genotype information into the analysis of exomes from related individuals.
Compared with traditional disease gene mapping, which has been historically based on statistical genetics, exome- and genome-based mutation searches work in reverse order. In the past, researchers collected large sets of samples, allowing them to employ linkage or association analyses and define candidate intervals based on predefined statistical cutoffs. Having established statistical significance and delimited a candidate chromosomal interval of (hopefully) manageable size, the researcher would then prioritize potential functional candidate genes within the interval and pursue them sequentially by targeted Sanger sequencing. This stepwise process could take years to complete. The first successful human genetic linkage study of Huntington's disease [Gusella et al., 1983] was originally published in 1983, but the causative mutation of the now famous CAG trinucleotide repeat in the HTT gene was not discovered until 10 years later [Huntington's Disease Collaborative Research Group, 1993].
Today's exome- and genome-sequencing technologies, allowing us to simultaneously uncover virtually all variants within an individual's genome, often make it possible to forgo the mapping studies and proceed directly to the examination of the list of variants and, given enough insight, “pinpoint” the mutation responsible for a disease. What we gain is speed; however, this gain comes at the expense of some scientific rigor and the support of a P value or a LOD score. To begin with, the list of genes covered by sequencing studies will be very dependent on the technology used: genome versus exome sequencing, the type of capture kit, the sequencing platform, and sequencing depth. Additionally, the lists produced will depend on the alignment algorithms and the stringency settings of the bioinformatics tools used for identifying variants. Finally, the ultimate list of candidate genes will be highly subjective, dependent on the preferences and hunches of individual researchers. Despite those somewhat “unscientific” drawbacks, exome- and genome-sequencing is effective and has hugely accelerated the mutation finding process in the past 2 years [Majewski et al., 2011].
The compendium of papers presented in this issue of Human Mutation should serve as a useful illustration of the modern-day gene-hunting process. It presents the problems encountered, as well as some of the solutions developed by the NIH Undiagnosed Diseases Program, and will be a valuable practical guide for researchers interested in applying such methodologies in their own research.
- 2012. Analysis of DNA sequence variants detected by high throughput sequencing. Hum Mutat 33:599–608. , , , , , , , , , .
- 2012. An analysis of exome sequencing for diagnostic testing of the genes associated with muscle disease and spastic paraplegia. Hum Mutat 33:614–626. , , , , , , , , , , , .
- NISC Comparative Sequencing Program, , , , , , , . 2012. Detecting false positive signals in exome sequencing. Hum Mutat 33:609–613. , ,
- For the NISC Comparative Sequencing Program, , , . 2012. The National Institutes of Health Undiagnosed Diseases Program: insights into rare diseases. Genet Med 14(1):51–59. , , , , , , , , , , , , , , , , , , ,
- 1983. B.A polymorphic DNA marker genetically linked to Huntington's disease. Nature 306:234–238. , , , , , , , , , , , , ,
- Huntington's Disease Collaborative Research Group. 1993. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell 72:971–983.
- 2011. What can exome sequencing do for you? J Med Genet 48(9):580–589. , , , , .
- 2011. A RaDiCAL approach to gene discovery. J Med Genet 48(9):577–578.
- 2012. VAR-MD: a tool to analyze whole exome/genome variants in small human pedigrees with Mendelian inheritance. Hum Mutat 33:593–598. , , , , , , , .