Exome and whole-genome sequencing for gene discovery: The future is now!


  • Jacek Majewski,

    1. Rare Disease Consortium for Autosomal Loci (RaDiCAL), Department of Human Genetics, McGill University, Montreal, Quebec, Canada
    Search for more papers by this author
  • David S. Rosenblatt

    Corresponding author
    1. Rare Disease Consortium for Autosomal Loci (RaDiCAL), Department of Human Genetics, McGill University, Montreal, Quebec, Canada
    • David Rosenblatt, Department of Human Genetics, McGill University, Montreal, Quebec, Canada
    Search for more papers by this author

  • For the Focus on the NIH Undiagnosed Diseases Program

Personalized medicine is a recurring theme in current medical and public literature, but the prospect that the tools of modern genetics and genomics will have an immediate impact on patient care is at time a matter of hype and hope. The obvious exception is in the area of Mendelian disease, where alterations in a single gene can have a major impact on the health of an individual patient. One of the disappointments of modern medicine is that although the tools are in place to find virtually all the causal genes for these diseases, thousands are not yet discovered either because of the rarity of the diseases or the lack of a coordinated systematic approach on the part of the international medical community [Rosenblatt, 2011]. In the last 2 years, the tide has begun to turn, and increasing interest is being focused on those problems that can help patients directly.

In Canada, where we both work, FORGE (Finding of Rare Diseases in Canada) has focused its attention on rare disorders in Canadian children. Its stated goals include assisting doctors with identifying patients, sequencing the genomes of patients to identify disease-causing genetic changes, setting up a Canadian coordination center to streamline and improve existing large-scale sequence analysis tools, and creating ethical guidelines for analyzing sequence data from entire genomes and sharing the results with families (http://www.cpgdsconsortium.com/).

In Europe, the UK10k project (http://www.uk10k.org/) aims to discover the effects of rare genetic variation. The project does not focus on rare diseases per se, but by sequencing patients with extreme phenotypes of specific conditions, the group aims to associate those extreme traits with the presence of rare protein-coding variants.

In the United States, the National Institutes of Health (NIH) Undiagnosed Diseases Program has gained national and international prominence. Its stated goals are to provide answers to patients with mysterious conditions that have long eluded diagnosis, and to advance medical knowledge about rare and common diseases (http://rarediseases.info.nih.gov/Resources.aspx?PageID=31). A recent report describes the findings of this program during its first 2 years [Gahl et al., 2012].

Another NIH initiative, National Human Genome Research Institute Focus on Rare inherited Diseases (http://www.genome.gov/27546261), has been recently created to fund sequencing and analysis centers for the purpose of identifying mutations underlying rare disorders.

It is becoming increasingly clear that next-generation sequencing, in one or another iteration, will soon become a first-line approach for patients suspected of Mendelian disease, when there is more than one candidate gene or when the gene is unknown. In the same way that microarrays have replaced traditional karyotypes as a first-line test for children with suspected chromosomal abnormalities, next-generation sequencing will replace target sequencing.

The four articles published in this issue of Human Mutation [Adams et al., 2012; Dias et al., 2012; Fuentes Fajardo et al., 2012; Sincan et al., 2012] address the question “Can we bundle a large number of molecular diagnostic studies using whole-exome sequencing of a single affected patient?” (Cornelius Boerkoel and William Gahl, personal communication). A researcher or physician embarking on a personal genome or exome sequencing quest must be aware of a number of pitfalls associated with generating gigabytes of massively parallel sequencing data. The papers included in this focus section describe several of the most important problems and walk the reader through the fundamentals of high-throughput sequence data analysis. The four general themes are: (1) incomplete coverage resulting in possible false negative results; (2) sources of false positive results and some guidelines for limiting them; (3) identification of candidate causative mutations within hundreds of variants; and (4) integrating pedigree and genotype information into the analysis of exomes from related individuals.

Compared with traditional disease gene mapping, which has been historically based on statistical genetics, exome- and genome-based mutation searches work in reverse order. In the past, researchers collected large sets of samples, allowing them to employ linkage or association analyses and define candidate intervals based on predefined statistical cutoffs. Having established statistical significance and delimited a candidate chromosomal interval of (hopefully) manageable size, the researcher would then prioritize potential functional candidate genes within the interval and pursue them sequentially by targeted Sanger sequencing. This stepwise process could take years to complete. The first successful human genetic linkage study of Huntington's disease [Gusella et al., 1983] was originally published in 1983, but the causative mutation of the now famous CAG trinucleotide repeat in the HTT gene was not discovered until 10 years later [Huntington's Disease Collaborative Research Group, 1993].

Today's exome- and genome-sequencing technologies, allowing us to simultaneously uncover virtually all variants within an individual's genome, often make it possible to forgo the mapping studies and proceed directly to the examination of the list of variants and, given enough insight, “pinpoint” the mutation responsible for a disease. What we gain is speed; however, this gain comes at the expense of some scientific rigor and the support of a P value or a LOD score. To begin with, the list of genes covered by sequencing studies will be very dependent on the technology used: genome versus exome sequencing, the type of capture kit, the sequencing platform, and sequencing depth. Additionally, the lists produced will depend on the alignment algorithms and the stringency settings of the bioinformatics tools used for identifying variants. Finally, the ultimate list of candidate genes will be highly subjective, dependent on the preferences and hunches of individual researchers. Despite those somewhat “unscientific” drawbacks, exome- and genome-sequencing is effective and has hugely accelerated the mutation finding process in the past 2 years [Majewski et al., 2011].

The compendium of papers presented in this issue of Human Mutation should serve as a useful illustration of the modern-day gene-hunting process. It presents the problems encountered, as well as some of the solutions developed by the NIH Undiagnosed Diseases Program, and will be a valuable practical guide for researchers interested in applying such methodologies in their own research.