Exome and genome analysis as a tool for disease identification and treatment: The 2011 human genome variation society scientific meeting

Authors

  • William S. Oetting

    Corresponding author
    1. Department of Experimental and Clinical Pharmacology, College of Pharmacy, and the Institute of Human Genetics, University of Minnesota, Minneapolis, Minnesota
    • MMC 485, 420 Delaware St. S.E., University of Minnesota, Minneapolis, MN 55455
    Search for more papers by this author

Abstract

The 2011 annual scientific meeting of the Human Genome Variation Society (HGVS) was held on the 11th of October, in Montreal, Canada. The theme of this meeting was “Exome and Genome Analysis as a Tool for Disease Identification and Treatment.” In the last few years, there has been a substantial increase in the use of next-generation sequencing in identifying variants associated with both single-gene disorders and complex diseases. The advent of exome sequencing with the subsequent transition to whole genome sequencing will require methods to identify candidate causal variants both in coding and regulatory regions. As this technology slowly moves into the clinical diagnostic laboratory, the need to accurately predict the functional consequences of variants becomes more critical for the both the diagnosis and treatment of disease. This year's annual meeting focused on these issues. Hum Mutat 33:586–590, 2012. © 2011 Wiley Periodicals, Inc.

Introduction

The 2011 annual meeting of the Human Genome Variation Society (HGVS) was officially opened by HGVS President Graham Taylor of St. James University Hospital, United Kingdom. The main topic, “Exome and Genome Analysis as a Tool for Disease Identification and Treatment” was introduced by Bruce Gottlieb of McGill University, Canada. One mission of the HGVS is to help facilitate research into understanding the impact of genetic variants on changes in phenotype, including disease. New technologies have allowed for the identification of large numbers of variants, but how do you analyze all of them? We are only beginning to be able to interpret the relationship between the genotype and the observed phenotype or changes in the phenotype. To date, most researchers have focused on variants within the coding sequence, but we can now identify variants within the whole genome, which is about to launch us into a new genetic era. As more data accumulates, we should expect that our understanding of many concepts that we thought we knew about in genetics will be radically altered.

Sequencing the Personal Genome

One of the biggest impacts of next-generation sequencing (NGS) is the ability to sequence the entire personal genome at a cost that can now (or soon) be affordable by most research or diagnostic laboratories. This results in the identification of all variants not only within the coding region (the exome) but also throughout the genome including all regulatory regions. Tools predicting the functionality of a variant have been limited for the most part to the exome, but understanding the effect of variation on the regulation of gene expression is crucial if we want to identify all genetic variation associated with disease.

For the first invited talk, Michael Snyder, of Stanford University, California, spoke on “Variation in regulatory information and annotation of personal genomes for disease risk prediction.” Personal genome sequencing is here, but interpreting the functional consequences of the identified genetic variants is difficult. It is hard to know which variants affect gene function and/or are helpful for disease prediction. The goal is to predict risk, diagnose, monitor, treat, and understand different disease states. To this end, the field of disease prediction is moving toward whole 'omics profiling. Not only will the identification of variations in the DNA sequence (the genome) be identified, but also changes in the transcriptome, the epigenome, the proteome, and the metabolome will need to be identified if we wish to fully understand the disease condition. Including the effect of the environment is also necessary, but this will be a much more difficult task. As part of this effort to predict the functional effect of genetic variation, information was presented showing the comparison of DNA variants with expression data and proteomic data. One result of this research is the mapping of transcriptional factor binding sites and the effect of variation in these regulatory regions on gene expression. This information is being placed in a web accessible database called RegulomeDB (http://regulome.stanford.edu). Chromatin Immunoprecipitation followed by DNA sequencing (ChIP-Seq) is being used to both identify regulatory regions bound by trans-acting factors and the effect of variation in these sequences [Kasowski et al., 2010]. The goal will be to create algorithms that can computationally predict the effects of variation on protein binding to DNA and their effect on gene expression. Together, with additional programs that can predict the effect of coding SNPs on protein function, we will be able to estimate the effect of candidate variants on the phenotype or disease susceptibility.

The emergence of the personal genome is expected to have a profound influence on medical care. Atul Butte of Stanford University, California, spoke on the medical relevance of genome sequencing in his talk entitled “Clinical Assessments Incorporating Personal Genomes.” It was noted that for an individual, we can make predictions on the genetic variant effects for 100–200 drugs. Unfortunately, predictions for complex diseases are less solid. Several efforts are underway to curate disease SNPs from GWAS and other candidate gene studies, by the National Human Genome Research Institute (NHGRI) and private industries. To date, Dr. Butte has curated publications associating 67,000 SNPs with over 1,500 diseases. Of note, many SNPs are reported in publications on the wrong DNA strand. Additionally, 45% of known published disease-associated SNPs are outside of gene exons, which will be missed in exome-sequencing efforts. It has also been difficult to determine how these associated variants are functional. Eventually, predicted risk will need to be presented in a way that clinicians can use to treat patients. Dr. Butte promoted the use of likelihood ratios over odds ratios, as likelihood ratios associated with different independent SNPs can be chained together and overall risk estimated. So far, there are very few instances in which an increase or decrease in risk has been accurately predicted for a disease when both genetic and environmental exposures are considered. For this ability to expand to additional diseases, researchers will need to get out of their “silos” and study both genes and environment factors together if we wish to help clinicians best treat their patients.

Exome Sequencing and Mendelian Disorders

The initial identification of a gene associated with a specific Mendelian (single gene) disorder sometimes took years of intensive work. The introduction of new NGS methods has, in some instances, decreased the time frame to weeks. Sarah Ng of the University of Washington, Seattle, spoke on this in her talk entitled “Next generation Mendelian genetics.” Exome sequencing was used as a method for gene discovery to identify the gene associated with Kabuki syndrome, a multiple malformation disorder that includes mild-to-moderate intellectual disability. DNA from 10 unrelated individuals was analyzed using exome sequencing and nine had frameshift or other protein-truncating mutations in the mixed lineage leukemia 2 (MLL2) gene that were highly likely to abrogate function [Ng et al., 2010]. In a follow-up screen of 100 additional patients, most mutations identified were protein-truncating mutations, but some putative mutations were missense mutations that have not been associated with a loss of function, making their functional status unclear. Further comparison of these missense mutations with those missense singletons found in MLL2 in data from the NHLBI Exome Sequencing Project (ESP; Exome Variant Server at http://snp.gs.washington.edu/EVS) revealed that the functional significance of missense variants in affected and unaffected individuals cannot be bioinformatically distinguished, underscoring the need for high-throughput functional assays. As a start in this direction, comprehensive analysis of variation in regulatory sequences (e.g., enhancers) at nucleotide resolution using massively parallel reporter assays is ongoing in an effort to identify potential regulatory mutations and to understand their affect on gene expression and eventually phenotype.

As more putative causal mutations are identified in NGS data, there is a need for improved tools to determine the functionality of these variants. Gholson Lyon of Children's Hospital of Philadelphia and the Utah Foundation for Biomedical Research spoke on “Using VAAST and exome sequencing to identify the genetic basis of idiopathic disorders.” New diseases with a dramatic phenotype will aid in the identification and role of new genes in disease. As an example, the exons on the X chromosome of an infant boy with a newly described X-linked disease named Ogden Syndrome, consisting of an aged appearance and several congenital anomalies, were captured and sequenced and several variants identified. Using the Variant Annotation Analysis and Search Tool (VAAST; http://www.yandell-lab.org/software/vaast.html) to predict functional candidate mutations, a causal mutation was identified in the N-alpha-acetyltransferase 10 (NAA10) gene, which encodes the catalytic subunit of the N-terminal acetyltransferase complex [Rope et al., 2011]. A similar methodology was used to identify putative causative mutations from exomes in a pedigree segregating for attention deficit hyperactivity disorder (ADHD). Several variants have been identified and are being analyzed to determine if they are causative for ADHD [Lyon et al., 2011]. During this analysis, they identified two variants in the pyruvate kinase (PKLR) gene that explained idiopathic hemolytic anemia in one individual compound heterozygote for the mutations. This is an example of what some people currently refer to as “incidental” findings unrelated to the disease being studied. Dr. Lyon stated that there is nothing incidental about proving causality for unrelated findings and that there is an ethical and moral obligation to find out what is going on in research subjects who report unrelated diseases during sequencing experiments. In this case, the “unrelated finding” was provided to the patient's healthcare provider, so that the physician could determine whether to verify the results in a CLIA-certified lab.

The power of NGS methodologies to identify causative mutations was exhibited by Leslie Biesecker of the Genetic Disease Research Branch of the National Institute of Health in his talk “Exome sequencing of combined malonic and methylmalonic acidemia: hypothesis-testing and hypothesis-generating approaches to genetics research.” Combined malonic and methylmalonic aciduria (CMAMMA) is an autosomal recessive disease that can result in CNS infarctions, and in rare instances, coma. In an effort to identify the gene responsible, the exome was sequenced from a single nuclear trio containing an individual with CMAMMA. Using a series of bioinformatic filters, 12 candidate genes were identified with one having a predicted mitochondrial leader sequence. Analysis of additional individuals with CMAMMA positively identified acyl-coenzyme A synthetase family member 3 (ACSF3) as the gene associated with CMAMMA. Though initially considered a pediatric disease, interrogating the ClinSeq exome database of adult individuals has shown that the phenotypic spectrum of CMAMMA is much broader than previously thought. It was found that adult forms of CMAMMA can mimic some neurodegenerative disorders. In the conclusion of his talk, Dr. Biesecker stated that for this type of research you first generate 'omic data, analyze data for potential variants, than make your hypothesis and follow up with additional clinical research. In cases such as CMAMMA, this research has the potential to reveal previously unexpected clinical heterogeneity in inherited diseases. Such research requires collaboration between basic and clinical researchers and clinicians with attention to both technical scientific issues and issues of informed consent, the collection of clinical phenotypes and samples.

As shown above and for other disorders, NGS methodology is useful for identifying genes associated with inherited genetic diseases. Tom Walsh of the University of Washington, Seattle showed how this methodology can be used for both germline and somatic mutation-based diseases in his talk “Targeted sequencing for inherited breast, ovarian, and colon cancer.” Using targeted capture and NGS of 21 tumor suppressor genes (including BRCA1 and BRCA2) an assay, called BROCA was created with the goal of developing a comprehensive genetic test for all known breast, ovarian, and colon cancer predisposition genes. This assay captures exons, introns, UTRs, and 10 kb up and down stream of the coding sequences. The assay methodology and reagents are not patented and the capture design is freely available. Additionally, a method for the detection of copy number variants (CNVs) was also designed [Nord et al., 2011]. This is particularly important in genes such as BRCA1 and BRCA2 that have a high incidence of large deletions and duplications. These techniques were used to determine the inherited mutations involved in ovarian cancer. A cohort of 360 women was enrolled at time of surgery and DNA from blood was used to identify germline mutations in 21 tumor suppressor genes. Only obvious protein-damaging mutations were counted. Overall, 23% of patients had germline mutations; 17% were within BRCA1 or BRCA2 and 6% were found in 10 other genes (see Walsh et al., 2011, for further details). Six percent of the mutations were CNVs, which are not currently detected by exome sequencing. Interestingly, 30% of women with inherited mutations had no prior personal or family history of breast or ovarian cancer and 37% of women were diagnosed with their ovarian cancer after the age of 60 years. A similar assay designed for colorectal cancer called ColoSeq, a 7-gene panel based on the BROCA gene panel, is now being offered clinically at the University of Washington.

NGS and Disease Identification

NGS technologies have made an impressive impact on genetic research, but so far, the impact on clinical analysis is limited. In his talk “Genetics and Stratified Medicine Using Clonal Sequencing,” Graham Taylor of St. James University Hospital, United Kingdom, spoke of some of the issues in the clinical use of NGS methodology. Targeted sequencing allows one to analyze a subset of genes for high-quality sequence results, due to a greater depth of coverage at a cost much lower than complete exome sequencing. This strategy can be used for specific diagnostic tests, such as the identification of cancer-specific variations in tumor tissue, sequencing of long PCR products and custom sets of genes, both for diagnostic purposes and identification of new genes associated with disease. One problem is with the short reads associated with many NGS techniques. This can be critical with variants that are large deletions, such as those found in many BRCA1 and BRCA2 variants. One example provided was the deletion c.1175_1214del40 within BRCA1. In the case of larger chromosome alterations, a sequencing karyogram can be created to identify deletions and duplications using sequencing data instead of comparative genomic hybridization (CGH). NGS is also being used to identify novel associations of genes with inherited disorders. Specific pipelines are being created (http://autozygosity.org) and were used in the example of peroxidasin (PXDN) being associated with developmental glaucoma. Methods based on NGS will become more prevalent in diagnostic laboratories, but there is a need for greater standardization of the workflow and data analysis.

The first of two company-sponsored lectures was presented by Mike Lelivelt of Life Technologies who spoke on “Ion Torrent semiconductor sequencing for life.” The Ion Torrent is a semiconductor chip-based sequencing platform that reads sequencing by measuring hydrogen ion production during DNA synthesis. This technique allows for an increase in sequencing depth (the Ion Torrent has increased from 1 million sensors to 7 million within a few months) without significantly increasing the cost. Read lengths have also been increased to over 500 bp with 99.997 accuracy.

As exhibited in previous talks, exome sequencing is a powerful technique to identify disease-causing mutations in Mendelian disorders, but methodology is critical to provide for proper interpretation of the resulting data. Christian Gilissen of Radboud University Nijmegen Medical Centre, the Netherlands, spoke on “Disease gene identification strategies for exome sequencing.” Interpretation of exome information derived from NGS is still a problem. The initial approach is to identify all variants including splice site variants. The next step is to focus on protein-changing variants, especially those that are rare. Usually 150–500 private variants in the patient are identified as potentially disease causing. There are six strategies to prioritize these variants. The choice of the strategy requires knowledge of the inheritance of the disorder, genetic heterogeneity, and availability of family members. Linkage strategy: variants that segregate with the disease or lie within a region that segregates with the disease are good initial candidates. Homozygosity strategy: known consanguinity can help with identifying mutations associated with a recessive disease. Double-hit strategy: for recessive diseases, a double-hit strategy is used by selecting variants that are homozygous or compound heterozygotes. In some cases, a single exome can be sufficient to identify the gene associated with the disease. Overlap strategy: multiple unrelated patients with the same phenotype. Important considerations include a well-defined phenotype. De novo strategy: sequencing a patient and his parents to find de novo mutations. This is most useful in case of sporadic disease with reduced fecundity. Validation of the results using additional individuals with the same phenotype is required to provide sufficient proof. Candidate strategy: use of biological information at the variant and gene level to prioritize for disease-causing mutations. Potential problems that can limit success include a lack of sequence coverage resulting in the mutation not being targeted (especially if the mutation is not in the coding region), misalignment of reads or miscalling of variants, misinterpretation of the variants, the mutation is not an SNP and clinical and/or genetic heterogeneity. Even though there are many potential problems, as listed above, there is an estimated success rate of 60%. Exome sequencing is the current method of choice for resolving the genetic cause of Mendelian disorders.

As stated earlier, assigning functionality to variants in regulatory sequences has been difficult. New methods will need to be created to understand the effect of associated variants on gene expression. On this subject, Kuan-Bei Chen of Pennsylvania State University spoke on “Differential regulation of gene expression in 5 human cell lines.” The motivation for this work is to identify regulatory regions and to understand the relationship between the binding strengths of trans factors on gene regulation and variation in these regulatory regions. RNAseq Data and ChIP analysis was used to identify variants that alter trans factor-binding producing differential regulation of the tumor suppressor gene STK11/LKB1 in HeLa cells. The potential utility of this approach to uncover regulatory regions involved in disease was discussed. This approach may allow for the identification of variants in regulatory regions that are associated with disease.

The second company-sponsored lecture was presented by Michael Rhodes of Life Technologies who spoke on “Pay per lane sequencing: the 5500 Series Genetics Analysis System.” The last few years have witnessed many advances in NGS throughput including increases in the read length, the addition of up to 96 available barcodes allowing multiple samples to be sequenced in a single sequencing lane and the ability to use only a single lane at a time, available in the 5500 Series Genetic Analysis System. The creation of the most efficient sequencing library is also critical to successful sequencing runs and several possibilities were presented, including TargetSeq custom enrichment kits, exome capture libraries, and RNA-Seq libraries. A new method for template preparation that replaces emulsion PCR by amplifying the library directly onto the slide was announced.

The Future of Genetic Variation Analysis

A core mission of the HGVS is to facilitate the creation of databases for the accumulation and presentation of genetic data, as well as the analysis of identified mutations. One example is the UMD-THAP1 LSDB. Arnaud Blanchard of INSERM U827, France, spoke on “Genetics of dystonia and creation of a locus-specific database for THAP1 gene mutations UMD-THAP1.” Dystonia is an inherited movement disorder with more than 20 different loci. The last gene so far identified, THanatos-Associated Protein 1 (THAP1), codes for a transcription factor. THAP1 mutations were first identified in Amish-Mennonite dystonic families and then extended to other populations (78 probands). Molecular and clinical data have been curated and are now available at http://umd.be/THAP1/. The LSDB proposes the standardization of the clinical vocabulary allowing the user to create homogeneous phenotypical groups among the patients, and thus facilitating genotype–phenotype correlations. It also contains statistical and predictive tools (pathogenicity of missense mutations). All of these UMD specificities led Dr. Blanchard to consider the UMD as a “knowledge base,” playing a central role in the challenge of the next years: the interpretation of the data.

Though the new sequencing technologies can produce an immense amount of sequence data, we need to be careful on how this information is interpreted. Bruce Gottlieb concluded the HGVS 2011 annual meeting with a talk titled “How reliable is human genome sequence in determining human phenotype? Is it time to rethink our reliance on genome sequence as the primary basis for constructing genetic databases?” As huge amounts of DNA sequence are being produced, the identification of sequence variation has risen exponentially, and we are finding information that we cannot understand when we try to interpret it using our present understanding of the occurrence and significance of genomic variation. This is because NGS has revealed an unexpected amount of genetic variation in normal individuals, including sequence alterations previously solely associated with well-known disease phenotypes. Further, we are now finding within the same individual that some tissues contain differences in their sequences when compared to the traditional source of genomic sequence, that is, peripheral blood lymphocytes. As an extreme example of this intra-organismal genetic heterogeneity, they examined in breast tumors variations in androgen receptor (AR) exon 1 CAG repeat length, which is a well-known functional polymorphism on the X-chromosome. Sequence analysis of breast tumor samples, using a unique NGS technique, produced over 35,000 reads and revealed up to 30 different CAG repeat lengths instead of the expected two. This showed that even within the same tissue, somatic genetic heterogeneity exists, which is being missed using traditional sequencing techniques. Further, sequencing the AR exome in over 140 patients with classical androgen insensitivity syndrome (AIS), a disease that is caused by AR mutations, revealed that 40% of the patients had no detectable mutation in the AR gene, even though many of these individuals were shown biochemically to have a dysfunctional AR. This questions the basic assumption that AIS is always the result of an alteration to the inherited sequence of the AR gene. Based on this and other evidence, a model was presented that showed how a number of post-DNA events, including DNA and RNA editing, interacting RNA molecules such as miRNA, epigenetic factors such as methylation, and interacting proteins, could all affect the genotype-to-phenotype pathway. Thus, it was concluded that future genetic databases will likely have to include a whole host of factors that can affect the phenotype in addition to the genomic sequence.

Concluding Remarks

At the conclusion of the 2011 annual meeting, Bruce Gottlieb left us with several questions to ponder. Is the definition of SNP out of date? Can we differentiate between benign polymorphisms and functional mutations? Does a reliable and stable human genome reference sequence exist? Is there such thing as a “normal” sequence? The quest for common mutations associated with disease has been difficult. It could be that many common diseases are actually tens of thousands of individual diseases. Additionally, a consistent direct relationship of genotype to phenotype may no longer be relied upon even for single locus-specific disorders. Epigenetic factors are being identified as important constituents of disease and disease states may also be affected by tissue and even cellular microenvironments. In the end, personalized medicine will be realized when we understand the nature and relationship of all genetic and nongenetic events that result in an individual's phenotype, not just the genomic contributions.

HGVS Awards

As part of the HGVS annual meeting, two individuals were recognized for their important contributions to the advancement of the study of human genetic variation. Dr. Mark Paalman was recognized for his enthusiastic promotion of the Human Genome Variation Society and as Managing Editor of Human Mutation, the official Journal of the HGVS. Professor Johan den Dunnen was recognized for his extraordinary contribution to the science of human genome variation, the establishment of international mutation nomenclature standards, and to celebrate the 1,000th download of Leiden Open Variation Database software (http://www.lovd.nl/2.0/). Both individuals have also played important roles in the direction of the HGVS. The HGVS has been very fortunate in having many leaders in the area of human genetic variation associated with the Society and we are looking forward to continued progress by these individuals and all members of the HGVS.

Acknowledgements

This year's annual meeting was co-chaired by Bruce Gottlieb of McGill University, Steven Brenner of the University of California–Berkeley and Marc Greenblatt of the University of Vermont. The sessions were chaired by Bruce Gottlieb, William Oetting, and Alistair Brown. The author would like to thank the speakers for their help in the preparation of this report.

Ancillary

Advertisement