Clinical Applications of Next-Generation Sequencing: The 2013 Human Genome Variation Society Scientific Meeting

Authors


ABSTRACT

Next-generation sequencing (NGS) has significantly contributed to the transformation of genomic research by providing access to the genome for analysis, by significantly decreasing the sequencing costs and increasing the throughput. The next goal is to exploit this powerful technology in the clinic, namely for diagnostics and therapeutics. The 2013 annual meeting of the Human Genome Variation Society, held in Paris, France, provided a forum to discuss possible clinical applications of NGS, the potential of some of the current NGS systems to transition to the clinic, the identification of causative mutations for rare genetic disorders through whole-genome or targeted genome resequencing, the application of NGS for family genomics, and NGS data analysis tools.

Introduction

The 2013 annual scientific meeting of the Human Genome Variation Society (HGVS) was held on the 8th of June in Paris, France. The theme of this meeting was “Clinical Applications of Next Generation Sequencing.” The evolution of the technology used to identify variants in the DNA sequence associated with disease has progressed rapidly since the first DNA sequence was published in 1968. Today, sequencing the exon of every gene (whole-exome sequencing; WES) or the entire genome (whole-genome sequencing; WGS) can be accomplished at a relatively low cost. How this technology can be used in a clinical diagnostic setting was the main focus of this meeting.

The identification of disease producing genetic variants is becoming increasingly important in understanding different disease states. There is increasing pressure on clinical laboratories to provide clinicians with these variants as well as their effect on the function of the protein product. The ability to create this information has been noted, but the ability to understand the effect of a genetic variant on disease, or disease predisposition, is still in its infancy. The talks presented at this meeting spoke to both the implementation of next-generation sequencing (NGS) in the clinical arena, as well as means to determine the effects of mutations on clinical outcomes.

NGS and Clinical Applications

The first session was chaired by George Patrinos of the University of Patras, Greece. In some cases, large amounts of DNA are not available, especially in the case of diseased tissues. To this, Radoje Drmanac of Complete Genomics, Inc., Mountain View, California spoke on “Accurate whole genome sequencing from ten human cells as the ultimate genetic test.” Using new technologies, WGS can be done in very small sample sizes but still achieve high (approximately 110×) sequence coverage. With additional methods, extended haplotypes up to 500 kb can be created, especially when parental DNA is also sequenced [Peters et al., 2012]. Possible roles for this technology include sequencing embryo DNA before implantation or fetal cells from the mother's blood to detect potential disease causing variants. Circulating tumor cells can also be sorted and sequenced to identify de novo mutations in tumor cells when compared with the original genome sequence. Ultimately, WGSs of all individuals is expected to provide a comprehensive genetic test for health improvement resulting in longer healthy lives, disease prevention, and diagnosis and treatment improvements. As this information becomes available for more individuals, we need to make sure that there is not an over interpretation of results. We should not be creating new diseases. We also need to educate the public and healthcare professionals as this information becomes available. These are exciting beginnings of genomic medicine and should result in substantial improvements in healthcare.

Many clinical laboratories are already providing results from NGS technologies to clinicians and patients. Joris Veltman of the Radboud University Medical Center, Nijmegen, the Netherlands, spoke on this in his talk “Diagnostic exome sequencing in genetically heterogeneous disorders.” Traditional Sanger sequencing for diagnostic purposes limits the number of genes that can be analyzed. NGS allows for a greater number of genes to be analyzed with less work and cost. The sequencing analysis at Radboud, which is approved by the Dutch Accreditation Council, is focused on a targeted set of genes where mutations are known to cause blindness, deafness, movement disorders (e.g., ataxias), intellectual disability (ID), mitochondrial disorders, or colorectal cancer. Even limiting the analysis to these clinical disorders, hundreds of candidate genes need to be considered for multiple panels that would require regular updating. To overcome these limitations and simplify the workstream, their protocol employs exome sequencing for all patients with targeted analysis of genes for the specific disease referral. An important part of this analysis is the development of software tools that have a high diagnostic yield, practical workflow, and minimize incidental findings. To determine the impact of identified variants, software such as Sorting Intolerant From Tolerant (SIFT), Polyphen-2, and Grantham Variation and Grantham Deviation (GVGD) are used. Identified variants are validated with Sanger sequencing. Dr. Veltman commented that incidental findings are still a challenge. An expert committee was created and incidental findings are discussed on a case by case basis. Two examples given were a variant of unknown significance in RB1 in a patient with ID, and a patient with an XYY karyotype.

Although NGS is usually used to identify base substitutions or small insertion/deletions (indels), this technology can also be used to identify chromosomal abnormalities. Sian Ellard of the University of Exeter Medical School, United Kingdom provided an example of this in her talk “Whole genome sequencing and copy number analysis of exome sequencing in two families with split-hand/split foot malformation identifies chromosomal rearrangements affecting putative exonic enhancers.” There are multiple genes associated with split-hand/split foot malformation (SHFM). Genome sequencing of a patient with a de novo balanced translocation t(2;7)(p25.1;q22) showed that the chromosome 7 breakpoint was located within a critical region on 7q21 near the distal-less homeobox 5 (DLX5) and DLX6 genes known to be important for limb bud development. In a second family, originally thought to have X-linked SHFM, no mutation was identified within the candidate SHFM2 locus at Xq26. Instead, a 106 kb deletion within the SHFM1 locus was identified by read depth analysis of exome sequence data. The deletion breakpoints were characterized by genome sequencing and include DYNC1I1 exons 15 and 17. These exons have recently been shown to act as tissue-specific enhancers of DLX5 and DLX6. These cases provide evidence of the importance of DYNC1I1 exonic enhancers in human limb formation and demonstrate the utility of genome sequencing for precise characterization of chromosomal abnormalities.

Although NGS can provide information on thousands of genes, it can also be used for targeted diagnosis using a limited panel of genes. For this, Silvia Borras of NewGene Ltd., Newcastle upon Tyne, United Kingdom, talked about the “Noonan spectrum test – one year on…” Germline mutations in the RAS/MAPK pathway (RASopathies) result in a number of developmental autosomal dominant disorders including Noonan syndrome, cardio-facio-cutaneous syndrome (CFC), Costello syndrome, Noonan syndrome with multiple lentigines (formally known as LEOPARD syndrome), Legius syndrome, and various Noonan-like syndromes. To detect causative mutations in these RASopathies, NewGene in collaboration with the SW Thames Molecular Genetics Diagnostic Laboratory at St. George's Healthcare NHS Trust, developed a comprehensive panel test that screens for variants in all coding regions and splice sites of 11 genes implicated in RASopathies and also tests for presence of the common SHOC2 mutation. This 454-based assay has been successful in the identification of mutations in many individuals affected with these syndromes.

Although NGS can identify numerous sequence variants, a major problem is predicting the functional consequence of mutations identified after sequencing. Much work has been done on determining the functionality of protein coding mutations within exon sequences, but significant mutations that affect splicing or gene expression have not been comprehensively assessed. Peter Rogan of the Schulich School of Medicine, Western University, London, Canada, has taken on this task and presented some of his results in his talk “Genome-wide prediction and validation of mRNA splicing mutations in cancer.” Using Shannon information theory, software packages were created for genome-scale analysis of variants within splice sites that alter binding site strength [Shirley et al., 2013] and to predict the transcript isoforms generated by splicing mutations [Mucaki et al., 2013]. Variants within the BRCA1 and BRCA2 genes were used to test their model and were grouped into four functional states: wild type, leaky mutations with residual activity, inactivating mutations, and cryptic splicing. Predicted effects of 50% of these calls were confirmed using RNAseq analysis. Additional efforts underway include predicting functional mutations in promoter and intronic regions and the potential effects of mutations in regulatory splicing elements with information theory-based models. The goal is to use these software tools to perform a genome-scale mutation analysis to identify functional variants outside the coding regions effecting either splicing or expression levels.

Databasing and NGS

Many efforts are being made to create software tools that can predict whether a mutation is functional or not. Databases that collect both gene specific variants, as well as associated clinical and functional information, are critical, especially in the case of testing predictive software packages. The second session, chaired by William S. Oetting from the University of Minnesota, Minneapolis, focused on progress in the creation of this type of database.

To begin this session, Anthony Brookes from the University of Leicester in the UK, spoke on “Towards a genetic disease data ecosystem.” He presented achievements from the now complete GEN2PHEN project (http://www.gen2phen.org), progress toward unifying and sharing genotype–phenotype data, and extended this with thoughts on future development of the field. The vision presented was that of a mixed model involving centralized genetic variant databases (e.g., the National Center for Biotechnology Information) combined with federated resources such as the locus-specific databases. GEN2PHEN has produced and deployed many standards, tools, principles, and components needed to achieve this kind of interoperability. This allows far more data to be brought together and holistically accessed than ever before. But many challenges remain—not least of which is facilitating further interconnectivity between multiple databases including research based data systems, diagnostics data systems, and patient data systems, to allow for meaningful interaction between these three domains. To this end, the project has devised an advanced database platform (PathoKB) based upon a highly flexible data model (Observ-OM). This “pathogenicity-focused system” integrates patient clinical information and phenotype, genetic variants, and the pathogenicity of these variants along with supporting evidence data. Information will be supplied from clinics, research, and diagnostic laboratories, along with expert curation. Decisions for sharing data will be made by the multiple research, clinical, and patient communities involved. Going beyond traditional forms of data sharing (i.e., open sharing or managed access), GEN2PHEN has explored novel ideas around open discovery of data—wherein the existence rather than the substance of the data is made widely and freely accessible. This principle has been enabled via a novel data discovery platform for mutation and phenotype data, called Cafe Variome (http://www.cafevariome.org). This is currently being deployed across sets of diagnostic labs with a common disease focus, as well as across groups of diagnostic labs from whole nations. Thereby, teams in these networks will be able to instantly find mutation records and rare disease (RD) patients of interest recorded by other groups, plus some summary level data (e.g., mutation frequencies), without compromising patient privacy or releasing actual data. Thereafter, the Cafe Variome tool facilitates tailored options for primary data access, controlled by the data owner on a case-by-case basis in a secure manner.

There are approximately 8,000 genetic diseases affecting over 30 million European Union citizens and most are very rare. Currently, several initiatives have been undertaken to promote and stimulate collaboration and data sharing in order to facilitate breakthroughs in RD diagnosis and research. National and international initiatives were presented by David Salgado from the Medical Genomics and Functional Genetics Institute at the Aix-Marseille University in Marseille, France, in his talk “Next generation sequencing in the rare disease field: international initiatives for data analysis and data sharing.” In France, one of the leading European countries in RD research has launched the 2nd French National Rare Diseases Plan, funded by the French Ministry of Health and the French Ministry of Research. The Rare Diseases Foundation (http://fondation-maladiesrares.org) is a national organization created to speed up research on RDs. The foundation supports all aspects of RD research including the funding of gene identification using WES and WGS technologies and provides a repository to promote data sharing through the Banque Nationale des Variants Partagés. The Fondation Maladies Rares has been supporting approximately 1000 WES projects in the RD field. At a higher level, the International Rare Diseases Research Consortium (IRDiRC; http://www.irdirc.org) has been given the ambitious goal, by 2020, to help in the identification of genes responsible for most RDs, to create tests for their molecular diagnosis, and to develop 200 new therapies for RDs. Thirty two member institutions from Europe, North America, Canada, Australia, and China currently compose IRDiRC. Four projects have already been funded at the European level including RD-Connect. RD-Connect is a unique global, interdisciplinary project that links up databases, registries, biobanks, and clinical bioinformatics data used in RD research into a central resource for researchers and geneticists worldwide.

The large amount of variants identified, either in large databases or through NGS data, will require highly automated analytical software tools to separate the functional variants from the benign. Mauno Vihinen, of Lund University, Lund, Sweden, spoke on “Interpreting NGS Data with PON-P.” NGS allows for the creation of large amounts of variant data, but how can it be interpreted? Tools need to be developed identify the needles in the hay stack. There are many ways an amino-acid substitution can affect protein function. PON-P was created for this purpose. A new version (PON-P2, http://structure.bmc.lu.se/PON-P2) will soon be released and will be much faster than its predecessor, has good performance, can handle large numbers of cases, has many submission formats and is freely available. An additional tool focuses on mismatch repair (MMR) gene variants and is also available (http://bioinf.uta.fi/PON-MMR). The performance of these tools were measured using datasets of variations with known effects as gold standards (e.g., for prediction method performance assessment and predictor development). There is a concern as to the accuracy of predictive software tools [Joppa et al., 2013]. Additional datasets, like those available in VariBench benchmark database [Sasidharan Nair and Vihinen, 2013], are needed to help with improvements to variant effect prediction tools.

To date, most of the focus on genetic variants has been on SNPs and small indels. We are now finding that copy number variants (CNVs), large indels, and even chromosome rearrangements are much more frequent than previously thought. As CNVs are being associated with changes in the phenotype, there is a need to accurately describe these changes using a common nomenclature. The HGVS has been in the forefront of creating a nomenclature for small sequence variants (http://www.hgvs.org/mutnomen) including nomenclature for complex variants [Taschner and den Dunnen, 2011]. In his talk “Describing translocations by extending HGVS sequence variation nomenclature” (Adapted slide version available at http://www.hgvs.org/mutnomen/SVtrans_HGVS2013_PT.pdf), Peter Taschner of the Leiden University Medical Center, Leiden, The Netherlands, presented the HGVS recommendations to extend nomenclature to include molecular details on chromosomal variation. There is a current nomenclature for describing chromosomal translocations (International System for Human Cytogenetic Nomenclature [ISCN] 2013 guidelines) but current technologies can now provide a detailed description of the specific breakpoints. The proposed HGVS recommendations include current ISCN guidelines describing the chromosomal abnormality along with a specific description of the breakpoint itself. An example of a balanced translocation between chromosomes 9p24 and 22q11 would be as follows: t(9;22)(p24;q11)(oNC_000022.10: g.23631785::NC_000009.11:g5069032;NC_000022.10:g.23631784:: oNC_000009.11:g5069031.

The last part contains the detailed descriptions of the derivative chromosomes 9 and 22 breakpoint junctions (separated by the semicolon), including chromosomal reference sequence accession and version numbers, their orientation and the nucleotide positions joined (as indicated by double colons). Additional sequences can be inserted at the junctions using a second set of double colons. It is important that a formal method for these changes be incorporated to limit alternative interpretations and ambiguous descriptions of the same chromosomal change.

Personal/Family Genomes and WGS

Sequencing DNA for clinical diagnostic needs has evolved from the sequencing of single exons of candidate genes to WES and even WGS. This provides unprecedented access to the genetic variants in a patient, but also opens up issues from determining the functionality of observed variants to issues of consent and privacy of personal information. In this session, chaired by George Patrinos and Mauno Vihinen, several speakers provided different experiences in the utilization of NGS technology in the clinical diagnostic laboratory.

In the first talk of this session, William Oetting spoke on the “Introduction of next generation sequencing in a clinical diagnostic laboratory.” He described a strategy in which an entire testing menu was validated as a single CLIA certified “test.” This testing menu was composed of 568 different genes organized into more than 100 panels covering the most commonly ordered genetic tests at their institution. This approach allows the clinical laboratory to maximize the coverage depth of clinically relevant regions by targeting a small fraction (<3%) of the exome. Offering a mix of both small and large panels allows the lab to maximize sample volume, which in turn reduces cost and turnaround time. Multiplexing patient samples using “bar-codes” to differentiate samples provides additional cost effectiveness. Variants are identified by a custom cloud-based genotyping pipeline utilizing the Genome Analysis Toolkit (GATK) unified genotyper (http://www.broadinstitute.org/gatk). This pipeline analyzes the sequence data and provides calls only for the specific genes/panels requested by the clinician. Utilizing this approach, they are able to achieve >20× coverage at every coding base in 98% of targeted exons. Supplemental Sanger sequencing is employed to analyze all bases not covered at >20× coverage.

Many believe that pharmacogenomics will have a large impact on improving clinical care by increasing efficacy of medications, while at the same time decreasing adverse events. Joseph Borg of the University of Malta, Msida presented preliminary data from a collaborative project among the University of Malta (Malta), the University of Patras (Greece), and Complete Genomics, Inc. (Mountain View, CA) in his talk entitled “The impact of whole genome sequencing on pharmacogenomics.” Pharmacogenomics focuses on variants in proteins that are involved in drug utilization including drug adsorption, distribution, metabolism, excretion, and toxicity (ADMET). The DMET™ chip from Affymetrix analyzes 1,936 SNPs within 231 genes that alter the function of proteins involved in these pathways but there are several limitations. The DMET™ chip is 4 years old and is missing many important variants that affect drug utilization including variants within controlling regions, TSS-binding sites, enhancer regions, siRNA loci, and noncoding RNA genes. Additionally, the chip is not representative of many populations. In an effort to identify novel ADMET variants, 69 whole genomes of nondiseased individuals were sequenced and a total of 87,000 variants were identified in genes and 2,000 variants found in exons and regulatory regions. Only 25%–32% of these variants were identified with the DMET™ chip for each individual. Additionally, 3% are novel ADMET variants. Also a number of indels were identified. It is clear that WGS and even WES can identify novel variants, many of which are missed by current pharmacogenomics chip-based analysis and will work equally well for in all ethnic groups or populations. Using WGS analysis for ADMET variants will aid in the identification of poor and intermediate metabolizers, even in cases when the variant has a low minor allele frequency. If prices are low enough, we can consider standard genotyping for pharmacogenomics and genomic-based medicine testing in routine clinical care. In silico prediction to determine functionality of novel variants will be very important to be able to use this information effectively.

CNVs have already been implicated in a number of genetic disorders and the importance of this class of variants will only increase. Jayne Hehir-Kwa, of the Radboud University Medical Centre, Nijmegen, The Netherlands, showed how these variants can be identified in her talk “Detection of clinically relevant CNVs with whole exome-sequencing.” Current medium resolution microarrays can detect a de novo CNVs in 10%–15% of patients with ID. This can be improved with high resolution microarrays (4.2 M) but these assays are more expensive. A comparison was made between WES data and medium resolution genomic microarrays for the detection of CNVs: 88% of the CNVs were detected by WES and 96% of the clinically relevant CNVs were identified. Unfortunately, software tools that are used to identify CNVs tend to overestimate CNV calls. A test using WES data was made of different CNV calling programs using 10 patients with known clinically relevant CNVs. Of the CNV calling programs compared (CoNIFER, Cn.MOPS, Contra, and ExomeDepth), CoNIFER was found to be the best for calling rare clinically relevant CNVs where 11/12 clinically relevant CNVs were detected, but it was also found that the larger the sequence involved, the more accurate the true calling percentage. Problems still exist including the detection of small CNVs, identifying breakpoints and issues with over segmentation.

Because of the reduction in cost, NGS is now being introduced into smaller clinical diagnostic laboratories. Debbie Prosser of LabPLUS, Auckland Hospital, New Zealand further talked about this transition in her presentation “Introduction of New Generation Technology to the Diagnostic Laboratory.” NGS was done using a Roche GS Junior 454 instrument with retrofitted custom designed M13 tailed primers that resulted in “bar-coding” of individual patient DNA samples. Initial testing was done on the BRCA1 and BRCA2 genes. Up to five patients (representing 500 amplicons) were simultaneously sequenced, greatly reducing the turnaround time and cost. One issue was that 2 single base duplications within homopolymer regions were missed; a common problem with the GS Junior 454 Sequencing platform. Throughput was limited by the requirement for 100 PCRs per patient but this was being addressed using the Multiplicom BRCA MASTR assay kit, which has the initial amplicon PCR step as a multiplex assay of 5 PCR's. Novel functional variants were identified using Simultaneous On-Line Variant Analysis, which incorporates a number of variant calling tools. CNVs were initially identified using a Nimblegen CGH array and then will be transitioning to an Agilent custom array with the withdrawal of Nimblegen from the custom array market.

Although the technology is rapidly allowing for identification of all variants within an individual's genome, there are still issues as to how clinicians, and more importantly patients, respond to this information. Project CARDIO showed that the major challenges are not in the technology, but rather in correspondence between the laboratory and the clinic, including a uniform interpretation of results, exchange of data and reporting. Terry Vrijenhoek, of the University Medical Centre Utrecht, The Netherlands, spoke on this in his talk “Diagnostic application of NGS: it's the people, not the technology.” A pilot project, CARDIO, was set up to see how NGS in the diagnostic laboratory works in the Netherlands. DNA samples and clinical information from the same nine patients was sent to eight clinical centers (Academic centers) for analysis. It was found that all eight centers found the same mutations, though they each had different diagnostic platforms, and presented the same diagnosis for each of the nine patients. Although this showed that there exists high-quality NGS based diagnostics in The Netherlands, the interpretation and presentation of the data remains challenging. Additionally, some diagnoses were easier for some centers than others, requiring extensive consultation that could allow for the introduction of error.

In the final discussion, Kyu-Baek Hwang of Soongsil University, Seoul, South Korea spoke on “A comparative analysis of whole-genome sequencing of an extended family using two different platforms.” A 17 member extended CEPH family (Utah pedigree 1463) was analyzed with WGS using the Complete Genomics standard service (CG) at 80× mean coverage and using an in-house Illumina HiSeq 2000 sequencer at 40× and CASAVA software. Between the two platforms, CG had the higher quality, but consensus calls had the highest quality with both platforms producing platform-specific variants that were questionable. It was shown that de novo calls having a high degree of confidence are produced when multiple platforms or family aware analysis are used. As sequencing using multiple platforms is usually impractical, the addition of family members to the analysis should be considered when trying to identify de novo variants. Additionally, more sophisticated calling tools using a machine learning technique can help filter out platform-specific variants from WGS results obtained using a single platform.

Conclusions

There is an increasingly rapid movement of NGS into the clinical diagnostic laboratory but there are several issues that need further refinement. These include the ability of accurately calling the functional variants responsible for observed changes in the phenotype, the accurate calling of CNVs, the proper nomenclature for larger variants including chromosomal abnormalities, and the databasing of disease producing variants to help aid the clinical community. This annual meeting of the HGVS presented some of the solutions to these problems, but still several questions need to be answered. Each year we get closer to the answers, but there always seems to be much work to do. We expect there will be more answers reported in the next HGVS annual meeting.

Acknowledgments

This year's annual meeting was chaired by George Patrinos with cochairs Alistair Brown, Marc Greenblatt, and William Oetting. It was sponsored by Complete Genomics, Inc. The sessions were chaired by George Patrinos, William Oetting, and Mauno Vihnen. The authors would like to thank the speakers for their help in the preparation of this report.

Disclosure statement: The authors declare no conflict of interest.

Ancillary