Future of whole genome sequencing

Authors

  • David J Amor

    Corresponding author
    1. Royal Children's Hospital, Murdoch Childrens Research Institute, Melbourne, Victoria, Australia
    2. Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
    • Correspondence: Associate Professor David Amor, Murdoch Childrens Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Vic. 3052, Australia. Fax: +61 3 8341 6390; email: david.amor@mcri.edu.au

    Search for more papers by this author

  • Conflict of interest: None declared.

The era of genomic medicine, anticipated since the completion of the Human Genome Project in 2003, is now upon us. Before long, genomic information will be an integral part of the medical record, and paediatricians will be able to interrogate these data for a range of purposes: to make a diagnosis in a sick child, to diagnose or exclude a rare genetic disorder, to identify and anticipate future health problems, or to increase the precision of medication prescribing. At the heart of genomic medicine is the use of genomic data, derived from the whole human genome, to better diagnose, predict and treat disease.[1] The availability of clinical-level sequencing and analysis of the whole genome is predicted to transform many aspects of paediatric medicine over the next 5 years and may ultimately become as routine as serum biochemistry.

The Technology behind Genome Sequencing

The remarkable increase in genetic testing capacity is due primarily to advances in genetic sequencing technology, from the ‘old’ Sanger sequencing to the ‘new’ next-generation sequencing (NGS, also called massively parallel sequencing). Sanger sequencing, sometimes referred to as ‘first-generation’ sequencing, was the cornerstone of DNA sequencing for more than 25 years but was always limited by low capacity and relatively high cost. Although the Human Genome Project used Sanger technology to sequence the entire human genome (at a cost of ∼$3 billion), clinical use of Sanger sequencing is usually restricted to the testing of single genes, typically at a cost of ∼$1000 per gene.

The advance of NGS is its ability to sequence short fragments of DNA on a massive scale.[2] For NGS, DNA is first fragmented into stretches of 50–500 nucleotides, and billions of these DNA fragments are then analysed in a single reaction. The sequence of each DNA fragment is then mapped back to, and compared with, the ‘reference human genome’. To maximise the diagnostic accuracy, NGS testing aims to achieve ‘coverage’ of as much of the genome as possible and to sequence each part of the genome many times (so called ‘sequencing depth’). Depending on the setup, Sanger sequencing allows the reading of hundreds of thousands to several million bases per machine each day, whereas NGS devices can analyse up to billions of bases in the same period.

Although NGS was originally used to study whole genomes (whole genome sequencing (WGS)), it is currently inefficient to sequence a whole genome when only a small fraction of the genome is required for analysis. This has led to the development of approaches that limit sequencing to defined regions of the genome. The most commonly used approach is to include an extra step, prior to sequencing, to enrich the DNA sample for regions of interest. Enrichment for all of the exonic regions (the part of the genome that codes for proteins, and corresponding to approximately 1–2% of the genome) is referred to as ‘exome capture’, and the resulting sequence is an ‘exome sequence’ (ES). A large proportion of human genetic disease is caused by mutations within exons, and hence, exome sequencing can be a powerful diagnostic test. Enrichment for only a subset of the exome (e.g. genes responsible for a particular disease) is referred to as a ‘targeted panel’.[3] Although targeted enrichment increases time and costs associated with sample preparation, it has the advantage of reducing the amount of sequencing required.

Once DNA sequence is generated, the data must be analysed, a process that requires high-level computational and data management skills coupled with clinical knowledge. After comparing the patient sample with the reference DNA sequence, the variants (alterations in the normal gene sequence that may or may not be associated with disease) are sorted or ‘filtered’ to determine which are of possible clinical relevance. Typically ∼20 000 variants are detected per ES and ∼3 million per WGS.[4, 5] A number of software tools are available to assist with variant filtering; nonetheless, data analysis is still the most time-consuming and expensive component of NGS. Following analysis, typically a number of candidate variants remain, and consultation with the clinical team may assist in deciding whether any of these is a plausible explanation for the patient's presentation.

Translating NGS to the Clinic

To date, the main use of NGS has been in research to identify genes for rare disorders. Since the introduction of NGS into gene discovery research in 2009, the discovery of new genes has become a commonplace. In 2013, new genes were identified at a rate of approximately three per week, and it is estimated that by 2020, genes for most of the ∼7000 single gene disorders will have been identified.[6]

In the last 2 years, NGS-based testing has started to move into the clinic. Although WGS can be used clinically, most NGS testing in diagnostic use is either ES or targeted panel-based testing. Targeted panels, which typically test for between 10 and 150 genes, are now commercially available for a range of clinical presentations including hereditary cancers, cardiac disorders, eye disorders, deafness, neuromuscular disorders, dementia and epilepsy.[3] Targeted panels are attractive because they offer high-quality coverage of genes of interest while minimising the risk of unwanted or incidental findings (see below); however, a disadvantage is that the panels need revision when new genes are identified, and multiple different panels are required to cover the spectrum of human genetic disease. In contrast, ES has the advantage of providing a single test that can then be analysed according to the clinical indication, but the disadvantage of providing inferior coverage of some important genes, along with an increased risk of unwanted findings.

For clinical use, WGS/ES is incorporated into a multi-step diagnostic pathway (Fig. 1). Prior to testing, the critical steps are clinical evaluation, selection of the most appropriate test, genetic counselling and consent, which includes discussion about the types of results that might be obtained (including results of uncertain significance), the reporting of incidental findings, the storage and use of genetic data, the limitations of WGS/ES, and the possibility of re-analysis of data at a later date.[7] Following testing, results must be interpreted back to the patient, and decisions must be made about future data storage and further analysis. Initially, genetic specialists (clinical geneticists and genetic counsellors) will lead this process, but eventually, many of these skills will need to be acquired by paediatricians, with the genetic specialists available for support when required.

Figure 1.

A typical pipeline for clinical use of next-generation DNA sequencing.

Clinical Utility of Genome and Exome Sequencing

While the utility of WGS/ES for gene discovery was quickly evident, until recently, the clinical utility of WGS/ES was unknown. In paediatrics, the most obvious use of WGS/ES is for the diagnosis of children with intellectual disability. Despite the inroads made over the previous 5 years by chromosome microarray testing (which has improved the diagnostic yield in children with neurodevelopmental disorders from around 3% for microscope karyotyping to ∼15–20%),[8] there exists a vast backlog of children awaiting definitive diagnosis. Several studies have now been published on the diagnostic yield of WGS/ES in children with neurodevelopmental disorders. These indicate that a definite diagnosis is achievable in ∼25% of patients and a possible diagnosis in an additional ∼15%.[4, 9, 10] Importantly, the majority of mutations detected in studies of patients with intellectual disability have been either de novo (new, rather than inherited) autosomal dominant mutations or mutations on the X chromosome. In fact, one of the important findings to emerge from the NGS revolution is just how prevalent de novo mutations are, both in healthy individuals and as a cause of disease. It has been demonstrated that healthy individuals have ∼50–100 de novo sequence changes.[11] It is fortunate that on average, only approximately one of these de novo sequence change occurs within the protein coding part of the genome (the exome),[10] and of these, the vast majority are without phenotypic effect. Detection of de novo variants is greatly facilitated by sequencing parent–child ‘trios’, which simplifies the analysis despite increased sequencing cost. In patients with a diagnosis of autism, a similar diagnostic yield has been obtained, with WGS detecting deleterious de novo mutations in 19% and inherited autosomal or X-linked alterations in an additional 31%.[12]

When there is parental consanguinity, a different picture is emerging. In this setting, an autosomal recessive aetiology is far more likely, and analysis is facilitated by targeting homozygous regions (where both chromosomes have identical DNA sequence) across the genome. Exome sequencing studies of children with neurodevelopmental disorders whose parents are consanguineous have demonstrated a diagnostic yield of ∼15%, with potentially causative mutations detected in an additional ∼30% of patients.[13, 14]

Beyond neurodevelopmental medicine, WGS/ES has the potential to provide a diagnosis for a range of rare paediatric presentations where a genetic aetiology is suspected. In a demonstration of the power of ES, a 15-month-old child with severe inflammatory bowel disease, for whom all known diagnoses had been exhausted, was diagnosed by ES to have a mutation in the X-linked inhibitor of apoptosis gene (XIAP), previously associated with X-linked lymphoproliferative disease.[15] The genetic diagnosis was critical for patient management because it indicated the need for haematological stem cell transplant, which was completed successfully. The same laboratory published the results of the first 18 paediatric patients who underwent WGS at their clinic, demonstrating a definite diagnosis in five (28%) and a suspected diagnosis in four (22%) of the patients.[16] The utility of WGS has also been demonstrated in the neonatal intensive care unit, where WGS results have been obtained for critically ill neonates within 50 h.[17] Of four neonates studied prospectively, a definite diagnosis was made in two, a novel gene for heterotaxy was identified in a third, and in the fourth neonate, WGS data helped rule out some differential diagnoses.

Taken together, these results suggest that WGS/ES has the potential to provide a diagnosis in 25–50% of children with suspected genetic aetiology across a range of paediatric presentations. Although the benefits of improved diagnostic yield vary according to the clinical scenario, tangible benefits include providing families with an explanation for their child's disorder, availability of prenatal diagnosis (or restoration of reproductive confidence in the case of de novo mutations), better targeted management plans, avoidance of further unnecessary and costly diagnostic tests, and facilitating contact with other families with the same disorder.

There are now a number of laboratories providing WGS/ES as a clinical service, with costs ranging between $US4500 and $US10 000 depending on the extent of sequencing and data analysis provided.[7] One diagnostic laboratory recently published the results of their first 250 patients tested, 80% of which were children with neurodevelopmental disorders.[18] A diagnosis was made in 25% of patients, and once again, most diagnoses were the result of de novo autosomal dominant mutations.

A more controversial role of WGS/ES is in patients who are healthy but where genomic data may predict future health problems and facilitate their prevention. In this setting, comprehensive genome analysis has the ability to simultaneously test for the presence of high-penetrance gene mutations (such as genes causing a high risk of breast cancer or Alzheimer disease), detect carrier status for autosomal recessive disorders (such as cystic fibrosis), provide a profile of the likely response of the individual to a range of drugs and provide lifetime risk estimates for a range of common disorders such as cancers, cardiovascular disease, diabetes, obesity, stroke and bipolar disorder.[5, 19, 20] To date, such testing has mainly been limited to research and to the worried (and wealthy) well; however, the use of WGS/ES as a preventative tool is expected to expand, potentially even to newborn screening.[21]

Challenges of Genome and Exome Sequencing

Although WGS/ES will deliver unquestionable health benefits for some individuals, its broader use also presents challenges. Central to these challenges is the very large number of variants that inevitably will be detected in every individual tested and the need to provide comprehensive clinical interpretation for these variants. The challenges of implementing clinical WGS are illustrated by a recent study in which participants underwent WGS.[5] Each participant had ∼3 million variants detected, ∼100 of which required a detailed analysis that took nearly 1 h per variant. Ultimately between two and six clinically relevant findings were reported back to each participant. This highlights the fact that the cost of providing clinical WGS/WES is likely to remain high, even as sequencing costs fall, and that the lack of funding mechanisms for WGS/WES will remain a barrier to implementation.

A related challenge is how to consent patients adequately for such a broad range of test outcomes and how to deal with incidental or ‘secondary’ findings when they arise. Secondary findings are findings that are unrelated to the reason for doing the test but which have implications for the health of individual being tested. What distinguishes secondary findings in genomic testing from incidental findings in other areas of medicine is that with genomic tests, secondary findings are very likely or even inevitable. Although secondary findings may be welcomed by many patients, others may be unwanted, particularly if they predict the onset of an incurable disease such as dementia. For this reason, it may be desirable to deliberately avoid analysis of certain genes, particularly when testing children. On the other hand, it has been argued that genetic predispositions to preventable diseases, such as certain cardiovascular diseases and inherited cancers, should be deliberately sought by the testing laboratories and reported back to patient without reference to whether or not the patient wishes to receive this information.[22]

Other challenges, particularly when using WGS/ES in an untargeted manner, are the risk of false positive results and the inevitable detection of numerous variants of uncertain clinical significance.[23] It is also important to recognise that there are mutations that will be missed by WGS/ES, either because of inadequate coverage of the gene of interest or because some mutations types (e.g. the trinucleotide expansion of Fragile X) are not readily detectable by NGS technology.

The Future of Genome Sequencing

Inevitably, NGS diagnosis will transition into prenatal diagnosis, where the stakes are considerably higher, due to the complexities of interpreting NGS-detected variants, the compressed timeframe in which a result is required and potential decisions about termination of pregnancy. DNA sample for prenatal analysis can be obtained from traditional methods, chorionic villus sampling and amniocentesis, but fetal DNA can also be obtained from a maternal blood sample. NGS is already being used for ‘non-invasive’ prenatal testing to detect common chromosome abnormalities such as Down syndrome, and it is expected that, eventually, the whole fetal genome could be analysed in this way.[24]

In paediatrics, new technologies will continue to drive advances in genomic medicine and are expected to deliver tangible improvements in paediatric health care, initially in the form of improved diagnostics and later through improved prevention and treatment and personalised medicine.[1] While the benefits of WGS/ES for the diagnosis of rare genetic disorders have already been demonstrated, challenges remain in relation to funding, consent, counselling and how to deal with incidental findings. Clinical pathways for the use of these tests will continue to evolve, and paediatricians will need to understand the science, ethical issues and diagnostic scope of genomic testing; providing health professionals with adequate training in genomics will be a major challenge for the medical profession. Across a broader range of paediatric presentations, the benefit of genomic testing remains to be determined and will depend on the extent to which we can unravel the complex genomic interactions that contribute to common paediatric presentations.

Ancillary