Clinical utility in infants with suspected monogenic conditions through next‐generation sequencing

Abstract Background Rare diseases are complex disorders with huge variability in clinical manifestations. Decreasing cost of next‐generation sequencing (NGS) tests in recent years made it affordable. We witnessed the diagnostic yield and clinical use of different NGS strategies on a myriad of monogenic disorders in a pediatric setting. Methods Next‐generation sequencing tests are performed for 98 unrelated Chinese patients within their first year of life, who were admitted to Xin Hua Hospital, affiliated with Shanghai Jiao Tong University School of Medicine, during a 2‐year period. Results Clinical indications for NGS tests included a range of medical concerns. The mean age was 4.4 ± 4.2 months of age for infants undergoing targeting specific (known) disease‐causing genes (TRS) analysis, and 4.4 ± 4.3 months of age for whole‐exome sequencing (WES) (p > 0.05). A molecular diagnosis is done in 72 infants (73.47%), which finds a relatively high yield with phenotypes of metabolism/homeostasis abnormality (HP: 0001939) (odds ratio, 1.83; 95% CI, 0.56–6.04; p = 0.32) and a significantly low yield with atypical symptoms (without a definite HPO term) (odds ratio, 0.08; 95% CI, 0.01–0.73; p = 0.03). TRS analysis provides molecular yields higher than WES (p = 0.01). Ninety‐eight different mutations are discovered in 72 patients. Twenty‐seven of them have not been reported previously. Nearly half (43.06%, 31/72) of the patients are found to carry 11 common disorders, mostly being inborn errors of metabolism (IEM) and neurogenetic disorders and all of them are observed through TRS analysis. Eight positive cases are identified through WES, and all of them are sporadic, of highly variable phenotypes and severity. There are 26 patients with negative findings in this study. Conclusion This study provides evidence that NGS can yield high success rates in a tertiary pediatric setting, but suggests that the scope of known Mendelian conditions may be considerably broader than currently recognized.


| INTRODUCTION
Rare disease is a health condition that affects a small number of people compared with other prevalent diseases in the general population (Baldovino, Moliner, Taruscio, Daina, & Roccatello, 2016). To date, between 5,000 and 8,000 distinct rare diseases have been documented with new ones reported regularly in the medical literature (Taruscio, Floridia, Salvatore, Groft, & Gahl, 2017). Although they are characterized by their rarity, the total number of patients affected is large [e.g., 25-50 millions in the United States (Fernandez-Marmiesse, Gouveia, & Couce, 2018), 27-36 millions in the EU (Moliner & Waligora, 2017), and 16.8 millions in China (Yang, Su, Lee, & Bai, 2015)]. Rare diseases are typically severe, mostly genetic in origin, and the majority of cases are reported in patients with very early onset (Luzzatto et al., 2015). Therefore, efforts have been made continuously to identify the causative mutations for these infantileonset rare Mendelian diseases (Bacchelli & Williams, 2016), which is of great importance for patient management (Silibello et al., 2016) and family counseling (Babac, 2017).
Although traditional gene mapping approaches, such as Sanger sequencing (Botstein & Risch, 2003), linkage analysis (Teare & Santibanez Koref, 2014), and homozygosity mapping (Lander & Botstein, 1987) have led to great insights into Mendelian diseases over the past few decades; they are unable to detect all forms of variation in a single experiment. The rapid development of next generation sequencing (NGS) constituted a turning point for the advancement of our understanding of this type of diseases, which requires a broad search for causal variants across their genetically heterogeneous spectrum within a short time (Shen, Lee, Shen, & Lin, 2015), especially for life-threatening or chronically debilitating cases. Today, different NGS techniques can be used for diagnostic purposes. Targeting specific (known) disease-causing genes (TRS), which is applied to assist with molecular diagnosis of well-defined disorders caused by a group of genes (Deleye, Gansemans, De Coninck, Van Nieuwerburgh, & Deforce, 2018) and sequencing the exons of every protein-coding gene (whole-exome sequencing: WES) for patients without an identified molecular cause are the two commonly used tools (Al-Shamsi, Hertecant, Souid, & Al-Jasmi, 2016).
In the present work, we study 98 patients with the clinical suspicion of a rare Mendelian disease with infantile onset. The patients were referred for NGS testing to establish a definitive genetic diagnosis. We demonstrate the clinical utility of NGS techniques in a pediatric setting by systematically describing our patient cohort.

| Editorial policies and ethical considerations
We have submitted our research proposal to the Ethics Committee of Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine. Our study protocol as well as the application form was fully reviewed and the organization has certified that this study would not incur any patient risk issues and is in accordance with the Declaration of Helsinki.

| Clinical samples
Our study included 98 unrelated Chinese pediatric patients within the age range of 1 year or younger at the time of testing from Xin Hua Hospital affiliated with Shanghai Jiao Tong University School of Medicine between January 2016 and December 2017. They were referred by medical specialists for either WES or TRS, and have had the analysis and results disclosure completed. The patients in this cohort have diverse clinical features which are summarized in Tables 1-3. Informal written consent was obtained from the patients' parents or legal guardians participating in the study prior to collecting 3 ml of the said patients' peripheral blood.

| The targeting specific disease-causing genes (TRS) analysis and Sanger confirmation
A total of 12 different specific disease panels based on Targeted Exome Sequencing (TES) (designed by MyGenostics, Beijing, China) were implemented on our cohort according to their clinical features to collect the protein-coding regions of the targeted genes. A gene capture strategy with GenCap custom exome enrichment kits (MyGenostics, Beijing, China) was used in our study. The extracted DNA samples were quantified by Nanodrop 2000 (Thermo Fisher Scientific, Wilmington, DE). A minimum of 3 mg of DNA from the patient was used to generate index libraries (average size of 350-450 bp, including adapter sequences) for Solexa HiSeq2000 sequencing (Illumina, San Diego, CA). Sequencing was carried out using 90 cycles per read. The obtained mean exome coverage was more than 98%, with variants accuracy at more than 99%. Clinically relevant variants, from proband and parental samples (whenever available), were confirmed by Sanger sequencing.

| Whole-exome sequencing and Sanger confirmation
Whole-exome sequencing and its analysis protocols were developed and validated by MyGenostics, Beijing, China. Genomic DNA from patients was fragmented by sonication. The fragments were ligated to illumina multiplexing pairedend adapters, amplified by polymerase chain-reaction assay, and hybridized to biotin-labeled P039-Exome (at 65°C for 16 hr). Paired-end sequencing was performed on Illumina NextSeq 500 platform, with an average sequencing depth of more than 100. Meanwhile, coverage of the targeted base for the N20 read was 95%. Following sequencing, raw image files were processed using Bcl2Fastq software (Bcl2Fastq 2.18.0.12, Illumina, Inc.) for base calling and raw data generation. Low-quality variations were filtered out using a quality score ≥20. Short Oligonucleotide Analysis Package (SOAP) aligner software (SOAP2.21; soap.genomics.org.cn/soapsnp. html) was then used to align and refresh reads to the reference human genome (hg19). Variants were prioritized on the basis of the phenotype-driven gene lists for each participant and predicted effect. Clinically relevant variants, from proband and parental samples (whenever available), were confirmed by Sanger sequencing.

| Molecular diagnoses
In this study, sequence changes including rearrangements, stop codon-introducing (nonsense), insertion/deletion (indel) variants, and splice site variants were regarded as null alleles (Lander & Botstein, 1987), abolishing production of the corresponding protein from the affected allele. Pathogenicity prediction (Nakken, Alseth, & Rognes, 2007) bii.a-star.edu.sg] and PolyPhen-2 [genetics.bwh.harvard. edu]) were used to evaluate putative pathogenicity of novel nonsynonymous coding variants (unreported previously). All our findings are classified under three categories. We describe causative mutations in the context of their consistent correspondence to the patients' phenotypes, biochemical findings, familial (segregation) studies, or previously reported pathogenicity, and group these patients accordingly into category I by following the American College of Medical Genetics and Genomics (ACMG) variant classification guidelines (Lander & Botstein, 1987). We indicate those variants which were consistent with patients' phenotypes and had been predicted to be deleterious though unreported previously, patients with such features were grouped under category II. Patients with variants belonging to category I and II were identified as either positive or confirmed cases. Category III include the patients with variants which were inconsistent with patients' phenotypes or biochemical/ familial (segregation) study results, as well as those with no identified pathogenic variants and those with previously unreported variants that were predicted as either consistently nondamaging or inconsistent between two prediction tools.
We used a human phenotype ontology (HPO) term (Shen et al., 2015) to classify the primary disease of the patient that can be annotated by his clinical notes, which is essential for variant interpretation in our cohort characteristic of clinically and genetically heterogeneous disorders.

| Statistical analysis
A chi-squared test was applied to compare the different diagnostic yields in the two groups of patients. The statistical calculations were performed using SPSS 22.0 version.

| RESULTS
This work is a retrospective evaluation of an advanced clinical diagnostic tool utility in a tertiary pediatric center. In this work, we investigated the diagnostic yield of NGS in a cohort of 98 Chinese patients with suspected rare Mendelian disease of infantile onset. Their clinical and biochemical profiles were undertaken prior to the referral for NGS analysis.
The NGS method consisted of TRS analysis (n = 81/98, 82.65%) and WES (n = 17/98, 17.35%) depending on a range of clinical concerns. There was no significant difference in the age of the patients at the time of testing between the two categories (p = 0.9678). The median turnaround time of TRS analysis was 30.0 days and that of WES was 50.0 days. Consequently, the median (SEM) age of diagnosis in infants who were undergoing TRS analysis (mean ± SD: 4.4 ± 4.2 months of age) was not significantly younger or older than those who were undergoing WES (mean ± SD: 4.4 ± 4.3 months of age).
The NGS results of 98 patients were divided into the following groups depending on our method criteria. Group A included 15 patients in line with Category II, shown in Table 1. Group B included 57 patients in line with Category I, shown in Table 2, while Group C included 26 patients in line with Category III, shown in Table 3. Therefore, a definitive genetic diagnosis was achieved for 72 patients (73.47%, 72/98) in the study. The TRS analysis provided higher molecular yields for 64 of 81 pediatric patients (79.01%) than WES for 8 of 17 ones (47.06%) (OR: 0.24; 95% CI (0.08-0.70); p: 0.01, Fisher's exact test). All reported pathogenic and deleterious point mutations in Tables 1 and 2 Variants in bold were unreported previously.
If SIFTori is smaller than 0.05 (rank score >0.395) the corresponding nsSNV is predicted as "Damaging"; otherwise it is predicted as "Tolerated". Multiple predictions separated by ";" Polyphen2 prediction based on HumDiv, "D" ("probably damaging", HDIV score in [0 Patients had clinical features of more than two of the broad aforementioned HPO term or atypical symptoms so that they were not given the exact HPO terms for their primary phenotypes.
Variants in bold were unreported previously.
For variants with autosomal recessive inheritance, homozygous variants are in Italics.

| Cohort description
All patients were under 1 year of age at the time of NGS analysis (average age was 4.38 months), with 41 females (41.84%, 41/98) and 57 males (58.16%, 57/98). Eighteen of them were <1 month of age (18.37%, 18/98), while 38 were between 1-and 3-month-old infants (38.78%, 38/98). It was shown that more than half of our patients developed various symptoms within 3 months of age. Of this cohort, 23.47%, 22.45%, 8.16%, 8.16%, and 7.14% were patients with primary phenotypes defined by HPO term related to abnormality of the nervous system (HP:0000707), abnormality of the metabolism/homeostasis (HP:0001939), abnormality of the immune system (HP:0002715), abnormality of the eye (HP:0000478), and abnormality of the integument (HP:0001574), respectively (Figure 1a, primary indication). 5.10% (5/98) had clinical features of more than two of the broad aforementioned HPO term or atypical symptoms so that they were not given the exact HPO terms for their primary phenotypes. For most patients, both parents' DNA was tested (Figure 1b, family members tested).

| Effect of clinical presentation on molecular diagnosis
Approximately 24 of the 72 diagnosed individuals (33.33%, 24/72) have atypical or unrecognized infantile presentation of genetic disorders. Some examples include that of a 3-month-old infant with seizures that were caused by a pathogenic ABCD1 (MIM 300371) variant, and a short-limbed neonate hospitalized of persistent hyper-lactic acidemia due to a defect in COL2A1 (MIM 120140). Some other examples of atypical presentation in infants of known Mendelian disorders include minicore myopathy with external ophthalmoplegia, which is instantiated by an 8-month-old girl harboring RYR1 (MIM 180901) mutations, who shows poor intermittent feeding, diffuse muscle weakness, and a CHD7 (MIM 608892) mutation presenting only a facial asymmetry without heart defect, extremity abnormalities, and genital hypoplasia, such as identified in a 20-day neonate.
To assess whether specific clinical presentations were more likely to be associated with a molecular diagnosis, the diagnostic rate was compared among patients who were annotated with different phenotypes as represented by HPO term. Analyses were performed at the top-level branching of HPO phenotypes to ensure adequate counts of participants (Table 5). Individuals with phenotypes of HPO category "abnormality of metabolism/homeostasis" (HP: 0001939) were found to yield higher diagnostic rate, though insignificantly (odds ratio, 1.83; 95% CI, 0.56-6.04; p = 0.32). Otherwise, individuals without a definite HPO term were found to be significantly underrepresented in cases with atypical symptoms (odds ratio, 0.08; 95% CI, 0.01-0.73; p = 0.03).

| Negative cases
Of 26 infants who did not receive a diagnosis in this study (Table 3): only one variant was observed in four infants (15.38%, 4/26) with a suspected compound heterozygous model; one infant received a partial diagnosis by a special panel, the variant (c.817C>T (p.Q273X) in ATP13A4 (MIM 609556) gene that is predicted as a null allele explains several of the clinical features (seizures and epilepsy) of the patient; two infants (7.69%, 2/26) received a dual or triple molecular diagnoses respectively; among five infants (19.23%, 5/26), their previously unreported findings were predicted as either consistently nondamaging or inconsistent between two tools; for the other 14 individuals (53.85%, 14/26), no pathogenic variants related to patient phenotypes were identified in the analyzed genes.

| DISCUSSION
While applying NGS to the diagnoses of 98 unrelated patients in their first year of life at a single tertiary institution, we observed an overall molecular diagnostic yield of 73.47%, which is higher than the positive rates of published clinical NGS reports (Okazaki et al., 2016;Smith, Willig, & Kingsmore, 2015;Stark et al., 2016). This difference is likely due to the number of participants, the nature of their clinical problems, and the selection bias of diagnostic tools between our study and others (Al-Shamsi et al., 2016;Okazaki et al., 2016). Moreover, significantly higher detection rates with TRS analysis have been shown in this study (OR: 0.24; 95% CI (0.08-0.70); p: 0.01), as well as in previous studies (Coene et al., 2018;Ponzi et al., 2018). All the 31 diagnosed infants with the 11 most common disorders in our cohort were observed through TRS analysis. Our high diagnostic yield demonstrates that the importance of distinct NGS strategies may be made available to address genetic diagnosis of a myriad of monogenic disorders and the effect of disease spectrum itself on the outcomes. In our study, there were 22 patients with primary indication of infantile-onset inborn errors of metabolism (IEM) (Rice & Steiner, 2016). For 18 of them, the reported pathogenic variants derived from the specific IEM panel were fully consistent with their clinical/biochemical (if available) features. For one patient with features of metabolic acidosis, recurrent hypoglycemia, poor-feeding, and vomiting, the initial panel test did not identify any mutations, while a positive diagnosis by WES was received as a Combined oxidative phosphorylation deficiency-23(COXPD23, OMIM 616198) (Kopajtich et al., 2014), one of the common causes of inborn errors in energy metabolism. Among these 22 individuals, 20 chose IEM panel and 2 WES. The results of this group indicated that abnormality of the metabolism/homeostasis underlined a substantial proportion of pediatric disease burden; a number of IEM have nonspecific biomarkers so that their diagnosis can be challenging depending on the traditional approaches, and a TRS analysis covering appropriate panel of genes has significant clinical utility for this group. Our results also illustrated that some variants not captured by one pipeline were indeed detected by the other (Jacob et al., 2018;Mori et al., 2017).
In our study, we applied WES rather than TRS to 17 patients mainly because the patients had nonspecific features and/or because a feasible TRS analysis was unavailable. The diagnosis was confirmed in eight of the patients. The definite diagnoses were Minicore myopathy with external ophthalmoplegia (OMIM 255320), the Strudwick type of spondyloepimetaphyseal dysplasia (OMIM 184250), CHARGE syndrome (OMIM 214800), Acrokeratosis verruciformis (OMIM 101900), Obesity with impaired prohormone processing (OMIM 600955), Combined oxidative phosphorylation deficiency-23 (OMIM 616198), Niemann-Pick disease type C1 (OMIM 257220), and Pseudovaginal perineoscrotal hypospadias (OMIM 264600). The success in these cases showed that there was not prior knowledge of the genetic condition in the patients since all cases were sporadic, of highly variable phenotypes and of variable severity. Eleven patients developed their clinical manifestations during neonatal period or early infancy (before 3 months of age), and 10 of them were critically ill babies in our NICU who required rapid comprehensive genetic reporting for both prognostication and clinical decision making. Our results supported the conclusion (Meng et al., 2017) derived from the study by Linyan Meng et al that the atypical and unrecognized presentation of genetic disorders that were observed in some young infants further challenged the traditional paradigm of tiered genetic testing in critical care units because the earlier the onset, the faster the progression and consequently the shorter the life span (Fitzgerald et al., 2015;Retterer et al., 2016). Since this work did not provide a cost-effective analysis of various NGS tests, as compared with conventional tools, in our patients, it is unknown whether NGS would increase or decrease the cost potentially. Also, since this work did not provide management details and follow-up investigations of those patients, it is yet unknown how much NGS testing could affect a personalized treatment for each patient. We hope to find these answers in research yet to set up.
Negative results for 26 cases in our study could be explained by various reasons. We applied WES to nine patients and various panels to the other 17 depending on our understanding of the function of various genes, and the primary indication of each patient. Fourteen individuals (53.85%, 14/26) were not identified with any pathogenic variants related to their clinical phenotypes. The main reasons might be that the causative gene was not included in the panel design and that the genes encoding proteins involved in the alteration of a specific biochemical marker/clinical phenotype are currently unknown or unrelated to human diseases. Nine patients had primary indication of abnormality of the nervous system, their highly heterogeneous phenotypes and puzzling paraclinical investigations might confuse the clinical orientation, leading to their negative results. For five infants in this group, their variants were previously unreported and predicted as either consistently nondamaging or inconsistent between two in-silico tools, indicating them as negative cases, which signal probable determination bias. It is therefore essential for clinicians to understand the strengths and limitations of every molecular test in order to choose the appropriate one for each patient (Meng et al., 2017). Also, functional studies should be performed to assess the impact of those VUS on the corresponding genes (Bao et al., 2014).
Unusual combination of signs, symptoms, and biochemical phenotypes sometimes can confuse even expert clinicians and geneticists. Therefore, a HPO term was used to classify the primary disorder of our cohort. Clinical assessments of the effect of HPO phenotype analysis on our diagnostic yields indicated a significantly low success rate for patients with atypical clinical features (no exact HPO terms); this is the same as the conclusion derived from another study: compound phenotype was noted to yield a lower diagnosis rate compared with an isolated phenotype. On the other hand, HPO analysis determined a higher diagnostic rate, though insignificantly, for the "abnormality of the metabolism/homeostasis" phenotype, which mainly might be due to the sample size of our study. But in another study, a higher diagnostic rate was associated with the "abnormality of the musculature" phenotype (Meng et al., 2017). Even though diagnostic yield was low for patients with nonspecific or overlapping clinical phenotypes, the confirmed case of Prader-Willi syndrome is a good example of the application of NGS technology, because using traditional methods proved to have limited results with huge cost and lengthy duration for this disease (Butler, 2017).

| CONCLUSION
In our study, NGS tools identified pathogenic mutations in 73.47% of our cases, demonstrating that they are informative in a tertiary clinical setting for Mendelian disorders. Moreover, it is proven by our study that NGS is effective in identifying new variants in known diseases as well as widening the spectrum of phenotypes resulting from deleterious variations in known genes. Therefore, it will not be long to see NGS tool as a routine diagnostic test for many genetic conditions.