Deep clinicopathological phenotyping identifies a previously unrecognized pathogenic EMD splice variant

Abstract Exome sequencing (ES) has revolutionized rare disease management, yet only ~25%–30% of patients receive a molecular diagnosis. A limiting factor is the quality of available phenotypic data. Here, we describe how deep clinicopathological phenotyping yielded a molecular diagnosis for a 19‐year‐old proband with muscular dystrophy and negative clinical ES. Deep phenotypic analysis identified two critical data points: (1) the absence of emerin protein in muscle biopsy and (2) clinical features consistent with Emery‐Dreifuss muscular dystrophy. Sequencing data analysis uncovered an ultra‐rare, intronic variant in EMD, the gene encoding emerin. The variant, NM_000117.3: c.188‐6A > G, is predicted to impact splicing by in silico tools. This case thus illustrates how better integration of clinicopathologic data into ES analysis can enhance diagnostic yield with implications for clinical practice.


Introduction
Molecular diagnostic rates have significantly improved since the widespread implementation of exome sequencing (ES). 1,2 At present, the molecular diagnostic rate of ES obtained in clinical diagnostic laboratories is approximately 25%-30%, although higher or lower rates may be seen in certain disease states and patient populations. [3][4][5][6][7][8] Advances in ES data analyses including copy number variation (CNV) assessment, 9,10 better detection of insertion/deletion (indel) variant alleles, 11 and homozygosity mapping using absence of heterozygosity (AOH) data as a surrogate measure of identity-by-descent (IBD) 10 have improved diagnostic rates. Nevertheless, the notion of a "diagnostic ceiling" has been proposed because of similar diagnostic rates observed in multiple disease cohorts 5,12 as well as known limitations of ES technology (e.g., poor coverage of noncoding regions, limited detection of structural variants and repeat expansions). 13 While both short read whole-genome sequencing (SR-WGS) and long-read sequencing (LR-WGS) technologies will likely improve molecular diagnostic rates by increasing coverage of noncoding regions, mounting evidence suggests the "molecular diagnostic gap" can be further narrowed by better integration of detailed phenotypic data into ES data analysis. [14][15][16] Here, we provide an illustrative example of how a detailed analysis of extant clinicopathologic data led to a molecular diagnosis in a patient with muscular dystrophy and negative clinical ES (cES) and reflect on its heuristic implications for clinical practice.

Participants
All participants in this study provided informed consent as part of the Baylor-Hopkins Center for Mendelian Genomics (BHCMG) initiative, including consent to publish photographs. This study was approved through Baylor College of Medicine Institutional Review Board (IRB) protocol H-29697.
Histology, immunofluorescence and western blot analysis of muscle samples A vastus lateralis muscle biopsy was obtained during the proband's clinical care. Hematoxylin and eosin (H&E) staining, immunostaining, and western blot analysis were performed by the Texas Children's Hospital Neuropathology and Molecular Neuropathology Laboratory (Houston, TX) by board-certified neuropathologists (CAM and AMA). For immunofluorescence, cryosections of skeletal muscle were stained using the nuclear stain 4 0 ,6-diamidino-2-phenylindole (DAPI) and antibodies against emerin or lamin A/C. Western blots were stained with antibodies for emerin and alpha-sarcoglycan. Antibodies were obtained from Leica (emerin, cat. no. Emerin CE; alpha-sarcoglycan, cat. no. A-SARC-L-CE) or Abcam (lamin A/C, cat. no. AB5090).

Exome sequencing
Research trio ES of genomic DNA obtained from peripheral blood was performed in the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC). 1,17 Rare variant family-based exome analysis was performed as previously described. 1,17 Identified variants after computational parsing and filtering were experimentally confirmed and segregated via orthogonal Sanger dideoxy sequencing.

Results
The proband is a 19-year-old male with muscular dystrophy. Since early childhood he had frequent falls, easy fatigability, joint stiffness, and motor difficulties. Weakness and stiffness gradually progressed with age. On last physical examination at 18 years of age he had bilateral ankle contractures, elbow contractures (right greater than left), and weakness of bilateral ankle dorsiflexion, biceps, and hand interossei ( Fig. 1A and B). There was no family history of neuromuscular disease. His creatine kinase (CK) levels trended upward with time (588 U/L at 17 years old, normal <245 U/L), suggesting a dystrophic condition. Needle electromyography showed small amplitude, short duration, polyphasic motor unit action potentials consistent with a myopathic condition. Muscle biopsy at age 11 years demonstrated unremarkable hematoxylin and eosin staining but absent emerin staining by immunofluorescence and western blot ( Fig. 1C-F). Trio cES was subsequently performed (Baylor Genetics, BG, Laboratories, Houston, TX) and failed to identify any variants in EMD or other myopathy genes.
The proband and his family were subsequently enrolled in a "molecularly undiagnosed" neuromuscular disease cohort in the BHCMG and underwent research ES. Analysis began with an extensive review of the proband's medical history, laboratory findings, electrodiagnostic studies, and muscle biopsy pathology. Due to negative emerin staining of muscle biopsy as well as phenotypic features consistent with Emery-Dreifuss muscular dystrophy, the proband's BAM file, a data file containing aligned sequencing data in a format which facilitates visualization, was inspected, leading to the identification of a hemizygous variant in EMD intron 2 ( Fig   multiple in silico algorithms (SpliceAI acceptor gain score 0.99; MMSp acceptor score À2.996; Human Splice Finder, alteration of the wild-type acceptor site) [18][19][20][21] (Fig. 2B). The variant is absent from gnomAD v2.1.1. 22 Segregation analysis within the family demonstrated that the variant was maternally inherited and absent from the proband's unaffected brothers (Fig. 2C).

Discussion
While diagnostic rates have improved since the advent of ES, many patients with presumed Mendelian disorders still lack a definitive molecular diagnosis. This diagnostic gap is often attributed to: (1) yet unidentified "disease-contributing genes" and variant alleles and (2) the limited ability of ES to detect non-coding and structural variants. 13 By providing a complete sequence of all genic and intergenic regions, whole-genome sequencing (WGS) has been regarded as a potential "panacea" and solution for the latter issue. However, WGS results to date have been underwhelming, with many additional diagnoses resulting from variants previously captured on ES and interim gene discoveries. 23 Future hopes for resolving the diagnostic gap include LR-WGS and RNAseq. [24][25][26][27] Although these approaches are promising, they will remain inaccessible in the clinic for the foreseeable future due to unavailability, high cost, and/or lack of appropriate tissue specimens.
An alternative hypothesis to explain the diagnostic gap is the under-utilization of extant ES data. For example, expansion from proband-only to trio ES improves diagnostic yield by permitting detection of de novo mutations and phasing (i.e., cis or trans configuration). 1,28 Copy number analysis of ES data, a practice not routinely performed by clinical diagnostic laboratories, may identify large deletions or duplications (>100 Kb) or even smaller homozygous exonic deletions. 9,10 The absence of heterozygosity (AOH) analysis, as a surrogate measure of runs-of-homozygosity (ROH), recognizes genomic intervals of identity-by-descent in families with or without a known history of consanguinity which prompts a thorough investigation of those regions for a potentially causative homozygous variant. 1,28,29 Greater integration of clinicians and deep clinical phenotyping also improves diagnostic rates by enhancing variant prioritization and drawing increased scrutiny of extant single gene or gene families' ES data. [14][15][16] Deep phenotyping, the process of comprehensively assessing and categorizing individual phenotypic features often through Human Phenotype Ontology (HPO) terms, is routinely performed by medical geneticists and neurologists, yet the requisition forms for clinical genetic and genomic testing often fail to capture the depth of phenotyping performed by clinicians. 14,30 Finally, large amounts of "off-target" sequencing data, for example intronic and 3 0 /5 0 untranslated regions, are generated by ES yet are often filtered by cES bioinformatic pipelines despite increasing evidence of their significance in Mendelian disorders and improved in silico tools for evaluating their pathogenicity. 19,21,31 Here we provide an illustrative example of how the incorporation of deep phenotyping into ES analysis improves molecular diagnostic yield. The proband carried a clinical diagnosis of muscular dystrophy with supporting laboratory and electrophysiologic data. The absence of emerin protein in his muscle biopsy strongly supported the clinical diagnosis of X-linked Emery-Dreifuss muscular dystrophy 1 (MIM #310300), and retrospective review of his clinical presentation identified compatible features including childhood-onset joint contractures and slowly progressive muscle weakness. However, trio cES failed to identify pathogenic variants in EMD or other myopathy genes. Considering his clinical history and biopsy results, EMD sequencing data were reanalyzed, identifying a pathogenic hemizygous variant in EMD, c.188-6A > G. The variant results in the substitution of a guanine for an adenine six nucleotides from the intron 2-exon 3 boundary (Fig. 2B) which is predicted to create a new splice acceptor site by multiple in silico prediction tools. While the precise impact of the variant on splicing was not determined, the convergence of in silico algorithms predicting alteration of the splice acceptor site and the in vivo readout provided by western blot and immunofluorescence strongly suggests the mutant transcript either has a premature termination codon (PTC) resulting in nonsense-mediated decay or encodes an unstable protein subject to rapid decay.
Identification of the variant had immediate clinical impact and management implications for the patient and his family. Gene therapy trials increasingly require a definitive genetic diagnosis for enrollment, and identification of a specific pathogenic intronic variant offers the opportunity for bespoke therapies like personalized antisense oligonucleotides (ASO). While personalized gene therapies may seem impractical, the recent story of milasen, an ASO customized and administered to a single patient with neuronal ceroid lipofuscinosis 7 (MIM # 610951), has demonstrated their feasibility and provide a pathway forward for rare disease. 32 Additionally, these studies identified the carrier status of the proband's mother, a finding of considerable significance as female carriers can develop cardiac conduction defects and are at risk of sudden death. 33 Therefore, cardiology follow-up and screening of the extended family was recommended.
The importance of intronic variants such as EMD c.188-6A > G which impact cis-acting elements in human disease is well-recognized. [34][35][36] The proportion of human pathogenic variants disrupting cis-acting elements has been estimated between 15% to 60%. 34,35 Prior to this report, only a single non-consensus splice variant, EMD: c.449 + 23_450À35del, was recognized. 37 Located within intron 5, the variant was detected on a neuromuscular gene panel and would have been well-covered in the BCM-HGSC ES platform ( Fig. 2A). Studies of EMD constructs with variably sized intron 5 deletions demonstrated the 23-nucleotide deletion does not impact the major branchpoint c.450-24A but rather causes splicing abnormalities due to excessive intronic shortening. 37 Such an intron size constraint mutational mechanism may disproportionately affect genes with small size introns and remains underappreciated despite the fact that it was described over a decade ago. 38 Additional pathogenic intronic EMD variants will undoubtedly be identified with increased implementation of WGS and closer scrutiny of extant ES data. Further identification and study of pathogenic intronic variants through ES/GS, mini-gene assays, and RNA-seq will clarify the mechanisms involved in splicing and in turn improve in silico predictive models.
In summary, this report illustrates how the integration of deep clinicopathological phenotypic data into ES analysis improves molecular diagnostic yield. Clinicians play a critical role in this process by providing accurate and detailed clinical data to clinical diagnostic laboratories and following up on all exome negative studies. Additionally, an active dialogue between clinicians and laboratories is essential to maximize diagnostic yield.