Human Mutation

Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia and The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania Division of Pediatric Gastroenterology, Hepatology, and Nutrition, Department of Pediatrics, Children’s Hospital of Philadelphia and The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania Division of Gastroenterology, Hepatology and Nutrition, Department of Pediatrics, Hospital for Sick Children and the University of Toronto, Toronto, Canada Division of Human Genetics, Roberts Individualized Medical Genetics Center, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania Department of Pediatrics, The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania

Pathogenic variants in JAG1 are most commonly proteintruncating, including frameshift, nonsense, exon level deletions, and splice site, though missense variants and whole gene deletions have also been reported (Crosnier et al., 1999;Warthen et al., 2006). The predominance of these protein-truncating variants along with the observation that both whole gene deletions and intragenic pathogenic variants cause similar phenotypes, supports a haploinsufficient disease mechanism (Oda et al., 1997;Saleh et al., 2016;Spinner et al., 2001). Early studies aimed to determine whether the location of pathogenic variants is able to predict the clinical manifestation of the disease do not support a genotypephenotype correlation (Crosnier et al., 1999;Spinner et al., 2001).
Conversely, a high degree of variable expressivity has been observed, and often significant phenotypic variability is reported in families harboring the same pathogenic variant (Dhorne- Pollet, Deleuze, Hadchouel, & Bonaiti-Pellie, 1994;Elmslie et al., 1995;Emerick et al., 1999;Izumi et al., 2016;Kamath, Bason, Piccoli, Krantz, & Spinner, 2003;Kamath, Krantz, Spinner, Heubi, & Piccoli, 2002;Krantz et al., 1998;Shulman, Hyams, Gunta, Greenstein, & Cassidy, 1984). These observations have led to the hypothesis that a second gene could act as a modifier, and studies have been carried out to test this theory. It has been proposed that defects in glycosylation of the mature JAG1 and NOTCH2 proteins will result in mutant proteins that are improperly trafficked and not effectively expressed at the cell membrane. Lunatic Fringe, Radical Fringe, Manic Fringe, and POGLUT1 are all known glycosyltransferases that have been studied in this capacity, and data is supportive of a role for these proteins in modifying the effects of pathogenic JAG1 variants (Ryan et al., 2008;Thakurdas et al., 2016). A second candidate genetic modifier, THROMBOSPONDIN2 (THBS2), was identified from a Genome Wide Association Study (GWAS) that stratified ALGS patients with pathogenic variants in JAG1 by whether they had mild or severe liver disease (Tsai et al., 2016). THBS2 encodes an extracellular matrix protein that is expressed in murine bile ducts and can interact with Notch signaling. Data from the GWAS study suggested that individuals with a pathogenic JAG1 variant and increased THBS2 expression could be at risk for developing more severe liver disease (Tsai et al., 2016).
The pathogenic mechanism of NOTCH2 variants has been far less clear than with JAG1. Fewer pathogenic NOTCH2 variants have been identified, and unlike with JAG1, these variants are predominantly missense (Kamath et al., 2012). It is possible that NOTCH2 is less tolerant than JAG1 to missense variants, resulting in functional haploinsufficiency, however other mechanisms of pathogenesis may be in effect. The higher frequency of missense variants in NOTCH2 may also indicate that NOTCH2 is intolerant of more severe, loss of function variants. As with pathogenic JAG1 variants, genotype-phenotype correlations have not been noted with NOTCH2 variants, though very few patients with NOTCH2 variants have been described to date. However, it has been reported on a preliminary basis that the clinical presentation of individuals with pathogenic NOTCH2 variants is different from those with pathogenic JAG1 variants, with a lower prevalence of cardiac involvement, vertebral anomalies, and facial features (Kamath et al., 2012).
In 1997, before the discovery that pathogenic variants in JAG1 cause ALGS, our lab initiated a clinical study to identify the causal gene for ALGS. Because that time, we had enrolled 401 probands who are clinically-consistent with ALGS, as well as numerous affected and unaffected relatives to test for inheritance. We and others have previously described 608 JAG1 variants and 16 NOTCH2 variants that are thought to cause disease (Fokkema et al., 2011;Landrum et al., 2018;Stenson et al., 2017). Here, we report an additional 86 novel JAG1 and three novel NOTCH2 pathogenic variants, and provide functional validation for nine previously uncharacterized JAG1 missense variants. Through this mutation update, we aim to combine our data of 27 years with previously published data of known pathogenic and likely pathogenic variants to provide up-todate statistics on the frequency and type of JAG1 and NOTCH2 variants in ALGS. In addition, we will discuss mutation trends that we and others have observed in both the JAG1 and NOTCH2 genes as a resource for missense variant interpretation and classification.
Finally, we will end with our thoughts on how best to understand the small population of patients with clinically defined ALGS who do not have a pathogenic variant in JAG1 or NOTCH2 and are currently molecularly uncharacterized.

| Patient cohort
We studied 401 probands whose phenotypic features met the clinical definition of ALGS based on the presence of three out of five characteristic liver, heart, eye, vertebral, and/or facial phenotypes as previously described (Alagille et al., 1987;Emerick et al., 1999;Kamath et al., 2003). The majority of these probands were ascertained from the Liver Clinic at the Children's Hospital of Philadelphia (CHOP), therefore, enriching our patient population for liver disease and potentially for JAG1 pathogenic variants associated with cholestasis. We also include data from 111 affected family members.
Some of the patients in our cohort have been previously reported and are included here to provide a comprehensive summary of our clinical study, with prior reports referenced in all corresponding tables Colliton et al., 2001;Heritage et al., 2000;Izumi et al., 2016;Kamath et al., 2003;Kamath et al., 2009;Kamath et al., 2012;Krantz et al., 1998;Laufer-Cahana et al., 2002;Li et al., 1997;Lin et al., 2012;McDaniell et al., 2006;Morrissette, Colliton, & Spinner, 2001;Oda et al., 1997;Warthen et al., 2006). Our cohort contains both probands and affected family members. All patients were enrolled into our study using a consent protocol approved by the Institutional Review Board at CHOP. All JAG1 variants described in our study can be retrieved from an already existing Locus Specific Database (LSDB) using the following link: https://databases.lovd.nl/ shared/genes/JAG1.

| Literature search
The majority of reported JAG1 and NOTCH2 variants are found in The Human Gene Mutation Database (HGMD ® Professional 2019.1, last queried on May 3, 2019; Stenson et al., 2017). Variants were filtered to include only those that were reported to be diseasecausing ("DM") and were associated with ALGS. Variants were also identified from ClinVar (last queried on May 3, 2019), and were filtered to include only those that were reported as "pathogenic" or "likely pathogenic" and listed "Alagille syndrome" as the associated condition (Landrum et al., 2018). A literature search was also performed on PubMed, with a last check on May 3, 2019. Finally, Leiden Open Variation Database (LOVD V3.0) was last queried on May 3, 2019 for JAG1, and variants were filtered to include only those reported as "pathogenic" or "likely pathogenic" (Fokkema et al., 2011).

| Mutation identification
Genomic DNA extracted from whole blood was screened first by polymerase chain reaction (PCR) and Sanger sequencing of all 26 exons of the JAG1 gene. Samples in which no pathogenic or likely pathogenic variant was identified were further screened by MLPA or single nucleotide polymorphism (SNP) array analysis of the JAG1 gene to identify copy number variants. If a sample was not found to have a pathogenic or likely pathogenic variant in JAG1 by both PCR and MLPA analysis, the sample was screened for pathogenic variants in the NOTCH2 gene by PCR and Sanger sequencing. Patients who were diagnosed as clinically consistent with ALGS, but in whom no pathogenic or likely pathogenic variant was identified by this threetiered approach were classified as mutation-negative. PCR-free whole genome sequencing (150 bp paired-end reads) at an average depth of 30× was performed using HiSeq X at the Center for Applied Genomics at the Children's Hospital of Philadelphia.

| Mutant JAG1 constructs
Human JAG1 cDNA has previously been cloned into the pBABEpuro retroviral expression vector (Morrissette et al., 2001). Point mutations were introduced using the QuikChange Site-Directed Mutagenesis Kit (Stratagene, San Diego, CA) and resultant clones were sequenced for mutation verification. Stable cell lines were generated by infecting NIH-3T3 cells with these mutant retroviral vectors as previously described (Morrissette et al., 2001).
Endo H: 50 micrograms of protein obtained from NP40 lysis were treated with 1,500 units of Endo H (New England Biolabs, Ipswitch, MA) at 37°C for 1 hr.

| Western blot analysis
Western blot analysis was performed according to standard protocols. JAG1 was detected using an antibody recognizing the C-terminal region (H-114, Santa Cruz Biotechnology, Inc., Dallas, TX) and a HRP-goat anti-rabbit secondary antibody (Amersham, Inc. Buckinghamshire, United Kingdom).

| Immunofluorescence
Stable cell lines were plated on culture slides and treated as previously described . A JAG1 antibody (H-114; Santa Cruz Biotechnology, Inc.) was used at a 1:40 dilution for immunodetection.
GILBERT ET AL.

| Luciferase assays
Luciferase assays were performed as previously described . Briefly, cells transfected with 199 ng of 4xCBF-Luc reporter construct (Hsieh et al., 1996) and 1 ng of an internal control SV40 Renilla construct (Promega, Madison, WI) were cocultured with stable cell lines expressing mutant JAG1.

| VARIANTS IN
Frameshift variants are predominantly caused by deletions (51%) and duplications (40%), but are also caused by insertionsdeletions (indels, 8%), and rarely by insertions (1%). Seventyeight percent of nonsense variants are caused by single nucleotide substitutions. Stop gain variants account for 20% of nonsense variants, and occur through deletions (77%), indels (15%), and duplications (8%). A single start loss variant accounts for the remaining 2% of nonsense variants. The overwhelming majority of splice site variants are due to single nucleotide substitutions (83%), with the remaining 17% caused by deletions, duplications, or indels at or near the splice site. The incidence of each mutation type is relatively unchanged when our data set is combined with all reported pathogenic and likely pathogenic variants (totaling 694 unique variants), and has also remained relatively stable in the 27 years because pathogenic variants in JAG1 were first identified as the cause of ALGS, suggesting that these frequencies are an accurate indication of mutation-type prevalence in JAG1 for ALGS (Crosnier et al., 1999;Stenson et al., 2017;Warthen et al., 2006; Figure 2a).

| Large gene deletions
Large gene deletions differ in both length and in the location of their breakpoints, two findings that have previously been used to suggest that there is no specific genomic hotspot for rearrangement. It has been recognized that patients with 20p deletions can have other abnormalities, including developmental delay, hearing loss, and autism, among others, and work by Kamath et al. defined a 5.4 Mb region, including 12 genes, within which deletions led to ALGS-specific disease phenotypes (Kamath et al., 2009). They further showed that individuals with deletion variants extending distally or proximally from this region all presented with additional phenotypes.
Of the 44 deletions that we report here, we provide mapped breakpoints for 23 (52%) of them, of which 3 have not previously been described. These three deletions include two that fall within the 5.4 Mb ALGS-specific region (257 and 861 Kb) and one that is larger (10.57 Mb). Clinical data from the two patients with the smaller deletions does not include phenotypes outside of ALGS; however, we only have records from infancy and we cannot speculate whether additional conditions arose with age. Clinical data from the patient with the 10.57 Mb deletion includes obesity and significant developmental delay.

| JAG1 missense variants
Missense variants were found throughout the entire extracellular region of the gene, with a statistically significant overrepresentation (p = .0002; unpaired, two-tailed t test) of missense variants clustering within the first 6 exons of the gene, an observation that has previously been reported (Masek & Andersson, 2017;Spinner et al., 2001; Figure S3). The statistical significance increases (p < .0001; unpaired, two-tailed t test) when reported pathogenic or likely pathogenic missense variants that are not present in our cohort are added to our data set ( Figure 3). Overall, 15% of all JAG1 pathogenic or likely pathogenic variants (our cohort and previously reported variants, n = 104 of 694) are missense. Almost a quarter of these JAG1 missense variants involve the gain or loss of a cysteine within the EGF-like domain (n = 22 of 104, 21% of total reported and novel variants). The importance of cysteine in the proper folding of the EGF-like domain in both ALGS as well as other syndromes, including cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) and Marfan syndrome, has previously been described, and it is accepted that variants of this kind in this region are very likely disease-causing Haritunians et al., 2005;Le Caignec et al., 2002;Schrijver, Liu, Brenn, Furthmayr, & Francke, 1999;Whiteman et al., 2007). To further understand cysteine changes in relation to disease, we plotted the frequency and distribution of cysteine changes observed in gnomAD compared to all cysteine changes reported in ALGS (including our cohort), and found that cysteine loss was more prevalent in the disease population whereas cysteine gain was overrepresented in the control population derived from gnomAD, suggesting a greater tolerance for cysteine gain in healthy individuals (Figure 4). We also    construct containing four tandem Notch-responsive CBF binding sites in the promoter region of the luciferase gene (Hsieh et al., 1996). Here, we found that seven of the nine mutants were unable to increase luciferase activity (p.Cys78Ser, p.Cys92Tyr, p.Cys229Tyr, C271R, p.Cys438Phe, p.Cys902Ser, and p.Cys911Tyr) whereas two retained Notch signaling function (p.C693Y and p.C714Y; Figure 7). Variants have previously been described to be "leaky," meaning that proteins retain partial, albeit reduced, wild type function, and indeed we included the known pathogenic variant p.G274D as a positive control, which has been shown to have impaired signaling ability but only a partial loss in cellular localization/trafficking Lu, Morrissette, & Spinner, 2003;Morrissette et al., 2001).
Efforts to identify whether variants that retain some partial protein function lead to milder or cardiac-specific clinical features have proven inconclusive, but suggest that there may be a threshold for JAG1 haploinsufficiency . It is also possible that there are innate cellular differences between the in vitro signaling assay and the in vivo environment of the developing liver. Vascular smooth muscle cells, which express NOTCH3 (Baeten & Lilly, 2017), are also likely to be a major source of JAG1 during biliary development, and it is possible that these mutations (

| VARIANTS IN NOTCH2
We identified nine unique NOTCH2 variants in 10 of 401 (2.5%) probands in our cohort. These variants are predominantly missense, but also include splice site and nonsense variants (Table 3; Figure S4).
Three of these pathogenic NOTCH2 variants have not previously been described, which brings the total number of known pathogenic NOTCH2 variants to 19, and we describe the clinical features of the individuals with these novel variants in Table S1. All three of the   (Zhou et al., 2016) that these observations are able to better guide missense variant interpretation in ALGS.
Functional characterization is necessary to conclusively classify missense variants, and our group and others have shown that many pathogenic missense variants result in improper protein folding, incorrect cellular localization, and/or a defect in Notch signaling activation Guarnaccia, Dhir, Pintar, & Pongor, 2009;Lu et al., 2003;Morrissette et al., 2001;Tada, Itoh, Ishii-Watabe, Suzuki, & Kawasaki, 2012). However, these studies have also categorized variants that were thought to be disease-causing as benign, which highlights the need for functionally validating individual variants Morrissette et al., 2001;Tada et al., 2012). Interestingly, while we and others had previously F I G U R E 5 Cysteine-loss missense variants are defective in protein localization. Confocal microscopy of stably-transfected NIH-3T3 cells expressing the following controls: (a) wild type JAG1 and two positive controls with known nuclear retention and perinuclear localization (b) p.G274D and (c) p.L37S (Lu et al., 2003;Morrissette et al., 2001 Kamath et al. (2012) Note: RefSeq NM_024408.3. 54% of reported pathogenic missense variants, of which we see five in our cohort. These two exons encode EGF-like domains (EGF repeats 9-12) of NOTCH2. The second hub occurs in exons 31 and 32, which accounts for 31% of reported pathogenic missense variants, of which we see four in our cohort. These two exons code for the Ankyrin (ANK) repeat domain of NOTCH2. A few of these missense variants have been studied to determine their functional consequence by assaying their ability to be activated by JAG1 using luciferase reporters, which confirmed pathogenicity in five out of six tested variants (Kamath et al., 2012). Little else has been done to specifically interrogate NOTCH2 missense variants in the context of ALGS, however a study in fruit flies found that a specific missense variant, V361M, located within an EGF-like domain was able to discriminate between ligands, such that it effectively abrogated the ability of Serrate (Jagged homolog) ligands to signal through NOTCH, whereas Delta (Delta-like homolog) ligands were able to signal normally, thus defining a domain that specifically affects Serratebinding (Yamamoto et al., 2012). Additional work in NOTCH1 has identified a minimal region of EGF repeats (EGF repeats 6-15) that are sufficient to fully activate signaling in an in vitro reporter assay (Andrawes et al., 2013), and this combined with work by Yamamoto et al. (2012) supports a growing hypothesis that missense variants within this region are less tolerated and more likely to confer a functional consequence. It will be interesting to see if some of the identified missense variants in ALGS act similarly.

| DIAGNOSTIC RELEVANCE AND FUTURE PROSPECTS
Results from our comprehensive 27-year, single-center study provides updated statistics regarding the incidence of JAG1 (94.3%; n = 377 out of 401), NOTCH2 (2.5%; n = 10 of 401), and mutation negative cases (3.2%; n = 13 of 401) of ALGS. In addition, we report 86 novel JAG1 pathogenic variants and three novel NOTCH2 pathogenic variants. When combined with previously published data, we provide the most up-to-date data on the frequency of mutationtype seen in patients with JAG1 or NOTCH2 pathogenic variants.
Successful screening of patients necessitates both sequencing and copy number analysis, which can be carried out by Sanger sequencing and MLPA, or next generation sequencing (NGS) with copy number variation analysis across the gene (Gilbert, 2018;Spinner, Leonard, & Krantz, 2013). The current standard is to sequence all exons in JAG1, which should identify approximately 85% of ALGS pathogenic variants. If CNV analysis is not carried out simultaneously with sequencing, second tier diagnostics involves large deletion/duplication analysis through either multiplex ligationdependent probe amplification (MLPA), chromosomal microarray (CMA), or fluorescence in situ hybridization (FISH), which should identify an additional 9% of pathogenic variants. Samples without an identified JAG1 pathogenic variant would then undergo Sanger sequencing for NOTCH2, which should uncover an additional 2-3% of pathogenic variants.
A notable finding from our study is the percentage (3.2%) of mutation negative individuals that we describe. These individuals have all met the standards for clinical classification of ALGS, but do not have a pathogenic variant in JAG1 or NOTCH2. We hypothesize that these include patients with JAG1 variants not previously identified by conventional testing (Sanger sequencing and MLPA), as well as a subset of patients that will be found to have a different diagnosis with overlapping features of Alagille syndrome. The best approach towards a molecular understanding of this population is to perform more comprehensive sequencing methodologies, including whole exome sequencing (ES), whole genome sequencing (GS), and/or RNA sequencing (RNAseq). Using ES, we have previously identified compound heterozygous pathogenic variants in the gene ATP8B1, a gene involved in progressive familial intrahepatic cholestasis type I (PFIC1) in a patient with overlapping features of ALGS and PFIC1 (Grochowski et al., 2015). Individuals with ABCB4 deficiency, which results in a variety of hepatic phenotypes including PFIC Type 3, have also been misdiagnosed as having ALGS (Schatz et al., 2018). Similarly, siblings with an initial diagnosis of ALGS were found to have a pathogenic variant in the NEK8 gene, which is commonly mutated in renal-hepaticpancreatic-dysplasia 2 (RHPD2) and in nephronophthisis (NPHP9), and resulted in a reclassification of the disease to encompass a spectrum of disorders that involve NEK8 pathogenic variants rather than ALGS (Rajagopalan et al., 2016). These studies suggest that full evaluation of our 13 mutation negative individuals, which has not yet been performed, may lead to disease reclassification.
Given the obvious molecular etiology of ALGS as a disease of Notch signaling dysfunction, we anticipate that regulatory regions within JAG1 or NOTCH2, or regions within those two genes that are missed by more traditional sequencing technologies, including ES, are the most likely candidates for novel molecular discovery. The advanced technology provided by GS is able to identify more complicated structural variants in JAG1, and indeed we describe here a partial gene deletion and an inversion detected by GS (Rajagopalan et al., in preparation). We are confident that a larger subset of mutation negative individuals with a clear clinical indication of ALGS will be definitively diagnosed as we screen this cohort.

| CONCLUSIONS
Overall, our decades-long study on ALGS has allowed us to accumulate comprehensive information on the types and frequencies of mutations in ALGS. We report an additional 86 JAG1 pathogenic variants and three NOTCH2 pathogenic variants, bringing the total number of described variants to 694 and 19, respectively (Stenson et al., 2017). We find that 94.3% of individuals with clinically diagnosed ALGS have a pathogenic variant in the JAG1 gene, 2.5% have a pathogenic variant in the NOTCH2 gene, and 3.2% are molecularly uncharacterized. We caution other researchers and clinicians on the functional relevance of missense variants, both in JAG1 and particularly in NOTCH2, where they predominate. Finally, we suggest that NGS strategies may best interrogate the small GILBERT ET AL.

| 2217
population of molecularly undiagnosed patients, and that these approaches should prioritize screening of JAG1, NOTCH2, and of other Notch signaling genes and regulatory regions.