A discarded synonymous variant in NPHP3 explains nephronophthisis and congenital hepatic fibrosis in several families

Half of patients with a ciliopathy syndrome remain unsolved after initial analysis of whole exome sequencing (WES) data, highlighting the need for improved variant filtering and annotation. By candidate gene curation of WES data, combined with homozygosity mapping, we detected a homozygous predicted synonymous allele in NPHP3 in two children with hepatorenal fibrocystic disease from a consanguineous family. Analyses on patient‐derived RNA shows activation of a cryptic mid‐exon splice donor leading to frameshift. Remarkably, the same rare variant was detected in four additional families with hepatorenal disease from UK, US, and Saudi patient cohorts and in addition, another synonymous NPHP3 variant was identified in an unsolved case from the Genomics England 100,000 Genomes data set. We conclude that synonymous NPHP3 variants, not reported before and discarded by pathogenicity pipelines, solved several families with a ciliopathy syndrome. These findings prompt careful reassessment of synonymous variants, especially if they are rare and located in candidate genes.

in four additional families with hepatorenal disease from UK, US, and Saudi patient cohorts and in addition, another synonymous NPHP3 variant was identified in an unsolved case from the Genomics England 100,000 Genomes data set. We conclude that synonymous NPHP3 variants, not reported before and discarded by pathogenicity pipelines, solved several families with a ciliopathy syndrome. These findings prompt careful reassessment of synonymous variants, especially if they are rare and located in candidate genes.
nephronophthisis, next generation sequencing, NPHP3, RNA splicing, synonymous variant Whole exome sequencing (WES) has become an accessible and cost-effective way of investigating inherited human diseases in routine clinical practice (Bamshad et al., 2011;Cameron-Christie et al., 2019;Groopman et al., 2019). Limited to protein coding regions of the genome (exome), which account for only 2% of the genome, WES generates reduced amount of data, compared to whole genome sequencing (WGS), translating into decreased data storage costs as well as cheaper, quicker and easier data analysis.
Indeed, variant annotation is generally more accurate in the protein coding regions of the genome because functional consequences are more readily assessable. Nevertheless, data generated from next generation sequencing (NGS) requires extensive pipeline analysis and automated filtering of variants or pathogenicity ranking remains challenging. Despite well-established workflows and software in place to process raw data (Jalali Sefid Dashti & Gamieldien, 2017), the typical European WGS in the 1K genome project has about 10,000 non-synonymous (1% singletons) variants (Auton et al., 2015), that need to be filtered down to a subset of variants relevant to the patient phenotype and pattern of inheritance. Crucially these final steps of variant prioritisation require biological and biomedical reasoning, demanding input from scientists and clinicians in addition to bioinformatics approaches. Besides filtering strategies, mutation detection rate by WES vastly depends on the underlying phenotype and the likelihood of a genetic aetiology (Mann et al., 2019 (Kagan et al., 2017). Yet, in familial cases of suspected NPHP, that is, with high a priori likelihood of a genetic aetiology, WES detected a causative mutation in 63.3% of cases (Braun et al., 2016).
This suggests that a significant number of cases remain unsolved, either because a novel gene is involved, the pathogenic variant is outside the coding regions or, because the pathogenicity of the underlying genetic variant is not readily assessable and requires in-depth curation.
Here we report an Omani pedigree where the parents were first cousins (Figure 1a), and in which WES was carried out in two siblings OM-1 (II.1) and OM-2 (II.4) with clinical features of a hepatorenal ciliopathy syndrome. Both presented with small echogenic kidneys with multiple cysts (Figure 1b), suggestive of NPHP associated with congenital hepatic fibrosis and ultrasonographic signs of portal hypertension ( Figure 1c and Figure S1). OM-1 had reached end stage kidney disease (ESKD) within 2 years of life (Table 1). Considering parental consanguinity, we applied initial standard WES filtering (using Qiagen Clinical Insight software) for rare (allele frequency [AF] less than 1% in any reference subpopulation) homozygous variants classified as pathogenic/likely pathogenic, truncating, predicted nontolerated SNV or leading to splice site loss (see supplementary materials). This analysis revealed only one homozygous variant shared by the two siblings, a VUS in TSNARE1 (RefSeq NM_145003.5: c.761C>T; p.(Pro254Leu)), a gene implicated in synaptic vesicle exocytosis and polygenic risk for schizophrenia (Sleiman et al., 2013).
Of note, no variants were detected in genes known to cause hepatorenal ciliopathies. Given the parental consanguinity, we next mapped the regions of homozygosity shared between OM-1 and OM-2 against 17 genes known or potentially associated with autosomal recessive cystic kidney disease and congenital hepatic fibrosis (Table S1) (Halbritter et al., 2013;Vilboux et al., 2017). We detected several, often nonoverlapping regions of autozygosity in the two siblings, and only NPHP3 was located in the middle of a shared stretch of homozygosity (~3Mb) ( Figure S2). By removing all pathogenicity filters and specifically looking for rare (AF<1%) homozygous variants in these candidate genes, we indeed identified 1 shared predicted synonymous VUS in NPHP3 (RefSeq NM_153240.5: c.2805C>T p.(Gly935Gly)) ( Figure 1d) and recessive segregation of this variant was confirmed by Sanger sequencing (Figure 1a). The identified allele has not been reported before in cases of ciliopathy and is rare (gnomAD 2/251,374 alleles; no homozygous individuals).
In line, we detected only two heterozygous adults without features of hepatorenal disease among 200,000 individuals in the UK Biobank.
Specialized in silico tools predict a possible impact on NPHP3 splicing F I G U R E 1 (See caption on next page) OLINGER ET AL.
| 1223 (Table 1). Given its rarity, its location within a strong candidate gene NPHP3 in a shared region of homozygosity, the predicted effect on splicing and the absence of alternative genetic explanations, we first sought to look for additional patients with the identical allele.
Searching the whole rare disease data set (73,988 genomes) from the Genomics England 100,000 Genomes Project, we identified 2 probands with the identical homozygous change c.2805C>T in NPHP3 (GEL-1 and GEL-2) and phenotypes suggestive of a multisystem ciliopathy with features of hepatorenal fibrocystic disease (Figure 1e,f and Table 1). Homozygosity plots for GEL-1 & GEL-2 detected a large homozygous region on chromosome 3, including NPHP3, and in keeping with known parental consanguinity ( Figure S3). In both cases, the unaffected mother is heterozygous for the NPHP3 allele ( Figure S4A,B). Of note, both GEL-1 and GEL-2 are reported as genetically unsolved by the Genomics England analysis team as the predicted synonymous NPHP3 allele had been filtered out from variant tiering tables. A further two cases with NPHP-like disease associated with congenital liver disease were identified in patient databases from clinical collaborators. The first was identified in a cohort of WES data (~4500 whole exomes) from Saudi Arabia (SA-1) ( Figure 1g) and the second one in a worldwide cohort of patients with inherited renal disease, including 800 patients with NPHP and related phenotypes (NP642) (Figure 1h and Table 1). In both consanguineous families, no alternative genetic diagnosis was detected. Noteworthy, the implication of NPHP3 in case NP642 is supported by a region of genome-wide homozygosity on chromosome 3 encompassing NPHP3 (Figure 1i).
The identified change c.2805C>T is located in the middle of NPHP3 exon 20 just before the C-terminal tetratrico peptide repeat domains ( Figure 1j). In silico tools indicate a certain constraint for the cytosine at position 2805. Importantly, the thymine insertion creates a GT motif that is predicted to act as a cryptic alternative splice donor (Table 1). We evaluated the effect of this variant using whole bloodderived RNA from the index Omani family. We performed reversetranscription polymerase chain reaction (RT-PCR) with primers targeted to exons 19 and 21 and sequenced the amplicon. In both heterozygous and homozygous individuals (Figure 1k and Figure S5), sequencing confirms the predicted alternative splicing event joining mid-exon 20 to exon 21. In addition, we also detected the canonically spliced RNA indicating that the effect on splicing is not 100%, at least not in blood. Altogether, the detected RNA sequence is compatible with the presence of both (i) canonically spliced mRNA leading to RNA substitution r.2805c>u and amino acid change p.Gly935Gly as well as (ii) alternatively spliced mRNA, out of phase (1 nucleotide from exon 20 and 2 nucleotides from exon 21) and leading to RNA deletion r.2804_2883del and frameshift p.Gly935GlyfsTer47 (- Figure 1k). On gel electrophoresis, we confirm the presence of a shorter NPHP3 transcript, compatible with the expected alternative splicing, in unaffected heterozygotes and in homozygous patients but not in unrelated controls ( Figure S6). The ratio of short to full-length transcripts is higher in homozygote's blood RNA indicating a dosage effect on splicing.
Considering that a predicted synonymous variant discarded from several independent NGS pipelines and analyses provided a plausible genetic diagnosis in five families with this rare ciliopathy, we wondered whether filtering by rare predicted synonymous NPHP3 variants could solve additional cases in the 100,000 Genomes Project rare disease data set (73,988 genomes). We did not detect other patients with rare homozygous NPHP3 synonymous variants and matching phenotypes but, when filtering for monoallelic variants, we identified a genetically unsolved 34-year-old female (GEL-3) presenting with ESKD in childhood and compatible with an autosomal recessive inheritance. This patient was compound heterozygous for a pathogenic NPHP3 nonsense allele (RefSeq NM_153240.5: c.1729C>T p.(Arg577Ter)) and a rare (UK Biobank exomes: 1/399,454 alleles) predicted synonymous NPHP3 allele (RefSeq NM_153240.5: c.3129T>C p.(Tyr1043Tyr)). The latter variant is situated 4 nucleotides from an intron-exon boundary and in silico tools predicted a possible effect on splicing (Table S2 and Figure S7) but unfortunately no patient RNA was available to test this hypothesis. (e, f) Additional families have been recruited within Genomics England 100,000 Genomes project, from a (g) Saudi Arabia cohort and (h) from a US cohort. Segregation of the NPHP3 variant c.2805C>T is indicated by mut/+ for heterozygous individuals and mut/mut for homozygous patients. (i) Genome-wide homozygosity plot for NP642. The genomic location of NPHP3 is indicated. Pedigrees were constructed and drawn using Progeny Free Online Pedigree Tool (Progeny Genetics LLC, Delray Beach, FL, www.progenygenetics.com). (j) NPHP3 RNA (RefSeq NM_153240.5) and exon structure with UTR in grey and annotated with domains. Variant c.2805C>T mid-exon 20 is indicated. The red line indicates the predicted premature stop codon at amino acid position 982 (see below). Protein visualization using ProteinPaint (Zhou et al., 2016) and domain annotation with SMART (Schultz et al., 1998). (k) Genomic map of NPHP3 exons 20 and 21 showing canonical splicing (i) and alternative splicing (ii) due to activation of a cryptic splice site mid-exon 20 in NPHP3 pre-mRNA harbouring c.2805C>T. Below is shown a Sanger sequence extract from RT-PCR (forward primer exon 19, reverse primer exon 21) performed on whole blood RNA from homozygous patient OM-1 (II.1). Sequence reveals presence of both canonically spliced (i) and alternatively spliced (ii) transcripts leading to shift in reading frame. The predicted amino acid sequence is above the nucleotide sequence. The full consequence of c.2805C>T is thus r.[2805c>u, 2804_2883del] p. [Gly935Gly,Gly935GlyfsTer47]. CHF, congenital hepatic fibrosis; mRNA, messenger RNA; NPHP, nephronophthisis; RT-PCR, reverse-transcription polymerase chain reaction; UTR, untranslated region Again, the rare synonymous variant was filtered out from the Genomics England tiering tables.
Synonymous variants occur frequently as they typically elude evolutionary constraint and may often be filtered out of automated lists of pathogenic variants. This is especially the case when in silico prediction scores based on amino acid changes, such as SIFT and PolyPhen-2 determine these are benign alleles. Disease-causing predicted synonymous SNV have been reported in association with cystic kidney diseases including autosomal dominant polycystic kidney disease with variants identified in PKD1 and PKD2 (Claverie- Martin et al., 2015) and autosomal recessive polycystic kidney disease with variants identified in PKHD1 leading to aberrant splicing (Molinari et al., 2020). Finally, we have previously reported families with a clinical diagnosis of NPHP in whom a predicted synonymous SNV in NPHP3 was demonstrated to cause aberrant splicing, using RNA from urinary renal epithelial cells, (Molinari et al., 2018). In these cases, pathogenic synonymous variants have been detected because of investigator-led manual curation of disease-associated genes.
Mutations in NPHP3 are among the more common causes of infantile NPHP (Halbritter et al., 2013;Tory et al., 2009) and may also cause the perinatal lethal Meckel syndrome (Bergmann et al., 2008). Aside from the renal features which are commonly classified as NPHP and cystic kidney disease, NPHP3 causes liver phenotypes, typically congenital hepatic fibrosis (Olbrich et al., 2003). Remarkably, the predicted synonymous variants identified in this study were able to provide a plausible genetic diagnosis in 6 families suffering from a rare ciliopathy. The most recent and among the larger patient cohorts described 13 families with biallelic NPHP3 variants, without evidence for recurrent homozygous changes as we detected here (Tang et al., 2020). Among pathogenic truncating variants, the majority are located downstream of exon 21, suggesting indeed that the premature termination reported here is disease-causing (Chaki et al., 2011;Tang et al., 2020). Furthermore, exons 20 and 21 are present in all main NPHP3 transcripts (Olbrich et al., 2003) and no alternative splice junctions affecting exons 20 and 21 were detected in human RNA sequencing data from different tissues (https://www.gtexportal. org/home/). If translated, C-terminal truncation affects the tetratrico peptide repeat domain with potential functional importance in protein-protein interactions (Olbrich et al., 2003).
Although we detected an allele-dosage effect on aberrant splicing, we still detected significant amounts of canonically spliced RNA (p.Gly935Gly) in homozygous patient blood. Considering the potential for cell-type and developmental stage differences in splicing, and in particular that splicing for NPHP3 markedly differed between whole blood and kidney (Molinari et al., 2018), we hypothesize that a predominant alternative splicing occurs in organs with disease manifestations. Unfortunately, we had no access to kidney or liver RNA and this constitutes a major lim- | 1225 demonstration that the identified variant indeed affects splicing strongly supports its pathogenicity.
The findings presented here and reported in literature show that rare predicted silent genetic variants are routinely discarded from NGS tiering algorithms, ignoring the functional consequences they can have on pre-mRNA splicing, and that some unexplained cases of ciliopathies may not be solved until a more refined analysis of any impact on splicing is performed to determine more accurately pathogenicity. It is estimated that cryptic splice mutations, i.e., outside of the essential GT and AG splice dinucleotides, are responsible for 10% of pathogenic mutations in patients with rare genetic disorders (Jaganathan et al., 2019) and up to 62% of all pathogenic singlenucleotide variants are disrupting RNA splicing (Wai et al., 2020).
There are a variety of tools that can be applied to this situation including SpliceAI (Jaganathan et al., 2019), MaxEntScan (Yeo & Burge, 2004) and Human Splicing Finder (Desmet et al., 2009;Moles-Fernández et al., 2018), with large variations in sensitivities and specificities for different types of splicing defects. A recent report from the UK Splicing and disease working group examined potential splicing-altering effects of 257 coding and noncoding VUS identified in a true-to-life clinical diagnostics context (mostly VUS in BRCA1/2 and FBN1) (Wai et al., 2020). Using RT-PCR or RNAseq on patient blood RNA, 33% of these VUS were associated with abnormal splicing. Interestingly, 13% of non-splice region variants (outside of the region defined as 3 exonic nucleotides and 8 intronic nucleotides from the exon-intron boundary) still significantly affected splicing.
Despite the increasing accuracy of some bioinformatics prediction tools, all continue to show significant miscalling (Jaganathan et al., 2019;Wai et al., 2020). Therefore, one take home message from this study is that in silico tools should neither be relied on in isolation nor should pathogenic predictions represent a pre-requisite for seeking experimental evidence for altered splicing, especially for variants located outside of classical splice regions. The study concludes that experimental RNA analysis has the ability to produce clear results that help classify variant pathogenicity in relatively large cohorts and should be routinely considered to clarify VUS in unsolved genetic disease. However, further work and validation is needed to determine how best to incorporate experimental splicing analysis into clinical practice, how to report and standardize experimental evidence, the potential advantages of RNAseq over RT-PCR coupled with Sanger sequencing and how to translate insights on splicing from blood RNA to affected tissues (Wai et al., 2020).

ACKNOWLEDGMENTS
The authors would like to thank the affected individuals, their fa-

DATA AVAILABILITY STATEMENT
The authors confirm that the data supporting the findings of this study are available within the article and its supplementary materials.
Further phenotypic or sequencing data are available from the corresponding author (JAS), upon reasonable request.