A novel cause of DKC1‐related bone marrow failure: Partial deletion of the 3′ untranslated region

Abstract Telomere biology disorders (TBDs), including dyskeratosis congenita (DC), are a group of rare inherited diseases characterized by very short telomeres. Mutations in the components of the enzyme telomerase can lead to insufficient telomere maintenance in hematopoietic stem cells, resulting in the bone marrow failure that is characteristic of these disorders. While an increasing number of genes are being linked to TBDs, the causative mutation remains unidentified in 30‐40% of patients with DC. There is therefore a need for whole genome sequencing (WGS) in these families to identify novel genes, or mutations in regulatory regions of known disease‐causing genes. Here we describe a family in which a partial deletion of the 3′ untranslated region (3′ UTR) of DKC1, encoding the protein dyskerin, was identified by WGS, despite being missed by whole exome sequencing. The deletion segregated with disease across the family and resulted in reduced levels of DKC1 mRNA in the proband. We demonstrate that the DKC1 3′ UTR contains two polyadenylation signals, both of which were removed by this deletion, likely causing mRNA instability. Consistent with the major function of dyskerin in stabilization of the RNA subunit of telomerase, hTR, the level of hTR was also reduced in the proband, providing a molecular basis for his very short telomeres. This study demonstrates that the terminal region of the 3′ UTR of the DKC1 gene is essential for gene function and illustrates the importance of analyzing regulatory regions of the genome for molecular diagnosis of inherited disease.


INTRODUCTION
Telomeres are nucleoprotein complexes capping the ends of chromosomes, composed of tracts of the repeated DNA sequence TTAGGG and sequence-specific binding proteins. In most human somatic cells, telomeres shorten with each cell division, due to the inability of the DNA replication machinery to replicate the ends of linear DNA molecules; telomere shortening in these cells is a mark of normal aging [1]. In stem cells, including those of the hematopoietic system, the enzyme telomerase counteracts telomere shortening [2]. Telomere biology disorders (TBDs) are a group of rare inherited diseases characterized by the presence of very short telomeres relative to the general population [3][4][5][6]. In these patients, who carry pathogenic variants in components of telomerase or other telomere-protective proteins, telomeres become abnormally short or dysfunctional, leading to activation of a DNA damage response and cellular senescence or apoptosis. This causes reduced replicative capacity of hematopoietic progenitor cells, leading to progressive bone marrow failure (BMF), a common feature of these disorders and the most common cause of mortality [7].
The first of these diseases linked to short telomeres was dyskeratosis congenita (DC) [4], classically defined by the triad of abnormal skin pigmentation, oral leukoplakia, and nail dystrophy. DC is, however, a multisystem disorder with clinical manifestations that include BMF, pulmonary fibrosis, liver cirrhosis, gastrointestinal symptoms, dental abnormalities, and predisposition to malignancies. BMF is seen in up to 80% of affected patients [7]. Other patients who are now being recognized as having an underlying TBD can present with isolated organ involvement, such as aplastic anemia or pulmonary fibrosis [7].
There are currently 16 genes associated with TBDs, which all play a role in telomerase biogenesis and function, telomere capping, or telomere replication [3,8,9]. While an increasing number of genes are being linked with TBDs using single gene analysis, gene panels, or whole exome sequencing (WES), the causative gene remains unidentified in ∼30-40% of DC patients [10]. There is therefore a need to analyze the genomes of such patients using whole genome sequencing (WGS), to identify novel TBD genes or pathogenic variants in the regulatory regions of known TBD genes.
The X chromosome gene DKC1, encoding the protein dyskerin, was the first gene in which pathogenic variants were identified in DC patients [11]. Dyskerin is an RNA-binding protein that specifically binds and stabilizes the H/ACA family of small RNAs, including the human telomerase RNA subunit (hTR, encoded by the gene TERC) [4]. Dyskerin is a pseudouridine synthase and catalyzes the conversion of uridine to pseudouridine at specific sites in ribosomal and spliceosomal RNAs [12]. Dyskerin is also an integral component of the telomerase complex [4,13,14]. The discovery that it is responsible for maintaining hTR levels and thereby regulating telomere length, was pivotal in the recognition of the role of telomere biology in the etiology of DC and related disorders [4].
Most patient-associated variants identified in DKC1 to date are missense changes, or small deletions or inversions in the protein-coding region [15,16], since more extensive deletions are likely to be incompatible with survival [17]. One exception is a family found to harbor a 2 kb deletion, removing the entire last exon of the gene, including the whole 3′ untranslated region (3′ UTR) [18]. Since the terminal 22 amino acids of the protein were also removed, it was unclear whether the protein truncation or loss of the 3′ UTR was responsible for disease.
Here, we describe a family with a much smaller deletion of the 3′ end of the DKC1 3′ UTR, that segregates with features of DC across the extended family. This variant was not identified through initial targeted sequencing of DC genes and WES, but was discovered in subsequent WGS analysis. We identified the locations of two polyadenylation signals in the 3′ UTR; this deletion removes both of them, and we demonstrate that this is sufficient to cause dramatic reductions in the levels of DKC1 mRNA and hTR, which likely leads to the short telomeres of the patient. Thus, we have found that loss of a small portion of the DKC1 3′ UTR is sufficient to cause DC. This illustrates the importance of examining the regulatory regions of known disease-causing genes by WGS in patients for whom a causative variant has yet to be identified.

Subjects
The male proband presented to The Children's Hospital at Westmead at 7 years of age with skin pigmentation, dysplastic nails, dysphagia, and celiac disease, and a family history suggestive of DC ( Figure 1,

Genome sequencing and bioinformatic analysis
WGS of peripheral blood genomic DNA from individuals III-3, IV-3, V-1, and V-2 ( Figure 1) was performed on an Illumina HiSeq X Ten platform with 150 bp paired-end reads, using TruSeq Nano library preparation with 350 bp inserts (Macrogen, Korea). Genome alignment and variant calling were performed by Macrogen using an ISAAC pipeline [19]. The

PCR across the deletion and Sanger sequencing
Predesigned PCR primers (Table S1, Figure 2B; ThermoFisher Scientific) included the forward primer of a pair in the DKC1 3′ UTR  Table S1; Sigma-Aldrich) by AGRF, Westmead, Australia. Sequencing data were visualized and analyzed in SnapGene.

2.4
Telomere length analysis (qPCR) A previously described monochrome multiplex qPCR telomere length assay [20,21] was used, in view of its accuracy and sample throughput.
Briefly, each reaction was performed on a BioRad CFX386 Touch PCR detection system (BioRad) in 384-well PCR plates. Twenty nanograms of DNA was added to a master mix containing 300 nM each of telc and telg telomere PCR primers (Table S1), 350 nM each of albu and albd single-copy gene PCR primers and Rotor-Gene SYBR Green PCR Master Mix (Qiagen), made up to a total volume of 10 µL. Each patient sample was assayed in quadruplicate, and each batch of PCR reactions included four control DNA samples, also assayed in quadruplicate.
Telomere content was measured using the ΔΔCT method, using a reference DNA. The difference between CT alb and CT tel was calculated for Dyskeratosis congenita (DC) Likely affected (DC)

Myeloma
Lung disease F I G U R E 1 Pedigree of family with dyskeratosis congenita. Relevant phenotypes are shown across five generations. The proband and one maternal uncle have been diagnosed with DC, while several female family members show signs consistent with mild DC (skin pigmentation, dysplastic nails, and premature greying). The genotype at the location of a deletion in the DKC1 3′ UTR is shown in red (determined by PCR across the deletion [ Figure 2D], and confirmed by Sanger sequencing in individuals IV-2 and V-2). ΔUTR indicates the presence of the deletion in Figure 2; + indicates a wild-type allele the reference sample and the control sample. Relative telomere length was then calculated using 2 (CT-sample -CT-reference) . The coefficient of variation (CV) of both the telomere and single copy gene quadruplicate measurements was 1-2%, while the telomere length CV between batches was 3-5%. Values obtained from 240 healthy individuals show the normal percentiles for different age groups ( Figure 3A).

Telomere length analysis (southern blot)
Telomere terminal restriction fragments were prepared by HinfI and RsaI digestion of genomic DNA, and 2 µg was loaded on a 1% (wt/vol) agarose gel in 0.5 × TBE. Pulsed-field gels were run at 6 V/cm for 14 hours at 14 • C, with an initial switch time of 1 second and a final switch time of 6 seconds. Gels were dried for 2 hours at 60 • C, denatured and hybridized overnight to a [γ-32 P]-ATP-labeled (CCCTAA) 4 oligonucleotide probe in Church and Gilbert hybridization buffer [23]. Gels were washed in 4 × SSC (0.06 M sodium citrate, 0.6 M NaCl, pH 7) and exposed to a PhosphorImager screen overnight prior to visualization using a Typhoon TRIO Imager (GE Healthcare Life Sciences). Mean telomere restriction fragment (TRF) lengths (the peak of the smear) were determined by comparison to size markers using ImageQuant TL (GE Healthcare Life Sciences).

Reverse transcription-PCR (RT-PCR) of DKC1 transcripts
RNA was isolated from whole blood from individuals IV-2, IV-3, V-1, and V-2 using PAXgene Blood RNA Tubes and a PAXgene Blood RNA Kit (PreAnalytiX, Switzerland), following the manufacturer's directions. The 3′ end of DKC1 mRNA was amplified using the technique of 3′ RACE (Rapid Amplification of cDNA ends). Total RNA (80 ng) was reverse transcribed using oligo(dT)-based primer AP (Invitrogen ; Table   S1) and Superscript IV reverse transcriptase (Invitrogen), following the manufacturer's directions. A portion (10%) of the resulting cDNA was amplified by PCR using 0.2 µM primers A-Fwd and AUAP (Table S1) (Table S1), and products electrophoresed as above.

Quantitative RT-PCR analysis (RT-qPCR)
Total RNA (300 ng) was reverse transcribed as described above, using random hexamers or primer AP for DKC1 (Table S1)

2.9
Assay for detection of non-random X-inactivation A modified version of the standard HUMARA assay was used to measure X-chromosome inactivation in female deletion carriers and wildtype controls [24,25]. Peripheral blood DNA was digested with restriction enzyme DdeI to improve accessibility of methylated regions to PCR, in the presence or absence of methylation-sensitive enzyme at 75 W on a 6% (wt/vol) acrylamide/8 M urea sequencing-style gel in TBE and exposed to a PhosphorImager screen. Band intensities were quantitated using ImageQuant TL, and the proportion of each allele digested by HpaII (i.e. undermethylated and hence active) was calculated as described [24].

Identification of deletion of part of DKC1 3′ UTR in a family with DC
In this study we performed genetic characterization of a large family presenting with DC ( Figure 1). The male proband developed skin pigmentation, dysplastic nails, mild dysphagia, and celiac disease from 7 years of age. The full blood count was normal at diagnosis at  Table 1). A peripheral blood mononuclear cell sample from the proband showed an average telomere lengthof less than the 1st percentile by Flow-FISH ( Figure S1). Together, these features and family history suggested a diagnosis of DC (OMIM #305000); the milder phenotype of the affected females suggested a possible X-linked inheritance.
Peripheral blood DNA from the proband was subjected to Sanger sequencing over the entire protein-coding region and intron-exon boundaries of the dyskerin (DKC1) gene (Centogene GmbH, Germany), and no variants were detected. He was retested using WES, and no significant variants in any of the known DC genes were detected. We therefore performed WGS on peripheral blood DNA from the proband DKC1 is an X-linked gene; the proband (V-2) is hemizygous for the deletion, whereas the WGS data showed that his mother, sister, and grandmother (IV-3, V-1 and III-3) are heterozygous. The boundaries of the deletion were confirmed by PCR of a region spanning the deletion followed by Sanger sequencing, in the proband and his unaffected father (IV-2) ( Figure 2C). The presence or absence of the deletion was determined by PCR in 12 members of the extended family ( Figure 2D).
All female family members displaying skin, nail, and hair symptoms (labeled in red in Figure 2D) are heterozygous for the deleted allele, whereas no asymptomatic individual carries the deletion. The 3′ UTR deletion in DKC1 therefore segregates perfectly with disease across this large family.

Carriers of the deletion have moderately short telomeres
To provide additional evidence for the link between the 3′ UTR DKC1 deletion and telomere-related disease, we measured telomere lengths of the 12 members of the extended family using quantitative PCR ( Figure 3A). In agreement with the Flow-FISH result ( Figure S1), telomere length in the proband was well below the 1st centile of the normal population ( Figure 3A). Southern blot analysis of TRFs also showed that the shortest telomere fragments of the proband (V-2) were shorter than those of any of his relatives, despite his young age ( Figure 3B). Heterozygous female carriers of the deletion (red, Figures 3A and 3B, Figure S1) had telomeres that were comparable to or slightly shorter than those of wild-type individuals of similar age by qPCR, TRF, and Flow-FISH analysis, consistent with their mild DC symptoms. Thus, telomere lengths of the extended family are consistent with the DKC1 deletion being causative of disease.

The proband expresses very low levels of DKC1 mRNA
Functional analysis was then performed to determine the impact of the deletion on the function of dyskerin. The 3′ UTRs of genes are involved in many gene regulatory processes, including transcript polyadenylation and stability, translation efficiency, and microRNA binding [26].
Polyadenylation of mammalian mRNAs occurs 10-30 nt downstream of a conserved polyadenylation signal of sequence AAUAAA or AUUAAA [27]. Inspection of the DKC1 3′ sequence revealed two AUUAAA motifs, 132 nt and 41 nt upstream of the end of the transcript, respectively ( Figure 2B). To determine whether either or both of these sequences constitute the poly(A) signal of DKC1, we examined a published singlecell transcriptomic dataset from human peripheral blood mononuclear cells [28], where the 3′-focused sequencing reads extended into non-templated poly(A) stretches that correspond to polyadenylation. For the DKC1 locus, we saw two peaks of such reads associated with each of the AUUAAA motifs ( Figure 4). Thus, DKC1 is expressed in mononuclear cells as two 3′ UTR isoforms. The deleted allele of DKC1 is lacking both of the canonical poly(A) signals, and hence the stability of transcripts arising from this allele is likely to be compromised.
To determine whether polyadenylated DKC1 transcripts from the deleted allele were detectable, 3′ RACE (Rapid Amplification of cDNA ends) was performed using a tailed oligo(dT) primer for reverse transcription, followed by PCR with the tail primer and a primer within the DKC1 3′ UTR (A-fwd; Figure 2B). RACE was performed on RNA isolated from peripheral blood of the proband, his parents, and sister, and products analyzed by gel electrophoresis ( Figure 5A). Transcription of the wild-type or deleted alleles would result in products of 283 bp and 141 bp, respectively. A band consistent with the size of the wildtype allele was seen in all wild-type and heterozygous individuals and controls, along with an additional band of ∼180 bp consistent with a shorter transcript utilizing the internal poly(A) signal. No bands in this size range were detected in the RACE products from the proband (Figure 5A). To increase specificity and sensitivity of the PCR, the products were subjected to an additional round of PCR using a "nested" genespecific primer (C-seq; Figure 2B). The same two bands were observed in heterozygous and wild-type individuals, and faint bands of different sizes were detected in the proband ( Figure 5B). These data suggest that RNA transcribed from the deleted allele of DKC1 may be utilizing an alternative poly(A) signal, such as one present on the non-transcribed strand of the 3′ end of the neighboring MPP1 gene ( Figure 2B) [18], and this results in an unstable transcript. reported that blood cells of most female carriers of X-linked DC show skewed X-chromosome inactivation, with ≥95% transcription arising from their WT allele [29][30][31]. We measured the degree of skewed Xinactivation in peripheral blood of three of the female carriers using a standard assay involving digestion with a methylation-sensitive restriction enzyme followed by PCR across a heterozygous region of the androgen receptor gene on the X-chromosome ( Figure S3). Carriers of the mutation showed substantially skewed inactivation of a single allele (89-96%), consistent with previous studies, whereas in the two wild-type females, the two alleles were approximately equally likely to be inactivated. The reduced DKC1 expression observed in the two female carriers analyzed here, relative to a WT male, is therefore likely to be only partly due to the reduced stability of the mutant transcript.
This is consistent with their almost-normal telomere lengths and very mild presentation of DC features.
Since one end of the deletion was very close to the 3′ UTR of the neighboring gene, MPP1 (Figure 2), we also measured levels of expression of MPP1 in the same four family members by RT-qPCR. MPP1 transcript levels varied between individuals and between different blood samples from each individual, but did not correlate with presence or absence of the DKC1 deletion ( Figure S2B). We therefore conclude that the deletion in this family does not affect MPP1 expression.

The proband has low levels of telomerase RNA
One of the major functions of dyskerin is to stabilize the RNA subunit of telomerase (hTR) [4,32], so levels of hTR in peripheral blood RNA were determined by RT-qPCR in these four family members. Again, the proband had much lower hTR levels (∼20%) than his father; his mother and sister had intermediate hTR levels ( Figure 5D).

DISCUSSION
Most patient-associated variants in DKC1 are missense changes, or small deletions or inversions in the coding region [15,16]. We describe here the first example of a family with DC who instead carries a deletion of a small region of the DKC1 3′ UTR. Multiple lines of evidence support the pathogenicity of this deletion: (a) the deletion segregates with disease and correlates with t across a large family, (b) steady-state levels of DKC1 mRNA are greatly reduced in peripheral blood cells from the proband, and (c) the proband also has dramatically reduced levels of the telomerase RNA subunit, hTR. Past studies have shown that a 50-60% reduction of hTR levels in peripheral blood cells or fibroblasts from DKC1-mutated DC patients is sufficient to lead to telomere shortening and disease [33,34]; the 80% reduction in hTR observed in the patient in this study therefore provides a molecular explanation for his short telomeres.
Discovery of this deletion illustrates that the last 142 nt of the 822 nt DKC1 3′ UTR are essential for full stability of the DKC1 mRNA. This is most likely because this region contains both canonical polyadenylation signals for DKC1; polyadenylation is known to promote mRNA stability by providing a binding platform for proteins and protecting against exonucleolytic degradation [35]. Nevertheless, RT-PCR using an oligo(dT) primer indicated that some poly(A)-containing DKC1 transcript exists in the proband's blood cells, albeit at greatly reduced levels ( Figure 5B, Figure S2A). The size of this transcript is consistent with use of a cryptic polyadenylation signal in the antisense strand of the neighboring MPP1 gene, as has previously been demonstrated in a patient missing the whole DKC1 3′ UTR [18]. The existence of this cryptic polyadenylation signal is likely the only reason that a DKC1 3′ UTR deletion is compatible with survival in male patients, who are necessarily hemizygous for their DKC1 variant, since complete loss of dyskerin expression is known to be lethal in mice [17]. The identification of the two canonical polyadenylation signals in DKC1 may have implications for future therapy of patients with deletions in this region; with emerging gene editing technologies it might become possible to engineer a more effective polyadenylation signal upstream of the deletion, resulting in a higher level of DKC1 expression than is conferred by the existing cryptic polyadenylation signal.
Genomic DNA from the proband was initially analyzed by targeted sequencing of the protein-coding region of DKC1, as well as by WES, without this deletion being detected. WES has the potential to detect deletions or other variants in UTRs, since UTRs are within exons; however, the kits currently used for enriching for exonic regions of DNA vary in which regions of the genome they target and their ability to capture UTRs [36]. This DC family is therefore an excellent example of the importance of thorough analysis of non-coding regions of the genome using WGS or targeted gene sequencing for molecular diagnosis of inherited disease.

CONFLICT OF INTEREST
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

DATA AVAILABILITY STATEMENT
All data that support the findings of this study are available from the corresponding author upon reasonable request.