RNA Sequencing of Urine‐Derived Cells for the Characterization and Diagnosis of Osteogenesis Imperfecta

DNA sequencing is a reliable tool for identifying genetic variants in osteogenesis imperfecta (OI) but cannot always establish pathogenicity, particularly in variants altering splicing. RNA sequencing can provide functional evidence of the effect of a variant on the transcript but requires cells expressing the relevant genes. Here, we used urine‐derived cells (UDC) to characterize genetic variants in patients with suspected or confirmed OI and provide evidence on the pathogenicity of variants of uncertain significance (VUS). Urine samples were obtained from 45 children and adolescents; UDC culture was successful in 40 of these participants (age range 4–20 years, 21 females), including 18 participants with OI or suspected OI who had a candidate variant or VUS on DNA sequencing. RNA was extracted from UDC and sequenced on an Illumina NextSeq550 device. Principal component analysis showed that the gene expression profiles of UDC and fibroblasts (based on Genotype Tissue Expression [GTEx] Consortium data) clustered close together and had less variability than those of whole blood cells. Transcript abundance was sufficient for analysis by RNA sequencing (defined as a median gene expression level of ≥10 transcripts per million) for 25 of the 32 bone fragility genes (78%) that were included in our diagnostic DNA sequencing panel. These results were similar to GTEx data for fibroblasts. Abnormal splicing was identified in 7 of the 8 participants with pathogenic or likely pathogenic variants in the splice region or deeper within the intron. Abnormal splicing was also observed in 2 VUS (COL1A1 c.2829+5G>A and COL1A2 c.693+6T>G), but no splice abnormality was observed in 3 other VUS. Abnormal deletions and duplications could also be observed in UDC transcripts. In conclusion, UDC are suitable for RNA transcript analysis in patients with suspected OI and can provide functional evidence for pathogenicity, in particular of variants affecting splicing. © 2023 The Authors. Journal of Bone and Mineral Research published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research (ASBMR).


Introduction
O steogenesis imperfecta (OI) is one of the most common rare bone disorders and is typically associated with fractures, short stature, blue or gray sclerae, joint hypermobility, and dentinogenesis imperfecta. (1) OI is most often caused by pathogenic variants in COL1A1 and COL1A2, but defects in more than 20 other genes have also been associated with an OI phenotype. (1) As genetic testing is becoming more widely available, the phenotypic spectrum of OI is expanding. It has emerged that a substantial proportion of children, adolescents, and younger adults with recurrent fractures who do not have the typical clinical features of OI nevertheless have mutations in one of these OI-associated genes. (2)(3)(4) Identification of a pathogenic variant is important for individuals with suspected genetic disorders, as it provides diagnostic clarification without the need for invasive investigations and helps understand inheritance risk and tailor screening and management. (5) Current genetic testing for rare disorders typically relies on sequence analysis of genomic DNA to assess targeted gene panels or the whole exome. Although this approach has greatly advanced our ability to diagnose heritable disorders, there are situations where the analysis of genomic DNA is not sufficient to conclusively establish a diagnosis. For example, variants affecting splicing can be missed if they do not affect splice consensus sites. Additionally, the consequences of variants in the splicing region are difficult to determine based on genomic DNA alone, and bioinformatic splice prediction tools are not yet reliable enough to accurately predict splice outcome. (6) Therefore, even if variants in or near the splicing region are detected on DNA sequencing, they are often classified as variants of uncertain significance (VUS) based on the American College of Medical Genetics and Genomics (ACMG) guidelines. (7) It is estimated that 10% to 50% of individuals with genetic conditions have variants affecting splicing, making this an important target for research in genetic testing. (8) When analysis of genomic DNA is inconclusive, RNA sequencing can sometimes help to identify pathogenic variants. (9) However, genetic variants can only be identified and characterized by RNA sequencing if the genes of interest are expressed at sufficiently high levels in the cells from which the RNA is extracted. (10) This can be problematic for the diagnosis of bone disorders, as the cells typically used for genetic testing (white blood cells) have a very different transcriptome to bone cells. (11) The most logical source of cells for diagnosing bone disorders, bone tissue, requires an invasive procedure. Fibroblasts have traditionally been used for the diagnosis of OI, but obtaining fibroblasts requires a skin biopsy, so the issue of invasive sampling remains.
Recently, urine-derived cells (UDC) have garnered attention as a source of cells to study genetic disorders because of their ease of isolation and expansion, multi-differentiation potential, and noninvasive sampling. (12) RNA sequencing of UDC has been used to successfully identify disease-causing genetic variants in a variety of heritable disorders, (6,13,14) but the use of these cells in bone disorders such as OI is yet to be explored.
In the present proof-of-concept study, we therefore obtained urine samples from individuals with suspected or confirmed OI, cultured UDCs, and performed RNA sequencing. The aim of the study was to assess the suitability of this approach for identifying and characterizing splice defects and determining the transcriptional effects of rare and unusual variants.

Subjects
Study participants were recruited at Shriners Hospitals for Children -Canada between May 2021 and May 2022. Patients aged 3 years to 21 years were eligible for inclusion if they were able to provide a clean-catch urine sample. Because the main purpose of the study was to assess whether UDC could be useful for the molecular diagnosis of OI, we recruited individuals who had suspected or confirmed OI and who had a candidate variant identified on DNA sequencing that was thought to affect splicing or represented an unusual type of variant. A total of 23 patients with OI were recruited. To characterize the range of UDC gene expression in more detail, an additional 22 individuals were recruited who were seen at our institution but did not have a diagnosis of OI. Thus, a total of 45 individuals participated in the study.
Informed consent was obtained from a parent/legal guardian of participants younger than 18 years and from the participant if aged 18 years or older. Ethics approval was obtained from the McGill University Institutional Review Board (IRB study no. A04-M10-21A).

DNA sequencing
Molecular diagnosis results based on DNA sequencing were obtained by chart review. These results had been obtained either by Sanger sequencing (Applied Biosystems [Carlsbad, CA, USA] 3500 Genetic Analyzer) or by next-generation sequencing. Next-generation sequencing was performed with an Ion Torrent PGM (Life Technologies, Carlsbad, CA, USA) or a NextSeq550 (Illumina, San Diego, CA, USA) device in the Molecular Diagnostics Laboratory at Shriners Hospitals for Children -Canada, which is accredited for clinical diagnostic testing. Next-generation sequencing targeted a "metabolic bone disorder" panel of genes that are associated with fracture disorders, mineralization defects (rickets) or osteopetrosis. The following 41 genes were included in the panel: Variants were classified as pathogenic, likely pathogenic, benign, likely benign, or a VUS based on the ACMG classification system. (7) We defined consensus splice sites and splice regions according to the definitions from The Sequence Ontology. (15) A consensus splice site was defined as the first 2 bases of an intron at an exon-intron junction. A splice region included the first 3 bases of the exon adjacent to an intron and bases 3-8 of the intron surrounding the splice junction.
Culture of urine-derived cells Urine samples (mean volume 92 mL, range 10-261 mL) were obtained via clean catch, stored on ice, and processed within 4 hours. Samples were centrifuged, washed, and cultured according to steps 1-16 of the protocol published by Zhou and colleagues (16) with the following modifications: (i) The aspirated supernatant was recentrifuged after step 4 to capture additional cells, and the cell suspension was distributed across multiple wells (rather than a single well) in a 12-well plate in samples with large numbers of cells or excess sediment. (ii) The following ingredients were added to the growth media: Y-27632 (rho-associated protein kinase inhibitor) at a final concentration of 10 μM (Stem Cell Technologies, Vancouver, Canada; catalog no. 72302), which was essential for establishing UDC growth, (17) and TGF-β1 at a final concentration of 1 ng/mL (PeproTech, Rocky Hill, NJ, USA; catalog no. 100-21C), which appeared to improve colony growth. (iii) Media changes were performed every 2 to 3 days (rather than daily) from culture day 4 onward.
The first cell passage was performed when the primary cell colonies covered approximately a quarter of the surface of the well. RNA extraction was performed at a median of 18 days (range 12-34) after urine sample collection.

RNA sequencing
Total RNA was obtained from UDCs by TRIzol extraction (Thermo Fisher Scientific, Waltham, MA, USA; #15596026). Total RNA was quantified using a NanoDrop Spectrophotometer ND-1000 (Thermo Fisher Scientific), RNA quality was assessed on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Study samples had an RNA concentration between 100 ng/μL and 300 ng/μL, and all samples had an RNA integrity number of 10.0.
Libraries were generated from 100 ng of total RNA as follows: mRNA was enriched by poly-A selection (NEB Next Poly(A) mRNA Magnetic Isolation Module, New England Biolabs, Ipswich, MA, USA; E7490) and depletion of ribosomal RNA (NEBNext Globin and rRNA Depletion Kit, New England Biolabs, E7755). Libraries were prepared using the NEBNext Ultra II Directional RNA Library Prep Kit (NEB, E7645). cDNA synthesis was achieved by the NEB-Next Ultra Directional RNA First Strand Synthesis and the Second Strand Synthesis Modules. The sample library indexing was prepared using NEBNext Multiplex Oligos for Illumina (NEB, E6440S). Libraries were quantified with a Qubit 2.0 Fluorimeter using Qubit dsDNA HS and BR Assay Kits (Thermo Fisher Scientific, Q32853). Library fragment average size was determined by Agilent DNA 1000 Kit (Agilent Technologies, 5067-1504). The libraries were normalized, pooled, and then diluted to a concentration of 2 nM. The pool was denatured by 0.2 N NaOH and neutralized by Tris-HCl (pH 7.0, Teknova, Mansfield, MA, USA; T2260). The final library was diluted to 1.5 pM and phiX library was added as an internal control at 1% level.
Sequencing was performed on an Illumina NextSeq550 device using a high-throughput flow cell with eight samples multiplexed (150-bp paired-end reads). Base calling was performed with RTAv3, and bcl2fastq2 v2.20 was used to demultiplex the samples and generate fastq reads. The fastq reads quality control was evaluated by FastQC (version 0.11.9) and aggregated results were generated by MultiQC (version 1.14). Alignment to the human hg19 reference genome was performed using the STAR aligner (version 2.7.9a). Read counts were determined using Samtools (version 1.16.1). StringTie (version 2.2.1) was used for quantification of gene expression, and gene abundance estimates were expressed as transcripts per million (TPM).
Integrative Genomics Viewer (IGV v2.8.2) was used to visually assess the RNA sequencing output, determine the number and percentage of reads containing a specific variant, and assess for consequences at the transcript level (such as addition or deletion of bases). The Sashimi plot function in IGV was used to assess splicing abnormalities and determine the number and percentage of reads using a particular splice junction.
To compare UDC transcriptomes with transcriptomes from whole blood cells and cultured skin fibroblasts, we obtained publicly available TPM values from reference gene expression samples (V8 data of the GTEx portal; https://www.gtexportal. org/home/datasets). TPM data from all 755 whole blood and 504 cultured fibroblast samples of the GTEx data set were included. PCA plots were generated using the PCAtools package in R (version 4.2.3). The lower 10% of genes based on variance were removed, as per the default settings of the software.

Participants
Among the 45 study participants, UDC culture of the first urine sample was unsuccessful in 9 individuals. In 4 of these 9 study participants, UDC could eventually be grown from a repeat sample. The remaining 5 individuals were not available for a second urine sampling. Thus, we were able to culture UDC and perform RNA sequencing in 40 study participants (21 females, 19 males; median age 12 years, range 4-20 years), including 18 individuals with OI or suspected OI. Demographics of the OI cohort are shown in Table 1.

Characteristics of UDC
The behavior of UDC corresponded to previously published descriptions, including their adherence to plastic in culture dishes, rapid expansion, and their capacity to grow after freezing and thawing. (16,19) We also confirmed two distinct primary colony morphologies, including rounded cobblestone-like cells and elongated spindle-shaped cells, as described by others. (16,19) To characterize UDC surface markers, flow cytometry was performed in UDC samples from 4 individuals with OI (Fig. 1A). This showed that UDC were strongly positive for several typical mesenchymal stem cell markers (CD24, CD29, CD44, and CD73). Results for CD90 and CD105 were variable, but none of the samples reached the threshold of 95% positive cells for these two markers that is considered characteristic of mesenchymal stem cells. (20) The hematopoietic stem cell markers CD34 and CD133 were consistently negative in all samples (data not shown). Overall, these results were similar to previous studies made in UDC. (18,21) RNA sequencing yielded a median of 121 million reads per sample (range 104-137 million). We compared the transcriptome profile of UDC to those of white blood cells and fibroblasts, two tissue types that are widely used in human RNA sequencing studies. Using principal component analysis (PCA) of the gene expression profiles, we compared UDC data from 40 participants of the present study to results of whole blood (n = 755) and cultured fibroblasts (n = 504) from the GTEx study. This revealed three distinct clusters (Fig. 1B). UDC and fibroblasts were clustering close together and had less variability than whole blood cells.

Quantitation of gene expression by RNA sequencing
Using transcript abundance estimates, we assessed whether the genes included in our 41-gene metabolic bone disorder panel were expressed at levels that allowed for identifying variants through RNA sequencing. As proposed by Murdock and colleagues, we considered genes with a TPM ≥10 as sufficiently expressed for RNA sequencing. (22) We observed that 25 of the 41 genes on our entire metabolic bone disorders panel (61%) had a median TPM across all samples of 10 or higher (Fig. 2). Among the subset of 32 genes that were associated with OI and other bone fragility disorders, a median TPM ≥10 was found in an average of 25 genes (78%, range [23][24][25][26]. Genes involved in collagen type I synthesis and processing (eg, COL1A1, COL1A2, CRTAP, FKBP10) were well expressed. Genes specific for osteoblasts and osteocytes (eg, IFITM5, SP7, PHEX, FGF23) were poorly or not expressed. We compared these gene expression results in UDC to the transcript abundance estimates that the GTEx consortium had observed for fibroblasts and white blood cells (Fig. 2). (23,24) UDC and fibroblasts had a similar proportion of genes with a TPM ≥10 among our 41 target genes. In contrast, white blood cells expressed only seven of the 41 genes at a sufficient level.

Transcript analysis-splicing abnormalities
Thirteen study participants had DNA variants in four different genes (COL1A1, COL1A2, FKBP10, and CRTAP) that were thought to potentially affect splicing (Table 1). According to ACMG criteria, 8 of these individuals had pathogenic or likely pathogenic variants (participant P1 to P8) and 5 individuals had VUS (P9 to P13). There were two pairs of study participants with identical variants.
RNA sequencing identified splice abnormalities in seven of the eight samples with pathogenic or likely pathogenic variants ( Table 1). Two of the five VUS also were associated with splice abnormalities (COL1A1 c.2829+5G>A and COL1A2 c.693+6T>G), whereas no effect on splicing was found for the remaining three VUS.
In the 5 individuals with splice variants in COL1A1 and a diagnosis of OI type I (P1 to P4, P9), the mis-splicing resulted mostly in frameshifts (Table 2; Fig. 3). In-frame variants, where present, represented less than 1% of reads. In contrast, the COL1A1 splice defect Fig. 2. Heat map of gene expression for UDC compared with GTEx data for human fibroblasts and whole blood cells for the 41 genes on our metabolic bone disorders panel. Transcripts per million (TPM) ≥10 is considered adequate expression for analysis. (22) Data for urine-derived cells (UDC) was obtained from RNA sequencing in the present study. Results for fibroblasts and whole blood cells were taken from data published by the GTEx Consortium. Genes are clustered by general disease type and pathogenic mechanism.
Journal of Bone and Mineral Research in participant P5 with OI type III was associated with 4% of reads harboring an in-frame nucleotide loss. The two COL1A2 splice variants in the present series (P6 and P12) both led to exon skipping and resulted in an in-frame deletion of nucleotides on the transcript level. The homozygous FKBP10 consensus splice site variant (P7) was associated with frameshifts and complete absence of wild-type splice product. The homozygous deep intronic variant in CRTAP (P8) led to inclusion of a 73 bp pseudoexon, and thereby a frameshift, in about half of the reads.
No abnormal splicing events were found in individual P10, who had recurrent fractures without additional clinical features of OI and who had a VUS in COL1A1 (c.1768-3del). Similarly, no abnormal COL1A1 splicing was detected in participant P11, who had typical clinical features of OI type I and a deeper intronic VUS in COL1A1 (c.1003-44_1003-33del). In participant P15, genomic DNA sequencing had shown two variants in COL1A2: a pathogenic glycine substitution (c.1684G>A, p.Gly562Ser) and a splice region variant (c.1719+3G>T). Both variants were de novo (not present in the unaffected parents). RNA was analyzed to determine if the splice region variant was also pathogenic, but no splicing abnormality was found in COL1A2 (Table 2).

Transcript analysis of rare variants
Individual P15 had a 26-base frameshift duplication in exon 4 of COL1A1 (c.336_361dup). This variant was found in less than 1% of transcript reads, suggesting efficient nonsense-mediated decay (NMD). In contrast, premature termination codons in the last and in the penultimate exon of a gene usually do not lead to the same degree of NMD. (25) This general rule was confirmed in two individuals of the present series. An 11-base frameshift deletion in the final exon of COL1A1 (c.4325_4335del; individual P16) was observed in $25% of transcript reads (Fig. 3E) and a stop mutation in the penultimate exon of COL1A1 (c.4051C>T, p.Gln1351Ter; individual P14) occurred in 15% of reads on the transcript level.
A large heterozygous deletion in COL1A1 spanning exons 26-49 (P17) demonstrated reduced coverage in the affected region of COL1A1 compared with the remainder of the gene. A frameshift insertion of a single nucleotide in exon 12 of LRP5 (c.2737dup; individual P18) was observed in 8% of reads on the transcript level.

Discussion
In this study, we found that UDC can be used as a source of material for RNA sequencing to diagnose OI and similar bone fragility disorders. The genes that are most commonly affected in OI, COL1A1 and COL1A2, and other genes related to collagen type I production are expressed at sufficiently high levels to detect transcript abnormalities by RNA sequencing. We used this approach to provide transcript-level functional evidence supporting pathogenicity (abnormal splicing) in two VUS. We also characterized splicing abnormalities caused by variants at the consensus splice site and deeper within the intron and transcript abnormalities resulting from deletions and duplications.
Data from GTEx indicate that OI-associated genes tend to be poorly expressed in white blood cells. (24) White blood cells are therefore not a suitable source of material for diagnosing OI by RNA sequencing. Our present observations suggest that most genes associated with OI are well expressed in UDC. Indeed, we found that UDCs can be used to analyze the transcripts of genes that are responsible for OI in 98% of individuals with a typical OI phenotype at our institution. (26)  The gene expression profile of fibroblasts is much less variable than of whole blood cells, which facilitates the detection of differences in gene expression by RNA sequencing in fibroblasts and makes fibroblast the preferred source of material for RNA sequencing studies. (22) In the present study, we observed that the gene expression profile was similar between UDC and fibroblasts, suggesting that UDC might also be useful for identifying disease-causing variants by RNA sequencing. Compared with fibroblasts, which require a skin biopsy, UDC can be obtained noninvasively from a clean-catch urine sample. By eliminating the necessity for an invasive procedure, the use of UDCs lowers the threshold for performing functional testing of variants that are classified as VUS after genomic DNA sequencing and thereby help to solve the VUS "conundrum." (27) The required time to grow UDC for RNA extraction was $3 weeks, which is comparable to skin fibroblasts. UDC can be frozen and thawed for later use, and it is possible to passage them multiple times. We found that UDC express several stem cell surface markers (CD24, CD29, CD44, and CD73). However, none of the UDC samples showed >95% positivity for CD90 and CD105, and thus our UDC did not meet minimal criteria required for the identification of mesenchymal stem cells. (20) DNA sequencing often results in VUS, in which case no definite molecular diagnosis can be made. The finding that a VUS in fact causes mis-splicing provides strong evidence for pathogenicity of the variant. This is what we observed for a VUS in COL1A1 (c.2829+5G>A) and a VUS in COL1A2 (COL1A2 c.693+6T>G), indicating that these variants are likely pathogenic. Conversely, intronic variants that do not have a detectable effect on the transcript appear less likely to be pathogenic. An example in the present study is the intronic COL1A1 deletion c.1003-44_1003-33del, which had previously been published as a variant that likely causes OI through an effect on splicing. (28) No such effect on splicing was found in the present study, raising doubts whether this variant is in fact causing OI.
Transcript analysis in UDC can also be useful to study genotype-phenotype correlations. For example, variants in COL1A2 are expected to cause OI only if they have a dominantnegative effect, given that simple heterozygous loss of function in COL1A2 does not cause an OI phenotype. (29,30) For COL1A2 splice variants to be responsible for an OI phenotype, they should therefore not lead to frameshifts and NMD, as is the case for many splice variants in COL1A1. (31) In accordance with this view, we observed that the two pathogenic splice variants in COL1A2 were associated with exon skipping and in-frame deletions.
A study on RNA testing in fibroblasts reported that individuals with the same genomic COL1A1 variant on the DNA level have similar splicing outcomes at the mRNA level. (31) Our examination of transcripts from participants with identical pathogenic variants in COL1A1 did show a small degree of variation in the abnormal mRNA products. Two unrelated participants (P3 and P4) with COL1A1 c.1354-12A>G had a similar OI phenotype, and both showed activation of the same cryptic splice site in 2% and 4% of reads. However, individual P3 also had exon skipping in less than 1% of reads. Siblings with COL1A1 c.334-9A>G (individuals P1 and P2) both had an addition of eight bases to the start of exon 4 in a small proportion of reads, but inspection of IGV showed abnormal splicing at the cryptic splice site only for P1. This finding is likely related to the lower coverage in COL1A1 in individual P2 (the TPM was 45 in P2 compared with 834 in P1). The cause of the difference in COL1A1 expression between these two individuals is unclear, but it may indicate the need for a higher TPM threshold when analyzing splicing abnormalities in cases where abnormally spliced products are anticipated to undergo NMD.
Transcripts containing a premature termination codon in the final exon of COL1A1 are predicted to escape NMD. (32) Accordingly, frameshift or stop mutations in the final exon of COL1A1 in our study resulted in a higher proportion of abnormal mRNA transcript (15% to 25% of reads) than frameshifts from cryptic splice sites in more upstream exons (≤4%).
There are several limitations to the use of UDC for RNA sequencing analyses in bone disorders. Urinary cells have to be processed within a few hours, which makes it difficult to ship samples to external laboratories. Not all urine samples result in successful cell cultures. Also, UDC may not be a suitable source of material for all bone disorders. For example, we observed that genes associated with heritable forms of rickets were typically expressed at levels that were insufficient for current RNA sequencing approaches. For genes with low expression, PCRbased RNA diagnostics rather than RNA sequencing may be necessary. (6) Another potential avenue is to differentiate cells into an osteoblast lineage to increase the expression of osteoblastspecific genes. Some splice events have been reported to be tissue-specific, and therefore results of splice analyses in UDC might differ from those in fibroblasts. However, four of the splice variants that we studied in UDCs had previously been investigated in skin fibroblasts and the results were very similar.
In conclusion, this study demonstrates that UDC can be used as a source of material for RNA sequencing to help diagnose OI. This approach can be used to identify pathogenic variants, provide functional evidence for potential splicing defects, and assist in characterizing known splicing variants, deletions, and duplications, including providing correlation with the phenotype. We envision that RNA-based diagnostics using UDC may be especially useful for the functional characterization of variants that are classified as VUS by DNA sequencing.