Mutational burden and potential oligogenic model of TBX6‐mediated genes in congenital scoliosis

Abstract Background Congenital scoliosis (CS) is a spinal deformity due to vertebral malformations. Although insufficiency of TBX6 dosage contributes to a substantial proportion of CS, the molecular etiology for the majority of CS remains largely unknown. TBX6‐mediated genes involved in the process of somitogenesis represent promising candidates. Methods Individuals affected with CS and without a positive genetic finding were referred to this study. Proband‐only exome sequencing (ES) were performed on the recruited individuals, followed by analysis of TBX6‐mediated candidate genes, namely MEOX1, MEOX2, MESP2, MYOD1, MYF5, RIPPLY1, and RIPPLY2. Results A total of 584 patients with CS of unknown molecular etiology were recruited. After ES analysis, protein‐truncating variants in RIPPLY1 and MYF5 were identified from two individuals, respectively. In addition, we identified five deleterious missense variants (MYOD1, n = 4; RIPPLY2, n = 1) in TBX6‐mediated genes. We observed a significant mutational burden of MYOD1 in CS (p = 0.032) compared with the in‐house controls (n = 1854). Moreover, a potential oligogenic disease‐causing mode was proposed based on the observed mutational co‐existence of MYOD1/MEOX1 and MYOD1/RIPPLY1. Conclusion Our study characterized the mutational spectrum of TBX6‐mediated genes, prioritized core candidate genes/variants, and provided insight into a potential oligogenic disease‐causing mode in CS.


| INTRODUCTION
Congenital scoliosis (CS) is a spinal deformity caused by malformations of vertebrae, which include the failure of formation (CS type I), failure of segmentation (CS type II), or a combination of the two (CS type III) (Hedequist & Emans, 2007). With an incidence of 0.5-1 per 1000 live births, CS is a major contributor to childhood and adolescent disability (Shen, Wang, Liu, Xue, & Qiu, 2013).
In our previous studies, we found that TBX6 gene contributes to about 10% of CS patients with a compound inheritance disease-causing mode, that is, a TBX6 null mutation or 16p11.2 deletion in trans with a common T-C-A (rs2289292, rs3809624, and rs3809627) haplotype Wu et al., 2015;Yang et al., 2019). We defined this entity of patients as TBX6associated congenital scoliosis (TACS) . In addition to null mutations and copy number deletions, missense variants in TBX6 which lead to a functional null effect also lead to TACS in combination with the risk T-C-A allele . In contrast to this mutation + polymorphism combination, biallelic loss-of-function or dominant-negative mutations in TBX6 cause Spondylocostal Dysostosis (MIM#122600), a severe skeletal dysplasia syndrome characterized by multiple segmentation defects of the spine and costal dysplasia (Sparrow et al., 2013). Although TBX6 mutations contribute to a substantial proportion of CS, the molecular etiology for the majority of CS remains largely unknown, and more candidate genes/loci warrant investigation.
Of these TBX6-mediated genes, RIPPLY2, MESP2, and MYF5 are associated with known Mendelian syndromes involving skeletal abnormalities. Similar with TBX6, biallelic loss-of-function variants in RIPPLY2 or MESP2 cause autosomal recessive Spondylocostal Dysostosis (MIM#616566 for RIPPLY2 and MIM#608681 for MESP2). Recessive MYF5 variants are recently reported to be associated with External Ophthalmoplegia, Rib, and Vertebral Anomalies (MIM#618155), whose spine phenotype highly resembles that of Spondylocostal Dysostosis (Di Gioia et al., 2018). Therefore, these TBX6-mediated genes together represent a promising candidate gene set for CS. However, due to the milder symptom of CS compared with the above-mentioned recessive Mendelian syndromes, the effect of the candidate variants in CS probably vary and their disease-causing modes are presumably complicated.
To explore the mutational landscape of TBX6-mediated genes in CS, and to give insight into the potential multi-factorial disease-causing mode, we performed exome sequencing (ES) on 584 individuals with congenital scoliosis without a prior molecular diagnosis, and then, studied both heterozygous variants and combinations of variant alleles.

| Ethical compliance
Approval for the study was obtained from the ethics committee at Peking Union Medical College Hospital (JS-098). Written informed consent was provided by each participant.

| Patient recruitment
We consecutively recruited individuals who are affected with CS in Peking Union Medical College Hospital from 2009 to 2016, as part of the Deciphering Disorders Involving Scoliosis and COmorbidities (DISCO) study (http://www. disco study.org/).

| Exome sequencing
Proband-only exome sequencing was performed on patients (1) without a molecular diagnosis of TACS, that is, not caused

| Analytic strategy
TBX6-mediated genes were curated through literature review. Only genes regulated by TBX6 during somitogenesis were selected, including MEOX1, MEOX2, MESP2, MYOD1, MYF5, RIPPLY1, and RIPPLY2. Variants in candidate genes were filtered against population databases. Because a MAF of 0.01 is generally used to distinguish a variant from a polymorphism, we define a rare variant as having a MAF lower than 0.01. Taken into consideration the disease incidence of congenital scoliosis (0.0005-0.001) (Shen et al., 2013), we suggest that a variant with MAF ≥0.001 should not be pathogenic in a dominant disease trait. Therefore, a MAF F I G U R E 1 Workflow and main findings of the study cutoff of 0.001 was used to filter for ultrarare variants. In our analysis, ultrarare variants (MAF ≤0.001) were selected for monogenic analysis and rare variants (MAF ≤0.01) were selected for potential oligogenic analysis. Protein-truncating variants (including stop-gain, frameshift, splice donor, splice acceptor, and start-loss variant), predicted deleterious missense variants (CADD score ≥20) and in-frame indels were prioritized. Sanger sequencing was performed on proteintruncating variants, deleterious missense variants, and variants presenting an oligogenic disease-causing mode.

| In-house controls
In-house controls are consisted of 1854 unrelated individuals without apparent skeletal deformities. ES were performed on these individuals using the same protocol as employed in CS cases.

| Statistics
R (version 3.6.1) was used for the statistical analysis. Fisher's Exact Test was used to compare the burden of deleterious missense variants with MAF ≤0.001 between cases and controls.

| Mutational spectrum of ultrarare variants in TBX6-mediated genes
We recruited 584 individuals affected with CS that could not be explained by TBX6 or other CS-causing genes ( Figure 1). After ES data processing and variant filtering, we identified a total of 28 ultrarare (MAF ≤0.001) variants in TBX6-mediated genes (Figures 2 and 4). Of them, seven variants are protein-truncating or predicted to be deleterious (CADD >15), presenting a tendency toward a significant mutational burden as compared with the in-house controls (nine ultrarare deleterious variants in 1854 control samples, p = 0.08, Fisher's Exact Test).

| Identification of protein-truncating variants in RIPPLY1 and MYF5
We first analyzed variants predicted to cause complete lossof-function of the proteins. As a result, two protein-truncating variants in TBX6-mediated genes were identified from two patients (Table 1).
A hemizygous splice-donor variant c.156-1G>C in RIPPLY1 was identified in CSS161458, a 6-year-old boy affected with CS type II. CSS161458 had a right thoracic curve F I G U R E 2 Developmental and molecular etiology of congenital scoliosis. The illustration demonstrates how genetic mutations identified in TBX6mediated genes may alter the process of somitogenesis and whereby cause congenital scoliosis. Arrows between gene symbols indicate activation. The diamond between RIPPLY1/2 and TBX6 indicates a negative feedback between the molecules (magnitude of 40°, apex located at T9) with multiple segmentation failures from T8 to L1 (Figure 3). Although RIPPLY1 has not been associated with a human developmental disease, its paralog, RIPPLY2, causes Spondylocostal Dysostosis (MIM#616566) in an autosomal recessive mode. In addition, knock-out of Ripply1/Ripply2 in mouse results in disrupted somitogenesis and causes gross spinal deformity (J. Takahashi et al., 2010), suggesting the important role of RIPPLY1 in the normal development of the spine. Our data suggest that truncating variants in RIPPLY1, in a hemizygous state, could be associated with vertebrae malformations in human.
A heterozygous stop-gain variant in MYF5 was identified in CSS160633, a 3-year-old boy affected with type I CS. He had two curves located at thoracic spine and lumbar spine, with wedge vertebrae located at apex region separately (Figure 3). Recently, biallelic frameshift and deleterious missense variants in MYF5 has been associated with External Ophthalmoplegia, Rib, and Vertebral Anomalies (MIM#618155). The spinal phenotypes of this syndrome include cervical and thoracic scoliosis, cervical fusions, clivus malformations, basilar invagination, and narrow disc spaces, which represents a more complex condition than that in our patient. Therefore, we propose that heterozygous loss-of-function of MYF5 might be associated with a non-syndromic form of CS.
Notably, we observed an excess of ultrarare deleterious missense variants in MYOD1 as compared with the in-house control (4/584 in CS cases vs. 2/1854 in controls, p = 0.032, Fisher's Exact Test). These findings suggest the possible involvement of MYOD1 in the pathogenesis of CS. MYOD1 encodes an early transcriptional factor during somitogenesis, and is required for MYF5 expression in the early mesoderm (Maguire, Isaacs, & Pownall, 2012). Intriguingly, two out of four deleterious missense variants in MYF5 were adjacently located in the HLH domain (Figure 4), which is essential for the transcriptional activity of MYF5 (Hamamori, Wu, Sartorelli, & Kedes, 1997). Therefore, these two variants c.464G>T(p.Arg155Leu) and c.458C>G(p.Ala153Gly) represent promising candidates in our analysis, and suggest the perturbation of the HLH domain as a potential etiology of CS.

| Potential oligogenic disease-causing mode
To give further insight into the complex molecular mechanism of CS, we analyzed variant combinations of TBX6-mediated genes. As a result, we found two patients with a potential oligogenic inheritance (Table 2). CSS170323 carries a heterozygous missense variant c.630G>C(p.Met210Ile) in MYOD1 and a heterozygous missense variant c.190G>A(p.Ala64Thr) in MEOX1 (Table 2). CSS170323 presented with L2 hemivertebra and fused ribs (the right 11th rib and 12th rib). During mesoderm development, the expression of MEOX1 is increased by MYOD1 (Gianakopoulos et al., 2011), suggesting that these two variant potentially result in the cumulative perturbation of TBX6-mediated pathway.
CSS161458 had a heterozygous splicing variant c.156-1G>C in RIPPLY1, as described above, and a heterozygous missense variant c.464G>T(p.Arg155Leu) in MYOD1 was also identified. Although no direct interaction between F I G U R E 3 Images and mutational information of patients with protein-truncating or deleterious missense variants. Spine X-rays of six patients with truncating variants or deleterious missense variants are presented. Sanger sequencing results and residue conservation of these variants are also shown F I G U R E 4 Mutational spectrum of the candidate genes in CS with unknown molecular etiology. Mapping of genetic variants in candidate genes to protein sequences annotated with functional domains RIPPLY1 and MYOD1 has been reported, they may together dysregulate the TBX6 pathway given the deleterious nature of both variants (Table 2).

| DISCUSSION
In this study, we performed exome sequencing on 584 patients with CS and without a molecular diagnosis. Variants in seven TBX6-mediated genes involved in somitogenesis were selected for analysis. Protein-truncating variants, in-frame indels and deleterious missense variants were prioritized. Potential oligenic disease-causing modes were also identified.
The candidate gene strategy has been widely used in parallel sequencing studies on presumptive genetic disorders Wu et al., 2019). The biological relevance and the rationality of the candidate gene set are critical for the success of this strategy. For example, focusing on genes encoding ion channels could promote the identification of novel genes in epilepsy (Epi & Epilepsy Phenome/Genome, 2017); the use of a systemic multiple candidate gene approach also explains a substantial proportion of the heritability in complex traits such as hypertension (Ji et al., 2017).
In our disease context of CS, the candidate gene set was selected based on a developmental biologic aspect: the skeletal system of a vertebrate embryo originates from the pre-somitic mesoderm, which segments into somite during early embryo development. During this process, TBX6 is expressed in the entire pre-somitic mesoderm, and regulates a series of genes to enable the normal somitogenesis and subsequent development of the skeletal system (Oginuma et al., 2008). As TBX6 has been revealed as a core disease gene of CS Liu et al., 2019;Wu et al., 2015), we conducted this study based on a hypothesis that TBX6-mediated genes may as well contribute to CS. The rationality of this hypothesis is supported by disrupted spinal development in animal models depleted of the candidate genes (Takahashi et al., 2010;Windner et al., 2015), and by several autosomal recessive syndromes resulting in spinal malformation cause by these genes (Di Gioia et al., 2018;Karaca et al., 2015).
Due to the fact that CS represents a relatively milder phenotype spectrum than the autosomal recessive syndromes caused by biallelic loss-of-function of the candidate genes, the variant spectrum and disease-causing mode of these genes in CS are assumed to be more complicated. More recently, there has been a surge of interest in digenic/oligogenic inheritance mode in complex disease (Schaffer, 2013). For example, mutational combination of PCDH15 and USH1G have been identified in non-syndromic hearing loss; oligogenic inheritance of three cardiomyopathy-associated genes has been reported in a family with cardiac anomaly and supported by experimental evidence (Gifford et al., 2019). In our study, potential oligogenic inheritance mode were identified in two cases. Mutation combinations between MYOD1/MEOX1 and MYOD1/RIPPLY1 were observed. Given the relatively low probability of loss-of-function intolerance (pLI) of the candidate genes in the gnomAD database, a deleterious variant alone in any of these genes might not be disease causing. However, the combined effect of deleterious variants in multiple genes might synergistically lead to the disease. Although experimental validation of the oligogenic model still needs to be conducted, our data give insight into the complex disease-causing mode of CS and suggest the combined effect of mutations in TBX6-mediated genes as an important mechanism in the pathogenesis of CS.
In conclusion, our study characterized the mutational spectrum of TBX6-mediated genes in CS, prioritized core candidate genes/variants, and provided insight into a potential oligogenic disease-causing mode.