Identification of common differentially expressed genes in Turner (45,X) and Klinefelter (47,XXY) syndromes using bioinformatics analysis

Abstract Background Analysis of patients with chromosomal abnormalities, including Turner syndrome and Klinefelter syndrome, has highlighted the importance of X‐linked gene dosage as a contributing factor for disease susceptibility. Escape from X‐inactivation and X‐linked imprinting can result in transcriptional differences between normal men and women as well as in patients with sex chromosome abnormalities. Objective To identify differentially expressed genes among patients with Turner (45,X) and Klinefelter (46,XXY) syndrome using bioinformatics analysis. Methodology Two gene expression data sets of Turner (45,X) and Klinefelter syndrome (47,XXY) were obtained from the Gene Omnibus Expression (GEO) database of the National Center for Biotechnology Information (NCBI). Statistical analysis was performed using R Bioconductor libraries. Differentially expressed genes (DEGs) were determined using significance analysis of microarray (SAM). The functional annotation of the DEGs was performed with DAVID v6.8 (The Database for Annotation, Visualization, and Integrated Discovery). Results There are no genes over‐expressed simultaneously in both diseases. However, when crossing the list of under‐expressed genes for 45,X cells and the list of over‐expressed genes for 47,XXY cells, there are 16 common genes: SLC25A6, AKAP17A, ASMTL, KDM5C, KDM6A, ATRX, CSF2RA, DHRSX, CD99, ZBED1, EIF1AX, MVB12B, SMC1A, P2RY8, DOCK7, DDX3X, eight of which are involved in the regulation of gene expression by epigenetic mechanisms, regulation of splicing processes and protein synthesis. Conclusion Of the 16 identified as under‐expressed in 45,X cells and over‐expressed in 47,XXY cells, 14 are located in X chromosome and 2 in autosomal chromosome; 8 of these genes are involved in the regulation of gene expression: 5 genes are related to epigenetic mechanisms, 2 in regulation of splicing processes, and 1 in the protein synthesis process. Our results are limited by it being the product of a bioinformatic analysis from mRNA isolated from whole blood, this makes necessary further exploration of the relationships between these genes and Turner syndrome and Klinefelter syndrome in the future.


| INTRODUCTION
Turner syndrome (TS) and Klinefelter syndrome (KS) are two of the most common sex chromosome aneuploidies (Berglund et al., 2019). The prevalence of TS is approximately 1 in 2000 to 1 in 4000 among women and is characterized by a chromosomal disorder affecting phenotypically female individuals harboring one intact X chromosome and complete absence [X monosomy (45,X), resulting in 50% of cases] or partial absence of the second X chromosome (including short or long arm deletion, ring X, isochromosome of the long arm and mosaicism, and fusion of 45,X and 46,XX cell lines) along with one or more clinical manifestations (Gravholt et al., 2017;Kesler, 2007). TS presents with broad clinical manifestations, potentially including features such as the characteristic facial appearance (ocular hypertelorism, epicanthal folds, ptosis, a depressed nasal root, short nose with a broad base, low-set posteriorly rotated ears, and micrognathia), with neck webbing and lymphedema, linear growth failure, ovarian insufficiency (pubertal delay), early sensorineural hearing loss, distinctive congenital cardiovascular, skeletal, digital and renal anomalies, a specific neurodevelopmental profile, and a constellation of other disorders commonly occurring in TS, including hypothyroidism and celiac disease (Gravholt et al., 2017). This broad phenotypic spectrum potentially results from the haploinsufficiency of different genes evading X chromosome inactivation and are normally biallelically expressed in 46, XX women and 46, XY men (Carrel & Willard, 2005;Kesler, 2007;Morgan, 2007).
KS occurs in approximately 1 of 600 newborn boys and is characterized by an extra X chromosome. Most (90%) cases present as a pure form with a 47, XXY karyotype, while the remaining 10% include the following sex-chromosomal abnormalities: mosaicism (46, XY/47, XXY), higher grade aneuploidy (48,XXXY;49,XXXXY), and structurally abnormal X chromosomes (Bearelly & Oates, 2019). Men with KS are usually tall, have a gynecoid habitus, are hypogonadal, and may have metabolic, cognitive, psychiatric, and cardiac disorders Groth, Skakkebaek, Høst, Gravholt, & Bojesen, 2013;Zitzmann et al., 2015) and, similar to women, are 14 times more likely to develop systemic lupus erythematosus (SLE) than 46,XY men (Scofield et al., 2008). However, no distinct dysmorphic features have been reported, and presentation may vary in accordance with the degree of gonadal dysfunction. Furthermore, the severity of disease presentation is strongly correlated with the severity of the sex chromosome aneuploidy, that is, higher-grade aneuploidies (Bearelly & Oates, 2019).
Although several studies have investigated the association between the karyotype and the severity of the clinical manifestations in TS (Al Alwan, Khadora, & Amir, 2014;Bispo et al., 2013) and a subset of phenotypic characteristics of TS and KS appear to display a linear dose-dependent association with the number of sex chromosomes, including stature and performance in cognitive subdomains of language and visuospatial processing (Zhang et al., 2020); nevertheless, other phenotypic traits including the altered risk of several autoimmune disorders in both syndromes (Jørgensen et al., 2010;Scofield et al., 2008), cannot exclusively account for sex-chromosome dosage compensation. In general, the mechanisms through which an extra or absent X chromosome determines the characteristics of KS and TS are poorly understood, and it has been hypothesized that the TS and KS phenotype may result from not only genomic imbalance owing to genes present on sex chromosomes, but also the additive effect on associated genes within a particular gene network with altered gene regulation due to epigenetic factors (Álvarez-Nava & Lanes, 2018).
Epigenetic mechanisms include all processes involved in the regulation of gene expression. Epigenetic modifications involve the following four mechanisms: (a) DNA methylation of CpG sites in the promoter, (b) posttranslational covalent histone modifications and the use of variant histone proteins, (c) nucleosome remodeling, and (d) regulation of gene expression by noncoding RNA (miRNA and long noncoding RNA). These mechanisms are active during embryonic development, cognitive processing, and lipid and glucose metabolism. However, limited information is available regarding the epigenetic phenomena in TS and KS, since studies on genetic and chromosomal abnormalities explaining the clinical manifestations of these diseases have received increased attention (Álvarez-Nava & Lanes, 2018).
It is important to understand these regulatory effects to clarify the biological underpinnings of phenotypic sex differences and the clinical features of sex chromosome aneuploidy. This study aimed to compare differentially expressed genes (DEGs) in 45,X and 47, XXY cells. Gene expression patterns were analyzed from microarray data obtained from the National Center for Biotechnology Information (NCBI) of the relationships between these genes and Turner syndrome and Klinefelter syndrome in the future.

MANOTAS eT Al.
Gene Expression Omnibus (GEO) database. Our results show that there are 16 common genes that are oppositely regulated in Turner syndrome (45,X) and in Klinefelter syndrome (47,XXY). It is important to note that eight of these genes are involved in the regulation of gene expression: five genes are related to epigenetic mechanisms, two to regulation of splicing processes and one to the protein synthesis process.

| MATERIALS AND METHODS
We selected two data sets from the NCBI GEO database (accessible at http://www.ncbi.nlm.nih.gov/geo) of microarray data on gene expression obtained from mRNA isolated from whole blood of TS and KS patients. Raw data sets were obtained and processed separately, to render samples comparable in each data set. The preprocessing steps were as follows: quality control and gene sample filtering for non-correlated or outlier measurements. In house functions and available functions in bioconductor packages were used for these tasks. Furthermore, summarization from probes to genes was carried out for each dataset. Thereafter, we identified DEGs for each data set using bioconductor R packages. Finally, DEGs were compared between both data sets and enrichment analysis was performed.

| Selection of the data sets
The TS data set corresponds to the GSE46687 experiment (Cheng, Zhou, & Bondy, 2014) and contains 36 samples, where 10 patients have two X chromosomes (10 controls), 16 patients have an X chromosome of maternal origin, and 10 patients have an X chromosome of paternal origin (26 cases). These data were obtained from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICDH) and the US National Institutes of Health (NIH). The data were curated on March 1, 2014. The data update date was March 1, 2014.
The KS data set corresponds to the GSE42331 experiment (Zitzmann, Bongers, & Werler, 2018) and contains 65 samples, where 35 individuals had KS (35 cases) and 30 individuals (15 male, 15 female) were healthy controls. The KS data set only included 47, XXY individuals. These data were curated on July 26, 2018, by Muenster University Hospital, Germany.
Individuals with mosaicism were excluded from both data sets. However, information regarding age and health conditions and their previous medical history was unavailable, potentially constituting a study limitation.

| Quality control
Quality control analysis of the two data sets was performed suing descriptive statistics (boxplot, histograms, correlations, and trend statistics) to identify atypical samples and mRNA values. Thereafter, the data were normalized using the vsn function (Huber, von Heydebreck, Sültmann, Poustka, & Vingron, 2002) of Bioconductor in R (R Foundation for Statistical Computing, 2018) for each data set separately, and we confirmed that the samples were comparable and that the dependence between the mean and variance was adjusted. We excluded one control sample from the KS data set, since it was an outlier and presented a low correlation with other control samples.

| Differential gene expression analysis
DEGs were determined using significance analysis of microarray (SAM) (Tusher, Tibshirani, & Chu, 2001) using the samr package. At each time point, DEGs were detected by comparing the mean gene expression levels under the two conditions (controls vs. TS/KS). Differential gene expression analysis was performed individually for each data set. This analysis yielded lists of DEGs between control and TS/KS individuals.
SAM was performed to detect DEGs by assigning a score (analogue to the t statistic, called di) to each gene in accordance with the mean expression levels relative to the standard deviation of repeated measures of each condition. To determine whether this score is greater than a specified threshold, the probability (p-value) was determined through permutations of repeated measures to generate a probability distribution. Furthermore, the percentage of genes potentially exceeding this threshold simply owing to chance events was determined and reported as the false discovery rate (FDR), facilitating double-filtering of false-positive results. The user can use a desired FDR percentage reflecting the confidence levels of DEGs detected therein.

| Gene set enrichment analysis
DEG lists were annotated and enriched using DAVID tools (Huang, Sherman, & Lempicki, 2009), wherein abnormal annotation terms were identified in the gene list by comparing its frequency to the expected frequency through chance events. This procedure was carried out for each annotation term, using a hypothesis test including the chi-square or Fisher's exact test, and the p-value was then corrected for multiple testing.

| RESULTS
The TS data set used the reference human genome HG-U133_Plus_2, thus, facilitating effective summarization in R and differential gene expression analysis at the gene level. However, the KS data set used the reference human genome HuGene-1_0-st, which resulted in complications during summarization, since numerous probes in the R library do not correspond to any gene. Therefore, for KS, differential gene expression analysis was performed at the probe level and the probes were annotated to identify the gene they correspond to, using the DAVID notation tool (Jiao et al., 2012).
For TS, a critical value of di = 3.0649 was selected, corresponding to a false positive rate approaching zero (FDR < 2E16). In total, 996 DEGs were identified, of which 501 were upregulated and 495 were downregulated (Figure 1). Similarly, a value of di = 0.9579 (FDR = 0.0770) was determined for KS, thus, retaining the false-positive ratio at zero. Since differential gene expression analysis for KS was performed at the probe level, all differentially expressed probes were selected and, through DAVID enrichment analysis (Jiao et al., 2012), were associated with a gene. Finally, 56 genes were over-expressed ( Figure 1).
To identify DEGs common to TS and KS, we initially searched for matches between the lists of upregulated genes; however, no genes were simultaneously upregulated in TS and KS. However, on cross-checking the list of downregulated genes for TS and upregulated genes for KS, 16 genes overlapped (Table 1). In particular, most genes displaying changes in their expression levels were located on the X chromosome, except for DOCK7 and MVB12B, which are located at 1p31.3 and 9q33.33, respectively (Table 1) (Figure 2a).

| DISCUSSION
X chromosome inactivation (XCI), dysregulation of gene dosage and the parental origin of the X chromosome are the phenotypic determinants of TS and KS (Trolle et al., 2016;Viana et al., 2014). Through in silico analysis of microarray data, this study reports 16 genes downregulated in TS and upregulated in KS, including KDM5C, KDM6A, ZBED1, ASMTL, ATRX, DDX3X, AKAP17A, EIF1AX, CD99, CSF2RA, DOCK7, DHRSX, P2RY8, SLC25A6, SMC1A, and MVB12B (Table 1). Twelve of these 16 genes were upregulated in KS patients according to the original data of data set GSE42331 analyzed by Zitzmann et al., 2015, and ATRX, AKAP17A, DHRSX, and the autosomal gene MVB12B were not reported to be upregulated. This difference may have been obtained because in their study, transcriptome profiling was restricted to only two individuals, while herein, all data corresponding to 65 samples are described.
Fourteen of these 16 DEGs between TS and KS are located on the X chromosome, 8 of which (ZBED1, ASMTL, AKAP17A, CD99, CSF2RA, DHRSX, P2RY8, and SMC1A) are located in the pseudo-autosomal 1 region of the X or Y chromosome (PAR1) (Figure 2a), owing to the presence of the extra or missing X chromosome, characteristic of KS and TS. This was expected, considering the canonical model for sex chromosome dosage compensation (SCD), which states that PAR genes are potentially upregulated with an increase in the X or Y chromosome count, while Y-linked genes are upregulated linearly with an increase in the Y chromosome  count. Owing to the non-binary nature of XCI-mediated gene silencing, the theorized effects of SCD on SCI-related gene expression and X-linked genes evading XCI (XCIE) represent the extreme ends of an X-chromosome dosage-sensitivity continuum. A recent study reported that an X-linked gene completely silenced on XCI displays an inverse correlation between significant XCI cluster expression and X-chromosome dosage, whereas an X-linked gene completely evading XCI displays sublinear upregulation with an increase in the X chromosome count (Raznahan et al., 2018). Interestingly, 8 of the 16 genes encode proteins regulating gene through different mechanisms, including epigenetic regulators (KDM5C, KDM6A, ASMTL, ATRX, and ZBED1), splicing regulators (DDX3X and AKAP17A), and translation regulators (EIF1AX) (Figure 3). These genes are reportedly upregulated in KS patients in comparison with 46,XY individuals , which in addition to validating our results, suggests other causes for these pathologies, considering the gene sequence or the overall impact of chromosomal imbalance (Álvarez-Nava & Lanes, 2018) (Table 1). KDM5C, KDM6A, EIF1AX, and DDX3X evade XCI ( Figure 2a) (Fang, Disteche, & Berletch, 2019;Y. Zhang et al., 2013) and a previous study investigating sex chromosome dosage through the analysis of genome-wide expression data in humans with diverse sex chromosome aneuploidies (XO, XXX, XXY, XYY, and XXYY), disparities in X chromosome dosage were always accompanied by significant differences in expression levels in the aforementioned four genes (Raznahan et al., 2018).
KDM5C functions as an enzyme erasing the activating H4K4Me3 signature and plays a role in cognition, resulting in the neurocognitive phenotype observed among KS patients . However, KDM6A encodes demethylase UTX, which erases the repressing H3K27Me3 signature (Allis, Caparros, Jenuwein, & Reinberg, 2015), whose expression levels and the levels of EIF1AX, DDX3X, and CD99 are associated with an increased abdominal circumference among KS patients and with serum IL-6, TNF, C-reactive protein, and procoagulant plasminogen activator inhibitor 1 (PAI-1) levels, which are potentially associated F I G U R E 2 X-linked genes upregulated in Klinefelter Syndrome and downregulated in Turner Syndrome. Differential patterns of gene expression obtained from microarray data from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (a). Presence of 14 common genes on the X chromosome, which are oppositely regulated in Turner syndrome and in Klinefelter syndrome. Pseudoautosomal genes are represented in green and genes coding for gene expression regulators are represented in red. (b) 8 genes involved in the regulation of gene expression: five genes are related to epigenetic mechanisms, two to regulation of splicing processes and one to the protein synthesis process. Include highest earned academic degree for authors on title page. Created with BioRe nder.com with an increased risk of metabolic syndrome among KS patients (Zitzmann et al., 2015).
In TS, KDM6A haploinsufficiency is associated with gonadal dysgenesis associated with this disease (Berletch, Deng, Nguyen, & Disteche, 2013). Abnormal changes in KDM5C and KDM6A expression levels lead to patterns of aberrant histone modifications potentially affecting transcriptional regulation in different cellular phenomena (Allis et al., 2015) (Figure 3).
Regarding the other genes located in PAR1, including ZBED1 and ASMTL, ZBED1 encodes a transcription factor that regulates the activation of H1 (a linker histone mediating chromatin stability) promoters (Figure 3). Because of its biallelic expression, the gain or loss of an X chromosome can affect its expression and function, similar to ASMTL, which encodes N-acetylserotonin O-methyltransferase-like protein (Allis et al., 2015). Otherwise, ATRX encodes a transcriptional regulator with an ATPase/helicase domain and belongs to the SWI/SNF family of chromatin remodeling proteins, being required for the deposition of the histone variant H3.3 at telomeres and other genomic repeats. These interactions are important to retain silencing marks at these sites (Allis et al., 2015). DDX3X and AKAP17A encode splicing regulators ( Figure 3); hence, alterations in their expression potentially affects their downstream protein-coding genes (Alberts, 2007). Furthermore, EIF1AX encodes an essential eukaryotic translation initiation factor, which is a component of the 43S pre-initiation F I G U R E 3 Differentially expressed genes in Turner syndrome and Klinefelter syndrome involved in gene expression mechanisms. Proposed model showing eight genes expressed differentially in TS and KS. KDM5C, KDM6A, ZBED1, ASMTL, and ATRX genes are involved in epigenetic processes. DDX3X and AKAP17A genes encode for regulatory proteins of splicing processes, and EIFIAX is related to the protein synthesis process. Created with BioRe nder.com complex (PIC), which mediates the recruitment of the small 40S ribosomal subunit to the 5′ cap of mRNAs ( Figure 2b) (Alberts, 2007). These findings support those of previous studies reporting numerous quantitative changes in gene expression in TS and KS (Allis et al., 2015). Moreover, apart from simple transcriptional effects, a previous hypothesis suggesting that the presence or absence of X chromosomes act in trans to elicit histotypic modifications in histones, DNA methylation signatures, and the chromatin landscape at genome-wide distributed loci, thus potentially elucidating the pathogenesis of TS and LS from the epigenetic viewpoint (Álvarez-Nava & Lanes, 2018) (Figure 3).
Regarding CD99 and CSF2RA located in PAR1, a recent study using the same data set used herein GSE46687), investigating DEGs between X monosomy TS patients and healthy women, reported that CD99 and CSF2RA were downregulated, thus, potentially increasing the frequency of autoimmune diseases among women with TS (Wang et al., 2020). This study shows that CD99 and CSF2RA are potentially associated with the pathogenesis of TS, although our study was different and describes the comparison of extreme phenotypes (45,X and 47,XXY cells).
CD99 contributes to the pathogenesis of systemic lupus erythematosus (SLE) (Fattal et al., 2010). SLE is more prevalent among women, and men with KS (47, XXY men) are at a similar risk of SLE as women (Scofield et al., 2008) (Cooney et al., 2009). Moreover, TS patients are at an increased risk of autoimmune diseases, most notably autoimmune thyroid diseases and type 1 diabetes mellitus, and are at a lower risk of SLE (Jørgensen et al., 2010). These findings concurrently suggest a gene dose effect at the X chromosome, indicating that individuals with two X chromosomes (46,XX or 47,XXY), are at a higher risk of SLE than those with one X chromosome (46,XY or 45,X) (Cooney et al., 2009), potentially consistent with the dose of CD99.
A histotypic expression pattern of SLC25A6 and DHRSX, located in PAR1, has been previously reported among KS patients. SLC25A6 is reportedly overexpressed in peripheral blood and cerebellar leukocytes among KS patients Viana et al., 2014), DHRSX is overexpressed in the cerebellum and the prefrontal cortex of the same individual (Viana et al., 2014). However, the association between the potential role of SLC25A6 and DHRSX in the neurocognitive phenotype in KS and TS remains unknown. SLC25A6 upregulation is reportedly associated with a shorter QTc interval in KS patients (Zitzmann et al., 2015). CSF2RA is also located in PAR1 and its upregulation in KS is associated with insulin resistance, waist circumference, and levels of pro-coagulatory substance PAI-1 and cytokines (Ross, Roeltgen, Kushner, Wei, & Zinn, 2000). CSF2RA haploinsufficiency is associated with a short stature among TS patients (Joseph et al., 1996). Furthermore, herein, two autosomal genes including DOCK7 and MVB12B were up-regulated in KS and downregulated in TS. DOCK7 positively regulates gene expression among KS patients Zitzmann et al., 2015). Furthermore, DOCK7 is implicated in the neurocognitive phenotype in KS , considering its contribution to the cartridge/bouton, synapse development, and chandelier cell (ChC) morphogenesis; ChCs are a unique subset of GABAergic interneurons that selectively innervate the axon initial segment of excitatory pyramidal neurons (Gallo, Paul, & Aelst, 2020). However, a previous study, with an approach similar to the present study, reported that DOCK7 and MVB12B were differentially expressed in KS and TS (Zhang et al., 2020). MVB12B is associated with endocytosis and innate immunity (Nandakumar et al., 2019); however, its potential association with the pathophysiology of KS and TS is unknown. Genetic interactions between these and specific genes present on sex chromosomes remain unclear; however, this is potentially conditioned by the aberrant expression of some genes identified herein, which encode transcriptional regulators. Moreover, loss of sex chromosome material affects autosomal gene expression (Trolle et al., 2016), and sex chromosomes globally regulate gene expression, with an approximate 3% influence on autosomal genes (Trolle et al., 2016). Furthermore, the potential of SCD to disrupt genome function extensively varies, indicating that differential involvement of autosomal genes is central to this variation (Raznahan et al., 2018).
Our results indicate that genes present in PARs in sex chromosomes, particularly including ASMTL, ZBED1, AKAP17A, CD99, CSF2RA, DHRSX, and SLC25A6, and those evading XCI, including KDM5C, KDM6A, EIF1AX, and DDX3X, which are biallelically expressed under normal conditions, potentially account for some clinical features of TS and KS. Furthermore, 8 of the 16 genes, overlapping and displaying opposite expression profiles in TS and KS, encoded transcriptional regulators, thus greatly influencing embryonic development (Figure 3).
Our study has several limitations. Being an in silico study, and notwithstanding previous studies on blood mRNA profiling yielding indicators for different diseases (Olsen, Skeie, & Lund, 2015), the tissue-level relevance of the present findings might be challenged. Our results may guide further studies on the role of KDM5C, KDM6A, ZBED1, ASMTL, ATRX, DDX3X, AKAP17A, EIF1AX, CD99, CSF2RA, DOCK7, DHRSX, P2RY8, SLC25A6, SMC1A, and MVB12B in the pathophysiology of TS and KS. Furthermore, DNA methylation and gene expression assays for phenotypes observed with karyotypes 47,XXY and 45,X in different tissues and functional validation of the genes reported herein would further support the present findings and elucidate the role of these genes in the phenotypes of TS and KS, respectively.