Identification of aberrantly methylated differentially expressed genes and associated pathways in endometrial cancer using integrated bioinformatic analysis

Abstract Endometrial cancer (EC) is a fatal female reproductive tumor. Bioinformatic tools are increasingly developed to screen out molecular targets related to EC. In this study, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17025 and http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40032 were obtained from Gene Expression Omnibus (GEO). “limma” package and Venn diagram tool were used to identify hub genes. FunRich was used for functional analysis. Retrieval of Interacting Genes Database (STRING) was used to analyze protein‐protein interaction (PPI) complex. Cancer Genome Atlas (TCGA), GEPIA, immunohistochemistry staining, and ROC curve analysis were carried out for validation. Univariate and multivariate regression analyses were performed to predict the risk score. Compound muscle action potential (CMap) was used to find potential drugs. GSEA was also done. We retrieved seven oncogenes which were upregulated and hypomethylated and 12 tumor suppressor genes (TSGs) which were downregulated and hypermethylated. The upregulated and hypomethylated genes were strikingly enriched in term “immune response” while the downregulated and hypermethylated genes were mainly focused on term “aromatic compound catabolic process.” TCGA and GEPIA were used to screen out EDNRB, CDO1, NDN, PLCD1, ROR2, ESPL1, PRAME, and PTTG1. Among them, ESPL1 and ROR2 were identified by Cox regression analysis and were used to construct prognostic risk model. The result showed that ESPL1 was a negative independent prognostic factor. Cmap identified aminoglutethimide, luteolin, sulfadimethoxine, and maprotiline had correlation with EC. GSEA results showed that “hedgehog signaling pathway” was enriched. This research inferred potential aberrantly methylated DEGs and dysregulated pathways may participate in EC development and firstly reported eight hub genes, including EDNRB, CDO1, NDN, PLCD1, ROR2, ESPL1, PRAME, and PTTG1 that could be used to predict EC prognosis. Aminoglutethimide and luteolin may be used to fight against EC.


| BACKGROUND
Originating from endometrial cells, endometrial cancer (EC) is a lethal tumor in female reproductive system. 1 In 2015, the American Cancer Society (ACS) predicted that the number of new cases of EC was 54 870, and 10 170 of them had died. This means that in the past 20 years, the mortality of EC has almost doubled. The average age at the establishment of EC diagnosis is 63. EC is most likely to attack postmenopausal women, 90% of whom are over 50 years old. 2 The symptoms of EC, represented by irregular vaginal bleeding after menopause, are often overlooked, making early EC diagnosis a thorny challenge. 3 To solve this challenge, database-based bioinformatic analysis has been increasingly used to screen out target biological molecules of diagnostic value. 4 For example, Shenghui Yao et al screened out some DEGs causing cervical intraepithelial neoplasia via GEO database. 5 Xiangsheng Liu et al analyzed the gene modules related to human osteosarcoma through a coexpression network. 6 DNA methylation of the gene promoter region typically inhibits gene expression. Most methylated CpG islands are located within genes or intergenic regions, while less than 3% in the gene promoter region. Intra-or intergene DNA methylation may regulate gene expression. 7 Many studies in recent years have shown that DNA methylation is closely related to tumor progression. [8][9][10] Extensive research has found that DNA methylation affects the occurrence and progression of EC. [11][12][13][14] We reasonably speculated that some methylation genes act on EC and could be used as biomarkers for targeted therapy of EC. In this research, we first screened out DEGs from GSE17025, and hypermethylated/hypomethylated genes from GSE40032. Comprehensive analysis, functional analysis, and PPI network analysis were performed. Potential drugs, hub genes, and terms related to EC were determined.

| Microarray data profile
The study design is shown in Figure 4A. Gene Expression Omnibus (GEO) is an online database which provides comprehensive data on gene profiling and sequencing. In GEO database (https://www.ncbi.nlm.nih.gov/geo/), we retrieved the gene expression profile dataset GSE17025 along with methylation profile dataset GSE40032. Based on the GPL570 platform [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, 15 we collected data from GSE17025 containing 91 samples of stage I endometrial cancers and 12 samples of postmenopausal atrophic endometrium. Based on GPL8490, we collected the methylation profile microarray data from GSE40032, including 64 endometrial cancer samples and 23 cancer-free samples. 16

| Pathway analysis of DEGS
Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways involving the two gene lists were determined using an enrichment analysis online tool FunRich. 22 The biological function of the overlap genes was interpreted. A value of P < .05 was seen to be statistically significant. STRING 23 and FunRich were used to build PPI network that hinted at the molecular mechanism involved in EC PRAME, and PTTG1 that could be used to predict EC prognosis. Aminoglutethimide and luteolin may be used to fight against EC.

K E Y W O R D S
bioinformatic analysis, CMap, endometrial cancer, gene expression omnibus (GEO), methylation, PPI, risk model tumorigenesis. The protein-protein interaction (PPI) network was plotted by Cytoscape v3.7.0. A value of P < .05 was considered statistically significant.

| Validation of the selected genes
To validate the aberrantly methylated DMGs, the data from the Cancer Genome Atlas (TCGA) database were analyzed using the online software GEPIA (http://gepia.cance r-pku. cn/) 24 for cancer and healthy gene expression profiles. In order to confirm epigenetic methylation level, we validated using TCGA data and R software. We analyzed the complete follow-up data of all EC patients. To validate the selected oncogenes and TSGs on the translational level, the immunohistochemistry stained samples of both the normal and endometrial cancer were downloaded from the Human Protein Atlas database (https://www.prote inatl as.org/). ROC curve analysis was for distinguishing normal and cancer tissues.

| Construction of a prognostic signature
First, univariate Cox proportional hazards regression analysis was performed based on eight genes. Prognosis-associated genes (P < .05) were determined. Next, multivariate Cox regression analysis was utilized to identify significant prognosis-related genes. Regression model was constructed to assess each patient's risk score for gene expression. Patients were divided into low-and high-risk groups based on the average risk score. Then, we used Kaplan-Meier curve analysis to compare the survival time of the low-and the high-risk group. Cutoff was set as P < .05. In addition, we used the receiver operating characteristic (ROC) curve to predict the 5-year survival. The predictive value was calculated as the area under the curve, sensitivity, and specificity. Harrel's concordance index (C-index) was measured to validate the predictive ability of this signature using the "survcomp" R package. 25

| Identification of candidate small molecules
We used CMap to find potential drugs related to EC. CMap is a program for predicting potential drugs that may induce biological status encoded by specific gene expression signatures. 26 Upregulated hypomethylated genes and downregulated hypermethylated genes were used to query the CMap database. Finally, the enrichment score representing similarity was calculated, ranging from −1 to 1. The positive connectivity score indicates that drugs can induce the biological phenomena queried in human cell lines. Conversely, a negative connectivity score indicates that the drug reverses the requested biological characteristics and has potential therapeutic value. The connectivity scores (P < .05) for the various instances were filtered out. Tomographic scans of these potential effective drugs were studied in Pubchem.

| Gene set enrichment analysis (GSEA)
In order to reveal the function of eight key genes, GSEA (http://softw are.broad insti tute.org/gsea/index.jsp) 27 was used to determine the enrichment of previously defined biological processes in the ranked DEGs between the two groups. 3 From TCGA, 546 EC samples were divided into two different groups based on their median expression levels. The collection of annotated gene sets of c2.cp.kegg.v6.0.symbols.gmt in Molecular Signatures Database (MSigDB, http://softw are. broad insti tute.org/gsea/msigd b/index.jsp) was chosen as the reference gene sets in GSEA software, and the P-value < .05 was set as the cutoff.

| DEGs and DMGs in endometrial cancers
Expression matrices were obtained from GSE17025 consisting of 91 endometrial cancer, 12 serous samples of atrophic endometrium from postmenopausal women. The DEGs were presented in Figure 1A,B. We obtained 1737 DEGs, including 690 upregulated and 1047 downregulated. A total of 4097 DMGs were obtained in GSE40032 ( Figure 1C). As a result, 1761 hypermethylated genes and 2336 hypomethylated genes were also screened out.

| Aberrantly methylated DEGS
Through Venn analysis, we identified the genes with overlapped expression between the upregulated genes and hypomethylated genes. We also identified the genes with overlapped expression between downregulated genes and hypermethylated genes. Subsequently, we searched 121 downregulated hypermethylated genes and 84 upregulated F I G U R E 2 Aberrantly methylated DEGs, and associated oncogenes and tumor suppressor genes (TSGs). A, 84 hypomethylated and upregulated genes were identified, including seven oncogenes. B, 121 hypermethylated and downregulated genes were identified, including 12 TSGs hypomethylated genes. So as to pinpoint the aberrantly methylated DEGs, upregulated hypomethylated genes were overlapped with oncogenes, and downregulated hypermethylated genes with TSGs. Correspondingly, we retrieved seven upregulated hypomethylated oncogenes ( Figure 2A). We also identified 12 downregulated hypermethylated TSGs, indicating hypomethylation may downregulate the expression of these specific genes in tumorigenesis ( Figure 2B).

| Go and KEGG pathway analysis of DEGS
GO terms cover biological process (BP), molecular function (MF), and cellular component (CC) ontologies. In our study, BP terms of upregulated hypomethylated genes were significantly enriched in immune response, positive regulation of T-helper cell differentiation, and lipopolysaccharide-mediated pathway ( Figure 3A). CC terms of upregulated hypomethylated genes were significantly enriched in mitotic spindle, spindle pole, and chromosome in the centrometric region ( Figure 3B). MF terms of the genes were mostly enriched in endopeptidase inhibitor activity, nerve growth factor binding, and 5′-3′ exodeoxyribonuclease ( Figure 3C). KEGG analysis indicated that these genes were mainly involved in Gastric Cancer Network 1, regulation of sister chromatid separation at the metaphase-anaphase transition as well as retinoblastoma gene in cancer ( Figure 3D). By contrast, downregulated hypermethylated genes were mainly enriched in: (a) aromatic compound catabolic process, vascular smooth muscle contraction, and hematopoietic stem cell differentiation (BP, Figure 3E); (b) L-type voltage-gated calcium channel complex, integral component of plasma membrane, and cytoplasmic vesicle (CC, Figure 3F); (c) dipeptidase activity, platelet-derived growth factor receptor binding, and protein homodimerization activity (MF, Figure 3G). These genes were mainly enriched in term "Wnt signaling pathway" (Figure 3H).

| PPI network construction
STRING was for constructing the PPI network. A total of 120 nodes and 60 edges were involved in the PPI network of the downregulated hypermethylated genes, with a PPI enrichment P-value of 3.88e-05 ( Figure 4A). PPI complex of the upregulated hypomethylated genes consisted of 84 nodes and 164 edges (P-value: 1.0e-16) ( Figure 4B). A total of 11 downregulated hypermethylated TSGs and seven upregulated hypomethylated oncogenes along with their associated genes are presented in Figure 4C,D. The pathways involving the 11 downregulated hypermethylated TSGs and their associated genes are shown in Table S1. These genes were mostly enriched in "pathway in cancer." The pathways involving the seven upregulated hypomethylated oncogenes and their associated genes are shown in Table S2. These genes were mostly enriched in "cell cycle."

| Validation of the selected genes
We used the data from TCGA to confirm how the expression and methylation of selected genes work in EC carcinogenesis. Five upregulated hypomethylated oncogenes, along with 10 downregulated hypermethylated TSGs were differentially expressed in normal tissues and tumor tissues. The difference in expression was validated based on GEPIA ( Figure 5). In addition, based on TCGA UCEC F I G U R E 4 PPI network for the aberrantly methylated DEGs. A, 121 genes filtered into the downregulated hypermethylated PPI network containing 120 nodes and 60 edges. B, 84 genes filtered into the upregulated hypomethylated PPI network containing 84 nodes and 164 edges. C, The PPI network of 11 downregulated hypermethylated TSGs, and their related genes, created by the FunRich software. D, The PPI network of the seven upregulated hypomethylated oncogenes, and their related genes, created by the FunRich software data, we detected eight differentially methylated genes including EDNRB, CDO1, NDN, PLCD1, ROR2, ESPL1, PRAME, and PTTG1 ( Figure 6). Immunohistochemistry results showed that these genes were dysregulated in EC samples. The expression levels of ENDRB, ROR2, and PLCD1 were lower in EC tissue than in normal tissue, whereas the expression levels of ESPL1, PRAME, and PTTG1 were higher in EC tissue than in normal tissue. Besides, the expression of CDO1 showed no difference between normal tissue and tumor tissue (Figure 7). Moreover, ROC curve analysis using "pROC" packages was performed to calculate the capacity of eight genes to distinguish EC tissue from healthy tissue. EDNRB, CDO1, NDN, ESPRL1, PRAME, and PTTG1 all exhibited excellent diagnostic efficiency (AUC > 0.9), and this efficiency was more obvious when the eight were combinedly used (AUC 0.987) ( Figure 8A). Furthermore, to evaluate the prognostic significance of the six methylated DEGs, we loaded the survival time and gene expression levels from the Human Protein Atlas database. Kaplan-Meier method was applied to estimate the survival time predicted by each gene. The analysis results showed that

| Prognostic signature
Univariate Cox proportional hazards regression analyses were performed for the above eight DEGs, including ESPL1, NDN, ROR2, and PLCD1 (Table 1). Multivariate Cox proportional hazards regression analysis was further performed on the above genes, which screened ROR2 and ESPL1 ( Figure S1). The risk score for predicting overall survival was calculated as follows: Risk score = 0.336* ESPL1-0.101* ROR2. According to the median risk score, EC patients were divided into low-risk (n = 267) and highrisk (n = 267) groups. Survival analysis showed that low-risk patients had longer overall survival than high-risk patients in TCGA cohort ( Figure 9A). We also used optimal cutoffs to analyze the prognosis of high-and low-risk groups (P < .001) ( Figure S4B) In ROC curve analysis, the AUC value for 5-year survival showed the highest specificity and sensitivity when the risk score was 0.633 (95% confidence interval[CI] 0.52-0.74) ( Figure 9B). C-index = 0.63 (95%CI 0.57-0.69), P-value = 7.06e-06. As shown in supporting Figure 4B-C, when clinical factors (including age, stage, grade, and histological type) were combined, the AUC value increased to 0.792. The survival status, the expression of five genes and distribution of risk score in each patient were also analyzed ( Figure 9C-E). In addition, the heatmap showed the expression of the two genes in low-and high-risk patients in the TCGA dataset. We observed significant differences in tumor status (P < .001), grade (P < .001), histological type (P < .001), stage (P < .001), and age (P < .001) between the high-and low-risk groups ( Figure S3).

| Related small molecule drugs screening
To screen out small molecule drugs, we analyzed upregulated genes and hypomethylated genes, as well as downregulated genes and hypermethylated genes with CMap. The top 10 EC-related small molecules are displayed in Table 2. Among these small molecules, aminoglutethimide and luteolin showed a highly negative correlation with EC. Sulfadimethoxine, maprotiline, isoflupredone, vancomycin, 3-acetamidocoumarin, clofazimine, adiphenine, and merbromin showed a highly positive correlation with EC. They all might have the potential therapeutic effects on EC. The tomographes of the top 4 potential molecule drugs were researched from Pubchem and shown in Figure 10A-D.

| DISCUSSION
EC mortality has increased by more than 100% during the past 20 years. 2 Only 20% of EC patients are diagnosed before menopause. 28 Database-based bioinformatic analysis helps to screen out target biological molecules for early EC diagnosis.
Based on GSE17025, we extracted the expression matrices of 91 endometrial cancer samples and 12 serous samples of postmenopausal atrophic endometrium. We obtained 1737 DEGs, including 690 upregulated and 1047 downregulated. A total of 4097 DMGs were obtained from GSE40032. Then, With Venn analysis, we detected the overlapped expression between the upregulated genes and hypomethylated genes, as well as downregulated genes and hypermethylated genes. We accessed 121 downregulated hypermethylated genes and 84 upregulated hypomethylated genes. Upregulated hypomethylated genes were overlapped with oncogenes, and downregulated hypermethylated genes with TSGs. Correspondingly, we retrieved seven upregulated hypomethylated oncogenes and 12 downregulated hypermethylated TSGs.
Immune response is involved in tumor development. 29 This involvement arises with DNA methylation. 30 Aromatic compound is engaged in the development of lung cancer. 31 Wnt Signaling is a star pathway associated with many tumors. 32,33 In the present research, the upregulated hypomethylated genes were mostly enriched in term "immune response" according to GO analysis and in Gastric Cancer F I G U R E 7 Immunohistochemistry (IHC) of the seven genes based on The Human Protein Atlas. A, Protein levels of CDO1 in tumor tissue (staining: negative; intensity: negative; quantity: none). Protein levels of CDO1 in normal tissue (staining: negative; intensity: negative; quantity: none). B, Protein levels of EDNRB in tumor tissue (staining: negative; intensity: negative; quantity: none). Protein levels of EDNRB in normal tissue (staining: medium; intensity: moderate; quantity: 75%-25%). C, Protein levels of ESPL1 in tumor tissue (staining: high; intensity: strong; quantity: >75%). Protein levels of ESPL1 in normal tissue (staining: high; intensity: strong; quantity: 75%-25%). D, Protein levels of PLCD1 in tumor tissue (staining: medium; intensity: moderate; quantity: 75%-25%). Protein levels of PLCD1 in normal tissue (staining: high; intensity: strong; quantity: >75%). E, Protein levels of PRAME in tumor tissue (staining: low; intensity: moderate; quantity: <25%). Protein levels of PRAME in normal tissue (staining: negative; intensity: negative; quantity: none). F, Protein levels of PTTG1 in tumor tissue (staining: high; intensity: strong; quantity: <25%). Protein levels of PTTG1 in normal tissue (staining: medium; intensity: strong; quantity: <25%). G, Protein levels of ROR2 in tumor tissue (staining: low; intensity: weak; quantity: >75%). Protein levels of ROR2 in normal tissue (staining: medium; intensity: moderate; quantity: >75%) Network 1 according to KEGG analysis. The downregulated hypermethylated genes were mainly enriched in aromatic compound catabolic process according to GO analysis and in Wnt Signaling according to KEGG analysis. Our research verified the significance of these genes in EC research.
Downregulated hypermethylated TSGs and upregulated hypomethylated oncogenes, along with their associated genes, conjured up a PPI network with FunRich tool. In this network, the function of the downregulated hypermethylated TSGs and their associated genes were mostly enriched in "pathways in cancer;" the function of the upregulated hypomethylated oncogenes and their associated genes were mostly enriched in "cell cycle." Cell cycle participates in EC growth and proliferation. 34 Qiu H et al found that JQ1 suppressed tumor growth via PTEN/PI3K/AKT pathway in endometrial cancer. 35 These all indicated that our functional analysis had guiding significance for EC.
We used GEPIA to confirm the expression and methylation of the 19 selected genes in carcinogenesis. Five upregulated hypomethylated oncogenes, along with 10 downregulated hypermethylated TSGs, were differentially expressed between normal tissue and tumor tissue. Using TCGA UCEC data, we further detected eight DMGs, including EDNRB, CDO1, F I G U R E 8 A, ROC curve analysis and AUC analysis were implemented to evaluate the capacity of eight genes to distinguish EC tissue from normal tissue. B-C, GSEA using TCGA UCEC databases. The 37 Yang CM et al further demonstrated that ROR2 was involved in multiple biological behaviors of renal carcinoma. 38 Equally, ESPL1 was proven to be a cancer oncogene for breast cancer. 39 Ushiku H et al demonstrated that CDO1 promoted DNA methylation in the process of gastric cancer. 40 Ushiku H et al also demonstrated that promoter DNA methylation of CDO1 gene regulated esophageal squamous cell carcinoma progression. 41 CDO1 affected the procession of prostate cancer, 42 clear-cell renal cell cancer, 43 breast cancer, 44 and lung cancer. 45 Hu YH et al found that hypermethylation of NDN promoted the cell proliferation by activating the Wnt signaling pathway in colorectal cancer. 46 Yang H et al found that NDN inhibited ovarian cancer development. 47 Phospholipase C δ1 (PLCD1) manipulates the biological behaviors of pancreatic cancer cells. 48 Methylation of PLCD1 can be regulated to improve the treatment of breast cancer. 49 Epigenetic inactivation of PLCD1 occurs in chronic myeloid leukemia. 50 PRAME is implicated in the growth and metastasis of breast cancer, the hypomethylation of epithelial ovarian cancer, and the prognosis of nonsmall cell lung cancer. 51 PTTG1, as an androgen responsive gene, acts in the progression of androgen-induced prostate cancer, colorectal  [52][53][54][55] Endothelin receptor type B (EDNRB) gene, a member of G protein-coupled receptor superfamily, plays in the development of embryonic and enteric ganglia. 56 Reza Mousavi Ardehaie et al demonstrated aberrantly methylated EDNRB acted as a potential diagnostic biomarker in sporadic colorectal cancer. 57 However, these genes have not been linked to EC. Here we demonstrate that their methylation may be involved in the development of EC. CMap showed EC had negative correlation with aminoglutethimide and luteolin and positive correlation with sulfadimethoxine, maprotiline, isoflupredone, vancomycin, 3-acetamidocoumarin, clofazimine, adiphenine, and merbromin. Aminoglutethimide has a therapeutic effect on breast cancer and luteolin on digestive tumors. 58,59 Research by Liu XS et al suggested that aminoglutethimide was effective in treating EC through improving endocrine environments and inhibiting cell growth. 60 Sulfadimethoxine has been used in the treatment of bladder cancer. 61 Hsu SS et al found antidepressant maprotiline worked in Ca 2+ movement and the proliferation of human prostate cancer cells. 62 The mentioned results all suggested that the drugs screened by C map were useful in fighting against EC.
However, this study has some limitations. Firstly, although an internal verification of the potential aberrantly methylated DEGs and dysregulated pathways was performed, a multicenter and prospective study is needed to evaluate the practicality of eight hub genes. Secondly, further in vivo and in vitro experimental verification is also needed to elucidate the molecular mechanisms. Finally, due to the lack of clinical information from an external database (such as GEO), the prognostic value of our signature should be further warranted.