Novel miRNA markers for the diagnosis and prognosis of endometrial cancer.

Abstract As endometrial cancer (EC) is a major threat to female health worldwide, the ability to provide an accurate diagnosis and prognosis of EC is promising to improve its treatment guidance. Since the discovery of miRNAs, it has been realized that miRNAs are associated with every cell function, including malignant transformation and metastasis. This study aimed to explore diagnostic and prognostic miRNA markers of EC. In this study, differential analysis and machine learning were performed, followed by correlation analysis of miRNA‐mRNA based on the miRNA and mRNA expression data. Nine miRNAs were identified as diagnostic markers, and a diagnostic classifier was established to distinguish between EC and normal endometrium tissue with overall correct rates >95%. Five specific prognostic miRNA markers were selected to construct a prognostic model, which was confirmed more effective in identifying EC patients at high risk of mortality compared with the FIGO staging system. This study demonstrates that the expression patterns of miRNAs may hold promise for becoming diagnostic and prognostic biomarkers and novel therapeutic targets for EC.

classification systems or biomarkers have been proposed to complement the current risk stratification approaches, 3-6 none of them have been widely applied or become a standard of care in clinical practice. Thus, the identification of effective and reliable biomarkers for the early diagnosis and prognosis of EC remains a significant clinical challenge.
MicroRNAs (miRNAs), a class of short noncoding RNAs with a length of 19-24 nucleotides, were demonstrated to work as regulators of gene expression by antisense complementarity or complementarity into specific mRNAs. 7 Many studies have shown that multiple miRNAs are aberrantly expressed in various tumours and are involved in tumorigenesis and progression as oncogenes or tumour suppressor genes. 8,9 A number of studies have demonstrated that miRNA expression is associated with cell proliferation, metastasis, invasion and response to therapy. 10,11 Moreover, miRNAs act as highly specific, sensitive and stable molecules, making them potential markers for diagnosing specific cancers and their progression. 12 With the rapid development of various miRNA sequencing technologies, especially next-generation sequencing technology, miRNA signatures in EC could be used to distinguish EC from normal counterparts and may shed light on the molecular diagnosis and prognosis of EC. Previous studies have tried to determine the correlation between miRNAs and EC. For instance, has-mir-337-3p, let-7b and miR135a were indicated to have the potential to serve as noninvasive biomarkers, 13 and the abnormal expression patterns of several miRNAs were suggested to have a strong association with carcinogenesis in the early stages. Previous research indicated that the expression pattern of miRNAs in EC can be correlated with their respective target mRNAs, inferred from an increase in the expression of several oncogenic proteins such as ERBB2, EGFR, EPHA2, BAX, GNA12, GNA13 and JUN, further substantiating that the combination of miRNAs with target mRNAs play an important role in carcinogenesis. 1 Thus, miRNAs have the potential to become novel noninvasive biomarkers measured as diagnostic and prognostic indicators of EC to guide surgical therapies and promote the understanding of the carcinogenesis of EC.
In this study, we aimed to identify potential diagnostic and prognostic miRNA biomarkers in EC. Using miRNA expression data in EC patients, we constructed a novel diagnostic miRNA classifier for distinguishing between EC and normal endometrium tissues, and a prognostic miRNA model for survival prediction in EC patients. Finally, these two novel models were validated and evaluated, respectively, indicating their diagnostic or prognostic values for EC.

| Data collection and preprocessing
The miRNA expression and mRNA expression data of patients with endometrioid endometrial adenocarcinoma were obtained from the public databases including The Cancer Genome Atlas Program (TCGA, https://portal.gdc.cancer.gov/) and Gene Expression Omnibus Datasets (GEO, https://www.ncbi.nlm.nih.gov/geo/).
Dealing with TCGA data, the patients whose clinical information and follow-up data were not complete were excluded, whereas only patients with fully characterized mRNA and miRNA profile data were collected in our study (Table S1). Moreover, the study types were limited to RNA-Seq based on IlluminaHiSeq/IlluminaGA platforms (processed by HTSeq). According to the inclusion criteria, a total of 441 samples for miRNA expression data and 419 samples for mRNA expression data were enrolled in further analysis (Table S2).
The entire cohort from TCGA database (N = 419) was randomly divided into training and testing cohorts according to the ratio of 2 to 1 (Table S2). Of note, the miRNAs/mRNAs with reads count > 1 that did not exist across 50% of samples in the training cohort were excluded. The normalization of expression levels was performed using the DESeq2 package 14 in R, and batch correction was processed by limma package. miRNA expression microarray data (from GEO database) were normalized and transformed to expression values by using affy package.

| Differential expression analysis of miRNAs and genes
Differential expression between EC tissues and normal counterparts in the training cohort was assessed using the DESeq2 package, and the adjusted P value was calculated afterwards. The differentially expressed miRNAs and genes were then screened with the filtering criteria of an adjusted P value < .001. Mann-Whitney U test implemented in SciPy package was conducted to examine the differential expression level of miRNA marker in the testing cohort.

| Identification of diagnostic miRNA markers
Least absolute shrinkage and selection operator (LASSO), a method of automatic variable selection in high dimensional data, was used for the selection of diagnostic miRNAs. As previously described, the tuning parameters were determined according to the expected generalization error estimated from 10-fold crossvalidation. 6 Unsupervised hierarchical clustering of the expression pattern of these diagnostic miRNA markers was conducted using the pheatmap package. Based on the expression level of these miRNA markers, the diagnostic classifier was constructed by implementing LASSO method under a binomial distribution. Receiver operating characteristic (ROC) curves and confusion matrices were subsequently applied to evaluate the prediction accuracy of the miRNA markers and diagnostic classifier. The best cut-off values in ROC curves were obtained for distinguishing EC and normal endometrium tissues in a confusion table.

| Identification of prognostic miRNA markers
As a prescreening procedure, the univariate Cox regression analysis was performed to identify miRNAs/genes associated with survival. A variable hunting method implemented in the ran-domForestSRC package was employed to screen candidate prognostic markers. Subsequently, multivariate Cox regression was applied to construct a prognostic model and remove any miRNAs that might not be independent factors in the model. For the gene model devised by our previous work, the risk score for each patient was computed using the list of nine genes (SLC16A1-AS1, KDM4B, MAP2K5, SYP, MPP1, DLX4, BOLA3-AS1, HOMEZ and STAP2). 15 For the prognostic model proposed by Wang et al, 16 six genes (PCSK4, IHH, CTSW, LRRC8D, TNFRSF18 and CDKN2A) were used to calculate the risk score for EC patient. To evaluate the ability of the prognostic model, the concordance index was calculated. In addition, the survivalROC package was applied to calculate time-dependent ROC curve from censored survival data and compute the value of the area under the curve (AUC value). Moreover, the ability of the model to predict outcome at three and 5 years was assessed, respectively. According to the maximum Youden index in the ROC curve, the best cut-off point was obtained to divide patients into different prognostic groups.
Kaplan-Meier curves were then plotted to evaluate the correlations between models and overall survival (OS). Meanwhile, hazard ratio (HR) and P values were computed by using the 'survdiff' function in the survival package. All aforementioned P values were two-sided.

| Correlation analysis of miRNA-mRNA expression
miRNA-mRNA regulation interactions were identified by two criteria. First, the pairwise correlation coefficients between differentially expressed miRNAs and genes were calculated by Pearson's correlation test. A P value less than .05 was considered to be statistically significant. Second, six miRNA-target prediction tools/ databases (miRWalk, 17 miRDB, RNA22, miRanda, PICTAR2 and Targetscan) were employed to predict target genes regulated by miRNA markers. The predicted miRNA-target pairs were screened out by no less than four algorithms, except hsa-miR-7706, which was screened out by no less than three. Additionally, the miRNAtarget pairs verified by experiments in the miRWalk database were also included. All the miRNA-target pairs were finally determined, which were not only negatively correlated but also predicted by algorithms (or verified by experiment). Then, the miRNA-target regulatory network was constructed, which was visualized using Cytoscape program. ClusterProfiler 18 package was used to perform over-representation analysis on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with the target genes regulated by miRNAs. The tool took the target gene list and the background gene list of whole human as input and conducted statistical enrichment analysis using hypergeometric testing. The pathways were considered significantly enriched when their P values were smaller than .05.
To further understand the potential biological function of these differentially expressed miRNAs in EC, functional enrichment analysis of these miRNAs was carried out. As a result, these miRNAs were found to be significantly related to pathways in cancer, the cell cycle, focal adhesion, the Wnt signalling pathway, cell adhesion molecules etc (Table S4, P < .05).

| Diagnostic miRNA markers for EC
By using LASSO, the number of these differentially expressed miR-NAs was reduced to screen optimal signatures for diagnosis of EC.
A panel of nine miRNAs was ultimately selected as diagnostic markers. In both the training and testing cohorts, the expression levels of hsa-miR-1307-3p, hsa-miR-183-3p, hsa-miR-183-5p, hsa-miR-200b-3p and hsa-miR-429 were significantly up-regulated in EC tissues whereas the expression levels of hsa-miR-152-3p, hsa-miR-24-1-5p, hsa-miR-374b-5p and hsa-miR-542-3p were down-regulated in EC tissues (P < .001, Figure 1B,C). Unsupervised hierarchical clustering of training cohort samples was performed based on the expression pattern of these miRNA markers, and the heatmap demonstrated that most of the same types of tissues could cluster together ( Figure 1D).
These results suggest that these miRNAs might serve as diagnostic markers to distinguish EC from normal endometrium tissues.
Subsequently, a combined diagnostic classifier was established using these nine miRNA markers to identify the group members of the tissue samples (Table 1). In addition, ROC curves were plotted to compare the efficiency of these diagnostic classifiers built on one miRNA or all nine miRNA markers. In the training cohort, the AUC of the combined classifier was 1.000, followed by hsa-miR-542-3p with an AUC of 0.997 and hsa-miR-183-5p with an AUC of 0.992 ( Figure 2A). In the testing cohort and an additional cohort (GSE35794), the AUC of the combined classifier was higher than or similar to other classifiers ( Figure 2B,C). Therefore, it is clear that the combined diagnostic classifier showed better performance in tissue prediction than other classifiers built on a single miRNA marker. When the final diagnostic classifier (combined classifier) was applied to the training, testing and GSE35794 cohorts, we obtained overall correct diagnosis rates of 100%, 100% and 95.45%, respectively, which point to the potential utility of the combined classifier (Table 2).

| Correlation of diagnostic miRNA markers and their regulated genes
To explore the potential interactions for these nine miRNA markers and their regulated genes, we integrated the miRNA and gene expression data of EC. A differential analysis between EC and normal endometrium tissues was conducted, and 6501 differentially expressed protein-coding genes were identified with the cut-off of an adjusted P < .001. By performing the miRNA-mRNA expression correlation analysis, 485 miRNA-mRNA pairs, which were inversely

F I G U R E 2
Performance of the miRNA diagnostic classifiers in tissue prediction. ROC curves of diagnostic classifiers built on each miRNA and all nine miRNA markers together in the training (A), testing (B) and GSE35794 (C) cohorts. The best cut-off values in ROC curves were obtained to distinguish between EC and normal endometrium tissues associated as well as predicted by algorithms or verified by experiment, were ultimately obtained (Table S5). Meanwhile, a miRNA-mRNA regulation network was constructed, which showed that the expression level of 375 genes had a high correlation with that of at least one of the nine miRNA markers ( Figure 3A). The UpSet plot, generated for visualizing intersecting sets of these differentially expressed genes, showed that one gene might be regulated by up to four miRNA markers ( Figure 3B). The KEGG enrichment analyses of these interacted genes revealed various significantly enriched signalling pathways, such as the MAPK signalling pathway, focal adhesion, the Wnt signalling pathway, cell adhesion molecules, the Hedgehog signalling pathway and some pathways involved in cancers ( Figure 3C and Table S6).

| Specific prognostic miRNA markers of EC patients
Subsequently, the prognostic utility of miRNA signatures for EC was assessed. Only miRNAs with significant differences in expression level between EC and normal endometrium tissues were retained for following analysis. Using the training cohort, miRNAs associated with OS were identified with P values less than .05 (Table S7). Through variable screening using the machine learning method and multivariate Cox regression, five specific prognostic miRNA markers, including hsa-miR-128-3p, hsa-miR-106a-5p, hsa-miR-7706, hsa-miR-18b-3p and hsa-miR-455-5p, were selected as candidates ( Figure 4A).
The expression data of these five markers in the training cohort were collected to establish a prognostic model ( Table 3) Figure 4C). A similar situation was found in the testing cohort ( Figure 4D). These results suggest that the five-miRNA model can be used for survival prediction of EC patients.

| Correlation of prognostic miRNA markers and their regulated genes
miRNA-mRNA expression correlation analysis showed that there were 396 miRNA-mRNA pairs that were not only negatively correlated but also predicted by algorithms or verified by experiment. The miRNA-mRNA regulation network is presented in Figure 5A, and a total of 367 genes were identified, whose expressions were highly correlated with these five miRNA markers. Among these regulated genes, one gene might be regulated by multiple miRNA markers ( Figure 5B). Kyoto Encyclopedia of Genes and Genomes functional enrichment analysis for these 367 target genes was conducted and showed that these target genes were significantly related to the FoxO signalling pathway, Ras signalling pathway, Wnt signalling pathway etc ( Figure 5C and Table S8). It is noteworthy that 23 target genes were found to be associated with the OS of EC ( Figure 5D).

| Performance assessment of the prognostic miRNA model
To evaluate whether the predictive power of the five-miRNA model is independent of other clinical factors, we included this model in univariate (Table 4) and multivariate (  Figure 6B).  have been demonstrated. In support of our findings, miR-542-3p is down-regulated in endometrial serous adenocarcinoma compared to normal endometrial tissues, 19 and it inhibits tumour angiogenesis and progression via directly targeting the protein angiopoietin-2. 20 miR-152 has been reported to be a tumour suppressor miRNA that is silenced by DNA hypermethylation in EC. 21 As previously reported, miR-200b and miR-429 are oncomiRNAs that target the PTEN gene in endometrioid endometrial carcinoma. 22 MiR-374b-5p could suppress tumour progression and enhance cisplatin sensitivity in ovarian cancer by targeting FOXP1. 23 MiR-1307-3p has been shown to be overexpressed in breast cancer and could be used as a diagnostic marker. 24 Noteworthy, miR-183 was found to be overexpressed in endometrioid endometrial adenocarcinoma, 25 which is consistent with our finding, whereas another study reported that the up-regulated expression of miR-183-5p promoted apoptosis and suppressed the epithelial-mesenchymal transition, proliferation, invasion and migration of human endometrial cancer cells. 26 The role of miR-183-5p in EC needs to be further elucidated. Additionally, miR-24-1-5p could act as an oncogene that facilitates the proliferation of ovarian epithelial cells, 27 which is contrary to our findings. The functional differences in the role of the same miRNA in different cancers might be due to tissue specificity.

| D ISCUSS I ON
Compared with a single biomarker alone, the aggregation of multiple biomarkers into one model can improve the prognostic ability. 28 Moreover, the establishment of a classifier or model that provided biomarker coefficients or a risk scoring formulas would be beneficial to the wide application of these markers/models in clinical practice. 6 Therefore, a diagnostic classifier that aggregated the aforementioned miRNA markers was constructed. Performance assessments were also conducted by using a fusion table and ROC curves, which further confirmed the accuracy and effectiveness of this miRNA classifier for EC diagnosis.
We also tried to discover specific biomarkers for predicting prognoses of EC patients based on miRNA expression data. By performing multiple screening procedures, five specific miRNAs were eventually chosen as prognostic markers that were involved in the development and prognosis of EC. Regarding the characteristics of these signature miRNAs, higher expression levels of hsa-miR-18b-3p, hsa-miR-128-3p, hsa-miR-106a-5p and hsa-miR-7706 correlated with a shorter OS. For another, higher expression level of hsa-miR-455-5p is associated with a longer OS.
Specifically, consistent with our results, miR-106a was found to be overexpressed in both endometrioid endometrial adenocarcinoma and endometrial serous adenocarcinoma. 7 Lower expression of miR-455-5p is significantly correlated with poor overall survival of endometrial serous adenocarcinoma, and miR-18b was observed to be overexpressed in endometrial serous adenocarcinoma. 19 Although there is no EC study on miR-128-3p, high expression level of miR-128-3p was found in endometriosis plasma and it can be a potential diagnostic biomarker for endometriosis. 29 Similarly, miR-7706 has been shown to inhibit the proliferation of hepatocellular carcinoma. 30 Lymphovascular space invasion (LVSI) is considered as a major determinant of recurrence and overall survival in endometrial carcinomas. 31 Canlorbe et al 32 identified three miRNAs (has-miR-34c-5p, has-miR-23b-5p and has-miR-23c) associated with LVSI, which may be used as a diagnostic tool for LVSI status.
Although these miRNAs were not included in our prognostic model, has-miR-23c was shown to be associated with prognosis of EC in our study (P < .05, Table S7).
Notably, 23 target genes regulated by prognostic miRNA markers were determined to be associated with the survival of EC patients via correlation analyses. Among these genes, PDPN, which is up-regulated in transformed cells, may be used as a biomarker and therapeutic target for many types of cancer, including glioma, squamous cell carcinoma, mesothelioma and melanoma. 33 C14orf28 was reported to be a target of miR-519d, which contributes to tumorigenesis and might provide new potential targets for colorectal cancer therapy. 34 In addition, previous studies demonstrated that MFAP5 could promote tumour progression and bone metastasis by regulating the ERK/MMP signalling pathways in breast cancer. 35,36 For the LARGE gene, previous findings have shown that the silencing of LARGE is responsible for the defects in dystroglycan-mediated cell adhesion and points to a defect in dystroglycan glycosylation as a factor in cancer progression. 37 PTEN, a tumour suppressor that is mutated in a large number of cancers at high frequency, [38][39][40][41] is also included in our prognostic models.

F I G U R E 4
Construction and validation of the prognostic miRNA model. A, Forest plots of hazard ratios of the five miRNA markers and prognostic model in the training cohort. B, Forest plots of the concordance index of the prognostic model in the training and testing cohorts. (*P < .05, **P < .01, ***P < .001) (C, D) Performance of the prognostic model in the OS prediction in the training (C), and testing (D) cohorts. The ROC curves (left) were generated for the 3-and 5-year OS predictions of EC. The best cut-off value of the 5-year OS prediction was obtained to divide the patients into low-and high-risk groups. Kaplan-Meier curves (right) based on these two groups were plotted to analyse the correlations between this model and the OS To investigate the functional roles of identified miRNA markers, we constructed miRNA-mRNA regulation networks and identified 375 and 367 genes regulated by diagnostic and prognostic miRNA markers, respectively. Functional enrichment analyses of these targeted genes revealed several potential pathways that might be related to both carcinogenesis and progression of EC. It is noteworthy that some of these pathways have been reported to be linked to cancer. MAPKs regulate various cellular activities related to cancer development, including proliferation, differentiation, apoptosis, inflammation and immunity. 43 Previous studies found that the crosstalk between MAPK signalling and ER status might exert a key role in the progression of EC. 44 Wnt signalling, which is pivotal in embryogenesis, healing and homeostasis, is important in the endometrium and has been linked to carcinogenesis. 45 Current studies have discovered that β-catenin mutations in Wnt signalling are reported in approximately 20%-50% of endometrioid endometrial carcinomas. 46 Traditionally, oxytocin (OT) is well known to play an important role in the regulation of cyclic changes in the uterus implantation of embryos and parturition, although oxytocin also is shown as a growth regulator, participating in endothelial cell growth and migration. 47 Moreover, several experiments have demonstrated that OT signalling serves as a major factor involved in the cell invasion of F I G U R E 5 The regulatory network of five miRNA markers and target genes in EC. A, Cytoscape visualization of the 396 miRNA-mRNA pairs. The diamonds and circles represent the miRNAs and target genes, respectively. The orange and green colours represent up-regulation and down-regulation, respectively. The red circle indicates the genes associated with patient survival. The grey edge indicates the verified miRNA-target pairs, whereas the black edge indicates the predicted pairs. B, The UpSet plot of the interactions between the five prognostic miRNA markers. One miRNA may have up to two regulated genes. C, KEGG functional enrichment analysis for 367 target genes regulated by the five prognostic miRNA markers. Only the pathways with an adjusted P value threshold of <.05 were displayed. D, Forest plots of hazard ratios of the 23 genes associated with survival of EC EC. 48 The Hippo pathway has been implicated in epithelial-to-mesenchymal transition and stemness, 49 and previous findings revealed a role of the Hippo pathway in the progression of aggressive subtypes of EC. 50 Aberrant JAK/STAT signalling has been shown to contribute to cancer progression and metastasis, 51 and targeting the JAK/ STAT pathway is currently one of the most promising strategies for prostate cancer treatment. Whereas p53 is the key protein in the pathway and has been considered to play an important role in tumorigenesis, the p53 pathway is one of multiple oncogenic pathways in EC. 52 Markers of the p53 pathway improve the stratification of EC and provide novel insights into the role of this pathway in the disease. 53 The limited sample size of EC tissue samples imposes limitations on this study. Therefore, before clinical application, more EC and normal endometrium samples are needed for further validating the diagnostic and prognostic value of the constructed models.
Additionally, the mechanisms of most diagnostic and prognostic markers of EC were unclear, and downstream experimental studies on these markers are necessary, which will further deepen the understanding of their functions.

F I G U R E 6
Performance of the miRNA prognostic model in the OS prediction of EC patients stratified by FIGO stage. A, EC patients with early (FIGO I/II stage) and advanced stages (FIGO III/IV stage) were divided into high-and low-risk groups using the miRNA prognostic model, respectively. By plotting Kaplan-Meier curves, the prognostic model capability for the OS prediction of EC patients with early stage (left) and advanced stage (right) was evaluated individually. B, Comparison of the survival prediction power of the FIGO stage and the prognostic models by ROC curve analysis for the 5-year OS prediction In summary, we identified a series of novel diagnostic miRNA markers and subsequently constructed and validated a diagnostic classifier that could accurately and effectively distinguish between EC and normal endometrium tissues. We also screened prognostic miRNA markers and established a prognostic model that can assist FIGO stage in survival prediction for EC patients.
This study also presents novel miRNA therapeutic targets for patients with EC and provides new insights into the mechanisms underlying EC.

ACK N OWLED G EM ENTS
This study was supported by the Fundamental Research

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest.

AUTH O R CO NTR I B UTI O N S
JY and DH designed the study and revised the manuscript. QW, KX and YT analysed the data and drafted the manuscript. KX and XD interpreted the results of experiments. TX contributed to the writing of the manuscript. All authors read and approved the final manuscript. QW, KX and YT contributed equally to this work.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.