MicroRNA‐17 as a promising diagnostic biomarker of gastric cancer: An investigation combining TCGA, GEO, meta‐analysis, and bioinformatics

Integrated studies of accumulated data can be performed to obtain more reliable information and more feasible measures for investigating potential diagnostic biomarkers of gastric cancer (GC) and to explore related molecular mechanisms. This study aimed to identify microRNAs involved in GC by integrating data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus. Through our analysis, we identified hsa‐miR‐17 (miR‐17) as a suitable candidate. We performed a meta‐analysis of published studies and analyzed clinical data from TCGA to evaluate the clinical significance and diagnostic value of miR‐17 in GC. miR‐17 was found to be upregulated in GC tissues and exhibited a favorable value in diagnosing GC. In addition, we predicted that 288 target genes of miR‐17 participate in GC‐related pathways. Enrichment of Kyoto Encyclopedia of Genes and Genomes pathway, Gene Ontology analysis, and protein–protein interaction analysis of the 288 target genes of miR‐17 were also performed. Through this study, we identified possible core pathways and genes that may play an important role in GC. The possible core pathways include the cAMP, phosphoinositide‐3‐kinase–Akt, Rap1, and mitogen‐activated protein kinase signaling pathways. miR‐17 may be involved in several biological processes, including DNA template transcription, the regulation of transcription from RNA polymerase II promoters, and cell adhesion. In addition, cellular components (such as cytoplasm and plasma membrane) and molecular functions (such as protein binding and metal ion binding) also seemed to be regulated by miR‐17.

known that GC tumorigenesis, progression, and metastasis are highly related to dysregulated gene expression. Thus, identifying genes that are differentially expressed at the DNA and RNA levels between tumor and normal tissues may benefit the diagnosis, prognosis, and prediction of GC and help elucidate the molecular mechanisms underlying oncogenesis and therapeutic strategies [4,5].
MicroRNAs (miRNAs) are small, noncoding RNAs that are 21-24 nucleotides in length and participate in post-transcriptional regulation by binding to the 3 0 untranslated regions of target genes, along with 5 0 untranslated regions and coding sequences, leading to translational inhibition and cytoplasmic degradation [6][7][8]. The aberrant expression of miRNAs is involved in multiple diseases, including various types of cancers. With advancements in research, miRNA-mediated regulatory networks such as miRNAs-lncRNA-mRNA, miRNAs-circRNA-mRNA, and miRNA-mRNA-miRNA have been gradually used to elucidate the complicated molecular mechanisms underlying tumorigenesis, disease progression, invasion, and metastasis [9][10][11][12][13]. In the past few years, studies in many fields have focused on the utilization of miRNAs in GC, ranging from diagnosis to therapies [14,15]. Accumulated knowledge has provided abundant resources for integrated studies to obtain more reliable information and more feasible measures in the field of medicine. In recent years, microarrays and sequencing technology have been extensively used as efficient tools for the identification of differentially expressed miR-NAs (DEMs). Some widely available open access databases include The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). Through searching the TCGA and GEO databases, we identified hsa-miR-17 (miR-17) as a promising candidate for the diagnosis of GC and explored the associated molecular mechanism. Hence, this study aimed to comprehensively investigate the clinical significance and diagnostic value of miR-17 in GC via meta-analysis based on the two databases, the literature, and bioinformatics analysis.

Materials and methods
miRNA-seq data from TCGA database Publicly available miRNA-seq data on miRNA levels in GC samples were directly downloaded from TCGA data portal (http://cancergenome.nih.gov/) with file filters [Transcriptome Profiling (Data Category), miRNA Expression Quantification (Data Type), miRNA-Seq (Experimental Strategy)], and case filters [TCGA-STAD (Project)] on 12 December 2017. The corresponding clinical data were downloaded using Xena (http://xena.ucsc.edu/) from TCGA database. There were 491 files with a total of 436 stomach adenocarcinoma (STAD) samples. Furthermore, 41 cases had miRNA-seq data from matched adjacent normal gastric mucosal tissues, whereas another 24 cases lacked pathological staging information. Finally, the miRNA-seq data for 412 GC samples and 41 normal stomach mucosal samples were obtained for further analysis. Reads per million (RPM) values were extracted for 1882 mapped miRNAs in each sample. In addition, we cleaned the data using Python. DEMs between GC samples with pathological stages I-IV and normal stomach control samples were identified by calculating the fold change (FC) (| log 2 (FC)| > 1 and P < 0.05) with the R package DESEQ. One-way analysis of variance or Student's t-test was used to analyze the relationship between the relative miRNA expression levels and clinical characteristics with SPSS STATIS-TICS version 20.0 (IBM Corp., Armonk, NY, USA). P < 0.05 was considered statistically significant.

Microarray profiles from the GEO database
Microarray profiles (up to 11 January 2018) related to GC were obtained from the GEO database (http://www.ncbi. nlm.nih.gov/geo/) with the following search strategy: (miR OR miRNA OR microRNA) AND (malignant OR tumor OR tumour OR cancer OR carcinoma OR neoplasm OR neoplasms) AND (gastric OR stomach). The microarrays that met the following criteria were collected: (a) studies including at least 20 samples and (b) examination of miRNA expression in tissues, serum, or blood samples of GC patients. Microarrays that did not provide useful data for analysis were excluded. Finally, 12 GEO datasets, namely GSE93415, GSE78775, GSE63121, GSE54397, GSE26595, GSE33743, GSE30070, GSE28770, GSE85589, GSE59856, GSE61741, and GSE31568, were included in the present study. The DEMs between GC and healthy control samples in each GEO dataset were ranked according to the signal-to-noise ratio (SNR) using MORPHEUS (http://software.broadinstitute.org/morpheus/), an online web tool. The top 250 DEMs in each direction were chosen for further analysis. An independent Student's t-test or a paired t-test was performed to calculate the difference in the levels of a particular miRNA between GC and healthy control samples. P < 0.05 was considered statistically significant.
Real-time PCR data of miR-17-5p from published studies A literature search (up to 11 January 2018) was conducted using the following databases: PubMed, Web of Science, EMBASE, and Cochrane. The following search strategy was used: . Eligible studies met the following criteria: (a) the studies were original articles; (b) the studies were on human GC patients; (c) miR- 17-5p (or miR-17) expression in GC tissues, serum, or plasma was measured by qRT-PCR; (d) the required data could be determined or extracted from the original articles; (e) studies with the largest patient sample size were included if the data were published in multiple papers; and (f) the studies were published in English. Studies were excluded if they were published as an abstract, summary, case report, comment letter, review, or editorial. The assessment and selection of eligible studies were performed independently by two authors (G-FH and Q-WL). Controversial studies were reassessed by a third author (J-XY) for consensus, and agreements were reached by discussion. After being carefully reviewed, the required data were extracted from the included studies using GETDATA GRAPH DIGITIZER version 2.26 (Germany).

Meta-analysis
The following data were extracted from each included study for meta-analysis: sample number and the mean AE SD of the GC group and healthy control group, true positivity, false positivity, false negativity, and true negativity. STATA 12.0 (StataCorp, College Station, TX, USA) was used to conduct the meta-analysis. The 'METAN' module of STATA 12.0 was used to determine the standardized mean differences (SMDs) and 95% confidence intervals (CIs) for pooled values. Heterogeneity was evaluated using Cochran's Q (chi-square test) and the I 2 test. P < 0.1 for the Q test and/or I 2 > 50% were considered to indicate significant heterogeneity. If no obvious heterogeneity was detected, a fixed-effects model was used. Otherwise, a random-effects model was used. Additionally, subgroup analyses were performed based on the features of different studies to identify the source of heterogeneity. Publication bias was detected using Deeks's funnel plot asymmetry test. P ≥ 0.05 was considered to indicate the lack of publication bias. The summary receiver operating characteristic (SROC) curve was constructed according to the sensitivity and specificity.

Integrative bioinformatics analysis
The achieved target genes of miRNAs were pooled for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and Gene Ontology (GO) enrichment using Database for Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/). Protein-protein interaction (PPI) networks were constructed using STRING version 10.5 (http://string-db.org/cgi/input.pl).

miR-17 at the intersection of the TCGA and GEO datasets for GC
Data for a total of 412 GC patients (266 males and 146 females) and 41 healthy control individuals were obtained from TCGA datasets, and RPM values of 1882 mapped miRNAs of each subject were extracted. The GC patients were divided into the following four groups, according to the pathologic stage: stage I (n = 58), stage II (n = 128), stage III (n = 183), and stage IV (n = 43). We compared the RPM data of 1882 miRNAs between the five groups. In the beginning, we attempted to identify miRNAs that were differentially expressed in all five groups. However, this strategy was not successful, as the intersection of DEMs was zero. Hence, we separately identified the DEMs between the healthy control and stage I-IV groups. The relevant volcano plots are shown in We searched the GEO database, and a total of 12 eligible GSE microarrays were included in the present study ( Table 1). Because of the differences in sample types of GSE microarrays, the common DEMs were separately examined. Notably, we ignored the differences between the 3p and 5p arms of miRNAs when identifying the DEMs in GSE datasets on account of   Fig. 2. The 5p and 3p arms of miR-17 among the top 500 miRNAs in GSE datasets according to the calculated SNR using MORPHEUS are shown in Table 2; miR-17-5p and miR-17-3p are also known as miR-17 and miR-17*, respectively [26,27]. We also searched the literature and found that miR-17-5p is more common and significant in GC than is miR-17-3p. Thus, we finally focused on miR-17-5p. The general flowchart is shown in Fig. 3. The present study is composed of four procedures performed sequentially, that is, the identification of GC-related DEMs based on TCGA and GEO, the verification of clinical values base on comprehensive meta-analysis, the prediction of target genes, and multiple bioinformatics analyses.

miR-17 expression in GC in TCGA database
The expression level of miR-17 was higher in the 412 GC tissues of different pathological stages than in the 41 normal gastric mucosal tissues (P < 0.001) (Fig. 4A). The expression level results of miR-17 for the 41 matched gastric cancer tissues are the same (Fig. 4B,C). The expression of miR-17 in TCGA data was normalized using the logarithm. Furthermore, we explored the relationship between miR-17 expression level and clinicopathological characteristics, with the results summarized in Table 3. No significant differences were observed among American Joint Committee on Cancer T, N, M stages, age, and gender. Among GC types, miR-17 expression level was increased in the tubular and papillary types of intestinal adenocarcinoma and reduced in the diffuse type of adenocarcinoma (P = 0.014; Fig. 4D and Table 3). The P-value of the diagnostic power in the receiver operating characteristic (ROC) curve was <0.001 (area under the curve (AUC) = 0.857, 95% CI: 0.808-0.905, P < 0.001; Fig. 4E). Additionally, there were no significant differences based on survival analyses (hazard ratio = 1.186, 95% CI: 0.866-1.624, P = 0.289) (Fig. 4F).
miR-17-5p expression in GC based on the GEO database A total of 12 GSE datasets, which consisted of 371 GC samples and 480 healthy control samples, were included in the present study (Table 1). Except for GSE31568, the other 11 GEO datasets all contained miR-17-5p (Table 2). Interestingly, the expression level of miR-17-5p showed different trends in tissues and serum/blood. The expression of miR-17-5p was upregulated in five of the eight GEO datasets in which the sample type was tissue ( Fig. 5A-H) but downregulated in all three GEO datasets in which the sample type was serum/blood ( Fig. 5I-K).

Meta-analysis of miR-17-5p expression in GC
The flowchart for the meta-analysis is shown in Fig. 6. Regarding the difference in the expression of miR-17-5p between GC tissues and adjacent normal gastric mucosal tissues, in addition to the eight GEO datasets (GSE93415, GSE78775, GSE63121, GSE54397, miR-17     GSE26595, GSE33743, GSE30070, and GSE28770) in which the sample type analyzed was tissue and TCGA database, four additional published studies were included in the meta-analysis [28][29][30][31]. The available data extracted from the original study that were utilized for the meta-analysis are shown in Table 4. We determined the pooled SMD of miR-17-5p to be 0.695 (95% CI: 0.241-1.150, P = 0.003; Fig. 7A) using a random-effects model. The P-value of the heterogeneity test was <0.001 (I 2 = 88.8%). No obvious publication bias was observed (Deeks's test: P = 0.257; Fig. 7B). The diagnostic accuracy was evaluated by plotting an SROC and calculating the AUC (AUC = 0.86, 95% CI: 0.82-0.88; Fig. 7C). The pooled sensitivity was 0.69 (95% CI: 0.54-0.80), and the pooled specificity was 0.90 (95% CI: 0.73-0.97). The forest plot of sensitivity and specificity is presented in Fig. 7D,E. Next, we also conducted a meta-analysis on the expression level of miR-17-5p in serum/blood in GC. Because there were no eligible studies and relevant TCGA data, only three GEO datasets (GSE59856, GSE61741, and GSE85589) in which the sample type was serum/blood were included in this meta-analysis ( Table 4). The pooled SMD of miR-17-5p was À0.774 (95% CI: À1.048 to À0.5, P < 0.001; Fig. 8A) using a fixed-effects model. The P-value of the heterogeneity test was 0.777 (I 2 = 0%). Additionally, the publication bias was not statistically significant (Deeks's test: P = 0.43; Fig. 8B).

Identification of miR-17-5p target genes and bioinformatics analysis
Based on 12 target gene prediction databases and TCGA database, 288 prospective target genes of miR-17-5p were included (Fig. 3). According to the KEGG pathway analysis and GO enrichment in DAVID, a total of 33 KEGG pathways, 69 GO terms of biological processes, 28 GO terms of cellular components, and 25 GO terms of molecular function were identified ( Fig. 9A-D). The top five KEGG pathway and GO terms are listed in Fig. 10. According to the results of our study, 288 genes were highly concentrated in the cAMP, phosphoinositide-3-kinase (PI3K)-Akt, Rap1, mitogen-activated protein kinase (MAPK) signaling pathways and in pathways involved in cancer (P < 0.05, Fig. 10A and Table S1). In the GO enrichment, the genes were most likely involved in the biological processes of DNA-templated transcription, the regulation of DNA-templated transcription, the regulation of transcription from RNA polymerase II promoter and cell adhesion (P < 0.05, Fig. 10B and Table S2), in the cellular components of cytoplasm, plasma membrane, integral component of plasma membrane, actin cytoskeleton, and axon (P < 0.05, Fig. 10C and Table S3), in the molecular functions of protein binding, metal ion binding, transcription factor activity, sequence-specific DNA binding, zinc binding, and calcium binding (P < 0.05, Fig. 10D and Table S4). The protein-protein interaction (PPI) networks of the 288 target genes are shown in Fig. 11.

Discussion
GC, a fatal disease, has attracted increasing attention from clinicians worldwide because of its high morbidity and mortality rates. Despite advancements in life science and medicine, the achievements in diagnosis, treatment and understanding of the pathology do not yet satisfy the needs of patients for earlier diagnosis and longer survival time. In the present study, we integrated the information from next-generation sequencing (TCGA) analysis and noncoding RNA profiling by microarray (GEO) in GC. As a promising miRNA candidate, miR-17-5p appeared in almost all of the datasets, suggesting that there may be some vital connection between miR-17-5p and GC. miR-17, as one of members of the miR-17-92 cluster, is located in an intron of nonprotein coding gene miR17HG (the miR-17-92 cluster host gene) on chromosome 13 in the human genome [32]. The miR-17-92 cluster, also termed onco-miR-1, is upregulated in several types of cancer, such as lung, breast, stomach, prostate, colon, and pancreatic cancers [33]. The other members of the miR- 17-92 cluster (miR-18a, miR-19a,  miR-20a, miR-19b-1, and miR-92a) were not present in all of the GEO and TCGA databases except miR-17 in our study. As reported, upregulated miR-17-5p expression levels enhanced pancreatic cancer proliferation by altering cell cycle profiles [34]. Low levels of miR-17 and miR-20a as a result of single nucleotide polymorphisms at the promoter of the miR-17-92 cluster may decrease the risk of colorectal cancer GSE85589 (2016) GSE59856 (2015) GSE61741 ( [35]. In patients with recurrent breast cancer, miR-17-5p is upregulated in tumor tissues and significantly downregulated in serum as one of the exosomal miR-NAs [36]. In contrast, statistically significant reductions in the levels of miR-17 and miR-19a in plasma have been observed between early and advanced stages of breast cancer [37]. With regard to the expression of miR-17-5p in GC, the majority of studies considered that this miRNA is upregulated in GC tissues [28][29][30][31][38][39][40]. Consistent with those previous studies, the meta-analysis in our study revealed that miR-17-5p was significantly upregulated in GC tumors. A meta-analysis of the miR-17-92 cluster in various cancers, including GC, indicated a poor prognosis in patients with high expression of this cluster [41]. Another meta-analysis identified 23 significantly upregulated miRNAs, including miR-17, that were correlated with poor prognosis in gastrointestinal cancers [42]. However, neither of the meta-analyses evaluated the expression of miR-17-5p separately in GC. In addition, the expression level of miR-17-5p in serum/blood in GC remained controversial. The microarray data in the GSE85589, GSE59856, and GSE61741 datasets suggested lower miR-17-5p levels in the serum/blood of GC patients than in healthy control individuals. In a study by Zeng et al., the serum levels of miR-17 were significantly reduced in both GC (n = 40) and benign gastric disease (gastric ulcer and gastric polyp) (n = 32) patients compared with healthy control individuals (n = 36) [43]. However, Zhou et al., by analyzing mononuclear cells collected from peripheral blood containing circulating tumor cells, concluded that miR-17 and miR-106a levels were significantly higher in preoperative (n = 41) and postoperative (n = 49) GC patients than in healthy volunteers (n = 27) [44]. Wang et al. identified four miRNAs, including miR-17-5p, in serum-circulating exosomes from a cohort of 20 healthy control individuals and 20 GC patients; however, according to this study, the upregulated expression of miR-17-5p had no statistical significance [45]. Furthermore, Tsujiura et al. demonstrated that the plasma concentration of miR-17-5p without contamination by cellular nucleic acids was significantly higher in GC patients (n = 69) than in healthy controls (n = 30) [46]. According to the results of our two independent meta-analyses, miR-17-5p expression levels were upregulated in GC tissues but downregulated in the serum/blood of GC patients. Our hypothesis is that GC tumorigenesis may have an effect on the extracellular transport of miR-10-5p. However, further research is needed to explore possible mechanisms. Notably, serum/blood samples from 70 GC patients and 263 healthy controls, which came from only three independent studies, were included in the meta-analysis of miR-17-7p expression. Thus, although we conducted a meta-analysis of miR-17-5p expression levels in serum/blood in our study, further investigations are required to confirm the relevant results. However, in GC tissues, the increased expression level of miR-17-5p, which was verified in our study, still reveals a promising prospect for miR-17-5p as a biomarker in GC. The diagnostic accuracy was evaluated and miR-17 had increased specificity and sensitivity (Fig. 7C-E).
To investigate the underlying molecular mechanism, we performed a comprehensive bioinformatics analysis. According to 12 miRNA target gene prediction databases and the relevant TCGA data, 288 genes were considered for further analysis. Based on KEGG pathway analysis and GO enrichment, several core pathways and GO terms displayed the potential to play a crucial role in GC. Nine hub target genes, namely PRKACB, ITGA4, PAFAH1B1, PIK3R1, ESR1, EFNB2, ATP2B1, AKT3, and LAMC1, may have a close association with the tumorigenesis, disease progression, invasion, and metastasis of GC. Our study may be the first example of the integration of data from the GEO database, TCGA database, and published literature to investigate the possible differential expression of miRNAs and their potential molecular mechanisms in GC. The study identified some core pathways and genes in GC, which may facilitate the further exploration of mechanisms. However, there are some limitations to our study. First, the numbers of miRNAs detected in different GSE chips were different ( Table 1), suggesting that some newly discovered miR-NAs might be missing. And some miRNAs might be excluded because of our rigorous screening criteria. Second, the meta-analysis of the expression of miR-17-5p in serum/blood of GC needed to be optimized because of insufficiently reliable studies at present. Third, the prediction of target genes was based on different algorithms. More experiments will be needed for validation or even correction and to confirm the KEGG pathway analysis and GO enrichment results.
In conclusion, we believe that miR-17 may serve as a promising diagnostic marker for GC. miR-17-5p promotes the occurrence and development of GC by targeting certain downstream genes. Future studies should be focused on the functions and underlying pathways of miR-17 in different GC sample types, such as tissue, serum, blood, circulating tumor cells, serum exosomes, and others, to further explore its utility in the diagnosis and molecular therapy of GC.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Table S1. Pathway enrichment in KEGG databases of the 228 targets of miR-17-5p .  Table S2. The GO analysis of BP of 228 target genes of miR-17-5p .  Table S3. The GO analysis of CC of 228 target genes of miR-17-5p .  Table S4. The GO analysis of MF of 228 target genes of miR-17-5p.