Identification of hub genes and its correlation with the prognosis of acute myeloid leukemia based on high‐throughput data analysis

Acute myeloid leukemia (AML) is one of the most common forms of leukemia in the world, but its molecular mechanism is still not well understood. The aim of this study was, by using multiple AML datasets, to obtain genes with differential expression, and to identify key genes in the development and progression of AML.

differentially expressed genes (DEGs), and those involved in AML carcinogenesis and progression. Functional pathway analysis. [3][4][5][6] However, the false positive rate in independent microarray analysis makes it difficult to obtain reliable results. Therefore, in this study, the Gene Expression Omnibus (GEO) was downloaded and analyzed from the gene expression database. The four mRNA microarray 7 data sets of differential genes between AML and normal tissues take the intersection, and then perform gene ontology (GO), 8 , Kyoto Gene and Genome Encyclopedia (KEGG) pathway enrichment analysis and protein-protein interaction (PPI) Network analysis. The analysis results provide an understanding of the molecular mechanism of cancer development and progression.

Microarray data
Four gene expression datasets -GSE24395, GSE30029, GSE38865, and GSE90062 -were downloaded from GEO, and the probes were converted into corresponding gene symbols based on platform annotation information. The inclusion criteria for the gene set were first studied in human blood samples.

DEG recognition
DEGs between AML and non-cancer patients can be obtained using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r). 10 GEO2R is an interactive web tool that allows users to compare two or more datasets in the GEO series to identify DEGs under experimental conditions. First, the differential genes of each dataset were obtained, and then the intersections were obtained, and the differentially expressed genes were found in different gene sets.

Hub key core gene screening
The PPI network of DEGs in the present study was constructed using the STRING database, and interacted with the combined interaction of Cytoscape 11 for molecular complex detection (MCODE; version 1.4.2) 12 and CytoHubba, which are applications used to explore important nodes/hubs and fragile motifs in an interactome network. Cytoscape was used to draw a PPI network, and MCODE was used to identify the most important modules in the PPI network. The selection criteria were as follows: MCODE degree cut-off = 2, node score cut-off = 0.2, maximum depth = 100, k score = 2, and CytoHubba selects the top 10.

DEGs KEGG and GO enrichment analysis annotation and visualization
Enrichr (https://icahn.mssm.edu/research/labs/maayan-laboratory the Ma'ayan Lab,New York, US) is an online bioinformatics database that combines biological data and analytical tools to aggregate data from 35 databases, providing comprehensive gene and protein functional annotation information. 13 KEGG is a database resource for understanding high-level functional and biological systems from large-scale molecular data generated by high-throughput experimental techniques. 14 GO is a major bioinformatics tool for annotating genes and analyzing the biological processes of these genes. The KEGG and GO analyses of the differential genes in the analysis module were carried out, and the combined scores (c) were ranked in the top 20 (c = log(p) ⋅ z).The PPI network was predicted using an online database search tool to retrieve the interacting genes (STRING 10.0; http://string-db.org). 15

Hub gene bioinformatics analysis
The cBioPortal (http://www.cbioportal.org) online gene network and its co-expressed gene analysis platform were used. 16 The biological process analysis of the hub gene was visualized using Cytoscape's BioNetwork Gene Oncology Tool (BiNGO 3.0.3) plug-in. 17 Hierarchical clustering of the hub gene was constructed using the UCSC Cancer Genomics Browser (http://gray-cancer.ucsc.edu). 18,19 The Kaplan-Meier curve was used in cBioPortal to show the overall survival and disease-free survival analysis of the hub gene. The online database, Oncomine (http://www.oncomine.com), was used to analyze the association of gene expression patterns with AML grades and survival status. 20

Statistical analysis
Differential genetic analysis used the R language package. The adjusted P-value and the Benjamini and Hochberg false discovery rates were used to strike a balance between the discovery of genes and limitations of statistically significant false positives. Probe sets without corresponding gene symbols, or genes with more than one probe set were removed, respectively. LogFC >1 and adjusted P < 0.01 were considered statistically significant. Enrichr provides an enrichment analysis using Fisher's exact test or hypergeometric test. The grade score or Z-score was calculated using a modification to Fisher's exact test.
The survival analysis was carried out by cBioPortal Kaplan-Meier, and a log-rank P-value was obtained to compare the overall survival rates of the two groups. The Oncomine TCGA-AML dataset hub gene expression and tumor grade were analyzed using one-way ANOVA. Test level = 0.05 (both sides).
F I G U R E 1 Analysis of differential expression genes Venn diagram, protein interaction network, and cytoscape module analysis. (a) In the mRNA expression profiling set, select differentially expressed genes with multiples >1 and P < 0.01, and four datasets contain three sets with overlapping 134 genes. (b) Protein-protein interaction networks of differentially expressed genes were constructed using Cytoscape. (c) The most important module is obtained from the protein-protein interaction network using Molecular Complex Detection and CytoHubba plug-ins. It shows the top 10 genes of important nodes and the red marker node multi-gene Figure 1a shows that after the microarray results were normalized, the data was overlaid in four datasets, and the sum of the genes containing the three datasets contained 134 genes.

DEG, KEG and GO enrichment analysis
The DEG was analyzed by Enrichr, and its function and pathway were

PPI network construction and module analysis
The PPI built the DEG network ( Figure 1b) using Cytoscape to obtain a significant module (Figure 1c). GO enrichment was used to analyze the functional analysis of the genes involved in the module (Figure 3).
The results showed that the gene in this module mainly involved abnormal regulation of tumors, migration of leukocytes across endothelial cells, regulation of RNA polymerase II, activation of transcription factors, binding, specific DNA sequences, and so on. (Figure 3).

Hub gene selection and analysis
A total of 16 genes were identified as the hub gene A network of hub genes, and their co-expressed genes were analyzed using the cBio-Portal online platform (Figure 4a). The biological process analysis of

DISCUSSION
In the USA, there are 19 000 new cases per year of AML , which is one of the common hematological diseases. There are currently stratified treatments based on karyotype and mutation, but most of the newly diagnosed AML subtypes have no major changes in the standard treatment regimen. 21,22 Until recently, most AML patients had no revolutionary chemotherapy regimen. 23,24 Given that the long-term prognosis of most adult AML patients is poor, especially in older AML patients, there are many new drugs being developed, combined with new molecular targeted therapies, which are superior to traditional chemotherapy alone. [25][26][27] In  has not been reported. In addition, the hub gene was hierarchically clustered. 34 The results show that these hub genes distinguish AML samples from non-cancer samples, suggesting that they might be candidate biomarkers.
In summary, the present study aimed to identify DEGs that might be involved in AML carcinogenesis or progression. A total of 134 DEG and 16 hub genes were identified, which can be used as a potential biomarker for the diagnosis and prognosis of AML. The innovation of this study was using different gene sets to obtain the hub gene, to eliminate the bias of individual analysis, and to analyze the effects of these genes in multiple dimensions (including enrichment analysis, cluster analysis, survival analysis, expression analysis, etc.). However, the indepth biological functions of these genes in AML require clarification through further research.

ACKNOWLEDGMENTS
The author Supported by Science Foundation of National Health Commission in Guizhou province (gzwjkj-2017-1-017).