Metabolic reprogramming‐associated genes predict overall survival for rectal cancer

Abstract Metabolic reprogramming has become a hot topic recently in the regulation of tumour biology. Although hundreds of altered metabolic genes have been reported to be associated with tumour development and progression, the important prognostic role of these metabolic genes remains unknown. We downloaded messenger RNA expression profiles and clinicopathological data from The Cancer Genome Atlas and the Gene Expression Omnibus database to uncover the prognostic role of these metabolic genes. Univariate Cox regression analysis and lasso Cox regression model were utilized in this study to screen prognostic associated metabolic genes. Patients with high‐risk demonstrated significantly poorer survival outcomes than patients with low‐risk in the TCGA database. Also, patients with high‐risk still showed significantly poorer survival outcomes than patients with low‐risk in the GEO database. What is more, gene set enrichment analyses were performed in this study to uncover significantly enriched GO terms and pathways in order to help identify potential underlying mechanisms. Our study identified some survival‐related metabolic genes for rectal cancer prognosis prediction. These genes might play essential roles in the regulation of metabolic microenvironment and in providing significant potential biomarkers in metabolic treatment.

reprogramming their microenvironments. [4][5][6][7] Previous studies [8][9][10] have been discussed the emerging importance role of dysregulated metabolism for cancer biology. However, the underlying processes and molecular alterations of metabolic programming in cancers have not been well elucidated yet, especially in rectal cancer.
In this study, we thought to focus on the important metabolic alterations in rectal cancer through analysing data downloaded from the TCGA database and validated the conclusions by the GEO database. The significant prognostic role of metabolic genes in rectal cancer has been conducted in this study.

| mRNA expression profiles and clinical information
We downloaded mRNA expression profiles and corresponding clinical information from The Cancer Genome Atlas (TCGA) database, which contains a total of 91 cases. We then downloaded 363 cases from the GEO database and obtained the gene expression matrix of the GSE87211 series. 11 The metabolic genes were obtained from the GSEA website (http://softw are.broad insti tute.org/gsea/index. jsp), downloads section. GSEA v4.0.3 for Windows and c2.cp.kegg.
v7.0.symbols.gmt were downloaded from the GSEA website for the further extraction analyses.

| Extraction of metabolic genes from the TCGA database and GEO database
Genes enriched in metabolism pathways of the KEGG database were utilized in this study as metabolic genes. The mRNA expression of metabolic genes in the TCGA database and GEO database was extracted, respectively. Then, an adjustment was given to adjust different mRNA expression levels of metabolic genes between TCGA and GEO databases by 'sva' package 12,13 in R software (version 3.6.1).
Genes were selected as candidate metabolic genes for the following analysis if: (a) they have the same expression pattern in TCGA database and the GEO database; and (b) they were listed in the metabolic-associated pathways.

| Identification of differentially expressed metabolic genes
Candidate metabolic genes have the same expression pattern both in the TCGA and GEO databases were selected. We utilized the R package 'limma' to screen the differentially expressed metabolic genes. 14 The expression of candidate metabolic genes in the TCGA database was used to identify differentially expressed metabolic genes. The screening criteria were set as |logFC| > 0.5 and P-value < .05.

| Identification of prognostic associated metabolic genes
Univariate Cox regression analysis was utilized in this study to identify prognostic associated metabolic genes in the TCGA database.
Genes with HR < 1 have better overall survival outcomes, while genes with HR > 1 have worse overall survival outcomes. Genes with P < .05 were regarded as prognostic associated metabolic genes.

| Construction of lasso Cox regression model
Prognostic associated metabolic genes screened by univariate Cox regression analysis were utilized to construct the lasso Cox regression model. 15 Lasso Cox regression model was constructed aimed to calculate the risk score of every patient. The formula of risk score was as follows: riskscore = the sum of each coefficient of mRNA multiple each expression of mRNA. Patients were divided into two groups based on the median risk score of each patient.

| Survival analysis based on the stratification of low-and high-risk scores in TCGA database
Patients were divided into two groups based on the median risk score. Survival analysis was performed by the Kaplan-Meier method.
Risk score curves were generated based on the risk score of each patient.

| Validation of TCGA survival analysis by utilizing data from the GEO database
In order to validate the survival data in the TCGA database, we downloaded survival data from the GEO database to perform Kaplan-Meier analysis.

| Validation of risk score in the TCGA database by univariate and multivariate Cox analysis
The clinicopathological characteristics and risk score were included in univariate and multivariate Cox regression analysis to validate whether the risk score can be regarded as an independent risk factor to predict overall survival outcome. ROC curve was used to assess the constructed model.

| GO and KEGG analyses by GSEA
We utilized GESA software 16

| Extraction of metabolic genes from the TCGA database
We downloaded a total of 91 cases from the TCGA database. A total of 804 metabolic genes were extracted from the TCGA gene expression matrix, including 375 up-regulated metabolic genes and 429 downregulated metabolic genes. We then downloaded 363 cases from the GEO database and obtained the gene expression matrix of the s series.

| Validation of TCGA survival analysis by utilizing data from the GEO database
Kaplan-Meier analysis demonstrated that high-risk group had a worse overall survival outcome compared with the low-risk group ( Figure 3A). Also, data from the GEO database showed that high-risk group had a worse overall survival outcome ( Figure 3B).

| Validation of risk score, survival status distribution and heatmap of the TCGA database by utilizing data from the GEO database
On top of Figure 4A,B, every patient was ranked from left to right according to the risk score. The risk score was elevated from left to right in both TCGA and the GEO database. Also, in the middle of Figure 4A,B, every patient was ranked from left to right according to the risk score.
The distribution of the survival status of each patient has demonstrated accordingly. Plus, at the bottom of Figure 4A,B, the heatmaps demonstrated the differential expression of metabolic genes in high-risk and low-risk groups in the TCGA database ( Figure 4A) and GEO database ( Figure 4B).

| Univariate and multivariate Cox analysis in the TCGA database
Univariate Cox regression analysis demonstrated that the risk score was associated with the overall survival of rectal cancer patients.
With the increasing risk score, the risk of poor survival outcomes elevated ( Figure 5A). The results of the multivariate Cox regression analysis showed that the risk score could be treated as an independent risk factor to predict overall survival outcome in rectal cancer patients ( Figure 5B). The results of the ROC curve demonstrated that the

| GO and KEGG analyses by GSEA
The results of GO analysis demonstrated that genes were mainly enriched in antibiotic metabolic process, arachidonic acid metabolic process and icosanoid metabolic process ( Figure 6A). The results of KEGG analysis showed that alpha-linolenic acid metabolism, arachidonic acid metabolism, drug metabolism cytochrome p450, glycerophospholipid metabolism, linoleic acid metabolism, metabolism of xenobiotics by cytochrome p450 and sphingolipid metabolism pathways ( Figure 6B).

| D ISCUSS I ON
The standard treatment for rectal cancer remains the total mesorectal excision. 17 Nevertheless, patients are requesting gradually for better-tailored managements for rectal cancer due to the focusing on the molecular alterations in rectal cancer in order to find some valid targets in the treatment of rectal cancer. [21][22][23] Recently, metabolic reprogramming has become a hot topic. 24 Accumulating evidence demonstrates the influence of metabolic alterations in neoplastic cells, mainly focused on the different cellular components of the cell microenvironment, like the regulation of tumour-infiltrating immune cells. [25][26][27] Therefore, the identification of novel metabolic genes in cancer to predict mortality risk of cancer has become a hotspot. In this study, we screened novel metabolic prognostic related genes from the TCGA database and validated the efficiency by data downloaded from the GEO database. The low-risk and high-risk patients were effectively stratified based on the different overall survival outcomes. The efficacy was further validated by data from the GEO database, which indicates a robust prognostic value of the prediction efficacy of these metabolic-related genes.
GSEA was utilized in this study to uncover the enriched GO terms and pathways in rectal cancer. Most GO terms and pathways were metabolism-related. On the one hand, the results enriched in GO terms and pathways proved the robust connection of the screened metabolic-related genes and indicated the underlying dysregulated metabolic microenvironment in rectal cancer. Patients with high-risk were more likely to be associated with the metabolism GO terms, like antibiotic metabolic process, arachidonic acid metabolic process and icosanoid metabolic process. Plus, patients with high-risk were more likely to be connected with the metabolism pathways, including alpha-linolenic acid metabolism, arachidonic acid metabolism, drug metabolism cytochrome p450, glycerophospholipid metabolism, linoleic acid metabolism, metabolism of xenobiotics by cytochrome p450 and sphingolipid metabolism pathways. Therefore, we hypothesized that high-risk patients might benefit more from metabolic therapy and metabolic-related management. Also, these results uncovered the underlying molecular mechanisms of these prognostic metabolic genes. However, many works are needed to further validate the relationship between these genes and metabolic POLR1D has been reported to be associated with the promotion of colorectal cancer progression and prediction of poor prognosis of patients. 28  DGAT2 has been reported to be associated with hepatocellular carcinoma malignancy by regulation of the cell cycle-related gene expression. 34 Also, DGAT2 has been reported to be associated with the regulation of the development of prostate cancer. 35 INPP5A could be treated as the target of miR-181a-5p in regulation on the progression of cervical cancer. 36 POLR2H was reported to be associated with the progression of prostate cancer. 37 Our results have identified some prognostic related metabolic genes for predicting survival outcomes of rectal cancer based on TCGA data. Some results could be validated by data downloaded from the GEO database, which reflected that these genes might be involved in the dysregulation of the metabolic microenvironment and might be treated as novel biomarkers for metabolic therapy.

CO N FLI C T O F I NTE R E S T
The authors declared that they have no conflicts of interest to this work.

AUTH O R CO NTR I B UTI O N S
ZYZ and JQL involved in study concept and design; QZY, HYL, QNG and PJQ involved in acquisition of data; QZY, HYL, QNG, PJQ and JPC involved in analysis and interpretation of data. ZYZ, QZY and JQL drafted the manuscript; ZYZ, QZY and JQL involved in critical revision of the manuscript for intellectual content.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in The Cancer Genome Atlas and the Gene Expression Omnibus (GSE87211) at http://www.tcga.org/ and https://www.ncbi.nlm.nih.