Development and validation of a CTNNB1‐associated metabolic prognostic model for hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is a heterogeneous malignancy closely related to metabolic reprogramming. We investigated how CTNNB1 mutation regulates the HCC metabolic phenotype and thus affects the prognosis of HCC. We obtained the mRNA expression profiles and clinicopathological data from The Cancer Genome Atlas (TCGA), the International Cancer Genomics Consortium (ICGC) and the Gene Expression Omnibus database (GSE14520 and GSE116174). We conducted gene set enrichment analysis on HCC patients with and without mutant CTNNB1 through TCGA dataset. The Kaplan‐Meier analysis and univariate Cox regression analysis assisted in screening metabolic genes related to prognosis, and the prognosis model was constructed using the Lasso and multivariate Cox regression analysis. The prognostic model showed good prediction performance in both the training cohort (TCGA) and the validation cohorts (ICGC, GSE14520, GSE116174), and the high‐risk group presented obviously poorer overall survival compared with low‐risk group. Cox regression analysis indicated that the risk score can be used as an independent predictor for the overall survival of HCC. The immune infiltration in different risk groups was also evaluated in this study to explore underlying mechanisms. This study is also the first to describe an metabolic prognostic model associated with CTNNB1 mutations and could be implemented for determining the prognoses of individual patients in clinical practice.

metabolic morphotype and is often cholestatic and infrequently steatotic. 3,9,10 A recent report showed that proteins involved in different metabolic activities such as drug metabolism, amino acid metabolism, glycolysis and gluconeogenesis are enriched in CTNNB1 mutant tumours, 11 indicating that it is closely related to metabolic reprogramming. However, the potential mechanism is not fully understood.
Metabolic reprogramming refers to the significant changes in metabolic patterns that occur during the process of cell carcinogenesis, which involves many processes such as glycolysis, oxidative phosphorylation, tricarboxylic acid cycle, amino acid metabolism, nucleic acid metabolism and fatty acid metabolism. [12][13][14] Tumorigenesis is a multistep process involving modifications to pathways that promote uncontrolled proliferation and eliminate cell death, which requires metabolic reprogramming to provide large molecules for cell survival, growth and migration. In recent years, targeting metabolic reprogramming has become a promising novel treating strategy, but the metabolic reprogramming of HCC has not yet been deciphered.
The study conducted gene set enrichment analysis on the CTNNB1 mutant HCC and CTNNB1 wild-type HCC through the Cancer Genome Atlas (TCGA) database, finding that gene sets significantly up-regulated in CTNNB1 mutant HCC were all related to metabolism, further confirming the close relationship between CTNNB1 mutation and metabolic reprogramming. We further explored the prognostic value of these genes in HCC and constructed a prognostic risk model consisting of five metabolic genes. The results of multiple datasets (including TCGA, ICGC, GSE14520 and GSE11 6174; total of 876 HCC patients) indicate that this risk score is highly accurate in evaluating the prognosis of HCC. In addition, we found that the prognostic model may reflect the immune microenvironment of the tumour and thus has high clinical application potential.

| Data collection from TCGA
We obtained the sequence data for 374 samples with HCC from the TCGA website (https://portal.gdc.cancer.gov/repos itory). The corresponding clinical data including overall survival time, survival status, sex, age, race, alpha-fetoprotein (AFP) level, body mass index F I G U R E 1 Gene set enrichment analysis between with and without CTNNB1 mutation HCC in TCGA (BMI), vascular invasion, histological grading, AJCC TNM stage, family history of cancer, tumour history, presence of new tumours after initial treatment and individual tumour status of the TCGA cohort were obtained from the UCSC Xena website (https://xenab rowser.net/). The CTNNB1 mutation sample list was obtained from the cBioPortal website (https://www.cbiop ortal.org/). We retained genes of which the average expression value is over 1 and meanwhile removed RNA-sequencing data with low abundance. 15,16 This study meets the publication requirement of TCGA (http://cance rgeno me.nih.gov/publi catio ns/publi catio nguid elines).

| Data collection from ICGC and GEO
Three independent cohorts were used for external validation (ICGC-LIRI-JP, GSE14520 and GSE11 6174). We obtained the gene expression files (ICGC-LIRI-JP gene expression files from the Illumina HiSeq RNA-seq platform, GSE14520 gene expression files from the GPL571 platform and GSE11 6174 gene expression files from the GPL13158 platform) and corresponding clinical data from the Gene Expression Omnibus (GEO) database (https:// www. ncbi.nlm.nih.gov/geo/) and the International Cancer Genomics Consortium (ICGC, https://icgc.org/). The batch effects of RNAseq data sets were eliminated via the combat function contained in the SVA R package (R Core Team, R Foundation for Statistical Computing, Vienna, Austria). The downloaded profiles all meet the GEO and ICGC data access rules.

| Gene set enrichment analysis (GSEA)
Gene set enrichment analysis was utilized in this study to uncovering the differences in the metabolic-related genes between the CTNNB1 mutant group (n = 98) and the non-mutant

| Construction and validation of a metabolicrelated prognostic model
We first used univariate Cox regression and the Kaplan-Meier (K-M) for survival analysis of each gene and selected genes with P < 0.05 in both algorithms as candidate genes to construct the model. 17 Next, the Lasso regression analysis assisted in further narrowing down the candidate gene number. Finally, we constructed a risk score system, multiplying the normalized expres-  be pointed out is that in order to avoid the impact of other factors on the prognosis of patients, we excluded patients with a survival time of less than one month.

| Correlation analysis between clinicopathological parameters and risk score
We used the Wilcoxon signed-rank test ( stage, main tumour size, tumour differentiation, AFP and vascular tumour cell type) and the risk score. P < 0.05 was considered with statistical significance, and box plots were generated via the beeswarm packages implemented in R software.

| Identification of differentially expressed genes (DEGs) between different risk groups
DEGs in the tissues of the two HCC groups were examined using the Wilcoxon test method in R package 'limma'. The thresholds were confirmed to be |log2-fold change (FC)| > 1.0 and FDR < 0.05. Gene ontology (GO) enrichment analysis was further used for biological function annotation of the DEGs using the R package 'clusterProfiler'. The normalized enrichment score (NES) that calculated from the ssGSEA was used in the 'GSVA' and 'GSEABase' R package. The independent-samples t tests were used for comparing the differences in immune infiltration levels and immune function between high-and low-risk groups, and P-values < 0.05 were suggested to exhibit statistical significance.

| Identification of differential metabolic gene sets between HCC samples with and without CTNNB1 mutations
Although CTNNB1 mutant HCC has been reported to have unique metabolic characteristics, the associated metabolic genes are still unknown. 3 Table 1). In contrast, no gene sets related to metabolic pathways were significantly enriched in wildtype CTNNB1 HCC.

| Construction of prognostic model in the TCGA cohort
Although CTNNB1 has been confirmed to be one of the genes with the highest mutation frequency in HCC, [1][2][3][4]7,18 Figure 2B). The expression levels of ALAS1, CYP3A5 and GOT2 were higher in the low-risk group, whereas the expression levels of AMD1 and PRDX1 were higher in the high-risk group ( Figure 2C).

TA B L E 3
The chi-square test of the relation between risk score and clinical features in TCGA cohort score ( Figure 2D-E). As found by the univariate and multivariate Cox regression analyses, the risk score could independently predict the prognosis ( Figure 2F-G).

| Internal validation of the prognostic model in TCGA cohort
Patients

| Relationship between risk score and clinicopathological characteristics
Previous studies have shown that AFP levels, histological grade,

| Gene Ontology functional enrichment analysis
We used Gene Ontology (GO) functional enrichment analysis to annotate the function of DEGs ( Figure 6A) in the group with a high risk and the group with a low risk. As found, the genes significantly upregulated in the group with a high risk were associated with a variety of immune regulatory processes including regulation of lymphocyte activation, regulation of immune effector processes and humoral immune response ( Figure 6B). The estimation of immune cell infiltration used by TIMER method showed that the infiltration level of six types of immune cells in high-risk group was all higher than that in low-risk group ( Figure 6C), indicating that the prognostic signature may affect the prognosis of HCC patients through regulating immune microenvironment of tumour.

| Relationship between risk score and immune cell infiltration
We further analysed the correlation between the prognos- neutrophils (r = 0.448, P < 0.001) ( Figure 7A). CTNNB1-mutated HCC is characterized by immune rejection, 23 and a recent clinical study showed that this subgroup is not sensitive to the treatment of immune-checkpoint inhibitors. 8 Our study found that the risk score positively corrected with the expression levels of six major immune checkpoints (PDL1, LAG3, CTLA4, CD276, PDCD1 and TIGIT) ( Figure 7B), which were consistent with the risk score of the CTNNB1-mutated subgroup, which was lower than that of the non-mutated subgroup ( Figure 7C).

| Relationship between risk score and tumour immune microenvironment
We used ssGSEA to further explore the internal relationship between prognostic model and tumour immune microenvironment (TIME

TA B L E 5
The chi-square test of the relation between risk score and clinical features in GSE14520 cohort terms of immune cell infiltration, the abundance of Tregs, aDCs, Th2 cells and macrophages of the high-risk group was significantly higher than those of the low-risk group in the four independent cohorts ( Figure   S1A-D). In terms of immune function, the results revealed that the type II IFN response in the low-risk group was significantly stronger than that in the low-risk group ( Figure S1A-D). In addition, the expression of immune checkpoint-related gene set was up-regulated in the high-risk group, which was consistent with the previous results ( Figure S1A-B).

| D ISCUSS I ON
In the past few decades, considerable progress has been made in understanding the risk factors and molecular characteristics of HCC. [24][25][26] However, morbidity and cancer-specific mortality continue to increase in many countries. 26,27 The existing prognostic staging system still has many limitations in guiding accurate treatment and predicting more accurate clinical outcome. 28,29 We need to further improve the existing prognostic staging system using gene sequencing technology, so that patients can benefit from the accurate treatment. 8 There is increasing evidence that metabolic reprogramming can remarkably affect HCC development and may be related to advanced diseases and adverse clinical outcome 30,31 ; hence, it seems that targeting tumour metabolism can promisingly assist in effectively treating HCC.
A recent report found that proteins involved in different metabolic activities such as drug metabolism, gluconeogenesis, glycolysis and amino acid metabolism are enriched in CTNNB1 mutant HCC. 11 NRF2 mutation is an early event in the carcinogenesis of HCC and is considered to be an important factor in promoting the progression of precancerous hepatocytes to HCC. 32 Existing studies have found that the abnormal transduction of NRF2 signal pathway is associated with simvastatin. 33,34 In contrast, CTNNB1 mutation usually occurs in the later stage of HCC progression, and the relevant research on its specific mechanism is still lacking. Considering this, we conducted GSEA of mRNA expression profiles in the TCGA database. According   44,45 However, the role of ALAS1 in HCC remains F I G U R E 6 Identification of differentially expressed genes in high-risk and low-risk HCC patients. A, Heat map of differentially expressed genes samples between high-risk and low-risk groups. B, GO circle plot of immune pathways differentially enriched in patients with different risk scores. C, Violin plot of relationships between the risk score and infiltration abundances of six types of immune cells (red represents high-risk group, and blue represents low-risk group) not well known. CYP3A5 is a cytochrome P450 protein that plays a role in the metabolism of many carcinogens and anticancer drugs in the liver. Jiang et al found that CYP3A5 plays an antitumour role in HCC by regulating the mTORC2/Akt signalling pathway. 46 Glutamate oxaloacetate transaminase 2 (GOT2) has been repeatedly reported in recent years to be associated with the progression of pancreatic cancer, 47,48 but its role in HCC remains unclear.
Although the above studies support that our prognostic model has a high potential for clinical application, the study faces some limitations. Although the accumulated data of high-throughput analysis from a large number of samples have been optimally applied, further verification through prospective studies is necessary. Furthermore, the specific biological functions of the five genes in HCC need to be explored experimentally.

| CON CLUS IONS
We constructed a novel prognostic model that is useful for further improving the prognostic evaluation system of patients with HCC,

F I G U R E 7
The landscape of immune infiltration in high-and low-risk HCC patients. A, Correlation analysis of risk score and immune cell infiltration. B, The relationship between risk score and expression level of immune checkpoint. C, The relationship between risk score and CTNNB1 status (P-value significant codes: 0 ≤ *** < 0.001 ≤ ** < 0.01 ≤ * < 0.05 ≤ . < 0.1) and the five metabolic genes in this model are expected to become potential targets for the treatment of HCC.

CO N FLI C T O F I NTE R E S T
The authors have no conflicts of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets analysed for this study were obtained from The