Identifying the predictive role and the related drugs of oxidative stress genes in the hepatocellular carcinoma

Abstract Background and Aims Oncogenesis and tumor development have been related to oxidative stress (OS). The potential diagnostic utility of OS genes in hepatocellular carcinoma (HCC), however, remains uncertain. As a result, this work aimed to create a novel OS related‐genes signature that could be used to predict the survival of HCC patients and to screen OS related‐genes drugs that might be used for HCC treatment. Methods We used The Cancer Genome Atlas (TCGA) database and the Gene Expression Omnibus (GEO) database to acquire mRNA expression profiles and clinical data for this research and the GeneCards database to obtain OS related‐genes. Following that, biological functions from Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) were performed on differentially expressed OS‐related genes (DEOSGs). Subsequently, the prognostic risk signature was constructed based on DEOSGs from the TCGA data that were screened by using univariate cox analysis, and the Least Absolute Shrinkage and Selection Operator (LASSO) regression, and multivariate cox analysis. At the same time, we developed a prognostic nomogram of HCC patients based on risk signature and clinical‐pathological characteristics. The GEO data was used for validation. We used the receiver operating characteristic (ROC) curve, calibration curves, and Kaplan–Meier (KM) survival curves to examine the prediction value of the risk signature and nomogram. Finally, we screened the differentially expressed OS genes related drugs. Results We were able to recognize 9 OS genes linked to HCC prognosis. In addition, the KM curve revealed a statistically significant difference in overall survival (OS) between the high‐risk and low‐risk groups. The area under the curve (AUC) shows the independent prognostic value of the risk signature model. Meanwhile, the ROC curves and calibration curves show the strong prognostic power of the nomogram. The top three drugs with negative ratings were ZM‐336372, lestaurtinib, and flunisolide, all of which inversely regulate different OS gene expressions. Conclusion Our findings indicate that OS related‐genes have a favorable prognostic value for HCC, which sheds new light on the relationship between oxidative stress and HCC, and suggests potential therapeutic strategies for HCC patients.

nomogram.The top three drugs with negative ratings were ZM-336372, lestaurtinib, and flunisolide, all of which inversely regulate different OS gene expressions.

Conclusion:
Our findings indicate that OS related-genes have a favorable prognostic value for HCC, which sheds new light on the relationship between oxidative stress and HCC, and suggests potential therapeutic strategies for HCC patients.

| INTRODUCTION
Hepatocellular carcinoma (HCC) is the most common type of primary liver tumor, accounting for more than 90% of all primary liver tumors. 1 HCC is the fifth most common cause of cancer worldwide and the second leading cause of cancer death and the overall survival rate for HCC remains poor. 2 Current HCC evaluation and treatment do not satisfy the standards for early identification and longer overall survival.As a result, new biomarkers with higher predictive value must be investigated to improve patient prognosis.
The mechanism of HCC development is still unknown at the moment.5][6] HCV and HBV infection causes oxidative DNA damage, which actively encourages hepatocellular tumorigenesis. 7Therefore, OS may be closely related to the occurrence of HCC.However, the prognostic value of these OS genes in HCC and their associated mechanisms require further investigation.
Current diagnostic methods for HCC are mainly dependent on imaging and histopathological examination.In recent years, large-scale tumor genome sequencing data has provided an excellent opportunity to identify potential molecular markers. 8,9In present study, using mRNA sequencing data and clinical information from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) datasets, we screened OS genes linked with HCC prognosis and developed a risk signature and prognosis-related nomogram based on prognosisrelated OS genes.In addition, we screened small molecule drugs linked to OS genes to provide a new perspective on HCC treatment.

| Data collection and identification of differentially expressed OS-related genes (DEOSGs)
The mRNA expression profile dataset contained 374 HCC samples and 50 normal liver tissues, as well as clinical data from the TCGA database 10 (https://portal.gdc.cancer.gov/) on March 27, 2022.
In addition, we downloaded the clinical and gene expression profiles of 167 HCC patients in the GSE76427 dataset from GEO. 11 Finally, we extracted the complete clinical data (age, stage, sex, survival time) of 279 patients in TCGA and 114 patients in GEO, respectively.
TCGA data was used as a training group and GEO data was used as a validation group.To identify differentially expressed oxidative stress genes, we obtained 611 OS-related genes with relevance scores ≥7 from the GeneCards website (https://www.genecards.org).We finally obtained 544 common OS-related genes in both datasets.Subsequently, we used the edgeR package 12 for further processing with the criteria of FDR <0.05 and jLog2FCj ≥ 0.5, concerning previous study criteria. 13Ultimately, differentially expressed genes were visualized with the heatmap package and ggplot package in R software.

| Gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) analyses
To further clarify the potential functions of differentially expressed genes, GO 14 and KEGG analysis 15 were performed using the cluster Profiler package in R software and visualized using the enrichplot and graphics packages.

| Identification of prognosis-related DEOSGs and construction of prognostic model
We firstly performed univariate Cox regression and Kaplan-Meier (KM) analysis using the survival R package to identify OS-related genes from TCGA that were associated with overall survival (OS).Only p value <.05 genes were identified as prognosis-associated DEOSGs in both investigations.To narrow down the spectrum of prognosisassociated genes, LASSO regression analysis of the above prognosisassociated DEOSGs was conducted using the glmnet package. 16Subsequently, a multivariate cox regression analysis was performed using the survival package (https://cran.r-project.org/package = survival) to create an OS-related risk signature based on the regression coefficients from the analysis and the expression levels of the selected OS-related genes.The formula was calculated as follows: risk score = Σexpgene i *β i , where expgene represents the relative expression value of OS genes and β represents the regression coefficient.The median risk score was used to divide patients into high-risk and low-risk groups, and overall survival was compared between the two categories using the survival package's Kaplan-Meier approach and the log-rank test.To validate the predicted accuracy and ability of the risk score, the survival ROC and time ROC packages 17 in R software were used.The link between clinical features, risk scores, and prognosis was further investigated using univariate and multivariate Cox regression analyses.Finally, the rms package was used to create a nomogram containing calibration plots to estimate the prognosis of HCC patients.
The GEO cohort was used to test the model's prognostic performance.

| Identification of potential therapeutic drugs
The DEOSGs have been uploaded to the L1000FWD website (https://maayanlab.cloud/L1000FWD).After that, a table of prospective medications was obtained.We further used PubChem (pubchem.ncbi.nlm.nih.gov) to visualize the above results.

| Identification and functional enrichment analysis of DEOSGs
Figure 1 shows the workflow used for bioinformatics analysis of the datasets.We first identified 544 OS genes by taking the intersecting three datasets (Figure 2A).Then, we analyzed the expression of 544 OS genes in 50 normal samples and 374 tumor tissues from the TCGA dataset.There were 282 OS genes identified as DEOSGs,

| Screening of DEOSGs associated with prognosis and development of risk signatures
To discover DEOSGs, KM survival and univariate Cox analyses were used to analyze 15 DEOSGs, and 15 DEOSGs were chosen as HCC prognosis-associated candidate OS genes with p < .05. (Figure 5A).Subsequently, the LASSO algorithm was used to narrow down the range of prognosis-related OS genes, and 9 DEOSGs were selected to construct the risk signature of OS genes (Figure 5B,C).The coefficients of the 9 DEOSGs are shown in Table 1.
Based on the median risk score, HCC patients in two datasets were separated into low-risk and high-risk categories.(Figure 6A-F).The survival ROC curves of the risk signature in the TCGA and GEO were 0.685 and 0.645, respectively (Figure 5D,E).The results showed a better performance of the risk signature than clinical-pathological factors in both datasets.Overall survival was considerably worse in high-risk HCC patients than in low-risk patients in the TCGA and GEO datasets.
(Figure 5F,H).We also plot the time-dependent ROC of risk signatures in the two datasets.The AUCs of 1, 2, 3-year were 0.662, 0.712.0.712 in TCGA dataset, and 0.679, 0.724, 0.771 in GEO T A B L E 1 Nine prognosis-associated OS genes in the TCGA dataset were identified by univariate Cox analysis and LASSO analysis.

| Evaluating the levels of expression of prognostically relevant DEOSGs in HCC patients
We extracted the expression values of each prognostic-related OS gene from the TCGA dataset and further to investigate the transcript levels of DEOSGs in HCC patients.The TCGA database was used to extract the expression values of each important gene, and box plots (Figure 8A) and heat maps (Figure 8B) were created.To discover OS genes associated with prognosis, we used univariate cox analysis and LASSO analysis, generating 9 DEOSGs: ANXA5, MMP1, MTR, REN, SCN4A, HP, ACOX1, and SMAD3.
We also analyzed the levels of expression of these genes in the TCGA dataset as well as the levels of protein expression in the HPA database.The results showed that ANXA5, MMP1, MTR, REN, SCN4A, and SMAD3 were upregulated in HC samples, while HP and ACOX1 were downregulated in tumor tissues.The results of this study were the same as those of previous studies.Overexpression of ANXA5 and MMP1 in HCC may enhance clinical progression and lymphatic metastasis. 20,21Moreover, low levels of HP expression are related to superior HCC tumor differentiation and enhanced five-year overall survival in HCC patients. 22 evaluate whether the risk signature of all these specific DEOSGs can be used as a prognostic marker, we performed univariate and multivariate Cox regression analyses, which revealed that this risk signature is an independent predictor with a strong predictive significance for HCC.Survival and ROC analysis confirmed the prognostic value of risk signature for predicting HCC.Furthermore, the nomogram analysis revealed that risk characteristics were of significant value in predicting overall survival in HCC patients, and their predictive power was significantly higher than that of other clinicopathological characteristics.Thus, OS plays a pivotal role in all stages of carcinogenesis and cancer progression. 23,24is study has several advantages.On one hand, the risk signature was performed external validation, demonstrating the robustness and reliability of the risk signature.On the other hand, a nomogram for quantitative calculations was further established, which facilitates clinical dissemination and application.The present study still has some limitations.First, this study is a retrospective analysis requiring further validation by prospective studies.Second, the mechanism of these genes and HCC progression needs to be further explored.
In conclusion, our study revealed 9 DEOSGs associated with overall survival in HCC and created a prognosis-related risk signature and nomogram based on these 9 DEOSGs.Simultaneously, we screened small molecule drugs that could be linked to DEOSGs.

F
I G U R E 1 Flowchart of the research.F I G U R E 2 Identification of DEOSGs.(A) The common OS-related genes in GEO, TCGA, and GeneCards.(B) Volcano plot of DEOSGs between normal and tumor tissues from TCGA.(C) Heatmap of DEOSGs.

3
| STATISTICAL METHODS R software (version 4.3.1)was used as a statistical analysis tool in this study.Quantitative data were expressed as mean ± standard error (SEM) or standard deviation (SD).Statistical significance was defined as p < .05.Figures were constructed by R software.

F
I G U R E 3 GO analysis of upregulated and downregulated DEOSGs.(A, B) Top 10 classes of GO enrichment terms of upregulated DEOSGs in biological process (BP), cellular component (CC), and molecular function (MF).(C, D) Top 10 classes of GO enrichment terms of downregulated DEOSGs in biological process (BP), cellular component (CC), and molecular function (MF).(E) Grid diagram of top 5 GO terms and upregulated DEOSGs.(F) Grid diagram of top 5 GO terms and downregulated DEOSGs.

F
Figure 2B,C, we illustrated the distribution of DEOSGs expression in HCC.Figures 3 and 4 present the analytical results of molecular mechanisms and potential pathways of the DEOSGs using the GO and KEGG analysis.

F I G U R E 6
Association of risk score with survival time and prognosis-related OS gene expression.(A, D) The risk curves of risk score distribution in the TCGA and GEO datasets.(B, E) Scatterplot of survival status of HCC patients in the TCGA and GEO datasets.(C, F) The heatmap displayed the expression levels of prognostic-related DEOSGs in the high-risk and low-risk groups in the TCGA and GEO datasets.dataset, respectively (Figure 5G,I).All the above results demonstrated that risk signatures have moderate specificity and sensitivity.However, when HCC patients were divided into living and dead groups, growing risk scores were no longer significantly associated with HCC patient survival time.(Figure 6B,E), suggesting that our risk signature only predicts survival in the entire population and cannot be utilized to predict HCC patient survival time.Furthermore, Univariate and multivariate Cox regression analyses were performed to see whether the risk signature was a prognostic characteristic.The risk signature was an independent predictive characteristic in the TCGA dataset that was substantially linked with the prognosis of HCC.The nomogram is a quantitative model that is used to forecast clinical outcomes in HCC patients.Following that, we created a predictive nomogram for HCC patients using the TCGA dataset, based on risk F I G U R E 7 Construction and validation of prognostic nomogram.(A) The prognostic nomogram for HCC patients.(B, C) The time-dependent ROC of nomogram at 1, 2, and 3 years in the TCGA and GEO datasets.(D-F) The 1,2,3-years calibration curves of nomogram in the TCGA dataset.(G-I) The 1,2,3-years calibration curves of nomogram in the GEO dataset.signature and clinical-pathological variables (Figure 7A).We further evaluated the predictive performance of this nomogram in the TCGA and GEO datasets using time-dependent ROC and calibration curves.The AUCs of the nomogram at 1, 2, and 3 years were 0.712, 0.694, and 0.701 in the TCGA dataset (Figure 7B), and 0.699, 0.810, and 0.830 in the GEO dataset (Figure 7C).The calibration curves in the TCGA (Figure 7D-F) and GEO datasets (Figure 7G-I) demonstrated that the nomogram had powerful calibration performance.

F I G U R E 8
Expression level of prognosis-related DEOSGs.(A) The box plot of prognostic-related DEOSGs in the TCGA dataset.(B) The heatmap of prognostic-related DEOSGs in the TCGA dataset.T A B L E 2 The screened drugs for HCC treatment.

4. 4 |
Screening of DEOSGs related smallmolecule drugsTo find prospective medications for the treatment of HCC, we uploaded up-and down-regulated DEOSGs to the L1000FWD website and matched them with small-molecule therapeutic methods.Table2summarizes the top 10 small-molecule drugs, as well as their similarity scores.The top three drugs were ZM-336372, lestaurtinib, and flunisolide, all of which were predicted to have an inverse effect on DEOSGs expression.Using the PubChem website, they were visualized in 2D and 3D forms (Figure9A-F).These potential small molecule drugs are expected to inhibit OS-induced gene expression and pave the way for the development of targeted treatments for the treatment of HCC.

5 |
DISCUSSIONCurrent research on the pathogenesis of HCC has concentrated on the cellular and molecular levels.However, few studies have attempted to utilize the research results in clinical situations.In our study, 282 DEOSGs were identified based on the TCGA dataset, and GO and KEGG enrichment analyses were performed.The results of KEGG enrichment analysis revealed that DEOSGs were remarkably related to lipid and atherosclerosis, fatty acid degradation, ROS, and hepatitis B. Additionally, DEOSGs were found to be considerably concentrated in a variety of biological processes, including response to oxidative stress, cellular response to chemical stress, response to xenobiotic stimulus, and fatty acid metabolic process.All of these biological processes are linked to the onset and progression of liver disease.18,19

F I G U R E 9
Structures of the screened small-molecule drugs.(A, B) 2D and 3D structures of ZM-336372.(C, D) 2D and 3D structures of lestaurtinib.(E, F) 2D and 3D structures of flunisolide.