Development and validation of apoptosis‐related signature and molecular subtype to improve prognosis prediction in osteosarcoma patients

Abstract Background Previous evidence has shown that apoptosis performs integral functions in the tumorigenesis and development of various tumors. Therefore, this study aimed to establish a molecular subtype and prognostic signature based on apoptosis‐related genes (ARGs) to understand the molecular mechanisms and predict prognosis in patients with osteosarcoma. Methods The GEO and TARGET databases were utilized to obtain the expression levels of ARGs and clinical information of osteosarcoma patients. Consensus clustering analysis was used to explore the different molecular subtypes based on ARGs. GO, KEGG, GSEA, ESTIMATE, and ssGSEA analyses were performed to examine the differences in biological functions and immune characteristics between the distinct molecular subtypes. Then, we constructed an ARG signature by LASSO analysis. The prognostic significance of the ARG signature in osteosarcoma was determined by Kaplan–Meier plotter, Cox regression, and nomogram analyses. Results Two apoptosis‐related subtypes were identified. Cluster 1 had a better prognosis, higher immunogenicity, and immune cell infiltration, as well as a better response to immunotherapy than Cluster 2. We discovered that patients in the high‐risk cohort had a lower survival rate than those in the low‐risk cohort according to the ARG signature. Furthermore, Cox regression analysis confirmed that a high risk score independently acted as an unfavorable prognostic marker. Additionally, the nomogram combining risk scores with clinical characteristics can improve prediction efficiency. Conclusion We demonstrated that patients suffering from osteosarcoma may be classified into two apoptosis‐related subtypes. Moreover, we developed an ARG prognostic signature to predict the prognosis status of osteosarcoma patients.


| BACKG ROU N D
Osteosarcoma has been identified as the most prevalent type of primary bone malignancy in adolescents and children. 1 Osteosarcoma usually occurs in the metaphysis, including the humerus, tibia, or femur, which leads to high mortality and disability rates, especially in patients with metastasis. 2 Despite the significant advances in therapies, including immunotherapy, radiotherapy, chemotherapy, and differentiation therapy, the effectiveness in the treatment of patients with osteosarcoma has remained unsatisfactory in the past few decades due to genomic complexities and instability. [3][4][5] Although there is an approximately 70% 5-year survival rate for patients suffering from localized osteosarcoma, patients suffering from metastatic disease experience unfavorable overall survival (OS) rates of <20%. 6 As a consequence, novel treatment targets must be explored, and biomarkers must be identified for the purpose of effectively stratifying patients and designing tailored therapy regimens for patients with osteosarcoma.
Apoptosis, also known as programmed cell death, is a main cellular process by which mammals eliminate DNA-damaging cells and sustain tissue homeostatic control. 7 There are two significant apoptosis pathways, namely, mitochondria-mediated pathways (intrinsic pathway) and death receptor-mediated pathways (extrinsic pathways). 8,9 Apoptosis is implicated in a variety of biological as well as pathological mechanisms, including the progression of tumors, oncogenesis, organ and tissue homeostasis, and embryonic growth. 10,11 Tumor cells have the capacity to escape programmed cell death, which could also increase invasiveness in the process of tumor growth, boost tumor angiogenesis, and accelerate cell proliferation. 12,13 Additionally, selective induction of apoptosis is one of the most effective anticancer therapies, including targeted therapy, radiotherapy, and chemotherapy. 14 In view of the fact that large-scale public databases comprising gene expression data as well as clinical information are now available, it has become feasible to create a highly accurate prognostic signature. In recent years, the development of apoptosis-related gene (ARG) signatures for the risk assessment and prognosis prediction of cancers has become a research hotspot and has yielded excellent outcomes. 15,16 However, there is no clarifying ARG signature for predicting osteosarcoma patient prognosis.
To investigate the ARG molecular subtypes of osteosarcoma, we first obtained the ARG expression patterns and relevant clinical data of osteosarcoma patients from the TARGET database.
Subsequently, utilizing ARGs from the TARGET cohort, we created a prognostic signature that was verified in the GEO cohort to enhance risk stratification as well as prognostic predictions in patients with osteosarcoma. Overall, the present research could aid in achieving an enhanced comprehension of the fundamental process as well as the evaluation of prognosis of osteosarcoma patients.

| Data collection
We downloaded the FPKM RNA-sequencing data as well as relevant clinical data of 88 osteosarcoma patients (TARGET-OS cohort) from the genomic data commons data portal (https://portal.gdc.cancer. gov/) as a training cohort. The clinical characteristics and mRNA expression data of 53 osteosarcoma patients in GSE21257 were acquired from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) as a validation cohort. Table 1 illustrates the patients' clinical data.

| Consensus clustering of osteosarcoma patients
We searched GSEA-MSigDB (http://www.gsea-msigdb.org/gsea) for ARGs by searching for "apoptosis" as a keyword, and 580 ARGs were identified (Table S1). To identify ARGs that were substantially correlated with the prognosis of osteosarcoma patients, we performed a univariate Cox proportional hazards regression analysis. For the purpose of performing subsequent analysis, 58 ARGs (Table S2) with a p value <0.05 were defined as prognosis-related ARGs and subjected to further analysis. Subsequently, utilizing the "Consensus ClusterPlus" tool in R, these osteosarcoma samples were subjected to consensus clustering with a clustering factor (k) ranging from 2 to 9. The highest intragroup correlations and the lowest intergroup correlations were obtained when k = 2, demonstrating that the 88 osteosarcoma patients can be divided into two clusters, Cluster 1 and Cluster 2. The osteosarcoma patients in the two clusters were visualized in a two-dimensional scatter plot after t-distributed stochastic neighbor embedding (tSNE) and principal component analysis (PCA) dimensionality reduction.

| Functional enrichment analysis
The threshold values for identifying differentially expressed genes (DEGs) across the two clusters were set as a false discovery rate (FDR) p < 0.05 and log2|fold change|>1. Gene Ontology (GO) 17

and Kyoto
Encyclopedia of Genes and Genomes (KEGG) 18 pathway enrichment analyses were conducted according to these DEGs to examine the distribution of the biological functions between the two clusters. We also conducted gene set enrichment analysis (GSEA, version 4.0.1) between Clusters 1 and 2 for the purpose of examining the different enrichment of pathways. 19 A FDR p < 0.05 was set to illustrate statistical significance.

| Immune characteristics of the two apoptosisrelated clusters
First, the ESTIMATE algorithm 20 was used to quantify the scores of the tumor microenvironment (TME) of each osteosarcoma patient, including estimate, immune scores, and stromal scores. Single-sample gene set enrichment analysis (ssGSEA) 21 was utilized to assess the enrichment scores of 16 distinct types of immune cells and the activities of 13 pathways correlated with immune function in each osteosarcoma sample. Then, we compared these scores between the two apoptosisrelated clusters. Ultimately, we examined the expression of immune checkpoint genes as well as human leukocyte antigen (HLA) genes between the two clusters to anticipate immunotherapy responsiveness.

| Establishment and verification of an apoptosis-related risk signature
In this protocol, we used the TARGET dataset as the training cohort for the purpose of developing the prognostic model. In addition, we obtained survival-related ARGs in the univariate Cox analysis that had been preliminarily filtered to conduct the least absolute shrinkage and selection operator (LASSO) regression utilizing the glmnet R package, 22 assisting in the identification of suitable factors with which to standardize the completed signature as well as prevent overfitting. In addition, utilizing the Survminer R package, we successfully extracted the prognostic risk score equation by performing a multivariate Cox regression analysis. According to the formula, each osteosarcoma patient's risk score can be derived as depicted below: Coefi is the coefficient of ARG in the signature. Subsequently, the osteosarcoma patients were classified into low-and high-risk cohorts based on the median value, which served as the threshold value in this study. The performance of the AGR signature was verified in 53 osteosarcoma patients who had survival data from the GSE21257 dataset as a validation set. The sva module in R was used to adjust all of the data from the two datasets. Time-dependent 1-, 3-, and 5-year receiver operating characteristic (ROC) curves, the Kaplan-Meier plotter with the log-rank test, and multivariate and univariate Cox regression analyses were employed to examine the AGR signature's predictive power and accuracy. To determine whether our ARG signature had a superior predictive ability for osteosarcoma patients, we compared it with five published prognostic signatures related to glycolysis, 23 immunity, 24 hypoxia, 25 ferroptosis, 26 and metastasis. 27 Moreover, a nomogram incorporating the risk model together with clinical data was developed to accurately predict the prognosis status of osteosarcoma patients. The prediction accuracy of the nomogram was confirmed utilizing calibration curves as well as time-dependent 1-, 3-, and 5-year ROC curves. Eventually, the chi-square test was employed to examine the correlation between the risk score and the clinicopathological variables of osteosarcoma patients.

| Statistical analysis
R software (version: 4.1.0) was employed to perform all statistical analyses and visualize the data. The chi-square test or Wilcoxon signedrank test was utilized to conduct data comparisons between various cohorts. With the help of a Kaplan-Meier plotter with a log-rank test, we created OS curves for the various cohorts. With respect to clinicopathological factors, multivariate and univariate Cox regression analyses were conducted to determine whether the risk score independently served as a prognostic predictor for patients suffering from osteosarcoma. A criterion of p < 0.05 was set to indicate statistical significance.

| Apoptosis-related genes distinguished osteosarcoma patients
A total of 58 ARGs associated with prognosis were discovered using univariate Cox regression analysis. According to consensus clustering analysis based on these prognosis-related ARGs, 88 osteosarcoma  Figure 1A). A heat map ( Figure 1B) showed the two clusters classified by 58 prognosis-related ARGs. PCA ( Figure 1C) and tSNE ( Figure 1D) plotters based on ARG expression showed a clear distinction between the two clusters. Additionally, the Kaplan-Meier curve revealed that Cluster 2 had significantly worse OS than Cluster 1 ( Figure 1E, p < 0.001).

| Functional enrichment analysis between the two apoptosis-related clusters
To examine the possible biological functions across the two clusters, we next identified 389 DEGs (Figure 2A and Table S4)

| Analysis of immune infiltration characteristics
We applied ESTIMATE and ssGSEA to explore the TME, immunerelated pathways, immune-related functions, and immune cell infiltration ( Figure 3A). TME analysis ( Figure 3B

| Construction and evaluation of ARG signature
We used the TARGET cohort as the training set, whereas GSE21257 was the validation set. We constructed the ARG To determine whether our ARG signature had a superior predictive performance, we calculated the AUC values and C-index

| Correlation between risk score and clinical parameters
Furthermore, the association between the risk scores and clinical variables was investigated. Based on clinical factors, we classified the patients into distinct groups. The heatmap ( Figure 8A) showed that metastasis status was significantly different between the lowand high-risk groups (p < 0.05), highlighting the fact that there were no statistically significant differences in any other clinical information between the two groups. As shown in Figure 8B, we classified the patients into four groups according to their risk scores and metastatic status. The chi-square test demonstrated that osteosarcoma patients in the high-risk cohort exhibited a greater metastatic rate than those in the low-risk cohort (37% vs. 14%, p = 0.032).

| DISCUSS ION
Osteosarcoma is a prevalent malignancy that usually affects children as well as adolescents. Despite the fact that developments in surgical procedures and holistic treatments have enhanced the local control rate and improved the quality of life of osteosarcoma patients, the five-year survival rate of osteosarcoma patients remains as low as 20% or less due to metastasis and recurrence. 6,28 The most effective method of improving the prognosis of these patients is to identify patients at high risk in a timely manner and to implement more Cluster 2, indicating that Cluster 1 may have higher immunogenicity than Cluster 2. Growing evidence has shown that the increase in immunogenicity can activate the immune response, which contributes to a better prognosis for cancer patients. 35,36 To further assess immune infiltration characteristics between the two subtypes, we first utilized the ESTIMATE algorithm to explore the TME, which consists of stromal and tumor cells 37 and performs an indispensable function in tumorigenesis, 38 tumor progression, 39 and metastasis. 40 In the present research, we revealed that Cluster Although immunotherapy is increasingly acknowledged as an effective and promising treatment for osteosarcoma, 47 The present research has a few limitations. First, given the low morbidity of osteosarcoma, the representative sample size employed in the present research was not large, and further confirmation in larger cohorts is needed. Second, the present research was focused solely on bioinformatic analysis, and additional experiments are required to confirm our findings. Third, our research is retrospective, and the ARG signature should be validated in large-scale clinical trials and prospective studies. Finally, the biological functions of apoptosis-related genes in the signature were not explored in osteosarcoma, and future studies are needed to investigate them.
In conclusion, we divided osteosarcoma patients into two subtypes based on ARGs. We then identified that Cluster 1 had higher immunogenicity and infiltration of immune cells, as well as a better response to immunotherapy, than Cluster 2. In addition, the ARG signature we developed can accurately predict the OS of osteosarcoma patients. Moreover, the nomogram we developed, which combines the ARG signature and clinical data, may exert a powerful prediction

CO N FLI C T O F I NTE R E S T
All authors hereby state that there are no financial or other conflicts of interest associated with the present research.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available from The Cancer Genome Atlas database (https://portal.gdc.cancer. gov/) and the Gene Expression Omnibus database (https://www. ncbi.nlm.nih.gov/geo).