Development of a prognostic gene signature based on an immunogenomic infiltration analysis of osteosarcoma

Abstract Osteosarcoma is the most common primary malignant bone tumour predominantly occurring in children and adolescents with a high tendency of local invasion and early metastases. Currently, tumour immune microenvironment (TME) is becoming the focus of studying of malignant tumours.. However, no sound evidence shows a specific immune molecular target in osteosarcoma. We downloaded the gene expression profile and clinical data of osteosarcoma from the TARGET portal, and extracted and normalized via R software. Then, the immune cell infiltration assessed by CIBERSORT and ESTIMATE algorithms. Three survival‐related immune cells and immune score were obtained via Kaplan‐Meier survival analysis, and 232 immune‐related genes were obtained as candidate genes. Enrichment and protein‐protein interaction co‐expression analyses were performed to identify 13 hub genes. Lastly, a seven gene prognostic signature was identified by univariate and multivariate Cox regression analyses. More importantly, our validations and TIMER algorithm suggested this immune‐related prognostic signature a good predictive tool. Our findings have provided novel insights that could demonstrate new targets of immunotherapy in osteosarcoma.

progression and tumorigenesis of osteosarcoma, such as MYC and microRNA-133a. [3][4][5] But the rarity and heterogeneity of the tumour itself represent a limitation in the development of practical prognostic factors and biomarkers. Currently, the emerging researching focus of malignancy is tumour microenvironment (TME), which represents the communication between tumour cells and the corresponding immune cells. It mirrors the traits of a tumour from another perspective. And several molecular mechanisms are potentially being developed into immune therapeutic targets.
Though there were many clinical trials of immune therapies conducting on osteosarcoma, no sound evidence shows a specific molecular target effective. [6][7][8] It still remains a big unknown world to be explored.
The tumour-infiltrating immune cells play an important role in TME. 9 Many algorithms were exploited to assess the infiltration of immune cells in genetic levels, such as Cell type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT), ESTIMATE, Tumor Immune Estimation Resource (TIMER), ImmPort and xCell. In this study, we provide a new approach combining some of these algorithms together to assess immune cell infiltration of osteosarcoma samples for the first time. The RNAseq gene expression profile and clinical data were acquired from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) database, which is the most comprehensive genomic resource for childhood cancers and sarcomas. Moreover, our study then provides the cell and gene signature that could be considered biomarkers for prognosis for osteosarcoma immune therapy. And it might shed some light on identifying a new potential immune target for osteosarcoma.

| Study design and data collection
The raw expression data and clinical data were downloaded directly from TARGET (OS) database including 98 osteosarcoma patients with 101 samples. The RNA-seq data were updated on 4 September 2018. And the clinical data were updated on 5 August 2019. Then, R software ("tximport" and "edgeR" packages) was used to extract raw data and sort to obtain gene expression matrixes.
The study design was shown in a flow chart (Figure 1), and the clinical data were arranged in Table S1. The validation cohorts were downloaded from GEO 10 in the National Center for Biotechnology Information database. The gene expression profiles of GSE21257, GSE39058 and GSE16091 met our validation conditions. 11-13

| CIBERSORT, ESTIMATE and TIMER
Cell type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT) is an analytical tool developed by Newman to assess immune cell infiltration ratio based on RNA-seq profiles. 14 We ran CIBERSORT locally in R software. The abundance ratio matrix of 22 immune cells was obtained with P < 0.05. Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) algorithm is developed by Yoshihara K to calculate the immune score of sequencing samples. 15 Furthermore, the osteosarcoma samples were divided into high-or low-score groups to identify a possible association of immune score with overall survival and progression-free survival. TIMER (https://cistr ome.shiny apps.io/ timer/) is a comprehensive resource to systematically analyse immune infiltrates. 16 The prognostic signature was validated for immune correlation via TIMER.

| Clinical relationship with survival-related immune cells
The data of metastasis, relapse, necrosis stage and 5-year survival were analysed in the study using Wilcoxon rank-sum test. The definitions of clinical traits were based on the osteosarcoma clinical trials. 17 "Relapse": confirmed primary relapse cases in 5 years; to delete the unconfirmed alive cases followed less than 5 years. "Necrosis stage": histological necrosis rate < 90 as Stage 1/2; histological necrosis rate ≥ 90 as Stage 3/4. "5-year survival": cases with follow-up time more than 5 years. All the cases were summarized in Table S1.
Differences among clinical parameters were tested using independent t tests. P-values of less than 0.05 were considered statistically significant.

| Gene ontology and Kyoto encyclopaedia of genes and genomes enrichment analyses
Gene Ontology (GO) and Kyoto encyclopaedia of genes and genomes (KEGG) pathway analysis were conducted via R software "clusterProfiler" package (http://www.bioco nduct or.org/packa ges/ relea se/bioc/html/clust erPro filer.html). Counts ≥ 4 and P < .05 were set as the enrichment cut-offs to screen meaningful enrichment results. P-value was the judgement of significance of enrichment results. The enrichment results were visualized via the ggplot2 R package.

| Protein-protein interaction network construction and MCODE to identify hub genes
The STRING database (https://string-db.org/) was used to explore protein-protein interaction (PPI) network construction of 232 candidate genes, and the combined score was set to ≥ 0.4. Then, the network was reconstructed via Cytoscape software "molecular complex detection (MCODE)" plug-in. The degree cut-off was set to 2, and node score cut-off: 0.2. The most significant module was screened (score > 10, Num.nodes > 50), and the genes were identified as Hub genes.

| Survival analysis
The Kaplan-Meier analysis for overall survival (OS) and progressionfree survival (PFS) was proceeded. PFS was calculated based on the "First Event" in clinical data Table S1. The survival analysis was applied with the aid of R software, and the log-rank was utilized to test.
We identified survival-related immune cells according to the results of the Kaplan-Meier survival analysis. The analysis also applied on 13 hub genes.

| Immunohistochemistry
After deparaffinating, rehydration and heat-induced antigen retrieval with citrate solution, paraffin sections were incubated with the corresponding antibodies overnight at 4°C. The staining scores were the sum of the staining area (0: no staining or staining in <10%, 1: staining in 10%-40%, 2: staining in 40%-70%, 3: staining in ≥70%) and the staining intensity (0: no staining, 1: yellow, 2: brown, 3: maroon). The immunohistochemistry (DAB) was assessed by two independent pathologists without any previous information of the clinical characteristics and outcomes. The antibodies are listed in Table S4. Student's t test was used to analyse the data for two groups.

| Statistical analysis
All data were analysed using GraphPad Prism 8.0.2 (GraphPad software, Inc, La Jolla, CA, USA). P-value < 0.05 was considered statistically significant.

| Identification of survival-related immune cells
The infiltration ratio of 22 immune cells calculated by CIBERSOT algorithm and their correlations is shown in Figure 2A

| Relationship between ESTIMATE immune score and OS or PFS
All of 98 samples were assessed by ESTIMATE algorithm in R software. According to the immune score, we separated them into 2 groups (50 in high group, 48 in low group) in order to do survival analysis next. Both OS and PFS Kaplan-Meier curves reveal that the high immune score group is significantly associated with better survival rate (OS P-value = 0.008; PFS P-value = 0.012 Shown in Figure 2I, J).   Figure 4E). 232 genes were selected as candidate genes. As shown in the heatmap ( Figure 4F), the expression profiles of these genes were relatively consistent among these 101 samples.

| GO enrichment analysis and KEGG pathway analysis of DEGs
GO and KEGG analyses of all 2810 DEGs and 232 candidate genes were performed via R software "clusterProfiler" package. Go analysis of 2810 DEGs showed that "leukocyte chemotaxis", "T cell activation", "leukocyte migration" were most frequently enriched among biological processes, "extracellular matrix", "side of membrane", "external side of plasma membrane" among cellular components, and "receptor regulator activity", "receptor ligand activity", "cytokine activity" among molecular functions. For the KEGG analysis, "Cytokinecytokine receptor interaction", "Staphylococcus aureus infection" and "Viral protein interaction with cytokine and cytokine receptor" were most often enriched. As expected, the results of GO and KEGG analysis of 232 candidate genes were very similar to the results of 2810 DEGs. 45% results were same ( Figure 5A-D).

| Identification of hub genes
The PPI network of 232 candidate genes mapped by STRING was shown in Figure 5E. There were 162 nodes and 590 edges in this network. Then, the Cytoscape MCODE plug-in was used to reconstruct the PPI network. 7 modules were analysed, and the most significant module was screened ( Figure 5F). 13 genes in the module were identified as Hub genes for the next analyses, which were CXCR3, SSTR3, SAA1, CCL4, PYY, CCR9, CXCL9, CXCL11, C3, CXCL2, S1PR4, CXCL10 and CXCR6 (Table S2).

F I G U R E 7
The validation of correlation analysis of prognostic signature and immune cells infiltration. (A) The correlation analysis. The x-axis represents immune cell types, and y-axis represents genes. Red dots represent positive correlation, and blue dots represent negative correlation. The dot size represents P-value. The bigger the dot is, the smaller the P-value is. And shade of colour represents Pearson correlation index r. (B) The definite correlationships between genes in the prognostic signature and infiltration of CD8 + T cells, CD4 + naïve T cells, follicular helper T cells, M0 and M1 macrophages. The correlation was performed by using Pearson correlation analysis. Each dot represents a sample, and the blue line represents the relationship between the expression level of each gene and immune cell infiltration level

| Exploration of immune-related prognostic gene signature model and validation
To identify key components of the hub genes, univariate and multivariate Cox regression analyses were further applied. All the 13 genes were included in both univariate and multivariate Cox regression analyses. The univariate result was shown in Table S3

| Validation of the correlation of the prognostic signature model and immune cells infiltration
Next, the method of Pearson correlation analysis was applied to validate the relationship between the prognostic signature and the infiltrating immune cells. The results between the 7 genes and 22 infiltrating immune cells were shown in Figure 7A. As shown in Figure  Besides, TIMER was also applied to validate the infiltration of the immune cells on TCGA-SARC data set ( Figure S1). And the same methods were used in validation cohorts ( Figures S2F-G, S3F-G, S4F-G).

| CXCL9, CXCL11/CXCR3 axis validation
The IHC assays were performed on our own samples to validate the expression level of CXCL9, CXCL11/CXCR3 axis ( Figure S5).
We found that IHC staining exhibits that all the protein expression of CXCL9, CXCL11/CXCR3 axis was significantly elevated in osteosarcoma tissues compared with adjacent normal tissues (bone, marrow, soft tissue and cartilage). Meanwhile, we compared the protein expression of CXCL9, CXCL11/CXCR3 axis in paired lung metastasis samples ( Figure S6). This results showed that no significant differences between metastasis and in situ lesions.

| D ISCUSS I ON
Osteosarcoma is the most common primary malignant bone tumour with poor clinical outcome. Though the immunotherapy for cancers and sarcomas is rapidly developed, a comprehensive study on osteosarcoma immunogenome characteristics has been rarely conducted.
Plenty of clinical studies illustrate that cancer immunotherapy is potential to play a key role in future clinical cancer management. 19 This study aims to identify and explore cells and genes closely related to immune infiltration and clinical outcomes. We also provide a comprehensive, integrated analysis approach to acquire an individualized immune prognostic signature proposing to reflect immune cell infiltration and clinical survival outcomes of osteosarcoma patients.
CIBERSORT, ESTIMATE and TIMER are widely approved algorithms to assess immune infiltration and scores based on RNA-seq profiles. In this study, these algorithms are synthetically applied to process osteosarcoma sequencing data acquired from TARGET data set. Our recombined approach could help with more accurate and highly correlated gene signature predicting clinical outcome.
In this study, the immune cell infiltration landscape of osteosar- In spite of relatively little proportion of CD8 + T cell infiltrated, it obviously impacts the survival outcome. As it has confirmed in other solid tumours, higher infiltration of CD8 + T cell always predicts higher survival rate. 22,23 In recent years, some investigations proposed that immune checkpoint blockage such as CTLA-4 and PD-L1 inhibits progression of osteosarcoma by stimulating CD8 + T cell. [24][25][26] Cytotoxic lymphocyte (CTL) activation of CD8 + T cells could lead to less lung metastasis and suppression of osteosarcoma progression. Another survival-related immune cell is naïve CD4 + T cell, also known as CD45RA + T cell. It is correlated with poor outcome in our study. Naive T cells were previously thought to traffic exclusively to lymphoid organs. But it is recently documented that they can be detected in non-lymphoid organs, even in inflamed tissues and in tumours. 27 This means that in our study the survival-related immune cells impact the prognosis but not primarily by influencing metastasis and relapse. Metastasis and relapse are so complicated pathological processes in osteosarcoma that many experiments have illustrated the immune microenvironment takes part in. 33,34 We support that it will be more credible if the study samples were paired with metastasis or relapse lesions. In line with previous reviews, 17,35,36 poor necrosis stage (<90% necrosis) was potentially responsible for the poor overall survival of patients with osteosarcoma. As we know, poor necrosis stage always represents a poor histological response to chemotherapy. The Huvos grade (1-2, <90% necrosis; 3-4, >90% necrosis) was always included in prognostic factors for osteosarcoma. 37 The GO and KEGG enrichment analysis were conducted to describe the relationship among the DEGs. As we expect, the results of all the DEGs and candidate genes calculated by VENN were highly similar. It is supposed that the 232 candidate genes were dominant in the gene functional enrichment analysis. Of the KEGG results, cytokine-cytokine receptor interactions and chemokine signalling pathways were mainly involved. Interestingly, the similar result was reported in an immunogenomic analysis of papillary thyroid cancer. 38 Our results also suggest that NF-kappa B signalling pathway and IL-17/TH17 signalling pathway involved in regulating survival-related immune cell infiltration in osteosarcoma. The general consensus is that NF-kappa B signalling activation maintains tumour cell survival to promote osteosarcoma progression, invasion and metastasis. 39, 40 Trieb Klemens et al illustrated that the expression of receptor activator of nuclear factor kappa B (RANK) was immunohistochemically evaluated in biopsies of high-grade osteosarcoma, and was found to be correlated with histological response to chemotherapy and overall survival. 41 NF-kappa B provides a mechanistic link between inflammation and cancer, and is a major factor controlling the ability of both pre-neoplastic and malignant cells to resist apoptosis-based tumour surveillance mechanisms. 42  13 hub genes were identified via PPI co-expression analysis and MCODE plug-in in Cytoscape software. And we finally present a prognostic model via univariate and multivariate Cox regression analyses. 7 genes were included in the immune-related genes signature, 2 of which were related to survival independently, namely CXCR3 and CXCL11. CXCR3 is known to have a key role in T-cell trafficking to inflammatory sites and to tumours. 46,47 In our study, we found CXCR3 acts like protective factor in osteosarcoma.
Lower expression of CXCR3 was associated with poor prognosis.
We also found that CXCR3 was positively correlated with the infiltration of CD8 + T cell, follicular helper T cell and macrophage M1.  50 52 On the other hand, CXCR3-endogenous ligands have been described to activate distinct signalling pathways, leading to specific cellular responses, such as CXCL9, CXCL11/CXCR3 axis. Interestingly, CXCL9 and CXCL11 were both included in our prognostic gene signature. This axis works primarily for immune cell migration, differentiation and activation. Immune cells show anti-tumour effect against cancer cells through paracrine CXCL9, CXCL11/CXCR3 axis in tumour models. However, the autocrine signalling in cancer cells increases proliferation, angiogenesis and metastasis. 53 Thus, it is promising to develop therapeutic targets by activating the paracrine axis, and inhibiting the autocrine axis. Unfortunately, there has been little research in osteosarcoma. In our validation, the relationships between CXCL9, CXCL11, CXCR3 and infiltration of immune cells were much similar. SAA1 coding for the major SAA isoform is expressed in trabecular and cortical bone fractions and bone marrow.
Recently, SAA1 has been deemed to contribute to bone damage.
Kovacevic Alenka et al reported high SAA1 levels in plasma of osteosarcoma patients. 54 Beyond that, we failed to find more studies on osteosarcoma to investigate the other 3 genes in the signature. Even so, they still performed moderately in the validations of correlation with immune cell infiltration in both osteosarcoma and sarcoma data sets. In summary, this prognostic signature, based on differentially expressed genes in osteosarcoma, clearly reflected the infiltration of survival-related immune cells and properly predicted clinical outcome.

| CON CLUS ION
In conclusion, we combined some algorithms to assess immune infiltration of osteosarcoma. In our analysis, macrophage (M0, M1 and M2) is the majority infiltration in osteosarcoma while CD8 + T cell, CD4 + naïve T cell and follicular helper T cell are related to survival outcome. We then explored a seven immunerelated gene prognostic signature, which was relatively credible to predict clinical outcome according to assessment of immune infiltration. Of importance, the potential relationship between CXCL9, CXCL11/CXCR3 and efficiency of immunotherapy should be further investigated. Our findings have provided novel insights that could demonstrate new targets of immunotherapy in osteosarcoma.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data generated during this study are included in this published article. Publicly available data sets were analysed in this study, and these can be found in the Therapeutically Applicable Research To Generate Effective Treatments (https://ocg.cancer.gov/progr ams/ target).