Systematic screening identifies a 2‐gene signature as a high‐potential prognostic marker of undifferentiated pleomorphic sarcoma/myxofibrosarcoma

Abstract The Cancer Genome Atlas (TCGA) Research Network confirmed that undifferentiated pleomorphic sarcoma (UPS) and myxofibrosarcoma (MFS) share a high level of genomic similarities and fall into a single spectrum of tumour. However, no molecular prognostic biomarkers have been identified in UPS/MFS. In this study, by extracting data from TCGA‐Sarcoma (SARC), we explored relapse‐related genes, their prognostic value and possible mechanisms of the dysregulations. After systematic screening, ITGA10 and PPP2R2B were included to construct a 2‐gene signature. The 2‐gene signature had an AUC value of 0.83 and had an independent prognostic value in relapse‐free survival (RFS) (HR: 2.966, 95%CI: 1.995‐4.410 P < .001), and disease‐specific survival (DSS) (HR: 2.283, 95%CI: 1.358‐3.835, P = .002), as a continuous variable. Gene‐level copy number alterations (CNAs) were irrelevant to their dysregulation. Two CpG sites (cg15585341 and cg04126335) around the promoter of ITGA10 showed strong negative correlations with ITGA10 expression (Pearson's r < −0.6). Transcript preference was observed in PPP2R2B expression. The methylation of some CpG sites in two gene body regions showed at least moderate positive correlations (Pearson's r > .4) with PPP2R2B expression. Besides, the 2‐gene signature showed a moderate negative correlation with CD4 + T cell infiltration. High‐level CD4 + T cell infiltration and neutrophil infiltration were associated with significantly better RFS. Based on these findings, we infer that the 2‐gene signature might be a potential prognostic marker in patients with UPS/MFS. Considering the potential benefits of immunotherapy for UPS/MFS patients, it is imperative to explore the predictive value of this signature in immunotherapeutic responses in the future.


| INTRODUC TI ON
Soft tissue sarcomas (STS) are a heterogeneous group of rare tumours with the mesenchymal origin. 1 Undifferentiated pleomorphic sarcoma (UPS) and myxofibrosarcoma (MFS) are two highly similar and the most common histological subtypes of STS. 2 Both affect the elderly population and their exact origin is still controversial and have not been proved. 2 Historically, MFS was originally considered as a subset of UPS, but was reclassified as a distinct entity in 2002 WHO classification due to its clinicopathological characteristics. One recent study by The Cancer Genome Atlas (TCGA) explored their genetic/epigenetic profiles and indicated that they are not distinct tumours, but rather belong to a single spectrum of tumour, due to high similarities in somatic copy number alterations (SCNAs), methylation, miRNA expression and protein expression. 3 Therefore, common systematic therapeutic strategies might be appropriate.
Currently, surgical resection with radiotherapy is still the most effective strategy for patients with non-metastatic tumours. 2,4 However, most UPS is usually associated with deep-seated lesions that grow aggressively, while MFS is usually characterized by infiltrative growth. 4,5 Although the five-year overall-survival (OS) rate was around 70% in UPS/MFS, 6,7 they have a high risk of local recurrence and subsequent poor prognosis. 4,5 Currently, prognosis prediction largely relies on the clinicopathological features, such as age, tumour size, margin status and the presence of infiltrative growth. [8][9][10] However, no molecular prognostic biomarkers have been identified in UPS/MFS and thus it is quite necessary to analyse their molecular features and to explore potential reliable prognostic biomarkers.
In this study, by extracting the genomic and survival data of UPS/MFS from TCGA-Sarcoma (SARC), we explored relapse-related genes, their prognostic value and possible mechanisms of the dysregulations.

| Retrospective analysis using data from TCGA
Genomic data in TCGA-SARC, including RNA-seq (IlluminaHiSeq) data of gene expression, gene-level copy number alterations (CNAs), somatic mutations and gene-level DNA methylation data, were obtained using UCSC Xena browser (https ://xenab rowser.net/). 11 The part of data from patients with UPS/MFS (N = 61) was further extracted. The updated clinicopathological and survival data were obtained from one previous publication of TCGA Research Network. 3 RNA-seq data were normalized and represented by log 2 (norm count + 1), in which the norm_count refers to RSEM value. Genelevel CNAs were calculated by the method of Genomic Identification of Significant Targets in Cancer 2.0 (GISTIC2), 12 in which the alterations were defined as homozygous deletion (−2), heterozygous loss (−1), copy neutral (0), low-level copy gain (+1) and high-level amplification (+2). Somatic mutations detected include single-nucleotide polymorphisms (SNPs) and small insertions and deletions

| Comparison of gene transcripts in sarcoma and normal human tissues
The alternative transcripts of the target genes in sarcoma tissues and normal issues with known positive gene expression were examined using the transcript analysis in UCSC Xena browser (https ://xenab rowser.net/)11. Transcript data in normal tissues were obtained from The Genotype-Tissue Expression (GTEx) project, which contains tissue-specific transcriptional data based on a large number of samples. 14,15 Log 2 Transcript per Million (TPM) was calculated and compared.

| Construction of potential prognostic gene signatures
To contracture prognostic gene signature, only the genes with independent prognostic value were included for model construction.
The weight coefficients of the genes were derived from the regression coefficients in multivariate survival analysis, by setting RFS as the dependent variable. Therefore, the risk score is calculated by the following formula: where B refers to the regression coefficient in multivariate survival analysis.

| Estimation of immune cell infiltration
The tumour immune cell infiltration in UPS/MFS was estimated using data provided by the Tumor Immune Estimation Resource (TIMER; cistrome.shinyapps.io/timer), which is a web-based platform that provides the abundance of six tumour-infiltrating immune subsets (B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages and dendritic cells) based on data from TCGA. 16

| Analysis of prognosis-related genes across different cancer types
The potential prognostic significance of the signature genes in other cancers was explored with the use of GEPIA2 (http://gepia2.cancerpku.cn/#index ), which is a web server for large-scale expression profiling and interactive analysis. 17 Prognostic value in terms of RFS and OS was assessed by setting median gene expression as the cutoff and the log-rank test. The cancer type with P-value < .05 was highlighted. Kaplan-Meier (K-M) curves of RFS/DSS were generated for prognostic comparisons, with log-rank test to detect the difference.

| Statistical analysis
Univariate and multivariate Cox regression (method, forward: LR) models were used to evaluate the independent prognostic value of the 2-gene signature as a continuous variable, regarding RFS and DSS, respectively. Pearson's correlation coefficients were calculated to estimate correlation. P < .05 was considered statistically significant.

| Systematic screening to identify relapserelated genes in UPS/MFS
RNA-seq data of 20 500 genes in TCGA-SARC were downloaded from the UCSC Xena Browser. To obtain the list of dysregulated genes between cases with relapse (N = 28) and without relapse (N = 33), the following criteria were applied: absolute log2 fold change ≥ 2, Welch's t test P-value < .05 ( Figure 1A). After this screening process, 20 dysregulated genes were identified ( Figure 1B). Then, we preliminarily analysed the prognostic value of these genes in terms of relapse by generating ROC curves. The AUC values of these genes were given in Table S1. Only the genes with AUC value ≥ 0.70 or ≤ 0.30 were considered as the candidate biomarkers (N = 7) ( Figure 1C). The K-M curves of RFS were generated by setting the median expression of the 7 genes individually. The log-rank test showed that the group with high CLEC3B (3p21.31) or PPP2R2B (5q32) expression had significantly longer RFS ( Figure 1D

| Construction of prognostic gene signature for UPS/MFS
Then, the independent prognostic value of these genes was as-  (Table 1). Therefore, only these two genes were included to construct a risk score model (2-gene signature), with the following formula: risk score = 0.254*ITGA10+-0.317*PPP2R2B. ROC curve analysis showed that the 2-gene signature had an AUC value of 0.83 when predicting relapse, which was higher than that of ITGA10 or PPP2R2B alone (Figure 2A).
Patients were separated into two groups by setting the median value of the risk score as the cut-off. K-M analysis indicated a significant difference in RFS between high-score and low-score group (log-rank test P = .0016, Figure 2B). Subgroup analysis con-  Table 2). We then checked the potential prognostic value of the 2-gene signature in terms of DSS. ROC curve showed that the 2-gene signature had an AUC of 0.71 ( Figure 2C). K-M survival curve failed to identify a significant difference between the two groups by median separation ( Figure 2D), but confirmed the difference between the first tertile and the third tertile ( Figure 2E).
By performing univariate and multivariate analysis, we found that the 2-gene signature showed an independent prognostic value in DSS as a continuous variable (HR: 2.283, 95%CI: 1.358-3.835, P = .002) ( Table 3).

| Analysis of potential mechanisms leading to dysregulated ITGA10 and PPP2R2B
Gene-level CNA, somatic mutation and methylation data were extracted for dysregulation-related analysis. We firstly checked ITGA10 CNA status and corresponding gene expression. Among the 61 UPS/ MFS cases, there were 36 amplification, 20 copy neutral and five deletion cases ( Figure 3A). However, no significant difference in ITGA10 expression was observed among the three groups ( Figure 3B). There PPP2R2B is a large gene that is around 500kb and contains at least 20 exons. Transcript expression analysis suggests that this gene encodes quite complex alternative transcripts in sarcoma tissues ( Figure S3).
Previous studies showed that the complex alternative transcription is at least partly driven by the promoter region flanking exon 7 18,19 (now exon 10 in UCSC Xena browser). This exon has multiple internal splicing sites, thereby generating several splice variants. Only the transcripts start from exon 10 to exon 21 were expressed in sarcoma. The two dominant transcripts (ENST00000394411.8 and ENST00000394409.7, Figure S3, red dotted frame) only contain 9 coding exons. PPP2R2B has well-characterized expression in brain tissues. 19,20 Therefore, we compared its expression profile in sarcoma with normal brain tissues. In normal brain tissues, not only the transcripts start from exon 10, but also the transcripts start from exon 2/3 were expressed ( Figure S3). These findings suggest that PPP2R2B has tissue-specific transcriptional pattern.
By checking the methylation status of the CpG sites in PPP2R2B locus, we found the methylation of some CpG sites in two gene body

| The 2-gene signature was associated with immune cell infiltration in UPS/MFS
The correlations between the 2-gene signature and the infiltration of  (Table S2). Then, we conducted a ROC analysis to compare the prognostic value of immune cell infiltrations and the 2-gene signature.
Results showed that the 2-gene signature had the highest AUC value, suggesting a better prognostic value ( Figure S4). integrin-α10 in human genome. Integrin-α10 has tumour-specific physical association with TRIO and RICTOR, the expression of which promotes sarcoma cell survival via activating the RAC/PAK and AKT/ rapamycin (mTOR) complex 1 (mTORC1) signalling 21 ( Figure 6A).

| D ISCUSS I ON
Integrin-α10 is an integral transmembrane glycoprotein that participates in cell adhesion as well as cell surface-mediated signalling. Its dysregulation and oncogenic effects were observed in lung cancer, 23 prostate cancer 24 and MFS. 21 PP2A is a serine/threonine (Ser/Thr) protein phosphatase, which has a heterotrimeric structure formed    33,36,37 In addition, it might contribute to the formation of borders at enhancers or promoters, thereby enhancing the transcription of specific transcripts. [38][39][40] Among STS subtypes, UPS/MFS has the highest median macrophage infiltration and the infiltration of immature dendritic cells was positively correlated with DSS. 3 These findings suggest that UPS/ MFS may have immunologic mutated protein targets and thus respond to immune checkpoint therapy. 3,41 In the SARC028 trial of pembrolizumab (a PD-1 inhibitor), four out of 10 UPS patients had responses to the drug. 42 These findings suggest that patients with UPS/MFS might benefit from immunotherapy. In this study, we found that the 2-gene signature had a moderate negative correlation with CD4 + T cell infiltration. We also confirmed that a high level of CD4 + T cell infiltration was associated with significantly better RFS.
Therefore, we infer that the 2-gene signature might also predict the infiltration of CD4 + T cells. However, the predictive value of this signature in immunotherapy responses should be studied in the future.
A series of previous studies reported that activation of the AKT/ mTOR signalling is associated with pathology of UPS/MFS, such as higher histologic grade and progression. 21  This study also has some limitations. Firstly, no validation cohort was used to verify the prognostic value of the 2-gene signature.
Besides, the exact mechanisms involved in ITGA10/PPP2R2B dysregulation were not explored by molecular studies. Future studies are required to figure out these issues.

| CON CLUS IONS
The systematic screening in this study found a 2-gene signature based on ITGA10 and PPP2R2B expression had a potential good prognostic value in RFS and DSS in patients with UPS/MFS. DNA methylation might be a critical mechanism leading to their dysregulation, but it might have opposite regulations on the two genes.
Furthermore, this signature has a potential predictive value in estimating CD4 + T cell infiltration. Both ITGA10 and PPP2R2B are involved in the mTOR signalling pathway, but showing opposite regulatory effects. Considering the potential benefits of immunotherapy for UPS/MFS patients, it is imperative to explore the predictive value of this signature in immunotherapeutic responses in the future.

ACK N OWLED G EM ENT
This study was supported by grants from the National Natural

CO N FLI C T O F I NTE R E S T
The authors confirm that there are no conflicts of interest.