Bioinformatics analysis of the prognosis and biological significance of VCAN in gastric cancer

Abstract Gastric cancer (GC) is one of the most common cancers in the world, and the tumor microenvironment (TME) plays a vital role in the occurrence and development of GC. In this study, we obtained differential expressed genes in GC tissues from The Cancer Genome Atlas. After ESTIMATE and weighted correlation network analysis, we obtained differentially expressed genes (DEGs). With further screening DEGs of immune infiltration and then through Kaplan‐Meier survival analysis, least absolute shrinkage and selection operator regression analysis and COX analysis, we found that VCAN was a gene positively correlated with high immune infiltration and poor prognosis of patients in GC. In addition, we selected a Gene Expression Omnibus dataset (GSE84433) to verify the effect of VCAN on the patient's prognosis, and analyzed the value of VCAN in immunotherapy through TIMER database and TISIDB. In conclusion, we hold the view that VCAN may affect the development of GC by regulating the TME, which may act as a potential therapeutic target for GC.

A counting number of studies have shown that the tumor microenvironment (TME) has an important impact on tumor development. 8 The microenvironment of tumor tissue includes tumor cells, tumor-related normal epithelial cells, stromal cells, immune cells, and vascular cells. 9 Infiltrating stromal cells and immune cells are the main components of normal cells in microenvironment of tumor tissue, which not only interfere with tumor signals in molecular research, but also play an important role in tumor biology. 9 Besides, tumor-infiltrating immune cells in the TME are regarded as effective biomarkers of tumor therapy. 10 In GC, immune cell infiltration is identified as an independent prognostic factor and helps to predict the overall survival rate of GC patients after chemotherapy. 11 For example, increased CD8 infiltration is related to progression-free survival and impaired overall survival. GC patients with more CD8 T cell also have higher PD-L1 expression. 12 ESTI-MATE is a tool to predict tumor purity, stromal cells and immune cells infiltrated in tumor tissues through the gene expression data, and it generates stromal score, immune score, and ESTIMATEScore based on a singlesample gene set enrichment analysis (ssGSEA). 9 Therefore, we supposed to screen the genes relate to TME by ESTIMATE.
In this study, we obtained differentially expressed genes (DEGs) in GC tissue of The Cancer Genome Atlas (TCGA). For further screening the differential genes related to the tumor microenvironment in GC, we used weighted correlation network analysis (WGCNA) to screen out 43 differential genes according to ESTIMATE score. It is also because immune infiltration can predict the effect of treatment in the tumor microenvironment. 10 We tested the relationship between 43 gene expression and immune cell infiltration. It was found that the expression levels of 22 genes had statistically significant difference between the high and low immune infiltration groups. Subsequently, through Kaplan-Meier survival analysis, least absolute shrinkage and selection operator (LASSO) regression analysis and COX analysis, we found that VCAN may be not only an immune-differential gene, but also an independent risk factor for poor prognosis in GC patients. With analyzing the data from Gene Expression Omnibus (GEO) dataset (GSE84433), TIMER database and TISIDB, we confirmed that VCAN is associated with immune infiltration and prognosis of GC patients. Therefore, we believed that VCAN had the potential to role as a biomarker for predicting the prognosis of GC, and VCAN may affect the TME of GC by regulating the immune infiltration of GC cells.

| Public database collection
We obtained the high-throughput sequencing data of GC from TCGA-STAD. 13 As |logFC| > 1.5, p < .01 were defined as standard of DEGs, we acquired DEGs through limma R package. 14 Besides, GEO data set (GSE84433) were selected to verified the clinical value of VCAN. Furthermore, the immune public database, TIMER database (https://cistrome.shinyapps.io/timer) and TISIDB (http://cis.hku.hk/TISID), were used for further investigating the sense of VCAN in the immune infiltration of GC and the corresponding immunotherapy. KM-plotter database (http://www.kmplot.com) was an online database to plot Kaplan-Meier survival curve from the clinical data of TCGA and GEO.

| Single sample gene set enrichment analyses
ssGSEA was an extension of gene set enrichment analyses (GSEA) method, which was mainly designed for the single sample that could not perform GSEA. 15 For further investigating the immune infiltration in GC, we performed ssGSEA analysis to the expression data from TCGA-STAD and GEO according to 29 immune genesets 16 through GSVA R package. Then, according to the result of ssGSEA analysis, we divided samples into high/low immune infiltration groups by clustering.

| CIBERSORT
CIBERSORT 17 was a method to calculate the proportion of 22 immune cells in each sample based on the gene expression data.

| Weighted correlation network analysis
WGCNA was used for the construction of co-expressed network, which divided genes with similar functions into the same module. 19 As 1997 differentially expressed genes were obtained from TCGA-STAD, WGCNA analysis were used to acquire module genes by analyzing the score from ESTIMATE.

| Kyoto Encyclopedia of Genes and Genomes and Gene Ontology enrichment analysis
Clusterprofiler R package 20 was applied for Gene Ontology (GO) enrichment analysis, which included molecular function and biological process as well as cellular component of DEGs, 21 and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. p < .05 was defined as significant.

| Protein-protein interaction
For further analyzing the relationship among proteins, STRING v11 database 22 was applied to construct protein-protein interaction (PPI) network.
2.8 | COX univariate analysis and least absolute shrinkage and selection operator regression analysis COX univariate analysis was used to determine the risk factors for GC patient prognosis. Besides, LASSO regression analysis, an estimation for processing data with complex collinearity, was performed to screen the prognosis-related genes. And COX univariate analysis and LASSO regression analysis was performed by R Language.

| Survival analysis
Overall survival in GC patients were investigated by Kaplan-Meier analysis, which was conduct by survival R package.
2.10 | Wound healing assay GC cell AGS was plated in a six-well plate with same number. When the cell confluence in the culture dish reached 100%, draw straight lines in the petri dish, then placed it in an incubator at 37°C for 24 h with 5% carbon dioxide in humid air and photograph in the third day at the marked position.

| Transwell
GC cell AGS was digested with microprotease, 2 × 10 5 cells were implanted in each transwell test tube, and 200ul rppi-1640 containing 10% fetal bovine serum was added. The test tube was then placed in a polycarbonate gel 24-well plate (BD Biosciences), and rotated at 37°C after 24 h, and the invasion was photographed and observed.

| Western blot analysis
The tissues and cells were decomposed with radioimmunoprecipitation assay buffer (Beyotime), separated by 10% sodium dodecyl sulfate polyacrylamide gel electrophoresis, and then transferred to a polyvinyl fluoride membrane (EMD Millipore). Transition to primary antibody and primary antibody reducing solution (Thermo Fisher Scientific). After washing with Tris-buffered saline with Tween 20 (TBST), the membrane and primary antibody were separated at room temperature for 1 h, and then washed with TBST three times.

| Statistical analyses
The significance Statistical difference was determined by t test, which was performed by R. As p < .05, the difference was considered as statistically significant.

| ESTIMATEScore in TCGA-STAD and WGCNA analysis of DEGs
In this study, we acquired the expression data of STAD from TCGA database. With the filtering criteria (|logFC| > 1.5, p < .01), we obtained 1997 DEGs including 875 upregulated genes and 1127 downregulated genes ( Figure 1A,B). To investigate the relationship between GC and tumor microenvironment, we performed WGCNA analysis to 1997 DEGs ( Figure 1C) and separately acquired stromal score, immune score, ESTIMA-TEScore ( Figure 1D). It was obviously that modules in color turquoise, brown and yellow were significantly correlated with ESTIMATEScore. As ESTIMATEScore was the sum of ImmuneScore and StromalScore (26), we found that hub genes in these 3 modules had significant correlation with immune cells and Stromal cells infiltration in STAD. After screening (Gene significance for ESTIMATEScore > 0.5, module membership >0.8), 43 hub genes were revealed in 3 modules ( Figure 1E-G). Then, after correlation analysis and cluster were employed to investigate 43 hub genes, we found that genes in the module with same color had high correlation  Figure S1A), which demonstrated that results of our WGCNA analysis were significant. In addition, we performed enrichment analysis to these genes via KEGG and GO, which discovered that these genes had function in extracellular and Osteoclast differentiation (Figure S1B-F). Furthermore, STRING database was used to establish PPI network of these hub genes ( Figure S2).

| VCAN is associated with immune infiltration and prognosis of GC patients
As immune infiltration can predict the effect of treatment in the tumor microenvironment, 10 we further verified the relationship between ESTIMATEscore and immune infiltration. According to several subsets of immune genes, we performed ssGSEA and divided 375 tumor samples in STAD into 2 groups (high immune infiltration vs. low immune infiltration) by cluster analysis. Furthermore, we verified our different immune infiltration groups via the ESTIMATEscore. As shown in Figure 2A-C, high immune infiltration group had significant higher ESTIMATEscore when comparing with low immune infiltration group, which indicated that ESTIMATEscore could significantly evaluate immune infiltration. To further screen genes regulated immune infiltration, we compared the expression of 43 hub genes from 3 modules between high/low immune infiltration groups, and 22 hub genes had significant different expression ( Figure 2D-F). Besides, we performed Kaplan-Meier survival analysis to the 22 hub genes via KMplotter, whose result showed that 10 hub genes were significant in survival analysis according to the survival data of TCGA-STAD ( Figure S3A-J). According to the overall survival outcome of 10 genes, LASSO regression analysis was used to filter genes and 2 genes (VCAN, CHRDL1) were remained ( Figure 2G,H). Furthermore, COX analysis was performed for further verification, which demonstrated that p value of VCAN was significant ( Figure 2I). Therefore, we prefer VCAN and consider that VCAN is associated with not only the immune infiltration but also prognosis of GC patients.

| VCAN act as a tumor promoter in gastric cancer
To investigated the clinical value of VCAN, we obtained genetic data of GC expression in TCGA-STAD and found that VCAN was highly expressed in GC tissues relative to adjacent tissues ( Figure 3A). As high VCAN could lead to poor overall survival prognosis in STAD, we further analyzed it in the GSE84433 and found that VCAN act as a tumor promoter in GC ( Figure 3B). For further research in clinical, we selected other prognosis features from GSE84433 and found that T stages was significantly correlated with VCAN expression (Table 1, Figure 3C,D). For further convince, we chose data from TCGA-STAD to investigate the impact of VCAN to other clinical features (Table 2), which demonstrated that T stages and TNM stages ( Figure 3E,F) were significantly correlated with VCAN expression. As COX analysis performed, Stage IV and VCAN both had significant p value ( Figure 3G), which indicated that both Stage IV and VCAN expression were significant tumor promoter in GC.

| Relationship between VCAN and immune infiltration
To verify the correlation between VCAN and immune infiltration, we selected another GEO dataset (GSE84433) for further investigation. Besides, after we performed ssGSEA analysis and evaluated ESTIMATEScore, the relationship between immune infiltration and ESTIMA-TEscore was verified ( Figure S4A-C). It was obvious that ESTIMATEScore had significant different expression between high/low immune infiltration groups. Therefore, we further estimated expression of VCAN in different groups of GSE84433 and revealed that VCAN had high expression in high immune infiltration group, which indicated that VCAN expression was positively correlated with immune infiltration level ( Figure S4C. Then we transformed gene expression data of TCGA-STAD into the matrix of immune cells in GC samples via the CIBERSORT, which offered us the proportion of immune cells in each sample ( Figure 4A). Furthermore, as samples in TCGA-STAD were divided into high or low immune infiltration groups, we estimated the different expression of each immune cell and found that 7 immune cells (T cells CD8, T cells CD4 memory activated, macrophages M0, macrophages M1, Mast cells resting, Mast cells activated, Neutrophils) had significant different expression between two groups ( Figure 4B). As VCAN had high expression in high immune infiltration group, we verified the relationship between VCAN expression and immune infiltration level in TIMER database and found that only three immune cells (T cell CD8+, macrophage M1, mast cell activated) had corresponding trend ( Figure 4C-E). Besides, we also focused on the mutation of VCAN through TIMER database,  Figure 4F). Afterwards, we expanded our research by using TISIDB to investigate the correlation between VCAN expression and immune cells, which demonstrated that macrophage M1 (R = 0.548, p < 2.2 × 10 −16 ) and mast (R = 0.548, p < 2.2 × 10 −16 ) were significantly correlated with VCAN expression (Figure 4G-H). According to the different analysis tools (TIMER, TISIDB), we held the view that VCAN was

| The value of VCAN in immune therapy
As VCAN was a significant immune infiltration biomarker, we were curious of the function of VCAN in immune therapy. Therefore, we performed further data mining via TISIDB. As GC was divided into five immune subtypes in TISIDB, we found that VCAN had highest expression in TGF-B subtype, which indicated that VCAN may role as an immune factor in this GC subtype ( Figure 5A). Then, for investigating its function in immune molecular therapy, we investigated the correlation between VCAN and each immune regulator, which included immunostimulator, immunoinhibitor and MHC molecular ( Figure S5-7). After screening (|R| > 0.5), PDCD1LG2 as well as ENTPD1 were regarded as molecular with strong correlation ( Figure 5B,C). Besides, to verify the results from TISIDB, we also investigated their correlation from GEPIA and TIMER for confirm, which both demonstrated that VCAN had strong significant correlation with these two genes (PDCD1LG2, ENTPD1) ( Figure 5D-G). Furthermore, we also revealed that VCAN role as a target site of hyaluronic acid from TISIDB, whose data was originated from Drugbank database. As hyaluronic acid was reported to participate in immune therapy in several articles, a new direction of investigating the impact from VCAN to immune infiltration in GC was provided.

| Effect of VCAN on function of GC cells and CD44
To verify the reliability of the data analysis, we knocked down VCAN in AGS of GC cells, and found that the migration ability of AGS of GC cells was weakened ( Figure 6A). In addition, through Transwell invasion experiment, we found that the invasion ability of AGS of GC cells was decreased after VCAN was knocked down ( Figure 6B), indicating that VCAN can promote the invasion and migration of GC cell. In addition, we found by Western blot that the expression of immune-related protein CD44 was decreased after VCAN was knocked down ( Figure 6C), so we believed that VCAN was immune-related in GC. In this study, we tried to find genes affecting GC TME, especially immune infiltration, from TCGA, which could also act as biomarkers for the diagnosis, treatment and regulating prognosis of GC. Therefore, we found that VCAN was not only related with tumor immune infiltration, but also an independent factor for the prognosis of GC patients. The TME referred to the internal and external environment where tumors occurred, grew, metastasized and the location of tumor cells. It had influence to the development of cancer through not only tumor cells, but also nontumor cells and noncellular components. 23 In tumor tissues, nontumor cells mainly infiltrated stromal cells and immune cells. 9 Therefore, by evaluating the status of stromal cells as well as immune cells in tumor tissues, ESTIMATE could indirectly transform the TME. ESTIMATE used gene expression data to generate stromal score, immune score and ESTIMATEScore based on ssGSEA analysis. 9 Besides, ESTIMATEScore represented the comprehensive ratio of ImmuneScore and Stromal-Score in the TME. Therefore, to find genes related to TME, we acquired genes of different functional modules through WGCNA, and screened 43 genes related to ES-TIMATEScore. As infiltrating immune cells played a role in predicting the prognosis of tumor patients in the TME, we further investigated the relationship between the expression of 43 genes and the immune infiltration, which demonstrated that the expression of 22 genes was significantly different in high and low immune infiltration groups. Furthermore, through Kaplan-Meier survival analysis, we further revealed that the expression of 10 genes is statistically significant to the GC patient's survival. Through LASSO regression analysis and COX analysis, we found that VCAN was an independent risk factor for the prognosis of GC patients. To ensure the sense of the results, we verified it through GEO dataset (GSE84433), TIMER database and TISIDB, and discussed the value of VCAN in immunotherapy. Meanwhile, for exploring the useage of VCAN in clinical therapy, we revealed that VCAN a binding target of hyaluronic acid, which was demonstrated in the DRUGBANK database. 24 With a simlar structure to hyaluronic acid-binding proteins in the amino-terminal domain, the protein of VCAN (versican), with a B-loop domain, was regarded to performed a binding function towards hyaluronic acid in human. [25][26][27] Besides, as different functions varied from multiple sizes of hyaluronic acid, we further investigated it to ensure the size of hyaluronic acid targeted by versican. Then, we discovered that reduced amount of high molecular weight hyaluronic acid (high MW HA: 1.5MDa) could lead to low expressed vesican (low MW HA: 40 kDa) in protein level, while change was not found in the fewer expression of low MW HA. 28 Therefore, we held the view that versican, as a protein production of gene VCAN, could target hyaluronic acid with high molecular weight via its structure that resembled hyaluronic acid-binding proteins, which need further investigation.
VCAN had been found to affect the occurrence and development of a variety of cancers, 6,29,30 but its research in GC tissue was still unclear. Studies had shown that the high expression of VCAN in GC tissue was an independent risk factor for poor prognosis of patients, which was extremely common in more aggressive tissues. 7,31 Based on the analysis of the differentially expressed GC genes from TCGA, our study tried to origin from the TME and immune infiltration, which was further combined with the clinical survival prognosis analysis, and found that VCAN affected the prognosis of GC patients which was related to the TME and immune infiltration. Diagnosis and treatment of GC would be offered new guidelines. However, through this bioinformatics analysis only reveal the role of VCAN in immune infiltration and TME of GC. We will perform experiments in vivo and in vitro for further exploring the mechanism of VCAN affecting on GC.

| CONCLUSION
VCAN is highly expressed in GC tissues, whose upregulated expression is correlated with the poor prognosis of GC. In addition, the expression of VCAN is high in the group with high immune infiltration in GC tissue. We believe that VCAN is a new immune-induced prognostic indicator in GC by regulating the infiltration of several immune cells in GC.