A MUCINs expression signature impacts overall survival in patients with clear cell renal cell carcinoma

Abstract Background Kidney cancer, especially clear cell renal cell carcinoma (ccRCC), is one of the most common cancers in the urinary system. Previous studies suggested that certain members of MUCINs could serve as independent predictors for the survival of ccRCC patients. None of them, however, is robust enough to predict prognosis accurately. Objective To analyze the correlation of MUCINs alterations and their expression levels with the prognosis of ccRCC patients and develop a prognosis‐related predictor. Methods We applied whole‐exome sequencing in samples from 22 Chinese ccRCC patients to identify genetic alterations in MUCIN genes and analyzed their genetic alterations, expression, and correlation with survival using the TCGA, GSE73731, and GSE29069 datasets. Result Genetic alternations in MUCINs were identified in 91% and 51% of ccRCC patients in our cohort and the TCGA database, respectively. No correlation with survival was found for the genetic alterations. Using unsupervised clustering analysis of gene expression, we identified two major clusters of MUCIN expression patterns. Cluster 1 was characterized by a global overexpression of MUC1, MUC12, MUC13, MUC16, and OVGP1; and cluster 2 was characterized by a global overexpression of MUC4, MUC5B, MUC6, MUC20, EMCN, and MCAM. Patients with cluster 1 expression pattern had significantly shorter overall survival time and worse clinical features, including higher tumor grades and metastasis. Meanwhile, they had a higher level of mutation counts and more infiltrated immune cells, but lower enrichment in angiogenesis signature genes. A five‐MUCINs expression signature was constructed from cluster 1, and notably, it was demonstrated to be associated with shorter overall survival. A similar worse clinical feature, lower angiogenesis but the more immune signature, was identified in samples presented with signature 1. In the validation data set GSE29069, patients with signature 1 were also associated with a trend of poor survival outcomes. Conclusion We established a five‐MUCINs expression signature as a new prognostic marker for ccRCC. The distinct tumor microenvironment feature between the two signatures may further affect ccRCC patients’ clinical management.


| INTRODUCTION
Renal carcinoma is a malignant neoplasm originating from the urinary tubular epithelial of the renal parenchyma. 1 Worldwide, the incidence and death of renal carcinoma are continuously rising, with a predicted incidence of 403,262 cases and 175,098 deaths in 2018. 2 In China, approximately 68,300 new cases were diagnosed and 25,600 deaths attributed to renal carcinoma 3 The five-year survival rate for patients with metastatic ccRCC is less than 10%. 4 Even for those with localized tumors, the risk for recurrence or metastatic following complete resection is as high as 40%. 5 Tumor grade and histological features were found to be correlated with prognosis in ccRCC patients. 6 Nevertheless, intratumor and intertumoral heterogeneity of the same patient is conspicuous, which complicates the prediction accuracy by pathology. 7,8 Molecularly classification of ccRCC includes the analysis of single-nucleotide polymorphism, somatic mutations, DNA methylation, and gene expression. 9,10 Integrative analysis of genetic mutation and clinical information demonstrated that several genetic changes, such as BAP1 mutation, were associated with poor clinical outcomes in ccRCC patients as independent predictors. 11 However, these prognostic markers are not robust enough for clinical practice.
MUCINs are epithelial cells-expressed large Oglycoproteins, and their overexpression and glycosylation in malignancies are found to facilitate oncogenic processes. 12 Previous studies have found that the expression levels of MUCINs, including MUC1, MUC4, MUC12, MUC13, MCAM, and LAMA4 were correlated with poor survival of ccRCC patients, and/or promotion of tumor cell proliferation in vitro. [13][14][15][16][17][18] MUCINs could be classified into secreted, membrane-bounded, and atypical types according to their structure and localization; however, the specific biological functions of different types in the prognosis of ccRCC are undefined. Previous studies mainly focused on the individual member of MUCIN's correlation with prognosis in ccRCC and a comprehensive evaluation of the MUCIN family members in ccRCC is still lacking.
In this study, we analyzed the correlation of patients' survival with genetic alteration and mRNA expression profile of MUCINs in ccRCC patients. Subsequently, we developed and validated a combined MUCINs expression-based signature for predicting the prognosis of ccRCC, which is worth being validated prospectively and utilized in further clinical practice.

| Data resource and study design
Whole-exome sequencing was performed for tumor tissues from 22 ccRCC patients from Qilu hospital and genomic alternation profiles were identified in 20 MUCINs genes.
Meanwhile, publicly available MUCINs genomic alternation data and corresponding clinical feature information (including overall survival, neoplasm disease grade, tumor stage, and metastasis stage) of ccRCC were downloaded from cBioPortal for Cancer Genomics database (https:// www.cbiop ortal.org/), which contained 538 ccRCC samples. Transcriptome data for 100 cases of normal kidney tissue were downloaded from Genome Tissue Expression (GTEX).
Then, data sets GSE73731 and GSE29069 were downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nml.nih.gov/geo/). The data set GSE73731 included MUCINs genomic alternation data of 265 ccRCC samples, which was used to verify the correlation of MUCINs expression with prognosis. The data set GSE29069 containing 39 ccRCC samples and survival data were applied for validation of the findings for prognosis.

| Screening of differentially expressed MUCINS
In the preprocessing of the raw data, MUCINs with lowquality data were excluded. GEPIA server was applied to analyze the MUCINs expression profile in ccRCC compared with normal tissue controls. P <= 0.05 and logarithmic fold changes >=0.3 were considered statistically significant. 19 GEPIA is available at http://gepia.cance r-pku.cn/. All plotting features in GEPIA are developed using R (version 3.3.2).

| Clustering analysis of CCRCC patients
The Pearson correlation coefficients (r-values) and p-values for each combination of MUCIN genes were calculated using the Hmisc package in R studio. Principal component analysis (PCA) and unsupervised hierarchical clustering were performed using the FactoMineR package in R studio. The MUCIN members with no expression in 20% samples or more were excluded.

Conclusion:
We established a five-MUCINs expression signature as a new prognostic marker for ccRCC. The distinct tumor microenvironment feature between the two signatures may further affect ccRCC patients' clinical management.

| Survival analysis
Kaplan-Meier survival curves and Cox survival analysis were generated by the "survival" package and "survminer" package in R. The Cox proportional hazard ratio and the 95% confidence interval were included in the survival plot.

| Gene enrichment analysis
Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were conducted using the "ClusterProfiler" package in R. We used "adjusted p < 0.05" as the significant criteria and selected the GO terms and KEGG pathways with the highest statistical significance.

| XCELL and GENE set variation analysis
We estimated the abundance of different immune and stromal cell types in each ccRCC case using xCell, an analysis of the cellular heterogeneity landscape through tumors gene profiles, which includes 64 different immune and stromal cell types. 20 xCell is available at http://xCell.ucsf.edu/.

| Statistical analysis
Student's t test and non-parametric test (Chi-square analysis of variance and Wilcoxon Rank-sum test) were used for the comparison of molecular differences and clinical features between two groups using R studio (https://rstud io.com/). A log-rank test was used to evaluate the survival curves between different groups. And p-value < 0.05 was considered statistically significant.

| Genetic alteration in MUCIN genes
In our ccRCC cohort, 91% (20/22) cases were identified as having at least one alternation in MUCINs, with the most prevalence in MUC16, MUC2, and MUC12 (59%, 56%, and 45%, respectively). Meanwhile, only 52% (235/448) of ccRCC cases had at least one genetic alternation in MUCINs in the TCGA database, but the majority had alternations in MUCINs mRNA expression levels ( Figure 1A,B). There was no significant difference in the overall survival between ccRCC patients with and without genomic alternations in MUCINs (p = 0.11, Figure 1C). However, patients with higher mRNA expression in MUCINs had significantly worse survival (p = 0.01, Figure 1D).

| MUCIN expression pattern in CCRCC
GEPIA web-server was applied to assess the MUCINs expression level in ccRCC patients compared to the normal controls. Expression levels of 9 MUCINs were significantly altered in ccRCC patients (p < 0.05), including six membrane-bound

| The relationship between MUCIN expression and survival in CCRCC
Excluding MUCINs with low-quality expression data, 11 MUCINs were available for the subsequent analysis. Univariate Cox analysis was conducted to estimate the hazard ratio (

| Correlation of MUCINS expression levels in CCRCC
To further understand the MUCINs' expression pattern in ccRCC patients, we calculated the Pearson correlation coefficient for each pair of MUCINs and revealed a principal component analysis (PCA) for the 11 MUCINs expression profile. There were 25 pairs of MUCINs with a positive correlation (0.09 < r < 0.73, Table 1) and 12 with negative correlation (−0.37 < r < −0.09, Table 2) (p < 0.05, Figure 4A). We identified a strong coexistence between MUC4 and MUC20 (r = 0.73).  Table S1, Figure S1).

| Clustering analysis in CCRCC
Unsupervised hierarchical clustering was applied according to samples' MUCIN expression patterns and the ccRCC cases were classified into two distinct clusters (cluster 1 and cluster 2) ( Figure 5A). There were 160 cases (29.96%) in cluster 1 and 374 (70.04%) in cluster 2. Notably, cluster 1 was significantly associated with worse survival (median OS: 54.57 vs. 116.75 months, p < 0.0001, Figure 5B). To further characterize each cluster, we performed an unpaired t-test for two clusters on log-transformed mRNA expression. There were five MUCINs (MUC1, MUC12, MUC13, MUC16, and OVGP1) significantly overexpressed in cluster 1 (p < 0.05) and most of them were membrane-bound MUCINs except OVGP1. In contrast, six MUCINs (MUC4, MUC5B, MUC6, MUC20, EMCN, and MCAM, p < 0.05) were overexpressed in cluster 2, which contained two membrane-bound MUCINs, two secreted MUCINs, and all the atypical MUCINs ( Figure 5C). By investigating the clinical feature of these two clusters, a significant increase in the genetic mutation counts in patients from cluster 1 was noticed (median counts 53 vs. 44.5, p = 0.0044) ( Figure 6A). Patients in cluster 1 were presented with later tumor stage, clinical stage, and higher neoplasm histological grade than those in cluster 2 ( Figure 6B-G). There was no significant difference in the age at diagnosis (61 years old, Figure 6H) and gender between the two clusters ( Figure 6I).

| Tumor characteristics in different clusters
Compared with cluster 2, 1730 genes were upregulated and 1837 genes were downregulated in cluster 1. To further distinguish the biological features between the two clusters, we performed gene ontology (GO) and KEGG pathway enrichment analysis on the differentially expressed genes (DEGs).
The following GO annotations of DEGs were found to be enriched (FDR < 0.05, Figure 7Aa): as for the molecular function, transporter and channel, extracellular matrix and receptor; as for the cellular component, extracellular matrix, apical region and transporter, and ion channel complex. Meanwhile, the significantly enriched DEGs were involved in renal system development, extracellular matrix and cell adhesion and calcium ion process. Similarly, KEGG analysis showed that the DEGs were mainly enriched in receptor interaction, extracellular matrix, cell adhesion, and calcium signaling pathway ( Figure 7B). We also investigated the association between the two clusters and angiogenesis/immune biology with a predefined gene set. The heatmap indicated that genes related to angiogenesis had relatively lower expression level in cluster 1, but genes related to immune biology were upregulating instead ( Figure 7C). By analyzing the composition of tumor microenvironment using xCell, it was revealed that cluster 1 was correlated with a higher immune cell level and microenvironment score, whereas cluster 2 had a higher stroma score ( Figure 7D). As shown in Figure  S2

| MUCIN signature and the prognosis of CCRCC patients
As clusters 1 and 2 were based on the expression data of the whole family of MUCIN members, which limited its application in clinical practice, we investigated whether simplified signatures may give sufficient performance. We exacted signature 1 (MUC1, MUC12, MUC13, MUC16, and OVGP1) from cluster 1 and signature 2 (MUC4, MUC5B, MUC6, MUC20, EMCN, and MCAM) from cluster 2, and each signature was weighted by hazard ratio ( Figure 8A). Though signature 2 did not show a significant association with survival (p = 0.6), patients with signature 1 were associated with worse survival than those without the feature (p < 0.0001, Figure 8B,C). Then, we validated this predictor in the GSE29069 dataset, which was independent of the previous data. Although MUC12 expression was absent and the sample size was limited in GSE29069, univariate Cox analysis indicated that each MUCIN in signature 1 (MUC1, MUC13, MUC16, OVGP1) were all associated with a trend with poor survival, especially OVGP1 ( Figure S3). The enrichment of signature 1 was associated with a trend of poor survival (p = 0.087, Figure S4). Interestingly, though there was no difference in the mutation counts and serum calcium level between samples with or without signature 1, a similar feature to cluster 1, including later tumor stage and disease stage, higher neoplasm histologic grade with more lymph node metastasis, and long-distance metastasis, were also presented in samples with signature 1 (Figure 8D).

F I G U R E 3
Hazard ratio of high and low expression levels of MUCINs in the TCGA cohort signaling pathways were focal adhesion, cGMP−PKG signaling pathway, and ECM−receptor interaction ( Figure 9B). Additionally, differential expression of genes involved in angiogenesis and immune and antigen presentation was observed in ccRCC samples with signature 1 (Figure 9C). A significantly higher level of the immune score but a lower level of stroma score was identified in samples with signature 1, with the increased presence of multiple types of infiltrated immune cells, such as activated myeloid dendritic cells, CD4 positive effector memory T cells, and CD8 positive T cells ( Figure 9D).

| DISCUSSION
MUCINs, especially membrane-bound MUCINs, have been demonstrated to be associated with a shared tumor progression pattern, 12 but the correlation of MUCINs' mutation and expression pattern and with ccRCC patients' prognosis is not fully documented. Although various MUCIN members have been studied singly in ccRCC, the comprehensive and integrated role of MUCINs in ccRCC patients' prognosis is still unclarified.
In the present study, we analyzed the ccRCC data from the TCGA database to examine MUCINs' correlation with patients' survival and identified cluster 1 expression pattern as being associated with poor survival. To simplify the prognostic biomarker, we successfully established a five-MUCIN expression signature that significantly correlated with worse survival and validated in independent ccRCC patient datasets. This signature may potentially be used as a prognostic marker in further clinical practice.
The signature is composed of the overexpression of MUC1, MUC12, MUC13, MUC16, and OVGP1, which were all contribution variables associated with poor survival in ccRCC. The predictive power of the signature may lie in the function of each MUCIN gene, some but not all of which had been reported to have various functions in different types of cancer and/or kidney development. Among them, MUC1 and MUC16 were found to be associated with immune modulation and metastasis in cancer. The presence of MUC16 neo-antigen-specific T cell clones and anti-MUC1 antibodies in cancer suggests that MUCINs can serve as potential targets for developing cancer therapeutics. 38 Notably, MUCIN mutations have been demonstrated to affect normal kidney function. For instance, MUC1 mutations directly cause autosomal dominant tubulointerstitial kidney disease. 21 MUC1 could be directly regulated by HIF-1-alpha, whose aberrant expression has been found frequently in ccRCC. [22][23][24] Moreover, MUC1 can be involved in renal tumor development and can be considered as potential markers of ccRCC development and prognostic, 25,26 which provides a foundation for the development of a cancer vaccine (TG4010, based on MUC1 and interleukin-2) in the first-line therapy of metastatic ccRCC patients. 27 Previous studies also showed that aberrantly glycosylated MUC1 is overexpressed in most human epithelial cancers 28 and it mediates enhanced expression of glucose uptake and metabolism genes, facilitating cancer cell survival and growth in multiple cancers. [29][30][31] And importantly, MUC1 could induce resistance to anticancer drugs by upregulating the expression of multi-drug resistance genes, for instance, multidrug resistance protein 1. 32 MUC16 overexpression has also been linked to worse prognosis in intrahepatic cholangiocarcinoma and pancreatic tumors. 33,34 Otherwise, it was proposed as a suppressor in TLR-mediated innate immune activation. 22,35 However, MUC16 overexpression was not described in ccRCC before. Overexpression of MUC12 and MUC13 in ccRCC was found to promote RCC progression depending on c-Jun/TGF-β signaling, which was associated with poor prognosis by increasing RCC cell growth and cell invasion. 14,16 OVGP1 was identified as an independent prognostic factor and had an association with drug resistance in ovarian cancer, and its elevation in serum could serve as an accuracy biomarker for ovarian cancer independent of CA125. 36,37 OVGP1 overexpression had not been described in ccRCC before, either. In addition to serving as a prognostic marker, the cluster 1 pattern and signature 1 may be considered for informing the selection of anti-angiogenesis and/or immune checkpoint inhibitors. Nowadays, treatments targeting the aberrant vascular endothelial growth factor/receptor (VEGF/VEGFR) pathway and/or immune checkpoint inhibitors are among the main and most effective therapy for ccRCC. However, the determinants for therapy selection are still undefined. In the present analysis, cluster 1 and its related signature 1 revealed a higher level of microenvironment immune score and multiple infiltrated immune cells, including CD4 positive effector memory T cells, CD8 positive T cells, and macrophages. Meanwhile, a relatively lower enrichment in the angiogenesis signature was found in patients from cluster 1 and signature 1. Previous studies, including IMmotion 150, have identified the association between anti-VEGFR therapy and angiogenesis signature, immune cell infiltration, and ICIs. 39 MUC1 had been found to engage with Siglec-9, induced tumor-associated macrophage-like phenotype, and increased PD-L1 expression in the tumor microenvironment. 40 These results may suggest that ccRCC patients with signature 1 may benefit less from anti-VEGFR monotherapy but have more response to ICIs and/or combination therapy instead.
However, our study also has distinct limitations. First, only 11 of 20 MUCINs expression data were obtained for the prognosis-related analysis, which might affect the outcome of the study. Besides, the sample size in the validation cohort was too small to fully validate the prognosis value of our established predictor. Though we found the potential difference in the sensitivity to anti-VEGFR and/or immune checkpoint inhibitors between patients with or without signature 1 feature, the absence of direct evidence on the response to related therapy may need further study to evaluate its clinical application. Further research with local MUCINs expression and immunotherapy data could better elucidate the relationship between our findings and the prognosis and management of patients with ccRCC.
In this study, we also found a higher prevalence of genetic alterations in MUCINs in our cohort comparing with the TCGA database, but the genetic alterations in MUCINs were not significantly correlated with ccRCC patients' survival. The underlying reason for the higher mutation rate in our cohort is currently unclear, which may be related to the difference in the genetic background of Chinese and Western populations, or environmental exposure to different foods/medicines. The answer to this question awaits future investigation.
In conclusion, we successfully classified ccRCC patients into two clusters according to their expression patterns and deciphered each cluster's feature in the clinical and prognosis. Especially, a 5-gene expression signature 1 extracted from cluster 1 could serve as a combined predictor marker and potentially help clinicians assign therapy choices for ccRCC patients.

AVAILABILIT Y STATEMENT
The raw sequencing data files were deposited in the Chinese National Genomics Data Center(https://bigd.big.ac.cn/). Left biospecimens may be shared for academic research under prior approval from the government of China.

ACKNOWLEDGMENTS
This work was supported by China National Funds for young Scientists Program (81702559). We are grateful to all participants in this research and Lifehealthcare Clinical Laboratories for technical assistance.

CONFLICTS OF INTEREST
The authors have no conflict of interest to declare.

AUTHOR'S CONTRIBUTION
HM: data collection, analysis, and drafting the article, XJ: manuscript revision, HH: data analysis, NS: data collection, CG: data collection, CY: data collection, GY: data collection, YW: design of this work and data analysis.

ETHICAL STATEMENT
This study was approved by the ethics committee of Qilu Hospital and conducted under the principles of the Declaration of Helsinki and the Good Clinical Practice guidelines. All enrolled patients provided written informed consent.