Single‐cell analysis reveals metastatic cell heterogeneity in clear cell renal cell carcinoma

Abstract Renal cell carcinoma (RCC) is one of the leading causes of cancer‐related death worldwide. Tumour metastasis and heterogeneity lead to poor survival outcomes and drug resistance in patients with metastatic RCC (mRCC). In this study, we aimed to assess intratumoural heterogeneity (ITH) in mRCC cells by performing a combined analysis of bulk data and single‐cell RNA‐sequencing data, and develop novel biomarkers for prognosis prediction on the basis of the potential molecular mechanisms underlying tumorigenesis. Eligible single‐cell cohorts related to mRCC were acquired using the Gene Expression Omnibus (GEO) dataset to identify potential mRCC subpopulations. We then performed gene set variation analysis to understand the differential function in primary RCC and mRCC samples. Subsequently, we applied weighted correlation network analysis to identify coexpressing gene modules that were related to the external trait of metastasis. Protein‐protein interactions were used to screen hub subpopulation‐difference (sub‐dif) markers (ACTG1, IL6, CASP3, ACTB and RAP1B) that might be involved in the regulation of RCC metastasis and progression. Cox regression analysis revealed that ACTG1 was a protective factor (HR < 1), whereas the other four genes (IL6, CASP3, ACTB and RAP1B) were risk factors (HR > 1). Kaplan‐Meier survival analysis suggested the potential prognostic value of these sub‐dif markers. The expression of sub‐dif markers in mRCC was further evaluated in clinical samples by immunohistochemistry (IHC). Additionally, the genetic features of sub‐dif marker expression patterns, such as genetic variation profiles, correlations with tumour‐infiltrating lymphocytes (TILs), and targeted signalling pathway activities, were assessed in bulk RNA‐seq datasets. In conclusion, we established novel subpopulation markers as key prognostic factors affecting EMT‐related signalling pathway activation in mRCC, which could facilitate the implementation of a treatment for mRCC patients.


| INTRODUC TI ON
Renal cell carcinoma (RCC) is characterized by various genetic abnormalities, and the accompanying clinical and biological heterogeneity plays a crucial role in the modification of drug resistance, signalling networks, distant metastasis and prognosis. [1][2][3] Clear cell RCC (ccRCC) is the most prevalent histopathological type, constituting more than 85% of metastatic RCC (mRCC) cases and showing a poor prognosis. Nearly 20% of ccRCC cases present with de novo metastatic disease at initial diagnosis, and the 5-year overall survival (OS) rate of metastatic cases is as low as 10%. 4,5 Over the last 12 years, the clinical management of mRCC has progressed significantly. Multiple targeted agents have been developed to block the activity of known tyrosine kinases and signalling pathways, such as inhibitors of the mammalian target of rapamycin (mTOR), platelet-derived growth factor (PDGF) and vascular endothelial growth factor (VEGF) pathways, which mediate crucial network functions involved in ccRCC, metastasis and angiogenesis. [6][7][8][9][10] Although targeted drugs for VEGF or PI3K/mTOR are effective firstline treatment options, many patients with mRCC eventually tend to develop drug resistance. The median time of disease progression into a drug-resistant phenotype is approximately 6-15 months, depending on the therapeutic programmes and intratumoural heterogeneity (ITH). 11,12 Epithelial-to-mesenchymal transition (EMT), a process in which epithelial cells lose their apical-basal polarity and concomitantly acquire a migratory phenotype, 13,14 is a key step in tumour metastasis.
The EMT programme in tumour metastasis adapts to the ITH and constantly evolving microenvironment to allow tumour cells to successfully metastasize. 13 Since EMT is dynamically regulated during metastasis, more studies on the molecular regulators of EMT could shed light on the therapeutic approaches to inhibit tumour colonization. However, the mechanisms by which EMT heterogeneity cooperates with the tumour microenvironment (TME) and activates distinct downstream signalling pathways in metastatic sites to promote tumour progression remains unclear.
The emerging single-cell RNA-sequencing (scRNA-seq) technology provides deeper insights into transcriptome expression profiles at a single-cell resolution and a detailed understanding of ITH. [15][16][17][18] Tumour metastasis is an evolutionary process, and cells may acquire novel or different phenotypes through selection. 19,20 scRNA-seq has helped reveal unidentical subpopulations, greatly facilitating the development of novel approaches for improving precision targeted therapy. 21 Nevertheless, elucidation of the role of ITH in mRCC cells and its contribution to drug resistance remains a challenge, and this information is expected to have profound implications in improving our understanding of the process of tumour subclonal evolution and development of effective treatment regimens. Thus, in this study, we sought to assess mRCC cell heterogeneity by analysing a combination of bulk data and scRNA-seq data.
We hypothesized that alterations in gene expression profiles during the evolutionary process of tumour metastasis affect the phenotype of metastatic cancer cells, resulting in the activation of distinct signalling pathways and drug resistance to specific treatments. Our analyses indicated that EMT activation reflects a main cell subset with distinct mechanisms of action in mRCC.
Using publicly available human sequencing datasets (The Cancer

Genome Atlas [TCGA] and Gene Expression Omnibus [GEO]
Series [GSE73121]) and immunohistochemistry (IHC) analysis, we confirmed that IL6, CASP3, ACTB, ACTG1 and RAP1B, which are referred to as subpopulation-difference or sub-diff markers, are co-regulators of the differentially expressed gene (DEG) network between two metastatic subpopulations and play key roles in RCC metastasis. We further explored the expression patterns and genetic mutations of these five sub-dif markers to evaluate their influence on the prognosis of RCC in large patient cohorts. These results indicated that the activation of pathways differed in mRCC subpopulations and that EMT was the main pathway. Sub-dif markers are key prognostic factors that participate in EMT induction in mRCC subpopulations. Furthermore, the correlations of sub-dif markers with immune infiltration and drug response were analysed to ascertain their value as molecular markers for predicting a subpopulation evolution course, and as novel potential pharmacological targets.

| Gene expression data and IHC-confirmed patient population
Bulk RNA-seq data for kidney renal clear cell carcinoma (KIRC) tumours and normal samples, including level-3 RNA-seq data and clinical data, were downloaded from the TCGA data portal (https://gdc. nci.nih.gov) and used as bulk RNA-seq data to explore sub-dif marker gene function.
Using the single-cell sequencing data of 118 cell samples (after removing three bulk RNA-seq samples from the 121 cell samples) from patients with primary RCC and mRCC, single-cell transcriptome profiles were obtained from the GEO repository under the accession number GSE73121. The criteria for filtering single cells for downstream analyses included exclusion of low-quality cells (<200 genes/ cell, <3 cells/gene and >10% mitochondrial genes) and log2transformation of gene expression levels by using the ScaleData R

| Gene set variation analysis of primary and metastatic tumour subpopulations
To estimate the differential activities of 50 hallmark pathway gene signatures between primary ccRCC and metastatic ccRCC, we implemented the gene set variation analysis (GSVA) algorithm in the scRNA-seq data. Significantly enriched pathways were identified based on a |logFoldChange| ≥ 1 and an adjusted P value of <0.05. In addition, to evaluate whether the metastatic subpopulations were highly enriched with EMT-and VEGF-activated pathways, GSVA was employed to determine the pathway gene set activity score for each sample by using a non-parametric and unsupervised algorithm in the GSVA R package. 22

| Principal component analysis and t-distributed stochastic neighbour embedding analyses in metastasis subgroups
Based on the assumption that genes capturing cell heterogeneity often display high variability, we focused on identifying highly variable genes (HVGs) by using the FindVariableFeatures function and used them for subsequent analyses. 23 Principal component analysis (PCA) was conducted on the scaled HVGs, which returned six principal components (PCs) based on Cattell's screen test, and the first four principal components were chosen for visualization ( Figure S1). The six returned PCs were presented for t-distributed stochastic neighbour embedding (tSNE) dimension reduction to obtain a two-dimensional representation of the cell state. The FindClusters function was used for clustering to realize a modular and optimized clustering algorithm of the shared nearest neighbour, which resulted in the formation of two clusters, and a resolution of 2.2 was selected. We further analysed the gene expression patterns of the mRCC subpopulations and identified 734 significant DEGs with dynamic expression changes under the threshold of an adjusted P value of <0.05. Cluster-specific marker genes from the two clusters were identified and presented as violin plots and tSNE plots.

| Defining core subpopulation differential genes in metastatic cancer cells
Weighted correlation network analysis (WGCNA) was used to acquire metastatic subpopulation-related protein clusters. 24 To identify the gene co-expression modules, a hierarchical clustering analysis and a soft threshold power were assigned to group genes with similar expression patterns, and co-expression networks were created. The networks consisted of highly similar coexpression modules, and the eigengenes of these modules were further determined. Finally, Pearson correlations between the module eigengenes and clinical data were displayed. A proteinprotein interaction (PPI) network was constructed using the STRING online portal (https://strin g-db.org/). A protein with a F I G U R E 1 Flow diagram of the study protocol contribution >6 was selected to construct the PPI network, which was visualized using Cytoscape software (version 3.6.1) (http:// www.cytos cape.org). 25

| Analysis of sub-dif marker expression patterns and prediction of drug sensitivity
The cBioPortal database (http://www.cbiop ortal.org) serves as a web resource for exploration, visualization and analysis of multidimensional cancer genomics data, and it contains DNA copy number, DNA methylation, transcriptome, micro RNA and non-synonymous mutation data. 26,27 In this study, the cBioPortal database was used for systematic analysis of the single-nucleotide variation profile of the sub-dif markers. Gene set cancer analysis (GSCALite), a web-based platform for gene set cancer analysis, 28 was applied to analyse the drug sensitivity of the sub-dif marker genes along with the pathway activities of TCGA-KIRC samples. We further evaluated the correlation between gene expression and IC50 using the Genomics of Drug Sensitivity in Cancer (GDSC) database.

| Tumour IMmune Estimation Resource database analysis
The Tumor IMmune Estimation Resource (https://cistr ome. shiny apps.io/timer/) is a web tool that provides a robust and comprehensive estimation of immune infiltration levels in diverse cancer types. 29 Correlation analysis of sub-dif marker expression with tumour-infiltrating immune cells (TIICs) was conducted using the 'Gene' module. Prognostic analysis of sub-dif marker expression and TIICs was conducted using the 'Cox' module. The x-axis of F I G U R E 2 Differential distribution patterns and involved pathways between primary RCC and metastatic RCC. A, A total of 3719 highvariant genes were identified through the M3Drop R package, and the yellow dots represent the genes with a high dropout rate. Fitting analysis suggests that these genes may be differentially expressed in cell subpopulations. B, Clustering heatmap generated by unsupervised hierarchic clustering reveals that the high-variant genes discriminate primary RCC from metastatic RCC. C, Differences in pathway activities scored per cell using GSVA between tumour cells isolated from primary RCC or metastatic RCC. t values are independent of effects from the patient of origin the scatterplot represents the expression level of TIICs, whereas the y-axis represents the expression level of the sub-dif markers.
All gene expression levels were represented by log2 RNA-Seq by Expectation-Maximization (RSEM), and the adjusted P value of <0.05 was considered to be statistically significant.

| Immunohistochemistry
Tissue microarrays (TMAs) containing samples from 20 metastatic renal carcinoma patients with a definite pathological diagnosis of ccRCC were constructed for immunohistochemical analysis, as previously described. 30   showed two significantly different distribution patterns between the primary RCC tissue and mRCC samples, indicating differential expression patterns between the primary and metastatic sites in RCC.

| Data availability and processing
Functional differences between the primary and metastatic cancers were further explored using GSVA ( Figure 2C) because one study showed that the two tumour cell clusters exhibit differences in the hallmark pathway gene signatures. 31

F I G U R E 4
The metastasis-associated module selection and protein-protein interaction networks. A, Determination of the softthresholding power in the mRNA WGCNA. B, Module-trait associations of mRNAs were evaluated by correlations between MEs and clinical traits. Left: Analysis of the scale-free fit index for various soft-thresholding powers (β). Right: Analysis of the mean connectivity for various soft-thresholding powers. C, Scatter plot of the correlation between gene MM in the grey module, which was associated with pathologic metastasis. D, PPI network of DEGs with an interaction score >0.7. Based on STRING and Cytoscape analysis, proteins (the red nodes) with a contribution of greater than six were filtered into the PPI network complex identified as hub genes for the RCC metastatic phenotype, and the red circles were referred to the sub-dif markers that have demonstrated a synergistic effect of tumour vessel normalization and EMT on tumour metastasis.

| Single-cell expression atlas of RCC metastases
To generate a comprehensive view of cellular diversity in RCC metastases, we selected metastasized cells for further analysis. We applied PCA to HVGs across all 71 metastasized cells (n = 1602 genes) using tSNE on the informative PCs (n = 8). We generated two-dimensional maps of the data using tSNE and partitioned the cells into two main clusters ( Figure 3A). Marker gene expression in cell clusters was dissimilar. Individual clusters were labelled with subpopulation-specific marker genes and stratified into the EMT pathway-related cluster (EMT-RC) (yellow) or the VEGF pathway-related cluster (VEGF-RC) (blue) for characterization ( Figure 3B). GSVA analysis suggested that subclonal populations of ITH contribute to induction of EMT in metastatic cancer cells ( Figure 3C). A heatmap revealed the differential expression patterns of the top 40 DEGs from the two clusters ( Figure 3D). In this analysis, EMT-RC and VEGF-RC were identified as the main changeable subclonal populations associated with distinct signalling pathway activation in the tumour metastatic samples ( Figure S2). The analysis implied considerable heterogeneity within the mRCC cell populations, which indicates that the EMT-RC and VEGF-RC subpopulations are major functional subpopulations extensively expanded in mRCC. These differences reflect well-known disparities in clonal selection following cell evolution in tumour metastasis. 35

| WGCNA and PPI
To estimate the relative contributions of the DEG subpopulations of the two clusters to bulk RNA expression, we applied WGCNA, which F I G U R E 6 Genetic alterations of sub-dif markers in RCC determined with the cBioPortal database is a powerful tool for identifying coexpressing gene modules and relating these modules to external traits, including metastasis. Using a soft threshold power setting of four ( Figure 4A), we identified five modules with genes with similar expression patterns in the TCGA-KIRC RNA-seq dataset ( Figure 4B). The grey module, containing 179 genes, was significantly correlated with metastasis (R 2 = 0.22, P = 3e-06; Figure 4C).
Using the STRING online database and Cytoscape software, a total of 178 coexpressing genes from the grey module were filtered into the PPI network, which contained 87 nodes and 103 edges ( Figure 4D).
The five most significant nodes in the network were IL6, CASP3, ACTB, ACTG1 and RAP1B, which were identified as hub genes for the RCC metastatic phenotype and are referred to as sub-dif markers.

| Pathway activity analysis of sub-dif markers
Sub-dif markers were selected to perform the pathway activity analyses. We identified genes that are widely associated with characteristic metastasis-related biological processes, such as cell cycle, PI3K/AKT and RAS/MAPK, which are involved in cell proliferation ( Figure 5A). The dysregulation of these sub-dif markers may trigger cell cycle dysfunction and cell proliferation, differentiation and apoptosis pathways, thus inducing tumour cell metastasis, which is in accordance with another independent group's publication. 36 Some sub-dif marker genes (RAP1B, IL6 and ACTB) were positively correlated with the EMT pathway ( Figure 5B). We also identified two positively correlated signalling pathways related to apoptosis and DNA damage response. Thus, we not only confirmed the metastasisrelated pathways frequently reported in other studies, but also verified the strong relationship between sub-dif markers and the EMT pathway.

| Genetic alteration, pathway analysis and drug sensitivity analysis of sub-dif markers
To explore the genetic features of sub-dif marker expression patterns, we analysed the genetic variation profiles of the sub-dif markers cases from TCGA, PanCancer Atlas; and 106 cases from Nat Genet 2013) using the cBioPortal database. We found that the sub-dif markers exhibited a low mutation frequency in the range of 0.12%-0.25%. ACTG1, ACTB and RAP1B were well represented with mutation frequencies of 1%, 0.8% and 0.25%, respectively. Specifically, 0.12% of the total patient series harboured only the IL6 amplification mutation, whereas 0.25% of the patients harboured the CASP3 deletion mutation ( Figure 6). Next, we investigated the effect of sub-dif markers on drug sensitivity. Using the GDSC database, we found that IL6 and CASP3 are potential therapeutic targets. High expression of CASP3 was correlated with sensitivity to 45 small molecular drugs, while high expression of IL6 was associated with sensitivity to seven drugs but with resistance to MPS-1-IN-1 and WZ3105 (Figure 7), which may provide further support for the optimization of targeted therapy in patients with mRCC. The nature of the sub-dif marker response to therapy is unclear, and our data provide a basis for further elucidation of drug sensitivity prediction through these marker genes.

| Correlation between sub-dif marker expression and immune biomarkers and prognosis in KIRC
In the tumour microenvironment, the crosstalk between tumour cells and TILs is essential for tumour metastasis. We further investigated the relationship between the sub-dif markers and TILs, which is associated with disease prognosis. 37,38 This analysis showed strong positive correlations between sub-dif markers were identified as risk factors (HR > 1) (Table 1) Figure 9D) were significantly associated with worse OS.
In contrast, higher RAP1B ( Figure 9E) expression was associated with a better prognosis. These results further validated that subdif markers are of great significance for assessing the prognosis of mRCC.

| Validation of sub-dif markers by IHC staining
To support our findings, we sought to provide an independent vali-

| D ISCUSS I ON
In the present study, we used scRNA-seq to identify two distinct subpopulations in metastatic ccRCC samples. We found evidence of cancer cell heterogeneity within each distinct subpopulation by using different markers. Our findings are analogous to those of other studies, which demonstrated that tumours are transcriptomically heterogeneous. 3,4,39 The two separate lineages evolve separately into two pathways, EMT-activated pathways, and VEGF-related pathways, which apparently include more related genes in the context of metastatic ccRCC. Moreover, we showed that sub-dif markers from the subpopulations are linked to RCC TILs, drug sensitivity and prognosis. These results indicate that a combined regimen of anti-vascular therapy and anti-EMT therapy may be an effective therapeutic strategy for patients with mRCC.
Intratumoural genomic heterogeneity in RCC has been well-  Analysis of the expression profiles of the DEGs between different subpopulations suggested that the sub-dif markers are important genes that activate EMT-related pathways to induce metastasis. We identified the two cell populations associated with the activation of different signalling pathways as the most prevalent subpopulations in metastatic cells, and this finding is consistent with those obtained in a study by Elzbieta. 53 Furthermore, we showed that IL6, CASP3, ACTB, ACTG1 and RAP1B are hub genes that regulate the tumour metastasis regulation network. The expression of these hub genes is related to the prognosis in patients with mRCC. The association between IL-6 and JAK2, and STAT3 signalling pathway activation and cancer cell migration was observed in a paracrine or autocrine IL-6-rich inflammatory environment. 54 Moreover, IL6 is associated with a network that controls cellular movement, and it serves as an unfavourable prognostic biomarker in terms of overall survival. 55 Our study revealed that the IL-6-rich network hub gene is closely related to the EMT and VEGF subgroups. Using bulk sequencing data from cancerous and normal cell samples, we showed that significant differences in the expression of sub-dif markers have a high prognostic value, which implies that the sub-dif markers play crucial roles in the acquisition of the EMT phenotype. This finding is consistent with our hypotheses.
Growing evidence indicates that immune cell infiltration plays an important role of in cancer metastasis, which could affect the prognosis of cancer patients. [56][57][58] Moreover, immune-related pathways and immunotherapeutic strategies in cancers are expected to provide a potential direction for cancer therapy. 59 Notably, the significant associations of sub-dif markers with TILs, drug sensitivity and OS suggest the potential of using these sub-dif markers as clinical prognostic biomarkers for predicting the risk in mRCC patients.
Some immune biomarkers, such as PD-1, have been suggested to function as negative immunoregulatory molecules and regulators of cancer cell immune evasion. 60 Thus, sub-dif markers may play a vital role in immune escape in the mRCC microenvironment.
In summary, we demonstrated that utilization of ITH in met-

| Limitations
Our study had some limitations. First, we analysed only metastatic tumour samples and could not elucidate the results in the primary tumour samples. Second, we did not compare our findings with those of other independent cohort studies, which is essential for validating our findings and could have yielded more reliable results. Hospital. The funders did not play a role in manuscript design, data collection, data analysis, data interpretation, or writing of the manuscript.

CO N FLI C T O F I NTE R E S T
The authors report no conflicts of interest in this work.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets analysed during the current study are available from the corresponding author on reasonable request.