Pan‐cancer analysis of alternative splicing regulator heterogeneous nuclear ribonucleoproteins (hnRNPs) family and their prognostic potential

Abstract As the most critical alternative splicing regulator, heterogeneous nuclear ribonucleoproteins (hnRNPs) have been reported to be implicated in various aspects of cancer. However, the comprehensive understanding of hnRNPs in cancer is still lacking. The molecular alterations and clinical relevance of hnRNP genes were systematically analysed in 33 cancer types based on next‐generation sequence data. The expression, mutation, copy number variation, functional pathways, immune cell correlations and prognostic value of hnRNPs were investigated across different cancer types. HNRNPA1 and HNRNPAB were highly expressed in most tumours. HNRNPM, HNRNPUL1, and HNRNPL showed high mutation frequencies, and most hnRNP genes were frequently mutated in uterine corpus endometrial carcinoma (UCEC). HNRNPA2B1 showed widespread copy number amplification across various cancer types. HNRNPs participated in cancer‐related pathways including protein secretion, mitotic spindle, G2/M checkpoint, DNA repair, IL6/JAK/STAT3 signal and coagulation, of which hnRNP genes of HNRNPF, HNRNPH2, HNRNPU and HNRNPUL1 are more likely to be implicated. Significant correlation of hnRNP genes with T help cells, NK cells, CD8 positive T cells and neutrophils was identified. Most hnRNPs were associated with worse survival of adrenocortical carcinoma (ACC), liver hepatocellular carcinoma (LIHC) and lung adenocarcinoma (LUAD), whereas hnRNPs predicted better prognosis in kidney renal clear cell carcinoma (KIRC) and thymoma (THYM). The prognosis analysis of KIRC suggested that hnRNPs gene cluster was significantly associated with overall survival (HR = 0.5, 95% CI = 0.35‐0.73, P = 0.003). These findings provide novel evidence for further investigation of hnRNPs in the development and therapy of cancer in the future.


| INTRODUC TI ON
RNA splicing procedure removes introns and combines exons of pre-mature mRNA, which is essential for cellular homoeostasis, functional regulation, tissue development and species diversity. 1,2 Almost each transcript derived from human genes undergoes diverse patterns of alternative splicing (AS) including exclusion or inclusion of ''cassette'' exons, changes of AS sites, intron retentions, alternative promoter or terminator, and mutually exclusive exons. 3,4 Alternative splicing of pre-mRNA is responsible various aspects of biological processes and aberrant AS contribute to a series of disorders even cancer. 5,6 Emerging evidence has demonstrated that cancer cells hijack and alter AS process, thereby facilitating its growth and metastasis. 7,8 As the most critical alternative splicing regulator, heterogeneous nuclear ribonucleoproteins (hnRNPs) family are responsible for the maturation of pre-mRNAs into functional mRNAs as well as the stabilization of mRNA translocation. 9,10 Through the RNA binding domains (RBDs), hnRNPs accomplish the recognition of specific RNA sequences and control various biological processes of RNA function and metabolism. 11,12 Mechanistically, hnRNPs constitute mRNA-protein 40S core complex via binding to RNA elements including exon and intron splicing regulators, which precisely control the alternative splicing of pre-mRNAs. 13 Until now, approximately twenty key members of hnRNPs family have been identified including hnRNP A-U, which share common characteristics but differ in biological properties. 14 Emerging evidence has suggested close relationship between hnRNPs and multiple malignant behaviours of cancer. 15 For instance, hnRNP A1 modulates the alternative splicing of CDK2, thereby contributing to oral squamous cell carcinoma by altering cell cycle progression. 16 In pancreas cancer, hnRNP E1 cancer cell metastasis via controlling the alternative splicing of integrin β1, a membrane receptor involved in cell adhesion, immune response and metastatic diffusion of cancer cells. 17 Studies have suggested that hnRNP A1, A2/B1 and K bind to the promoter of tumour suppressor Annexin-A7, which alters Annexin-A7 splicing patterns and leads to prostate cancer. 18 In addition, hnRNP L has been found to regulate VEGFA mRNA translation and induce apoptosis of cancer cells, thereby inhibiting the development of cancer. 19 In spite of the current reports indicating the significant contribution of hnRNPs in carcinogenesis, our knowledge of the specific implication concerning hnRNPs still remains limited. Considering the increasing essential role of hnRNPs in cancer, it is of great interest to unravel the whole landscape of expression, mutation and copy number variation of alternative splicing regulator hn-RNPs family as well as their prognostic potential. Through analysing multiple levels of data from The Cancer Genome Atlas (TCGA) including 33 types of cancers, we described the specific implication of alternative splicing regulator hnRNPs in various cancers in this study. It is anticipated that the comprehensive pan-cancer analysis could shed light on the way alternative splicing lead to cancer.

| Collection of hnRNP genes
We collected 22 hnRNP genes from recently published review papers. All these gene symbols were converted into Ensemble gene IDs and HGNC symbols by manually curated from GeneCards (https:// www.genec ards.org/).

| Genome-wide omics data across 33 cancer types from next-generation sequence data
The results in our analysis were based upon omics datasets generated

| Identification of differentially expressed genes
To identify the alternation of gene expression in each cancer type, we used the Deseq2 package in R to identify differentially expressed genes. Genes with adjusted P-values < 0.05 and at least twofold changes in expression were identified as differentially expressed genes in each cancer type.

| Protein-wide omics data across pan-cancer from protein expression data
The protein expression data of hnRNP genes were obtained from 'The Human Protein Atlas' database (https://www.prote inatl as.org/). We totally analysed 20 cancer types on hnRNP genes protein expression, including BRCA (breast cancer), carcinoid (carcinoid), CECA

| Genome-wide mutation data across pancancer cell lines from CCLE datasets
Mutation frequency of hnRNP family genes in pan-cancer cell lines were obtained from Cancer Cell Line Encyclopedia (CCLE) datasets (https://porta ls.broad insti tute.org/ccle).

| Oncogenic pathway activity across cancer types
In order to calculate the activity of cancer hallmark-related pathways, the TPM gene expression was subjected to gene set variation analysis (GSVA), which is a non-parametric unsupervised method for estimating variation of gene set enrichment through the samples of an expression dataset. To identify the hnRNP genes that were correlated with activation or inhibition of certain pathway, we calculated the Pearson correlation coefficient (PCC) between expression of hnRNP genes and pathway activity. The regulator-pathway pairs with |PCC|>0.3 and adjusted P-value < 0.05 were identified as significantly correlated hnRNP genes.

| Correlation of hnRNP genes with immunerelated genes
The major immune cells related genes were shown in Table S1.
In order to explore the correlation between hnRNP genes and immune-related genes, we calculated the Spearman correlation coefficient (SCC) between expression of hnRNP genes and immune-related genes. The regulator-pathway pairs with |PCC|>0.3 and adjusted P-value < 0.05 were identified as significantly correlated hnRNP genes.

| Clinical significance of hnRNP genes
To explore whether the expression of hnRNP genes was associated with patient survival, we divided all the patients into two groups based on the median expression of each hnRNP gene. The log-rank test was used to test the different survival rates between the two groups. The P-values < 0.05 were considered as statistical significance.

| Expression profile of hnRNP genes across different cancer types
A total of 22 hnRNP genes were identified after searching the published review papers, the information of which was summarized in Table S1. Using the count data of TCGA, we described the differential expression of these genes across different cancer types. As shown in Figure 1A, hnRNP genes demonstrated heterogeneous distributions in different cancer types: HNRNPA1 and HNRNPAB were highly expressed in most tumours; HNRNPA1P33 expression was increased in COAD, READ and LUAD whereas decreased in CHOL, PRAD and BLCA. The detailed LogFC changes were listed in Table S2. Next, we visualized the differential expression of HNRNPAB in each cancer ( Figure 1B). Based on the immunohischemistry results of Protein Atlas database, we showed the protein expression of hnRNP genes in various cancer types ( Figure 1C). In addition, immunohischemistry results of HNRNPD based on 'The Human Protein Atlas' database representing the protein expression was shown in Figure 1D.

| Pan-cancer genetic alternations of hnRNP genes
The mutation frequency of hnRNP genes were analysed, and the results indicated that most hnRNP genes were frequently mutated in UCEC

| Association of hnRNPs with cancer-related pathways and immune status
In order to elucidate the molecular implication of hnRNPs in carcinogenesis, the relation of hnRNPs with cancer-related pathways was analysed and visualized in Figure 3A

| Prognostic significance of hnRNP genes
The prognostic significance of hnRNP genes in different cancer types was

| D ISCUSS I ON
In order to clarify the critical role of alternative splicing regulator heterogeneous nuclear ribonucleoproteins family across various types of cancer, we comprehensively analysed the core genes which belong to hnRNPs family. Based on multiple levels of data from TCGA, genomic and transcriptomic landscape of key hnRNPs family genes was investigated by pan-cancer analysis. The results suggested that hnRNPs were differentially expressed in certain cancers and corresponding controls, which also correlated with prognosis of patients. The identified correlation between hnRNPs with multiple cancer-related pathways suggested close implication of hnRNPs in the development of various types of cancers.
By comprehensively analysing the transcriptional data of 22 core hnRNP genes in TCGA, we describe the expression landscape of hnRNP genes across different cancer types. Heterogeneous distributions of hnRNP genes were observed in different cancer types: HNRNPA1 and HNRNPAB were highly expressed in most tumours. It has been reported that hnRNPA1 was highly expressed in gastric cancer tissues, which promote proliferation, migration and EMT of gastric cancer cells. 20 In lung cancer, knockdown of HNRNPA1 suppressed the viability and growth as well as induced cell cycle arrest of lung cancer cells. 21 The results of previous studies and our analysis all suggested the critical role of HNRNPA1 in the initiation and development of different types of cancers. Besides, HNRNPAB overexpression has been found in metastatic cells or cancer tissues in hepatocellular carcinoma patients, which lead to EMT and metastasis of hepatocellular carcinoma cells in vivo. 22 The oncogenic effect of HNRNPA1 and HNRNPAB is of great interest to understand the underlying mechanisms of alternative splicing in carcinogenesis, which might provide novel insights into anti-tumour therapy. Moreover, HNRNPA1P33 expression was increased   patients with increased HNRNPD expression significantly correlated with shorter recurrence-free survival. 33 These findings indicated that hnRNPs were closely implicated in the prognosis of various cancers. As many hnRNP genes demonstrated influence on KIRC prognosis, we further performed clustering analysis of prognosis-related hnRNP genes. The prognosis analysis of the cluster C1 and C2 suggested that C2 cluster was significantly associated with better survival compared with C1 cluster, indicating that hnRNP genes might be used as a prognostic predictor of cancer in the future.

| CON CLUS ION
In summary, our study systematically demonstrated the expres-

DATA AVA I L A B I L I T Y S TAT E M E N T
All of the data in this article were used the TCGA datasets (https:// www.cancer.gov/about -nci/organ izati on/ccg/resea rch/struc tural -genom ics/tcga).