Comprehensive characterization of cancer‐testis genes in testicular germ cell tumor

Abstract Cancer‐testis (CT) genes are a group of genes restrictedly expressed in testis and multiple cancers and can serve as candidate driver genes participating in the development of cancers. Our previous study identified a number of CT genes in nongerm cell tumors, but their expression pattern in testicular germ cell tumor (TGCT), a cancer type characterized by less genomic alterations, remained largely unknown. In this study, we systematically investigated the expression pattern of CT genes in TGCT samples and evaluated the transcriptome difference between TGCT and normal testis tissues, using datasets from the UCSC Xena platform, The Cancer Genome Atlas (TCGA) and the Genotype‐Tissue Expression (GTEx) project. Pathway enrichment analysis and survival analysis were conducted to evaluate the biological function and prognostic effect of expressed CT genes. We identified that 1036 testis‐specific expressed protein‐coding genes and 863 testis‐specific expressed long noncoding RNAs (lncRNAs) were expressed in TGCT samples, including 883 CT protein‐coding genes and 710 CT lncRNAs defined previously. The number of expressed CT genes was significantly higher in seminomas (P = 3.48 × 10−13) which were characterized by frequent mutations in driver genes (KIT, KRAS and NRAS). In contrast, the number of expressed CT genes showed a moderate negative correlation with the fraction of copy number altered genomes (cor = −0.28, P = 1.20 × 10−3). Unlike other cancers, our analysis revealed that 96.16% of the CT genes were down‐regulated in TGCT samples, while CT genes in stem cell maintenance related pathways were up‐regulated. Further survival analysis provided evidence that CT genes could also predict the prognosis of TGCT patients with both disease‐free interval and progression‐free interval as clinical endpoints. Taken together, our study provided a global view of CT genes in TGCT and provided evidence that CT genes played important roles in the progression and maintenance of TGCT.

study, we systematically investigated the expression pattern of CT genes in TGCT samples and evaluated the transcriptome difference between TGCT and normal testis tissues, using datasets from the UCSC Xena platform, The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Pathway enrichment analysis and survival analysis were conducted to evaluate the biological function and prognostic effect of expressed CT genes. We identified that 1036 testis-specific expressed protein-coding genes and 863 testis-specific expressed long noncoding RNAs (lncRNAs) were expressed in TGCT samples, including 883 CT protein-coding genes and 710 CT lncRNAs defined previously. The number of expressed CT genes was significantly higher in seminomas (P = 3.48 × 10 −13 ) which were characterized by frequent mutations in driver genes (KIT, KRAS and NRAS). In contrast, the number of expressed CT genes showed a moderate negative correlation with the fraction of copy number altered genomes (cor = −0.28, P = 1.20 × 10 −3 ). Unlike other cancers, our analysis revealed that 96.16% of the CT genes were down-regulated in TGCT samples, while CT genes in stem cell maintenance related pathways were

| INTRODUCTION
Cancer-testis (CT) genes are predominantly expressed in normal testis tissues and a wide type of cancers. 1 They were first named as Cancer-testis Antigens (CTAs) since early defined CT genes can encode immunogenic antigens which evoke immune responses in cancer patients. 2,3 Because of the immunogenicity and restricted expression pattern in normal tissues, CTAs are commonly considered as ideal immunotherapy targets for cancer therapy. [3][4][5] Recently, CT genes have also been proved that contribute to various neoplastic phenotype features, 3 and can be driver candidates participating in the process of tumorigenesis. 6 In our recent work, we performed a systematic identification of CT genes by integrating gene expression data of 24 normal tissues and tumor samples from 19 cancer types and defined 876 new CT genes. 6 However, all previous studies investigated CT genes in nongerm cell tumors, leaving a blank for testicular germ cell tumor (TGCT).
Testicular cancer is the most commonly diagnosed malignancy of adult males among Americans, 7 while TGCT accounts for more than 95% of testicular cancers. 8 The main types of TGCT are seminomas and non-seminomas, and non-seminomas consist of either undifferentiated (embryonal carcinoma) or differentiated (teratoma, yolk sac tumor, and choriocarcinoma) histologic subtypes. 9 In a recent study, Shen et al identified that TGCTs were characterized by a lack of genomic alterations, deficiency of methylation level and high aneuploidy, 10 which highlighted the importance of transcriptome alteration in the development of TGCT. Considering the special expression pattern, the transcriptome alteration of CT genes may participate in the development of TGCT. Interestingly, some CT genes have previously been reported to be expressed in TGCT samples and have a potential function in tumorigenesis, such as CTAG1B 11 and the MAGE families MAGEA1, MAGEA2, MAGEA3, and MAGEA4. [12][13][14][15] Although attracting more attention, CT genes in the process of TGCT development have not been well characterized.
In this study, to make a comprehensive description of CT genes in TGCT, we first described the expression pattern of CT protein-coding genes and CT long noncoding RNAs (CT-lncRNAs) in TGCT samples. Then we conducted differential expression analysis between normal and malignant testis samples to provide a global view of the expression alterations of CT genes between these samples. Additionally, we performed pathway enrichment analysis to investigate the function of these altered CT genes, as well as survival analysis to evaluate the prognostic effect of CT genes. Overall, our study provided evidence that CT genes may play important roles in the progression and maintenance of TGCT.

| Quantification expression data of testicular germ cell cancer samples used in this study
To comprehensively explore the expression pattern of our previous defined testis-specific expressed genes (TSGs) 6 in TGCT, we retrieved quantification expression data of 150 TGCT samples from the University of California Santa Cruz (UCSC) Xena website (https ://xenab rowser.net/datap ages/). 16 The expression data were quantified as raw read counts. In our previous study, 6 we analyzed the transcriptome data of normal tissues (24 different organs from 175 individuals) from the Genotype-Tissue Expression (GTEx) project (https ://www.gtexp ortal.org/home/) and used specificity measure to classify all human genes (50 016) into six categories (C1-C6) based on their expression pattern, including high confidence testis-specific expressed protein coding genes (C1) and noncoding genes (C2), moderate confidence testis-specific expressed protein-coding genes (C3) and noncoding genes (C4), low-confidence testis-specific expressed genes (C5), genes with transcripts exhibiting a testis-specific expression pattern (C6a), and genes without transcripts exhibiting a testis-specific expression pattern (C6b). In this study, only 1061 protein-coding genes and 1184 long noncoding RNAs (lncRNAs) exhibited testis-specific expression pattern with high confidence (C1 and C2 groups) as well as a median expression greater than 0.5 fragments per kilobase of exon model per million reads mapped (FPKM) in normal up-regulated. Further survival analysis provided evidence that CT genes could also predict the prognosis of TGCT patients with both disease-free interval and progression-free interval as clinical endpoints. Taken together, our study provided a global view of CT genes in TGCT and provided evidence that CT genes played important roles in the progression and maintenance of TGCT.

K E Y W O R D S
cancer-testis gene, stem cell maintenance, survival, testicular germ cell tumor | 3513 testis samples were included. 6 According to GENCODE v19 annotation data, lncRNAs were reclassified into 6 biotypes, 17 including three prime overlapping ncRNA (lncRNAs located within the 3' UTR of protein-coding genes), antisense (lncR-NAs overlapping any protein-coding genes on the opposite strand), lincRNA (long intergenic noncoding RNA with a length greater than 200 bp), sense intronic (lncRNAs located within the intron of any protein-coding gene), sense overlapping (lncRNAs within any protein-coding gene within its intron on the coding strand), and processed transcript (transcripts without an open reading frame). Testis-specific expressed genes exhibited expression (normalized read counts >5) in at least 1% TGCT samples were defined as expressed.

| Differential gene expression analysis
To further explore the expression difference of TSGs in malignant (TGCT) and normal testis samples, we performed differential expression analysis by using data from The Cancer Genome Atlas (TCGA) project (https ://xenab rowser. net/datap ages/) and the GTEx portal. Expression data of 174 normal testes were obtained from GTEx Analysis V7 (dbGap Accession phs000424.v7.p2, https ://www.gtexp ortal.org/home/datasets) and quantified as raw read counts. Differential expression analysis was performed with R package DESeq2. The batch effect was removed by sva package and raw read counts data was normalized by DESeq2. The magnitude (log2 transformed fold change) and significance (P-value) of differential expression between groups were calculated, and genes with |log2foldchange|≥1 and Benjamini-Hochberg false discovery rate (FDR) adjusted P < 0.05 were considered as significant.

| Pathway enrichment analysis
To investigate the potential function of altered TSGs in TGCT samples, we performed pathway enrichment analysis based on GO Biological Process Ontology gene sets with the R Bioconductor package clusterProfiler. As most lncRNAs were not included in the pathway annotation data, only protein-coding genes were included in this analysis. TSGs with significantly altered expression in TGCT samples from that in normal testis tissues were included in this analysis (|log-2foldchange|≥1 and P fdr < 0.05). Pathway with enrichment P < 0.01 was considered as significant.

| Survival analysis
The clinical and follow-up information of 134 TGCT samples were obtained from a previous TCGA study 18 (Table S1). As for the few disease-specific survival and overall deaths of TGCT patients, we used disease-free interval (DFI) and progression-free interval (PFI) as clinical outcome endpoints. 18 We performed a multivariate Cox proportional hazards regression model to calculate the Hazard Ratio (HR), the 95% confidence interval (95% CI), and P-values with adjustments for age and tumor stage. The Kaplan-Meier (K-M) method was used to create the survival plots and the log-rank test was used to compare the difference in survival curves.

| Statistical analysis
Wilcoxon rank-sum test was performed to compare the number of expressed CT genes in different groups. Spearman rank-sum test was used to evaluate the association between the expression level of CT genes and the stemness scores, the fraction of altered genomes, and immune cells level in 133 samples. The stemness scores, the fraction of altered genomes and immune cells information were obtained from a previous study. 19 General statistical analyses were performed with R software (R version 3.2.2).
In an attempt to explore the relationship between CT genes and clinical characteristics in TGCT samples, we performed subgroup analysis and identified that the number of expressed CT genes was significantly higher in the seminoma histologic subtype than other non-seminoma subtypes (Median seminomas = 867.5, Median non-seminomas = 764; P = 3.48 × 10 −13 ) ( Figure 1B). Further differential expression analysis between 68 seminomas and 65 nonseminomas revealed that 366 TSGs showed elevated expression in the seminomas and 165 were decreased. Interestingly, although spermatogenesis-related genes shared a similar expression level in both two groups of patients, meiotic cell cycle-related genes were specifically elevated in the seminoma type (Table S3-S4). Additionally, we identified that CT genes were more prone to be expressed in samples with mutations in the only 3 significantly mutated genes identified previously, 10 Figure 1C). However, the number of expressed CT genes showed a negative moderate correlation with the fraction of copy number altered genomes (cor = −0.28, P = 1.20 × 10 −3 ) ( Figure 1D).

F I G U R E 2 A, Differential expression analysis of testis-specific expressed genes. Volcano plot displayed differentially expressed TSGs in
TGCT and normal testis samples. B, Differential expression analysis of genes without testis-specific expression pattern. Volcano plot displayed differential expressed genes without testis-specific expression pattern in TGCT and normal testis samples. C, Differential expression analysis of testis-specific expressed genes. Differential expression pattern of DPPA2 and IGF2BP3 in TGCT and normal testis samples

| A comprehensive comparison of cancer-testis genes in malignant and normal testis samples
As TSGs were highly expressed in both malignant (TGCT) and normal testis samples, we then evaluated the differential expression of TSGs in 174 normal testis samples and 150 TGCT samples. Interestingly, we identified that 96.16% (1826/1899) of the TSGs genes were differentially expressed between malignant and normal testis samples (Figure 2A, Table S5) while 53.48% (6637/12411) of the genes without testis-specific expression pattern were differentially expressed ( Figure 2B). Additionally, as the majority of the differential expressed TSGs (98.69%, 1802/1826) were down-regulated in TGCT patients (Figure 2A), only 54.26% (3601/6637) of the differential expressed genes without testis-specific expression pattern were down-regulated in TGCT ( Figure 2B Table S5).

| Stem cell maintenance genes are upregulated in testicular germ cell cancer
To evaluate the potential function of differential expressed testis-specific protein-coding genes, we performed pathway enrichment analysis with GO Biological Process Ontology dataset. The result revealed that the down-regulated TSGs in TGCT were significantly enriched in 2 pathways, including spermatogenesis (P = 1.90 × 10 −4 ) and male gamete generation (P = 1.90 × 10 −4 ) pathways ( Figure 3A, Table S6); while the up-regulated testis-specific protein-coding genes in TGCT samples were involved in translational regulation and stem cell maintenance related pathways, such as negative regulation of translation (P = 9.04 × 10 −8 ), regulation of translation (P = 1.07 × 10 −5 ), stem cell population maintenance (P = 9.11 × 10 −6 ), somatic stem cell population maintenance (P = 8.88 × 10 −4 ), and stem cell proliferation F I G U R E 3 Pathway enrichment analysis of differentially expressed testis-specific expressed genes. A, Dotplot depicts the enriched pathways of differentially expressed testis-specific expressed protein-coding genes in TGCT samples compared with normal testis samples. B, Heatmap depicts the correlation of testis-specific expressed protein-coding genes in stem cell maintenance related pathways with the stemness signatures (P = 8.88 × 10 −4 ) ( Figure 3A, Table S7). Interestingly, although the number of expressed CT genes was significantly higher in the seminoma group with KIT, KRAS and NRAS mutations, expression of 4 genes in stem cell maintenance related pathways were significantly elevated in the nonseminoma group and samples that without driver gene mutations ( Figure S2). As 6 of 9 stem cell maintenance related pathways included in GO Biological Process Ontology dataset were over-represented in testis-specific expressed protein-coding genes, we further evaluated the association of 22 up-regulated testisspecific expressed protein-coding genes with the stemness scores. Interestingly, 16 of the 22 up-regulated testis-specific expressed protein-coding genes showed strong positive associations with both epigenetic-related and expression-related stemness signatures (Cor ≥ 0.2, P adj < 0.05) ( Figure 3B, Table S8).

| DISCUSSION
CT genes are a class of genes predominantly expressed in the germ cells of testis and are ectopically reactivated in various types of tumor cells. Accumulating evidence suggested that this group of genes can also serve as driver candidates in the development of cancer. 3 However, previous studies focused mainly on nongerm cell tumors although some classic CT genes have also been found in testicular germ cell tumors. 11,12 Interestingly, TGCT is a kind of cancer characterized by genomic de-methylation and a low genomic alteration rate, 10 which indicates the important role of transcriptome alteration during tumorigenesis. Therefore, systematically investigation of CT genes in TGCT could unravel the role of this group of genes in the development of TGCT. In this study, we performed a comprehensive description of CT genes in TGCT and provided the first evidence that CT genes play important roles in the progression and maintenance of TGCT.
Of the two most common types of TGCT, although seminomas are less aggressive, 8 we observed a significant enrichment of expressed CT genes in this type of cancer. Interestingly, consistent with our previous hypothesis that de-methylation was an important mechanism for the reactivation of CT genes in cancers, 6 seminomas are also a group of cancers that lack of DNA methylation. 10 Thus, the globally de-methylated genomes of seminoma cells could serve as the epigenetic mechanism for CT genes expression in TGCT. 6 However, in contrast to non-germ cell solid cancer, 6 CT genes in TGCT are prone to be expressed in samples with driver genes (KIT, KRAS and NRAS) mutations, which may be attributed to the altered aneuploidy and genomic instability in these samples. Additionally, we also observed a positive association between the number of expressed CT genes with different types of immune cells. Recently, there is accumulating evidence confirming the association between immune biomarkers and the response to anti-PD-1/L1 therapy across several solid tumors. 21,22 In this study, we found that PD-1/PD-L1 expression was positively correlated with the number of expressed CT genes, suggesting that CT genes represent a potential marker for effective PD-1/L1 immunotherapy.
During the tumorigenesis of the testis, we identified that the expression level of most CT genes in spermatogenesis pathway was de-regulated, while the expression of genes in the maintenance of stem cells was up-regulated. As TGCT is a cancer type derived from immature germ cells that retain expression of pluripotency genes, 9 further correlation analysis provided additional evidence that CT genes up-regulated in TGCT samples showed strong associations with the stemness scores. Stemness is defined as the ability of self-renewal and differentiation for normal cells, as well as the shared characteristic with cancer cells. 23 Recently, increasing evidence suggested that the development of cancer cells involves the acquisition of stem-cell-like phenotypes and loss of differential features, which may lead to the progression and distant metastasis of cancer cells. [23][24][25] Interestingly, we identified that LIN28A and IGF2BP1, the essential members in stem cell maintenance, were also associated with advanced tumor stages in addition to the oncogenic effect in tumorigenesis. 26,27 Thus, we proposed that the dysregulation of stemness-related CT genes in TGCT may not only lead to the initiation of tumorigenesis and oncogenic de-differentiation, but also contribute to the progression of testicular germ cells.
Consistent with our hypothesis, further survival analysis revealed that 22 genes were associated with the prognosis of TGCT, including 2 previously known CT genes. TSGA10, with the highest expression level in elongating spermatids and localized in the fibrous sheath of mature sperm, has previously been reported to be a biomarker in brain tumors 28,29 and regulated the development of esophageal squamous cell carcinoma. 30 AKAP4, as a kinase anchor protein, has been reported to contribute to the development of cervical cancer and ovarian serous carcinoma. 31,32 In this current study, we first provided evidence for the prognostic effect of TSGA10 and AKAP4 in the progression of TGCT, which may serve as potential treatment targets. Additionally, we identified that although some CT genes have previously been reported to be oncogenic, they could also predict a better survival with either DFI or PFI as the endpoint. LIN28A is a highly conserved RNA binding protein and has emerged as an oncogenic driver by blockading of let-7 micro-RNA biogenesis in many cancers [33][34][35] ; however, in this study, we identified that LIN28A was associated with a better prognosis of TGCT, which is different from the other cancers types. [36][37][38] Nevertheless, there also exists some limitations in this study. As for the small sample size used in this study as well as no independent validations, further studies are warranted to validate the findings.
In summary, we identified a total of 1899 testis-specific expressed genes (1036 testis-specific protein-coding genes and 863 testis-specific long noncoding genes), including 1593 CT genes (883 CT genes and 710 CT-lncRNAs) in TCGA TGCT samples. By integrating expression from both normal and malignant testis samples, we proposed that CT genes up-regulated in TGCT were primarily involved in stem cell maintenance and could serve as prognostic factors. This founding greatly broadens our understanding of CT genes in testicular germ-cell tumors.