The results published here are in whole based upon data generated by The Cancer Genome Atlas pilot project established by the National Cancer Institute and the National Human Genome Research Institute. Information about The Cancer Genome Atlas (TCGA) and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov.
Glioblastoma multiforme (GBM) is the most prevalent and deadly brain tumor. A variety of germline and somatic, genetic and epigenetic alterations at 9p21.3, which encode CDKN2A/CDKN2B tumor suppressor genes, have been isolatedly reported to be associated with GBM risk and prognosis.
To obtain a comprehensive view of these events, we leveraged the wide-spectrum GBM data available from The Cancer Genome Atlas project and performed an integrated analysis by systematically evaluating 9p21.3-related germline single-nucleotide polymorphisms, somatic copy number alterations (CNAs), DNA methylation, and microRNAs (miRNAs) with regard to CDKN2A/CDKN2B expression and patient prognosis in GBM.
Our multivariate analysis indicated that expression of CDKN2A and CDKN2B was both strongly affected by CNAs (P = 1.00 × 10−4 and 2.37 × 10−14). The miRNAs hsa-mir-126, hsa-mir-517a, and hsa-mir-125b exhibited significant negative correlations with CDKN2A expression (P = 0.003, 0.041, and 0.050). Survival analysis showed that complete 9p21.3 loss and low CDKN2B expression were associated with worse prognosis for both tumor progression/recurrence-free survival (P = .041 and .019) and patient overall survival (P = .043 and .021) after adjustment for age and treatment, and that higher methylation at cg17449661 predicted poorer overall survival (P = .048).
Glioblastoma multiforme (GBM) is the most prevalent primary brain tumor and is among the deadliest of all human cancers.1 Several genetic and epigenetic factors have been suggested to play critical roles in the etiology and prognosis of GBM, but they remain poorly defined and insufficiently characterized.
CDKN2A and CDKN2B are 2 tumor suppressor genes encoded at 9p21.3 (Figure 1) whose 2 gene products, p16INK4A and p15INK4B, are both able to inhibit the binding of CDK4 and CDK6 to cyclin D, preventing the cell cycle progression at G1 phase.2 Another alternate reading frame product by CDKN2A, p14ARF, can cause stabilization of p53, which in turn leads to cell cycle arrest at G1 or G2 phase.3 Thus, by negatively controlling cell cycle progression, the 9p21.3 genes function as a critical defense against tumorigenesis of a great variety of human cancers,4, 5 including GBM.6
The 9p21.3 locus has been closely linked with GBM susceptibility and prognosis. Germline mutations of CDKN2A/CDKN2B cause the melanoma-astrocytoma syndrome, which is a well-established risk factor for GBM.7 Recently, 4 germline common sequence variants (rs1063192, rs2157719, rs1412829, and rs4977756) within a 122-kb interval of linkage disequilibrium at 9p21.3 are shown to be associated with increased glioma susceptibility8, 9 (Figure 1). Somatic loss of CDKN2A/CDKN2B expression or function due to genomic/epigenetic alterations such as deletions and promoter methylation are frequently found in GBM tumors.10, 11 Poor prognosis of GBM patients has been reported to be linked with loss of CDKN2A expression12 or CDKN2B methylation.13
It is noticeable that despite these isolated reports of associations of various 9p21.3-related germline and somatic, genetic, and epigenetic events with GBM, their functional connections and relative contributions to the disease phenotypes are yet to be characterized in a holistic manner. To obtain such a comprehensive view, we performed an integrated analysis by leveraging the full-spectrum GBM data available from The Cancer Genome Atlas (TCGA) Research Network. We systematically evaluated multiple 9p21.3-related factors with regard to the expression of CDKN2A and CDKN2B in GBM tumors, which include germline single-nucleotide polymorphisms (SNPs), somatic copy number alterations (CNAs), DNA methylation, and microRNAs (miRNAs). We also correlated these factors with the clinical data of the GBM patients.
MATERIALS AND METHODS
TCGA Data Analysis
The SNP, copy number, DNA methylation, miRNA and messenger RNA (mRNA) expression, and clinical data for 321 GBM subjects and their brain tissue samples were downloaded from the TCGA data portal (http://cancergenome.nih.gov/dataportal) in February 2010. Data use certification was obtained for the controlled access data. Details on the data processing and platforms can be found below and in the reference describing the GBM data analysis.10
Datasets downloading and quality control
The SNP, copy number, DNA methylation, miRNA and mRNA expression, and clinical data for 340 GBM patients were downloaded from the TCGA project data portal (http://cancergenome.nih.gov/dataportal). We performed quality control by excluding samples in which the percentage of tumor cells <50%, or the combined percentage of normal/stromal cells >50%, or the maximal necrosis percentage >50%. To prevent duplicates, we only selected 1 portion if >1 portion was found for a sample. In total, we found 321 samples that met the quality control criteria and were used for this study.
For the SNP-related analysis, we selected genotype data for the 4 GBM-associated SNPs (rs1063192, rs2157719, rs1412829, and rs4977756). Because these 4 SNPs are in a relatively strong linkage disequilibrium block, we chose rs1412829 for analysis.
Copy number data
Copy number data with regard to the individual aforementioned GBM-associated SNP was chosen to calculate the copy number status based on the raw CEL file downloaded.
DNA methylation data
The beta values of a total of 24 methylation probes (12 for CDKN2A and 12 for CDKN2B, coming from 3 platforms: GoldenGate OMA-002, GoldenGate OMA-003, and Infinium) were selected from the TCGA dataset. Probes with small sample sizes, predominantly low methylation measurements (90th percentile <0.20; they are unlikely to be methylated in GBM tumor samples), or unexpected positive correlations with CDKN2A/CDKN2B expression were excluded for methylation analysis because they may not be sufficiently informative or reliable. There were 2 probes (cg17449661 for CDKN2A and cg10210238 for CDKN2B) that met the criteria and were used for our analysis.
The expression data of 45 miRNAs (43 against CDKN2A and 2 against CDKN2B) that potentially target CDKN2A or CDKN2B were selected from the TCGA dataset. Among those, hsa-mir-24 and hsa-mir-125b were shown to target CDKN2A14, 15; others are predicted to target either CDKN2A or CDKN2B by the miRBase database (http://www.miRBase.org). For data cleaning, any values less than Q1 − 3 × IQR or greater than Q3 + 3 × IQR were eliminated as outliers, where Q1 is the first quartile, Q3 is the third quartile, and IQR is the interquartile range, calculated as IQR = Q3 − Q1. For simplicity, miRNAs that showed unexpected positive correlations or that showed insignificant negative correlations (P > .05) with CDKN2A/CDKN2B expression in the univariate analysis were excluded. There were 3 miRNAs (hsa-mir-125b, hsa-mir-126, and hsa-mir-517a, all targeting CDKN2A) that met this criterion and were used in analysis.
Two expression platforms (Agilent 244K and Affymetrix HuEx, respectively) were used to measure the expression of CDKN2A and CDKN2B in the original GBM data from the TCGA. The Agilent platform measures expression of target genes through probes targeting against the 3′ end of the target transcripts. The Exon platform used multiple probes against individual exons of target transcripts. Due to the relatively smaller sample sizes associated with the Exon platform measurements, only the expression data by Agilent platform was used for our analysis.
The multivariate analyses over the multiple factors with regard to the transcript expression of CDKN2A/CDKN2B were performed using general linear models where the interactions among independent variables were not considered because of the degree of freedom issue. The contribution of each factor to the CDKN2A/CDKN2B expression was calculated from the regression result. Cox regression analyses were performed to identify significant predictors for progression/recurrence-free survival and patient overall survival because the predictors have both continuous and binary variables. Effects in these models were quantified by hazard ratio estimates with 95% confidence intervals. For Kaplan-Meier survival plots, continuous variables were dichotomized at the median values. The difference between 2 survival plots was evaluated by a log-rank test. P ≤ .05 was considered statistically significant. All analyses were performed using SAS statistical software.
Characteristics of Clinical and Genomic/Molecular Parameters
The TCGA GBM study population consists of 321 subjects and their brain tissue samples, including 310 primary GBM tumor samples and 11 nontumor brain samples. The characteristics of various clinical and 9p21.3-related genetic, epigenetic, and molecular parameters are summarized in Table 1. A variety of CNAs were found in the GBM tumor samples. Their relative frequencies were similar to those reported in a previous study,11 consistent with the absence of a selection bias in this TCGA-based study population. Among a total of 24 probes targeting the DNA methylation sites at 9p21.3 and 45 putative CDKN2A/CDKN2B-targeting miRNAs contained in the TCGA dataset, 2 methylation probes (cg17449661 and cg10210238, for CDKN2A and CDKN2B, respectively) and 3 miRNAs (hsa-mir-125b, hsa-mir-126, and hsa-mir-517a, all against CDKN2A) passed our selection criteria and were included for analysis (others were excluded). The mRNA levels of CDKN2A and CDKN2B displayed a high degree of positive correlation (Pearson's r = 0.78, P <.0001), suggesting that expression of these 2 genes may be coregulated by shared mechanisms.
Table 1. Basic Characteristics of Study Subjects Within the TCGA GBM Dataset
Analysis of Factors That Potentially Influence CDKN2A/CDKN2B Expression in GBM Tumors
Because CNAs, SNPs, DNA methylation, and miRNAs all potentially influence the mRNA expressions of CDKN2A/CDKN2B, a multivariate analysis was performed to systematically evaluate these germline and somatic, genetic and epigenetic factors with regard to the transcript levels of CDKN2A/CDKN2B (Table 2). Among the 6 factors we examined in the analysis over CDKN2A, CNAs were highly associated with CDKN2A expression levels (P = 1.00 × 10−4) after adjustment for other factors, with decreasing expression found in tumors containing hemizygous (1×) and homozygous (0×) deletions compared with those containing normal copy numbers (2×). Associations with CDKN2A expression were also observed for the 3 miRNAs, among which hsa-mir-126 and hsa-mir-517a reached a statistical significance (coefficient estimate  = −0.88, P = .003, and = −7.50, P = 0.041, respectively), and hsa-mir-125b a borderline significance ( = −0.73, P = .050). No associations were found for other factors. The analysis over CDKN2B identified CNAs as the only factor that reached statistical significance (P = 2.37 × 10−14) in its association with CDKN2B expression after adjustment for the other 2 factors. Because CNAs appeared to play a predominant role in affecting both CDKN2A and CDKN2B expression, it is possible that the CNAs may have masked the effects of other factors in the multivariate models. We stratified samples based on their different copy number status, but failed to identify any significant associations for the remaining factors, possibly due to the small sample sizes associated with the stratified analyses. Thus CDKN2A expression in GBM tumors may be affected by CNAs and the miRNAs hsa-mir-126, hsa-mir-517a, and hsa-mir-125b, whereas CDKN2B expression was influenced primarily by CNAs.
Table 2. Multivariate Analysis of CDKN2A/CDKN2B Expression
Multivariate F test least square means, coefficient estimates (β), and P values are presented where appropriate.
For copy number alteration effect, samples were categorized into 3 classes (0×, 1×, and 2×) based on their copy number status on 9p21.3, where 2× includes both normal and copy-neutral loss-of-heterozygocity samples.
The 3 genotypes were based on rs1412829.
Analysis was performed for all samples or samples stratified based on different copy number status (2×, 1×, and 0×).
Analysis of Factors That Potentially Predict Prognosis in GBM Patients
The prognostic values of the multiple 9p21.3-related factors were assessed with regard to the progression/recurrence-free survival (PFS) and overall survival (OS) of the GBM patients in the study population. Univariate analyses using Cox regression model indicated that younger age at diagnosis (<60 years) and receipt of therapeutic treatment predicted better prognosis for both PFS (P = 1.00 × 10−4 and .004, for age and treatment respectively) and OS (P = 7.86 × 10−8, and 4.88 × 10−14) (Table 3), findings in notable consistency with well-established notions. Significant associations with PFS and/or OS were also observed for 9p21.3-related factors, including presence/absence of 9p21.3, DNA methylation at cg17449661, and CDKN2A and CDKN2B expression (Table 3), several of which were subsequently confirmed in the Kaplan-Meier survival analysis with log-rank significance test (Figure 2). To minimize the potential confounding effects of age and treatment in our evaluations, the Cox multivariate regression analyses were performed to adjust for these 2 confounders (Table 3), which identified 3 9p21.3-related factors with potential prognostic utilities. Compared with the presence of 9p21.3, wherein the tumor samples contained 1 (hemizygous deletion) or 2 (normal) copies of the locus, the complete absence of 9p21.3 (homozygous deletion) was associated with worse prognosis for both PFS (hazard ratio [HR], 1.40; P = .041) and OS (HR, 1.36; P = .043). The higher DNA methylation beta values at cg17449661 were also found to predict poorer OS (HR, 2.23; P = .048). It is noticeable that a high prognostic value was identified for the CDKN2B expression, whose higher levels in GBM tumors were significantly associated with better PFS (HR, 0.87; P = .029) in GBM patients. Furthermore, the median-dichotomized high CDKN2B expression levels were also observed to predict significantly better prognosis for both PFS (HR, 0.73; P = .019) and OS (HR, 0.75; P = .021). We attempted to perform a multivariate survival analysis by integrating all of these factors together, but due to the rather small sample sizes (<30) associated with the integrated models, we failed to detect any meaningful associations with PFS and OS (data not shown).
Table 3. Evaluation of the Prognostic Values of Various Factors
The wide spectrum of data deposited in the TCGA database have provided myriad information with regard to the usually complex genome alterations frequently found in tumor samples at various layers, including both germline and somatic events involving changes at the genetic, genomic, epigenetic, and molecular levels. A holistic view of these multilevel events is essential for our understanding of cancer etiology and progression and is clinically critical for cancer subtype classifications and treatment optimizations. To our knowledge, this study represents one of the first attempts to systematically integrate the various levels of cancer genome alterations, through a demonstrated example of the multidimensional analysis over 9p21.3 in GBM. A variety of germline and somatic, genetic and epigenetic alterations at 9p21.3 are widely implicated in GBM risk and prognosis, but these have been only investigated as isolated events. In this integrated study, we took advantage of the full-scope GBM data available from the TCGA and systematically evaluated the functional connections of these seemingly disparate factors, which include germline SNPs, somatic CNAs, DNA methylation and miRNAs, with regard to the transcript levels of CDKN2A/CDKN2B and patient prognosis in GBM.
Among the multiple factors we investigated, CNAs were highly significantly associated with the expression of both CDKN2A and CDKN2B and accounted for a considerable proportion (24.4% and 45.6%, respectively) of the variations seen for the expression of these 2 genes, suggesting that deletion of the 9p21.3 locus may serve as a predominant mechanism for GBM precursor cells to lose the CDKN2A/CDKN2B tumor suppressors and gain the selection advantage during gliomagenesis. Our study also identified 3 miRNAs (hsa-mir-126, hsa-mir-517a, and hsa-mir-125b) that exhibited significant negative correlations with CDKN2A transcript levels, which together explained 11.3% of the observed variation. It is noteworthy that hsa-mir-125b is a bona fide CDKN2A-targeting miRNA which is frequently overexpressed in GBM tumors16 and can promote glial cell proliferation,15 thus it is possible that the other 2 miRNAs may also target CDKN2A in glial cells and that this miRNA-mediated dysregulation of CDKN2A expression may also contribute, to a nonnegligible extent, to the GBM development and progression. Experiments are needed to confirm these findings, and more in-depth studies are warranted to examine this hypothesis. In our analysis, no significant associations with CDKN2A/CDKN2B expression were identified for other factors, such as germline SNPs and DNA methylation. It is possible that these factors may play trivial roles in the expression of CDKN2A/CDKN2B in GBM tumors; alternatively, the null observations could be caused by small sample sizes, unreliable measurements, or confounding issues arising from other unknown factors.
It is of note that the multivariate models that include all the factors we investigated explained only a small portion (<50%, data not shown) of the variation seen for the CDKN2A/CDKN2B expression in GBM tumors, suggesting that there are other yet-to-be-investigated factors that may also influence the expression of CDKN2A/CDKN2B to a meaningful extent. We noticed that both genes appeared to be overexpressed in GBM tumors compared with nontumor brain tissues, the latter essentially expressing no CDKN2A/CDKN2B because their expression in these normal tissues was in a level nondifferentiable to that seen for tumor samples with homozygous deletions (data not shown). Given this observation, it seems plausible that CDKN2A/CDKN2B may be induced through certain known and/or yet-to-be-identified mechanisms during gliomagenesis. It is noteworthy that CDKN2B has been shown to be induced by transforming growth factor β (TGF-β),17 which is nonetheless transcriptionally repressed by c-Myc.18 Thus, any alterations in the TGF-β/c-Myc signaling pathways, and other yet-to-be-identified regulatory mechanisms as well, could potentially affect CDKN2B expression in the GBM tumors.
This study identified prognostic values for several factors. Complete loss of the 9p21.3 locus and low expression of CDKN2B were significantly associated with worse prognosis for both tumor progression/recurrence and patient overall survival even after adjustment for age and treatment, indicating that the copy number status of 9p21.3 and CDKN2B expression may represent 2 valuable prognostic biomarkers predicting disease outcomes. Previously, there have been similar efforts toward this end,12, 19 but these earlier studies had failed to demonstrate this strong prognostic use. It is also noteworthy that no prognostically significant associations were detected for CDKN2A expression levels in our study. Thus, although the current consensus advocates CDKN2A as the primary tumor suppressor gene, whereas CDKN2B functions mainly as a backup for loss of CDKN2A,20 our findings argue for a more important role of CDKN2B in determining the prognosis of cancer patients, at least in the case of GBM. In addition, higher cg17449661 methylation was found to predict poorer overall survival of GBM patients, suggestive of some prognostic value as well.
There are several limitations to this study. First, the miRNA candidates were selected based on available reports and in silico predictions, which could nonetheless introduce biases. We may have missed other bona fide CDKN2A/CDKN2B-targeting miRNAs that are not in our list. Second, because of the multifaceted and ongoing nature of the TCGA project, the different types of data from different specimens are updated in an as-available manner, which compromises the comprehensiveness of the datasets used in this study at the time of analysis. As a consequence, our analyses suffer from small sample sizes, and it is even more unfavorable when it comes to the multivariate analysis involving multiple parameters. Thus, the insignificant associations observed for some variables with regard to the expression/prognosis phenotypes could be due to small sample sizes. Finally, measurements of some parameters, such as the DNA methylation probes, were not sufficiently reliable in the TCGA dataset. For example, we noticed that samples with homozygous deletions still displayed abnormally high DNA methylation beta values for quite a few methylation probes, though biologically this does not make any sense. Given these aforementioned limitations, it is possible that we may have missed some significant associations while mistakenly obtaining others in our analyses.
This study was supported by an intramural postdoctoral fund from Wake Forest University School of Medicine (to J. F.).