Interrogation of ERG gene rearrangements in prostate cancer identifies a prognostic 10-gene signature with relevant implication to patients' clinical outcome
Tarek A. Bismar,
Department of Pathology and Laboratory Medicine, University of Calgary and Calgary Laboratory Services, Calgary, AB, Canada
Department of Oncology, University of Calgary, Calgary, AB, Canada
Southern Alberta Cancer Institute and Tom Baker Cancer Center, Calgary, AB, Canada
Correspondence: Tarek A. Bismar, University of Calgary, Faculty of Medicine, Departments of Pathology and Laboratory Medicine and Oncology, Rockyview General Hospital, 7007–14th Street SW, Calgary, AB, Canada T2V 1P9.
ERG-gene rearrangement defines a distinct molecular subtype of PCA with potential biological and clinical implications.
To identify a molecular signature reflective of the downstream effects of ERG-mediated transcriptional regulation with prognostic implication in patients with prostate cancer (PCA).
Material and Methods
We used a singular value decomposition (SVD) bioinformatics approach to re-analyse gene expression data previously generated from 46 prostate tumours, and identified an ERG-like gene signature.
The signature was validated on several patient cohorts and individual genes were correlated to ERG expression and PCA progression.
An ERG-like 10-gene signature was identified and validated in PCA cohorts of the physician health study (p115) (n = 110) in addition to three independent public datasets, and was significantly associated with disease progression, biochemical recurrence and PCA-specific mortality.
Patients with the ERG-like signature were significantly associated with disease recurrence on univariate (hazard ratio [HR] 2.6; 95% confidence interval [CI]:1.3–5.2; P = 0.004) and multivariate analysis (HR 2.3; 95% CI:1.1–4.6, P = 0.016) compared with patients without this signature.
Within the group of patients with Gleason score (GS) 6 and 7 PCA, the signature added prognostic value beyond GS and identified patients at higher risk of cancer deaths more accurately than GS alone or in combination with ERG status.
Protein expression of the 10 genes were significantly associated with ERG and disease progression regardless of ERG status.
The characterized ERG-like signature was reflective of aggressive features of ERG-mediated transcription and was prognostically robust.
The combination of this signature with clinicopathological variables should be validated prospectively to explore its clinical utility in stratifying patients with PCA and in identifying those at higher risk of metastatic and lethal disease.
ETS (E26) fusions have been proposed to constitute a unique molecular subtype of prostate cancer (PCA), as evidenced by both their mutual exclusivity with other proposed PCA subtypes and the fundamentally different chromatin biology conferred by ETS-positivity [1-6]. These findings have allowed the conception of a road map associating specific molecular subtypes of PCA with progression pathways, and pairing targeted therapies with particular PCA subtypes, similar to algorithms used in breast cancer and other haematological malignancies . To date, however, the prognostic value of ETS gene rearrangements has remained uncertain and has generated conflicting results. The majority of published reports showing prognostic value have been documented on cohorts reflective of the natural history of PCA, while those not able to confirm any prognostic significance have been investigated in surgically treated cohorts, with few exceptions [7-20]. The vast majority of ETS gene rearrangements involve ERG, a prototypical ETS transcription factor that transcriptionally activates hundreds of genes. While not all of these rearrangments have prognostic value, several studies have characterized novel genes associated with ERG gene rearrangements and their respective signalling pathways [21-29]. Furthermore, other studies have chosen a differential expression approach between two groups of ERG-positive and -negative tumours to identify downstream deregulated genes and genes with biological significance in patient prognosis [30, 31]. Taken together, these studies show that a thorough interrogation and characterization of the molecular mechanisms that result in, or result from, ERG-gene rearrangements will enable us to identify novel PCA biomarkers as well as characterize new pathways potentially involved in disease progression, allowing a better prediction of patient prognosis.
It is also possible that a subset of these target genes could contribute to the ‘ERG-ness’ of a tumour, that is, a measure of the strength by which ERG-mediated regulation is active in the downstream gene pathway, independent of whether the tumour is ERG-positive or -negative; therefore, we hypothesized that ERG-ness, or an ERG-associated gene expression signature, would have more prognostic value and be more clinically relevant than ERG status itself. In the present study, we used a differential expression microarray approach, combined with a bioinformatics interrogation of ERG-negative and ERG-positive tumours using the singular value decomposition (SVD) method , to identify novel genes associated with disease progression and with prognostic implication for cancer recurrence and disease-specific mortality, regardless of ERG status. The SVD-based approaches discover genes that have high entropy on the whole expression data, unlike other methods that test single genes without considering the overall effect on the data. We further validate the prognostic significance of this signature across several cohorts of PCA patients.
Materials and Methods
Patient Cohorts and Samples
Several cohorts were interrogated in this study. To investigate the downstream ERG transcriptional regulation signature, we used a cohort of 52 patients with castration-resistant disease (54 samples) which was interrogated for gene expression profiling and further bioinformatics clustering based on ERG status. To assess the relationship of the 10-gene signature to patients' clinical outcome, we used three publically available cohorts, the Swedish GSE8402, Taylor and Glinsky [33-35]. We also used a case–control subgroup of patients from the Harvard University Physician Health study (n = 115) who were treated by radical prostatectomy and for whom information regarding the development of metastasis and lethal disease was available. Protein expression of the 10-gene signature relative to disease progression and ERG status, was assessed using a progression cohort consisting of 61 patients, with samples representing disease progression (benign prostate tissue, localized PCA and castration-resistant disease), obtained from radical prostatectomy and TURP samples, respectively. Additional information about the cohorts can be found in the Supporting Information (supplementary Methods section). The study was approved by the institutional review board at the University of Calgary, Calgary, AB, Canada and the Harvard University School of Public Health.
Assessment of ERG-Gene Rearrangement
ERG status was confirmed using a fluorescent in situ hybridization (FISH) break-apart probe assay and immunohistochemistry (IHC) on a tissue microarray of the first Cohort (above) treated by different androgen deprivation therapies as previously described . To omit false-positive/-negatives, only cases with consistent FISH and IHC results were included in the SVD approach.
We re-analysed gene expression data, which was previously published and generated using a complementary DNA-mediated annealing selection and ligation assay (DASL; Illumina, San Diego, CA, USA) to regroup tumour samples into ERG-positive and -negative cases .
We used the Ventana autostainer system (NEXES IHC model; Ventana Medical Systems Inc., Tuczon, AZ, USA) to assess the protein expression by IHC, using the antibodies and conditions listed in Table S1, and according to manufacturer's protocol outlined in full in the supplementary Methods section.
The samples used in gene expression profiling were previously validated by at least two pathologists. Protein intensity expression was assessed semi-quantitatively by two pathologists (L.H.T. and T.A.B.) using a four-tiered system (0, negative; 1, weak; 2, moderate; and 3, strong) without prior knowledge of clinical information.
One-way anova tests were performed to determine if the scores for each marker were significantly different between benign tumours, localized PCA and castration-resistant PCA. Data for each of the 10 markers from the four-tiered IHC analyses were used for these tests.
Phi-correlations were used to test each marker's association with ERG status in PCA samples. Marker association with ERG expression was categorized into two groups, based on ERG intensity (negative and weak) vs (moderate and high) expression. Analyses were performed using spss v.16 (IBM Corp., Somers, NY, USA). A two-tailed P-value ≤0.05 was considered to indicate statistical significance.
Multivariate Cox regression analysis of 150 patients from Taylor data was initially performed, including two presurgical clinical and pathological variables (prebiopsy serum PSA and biopsy Gleason score (GS) in addition to the 10-gene signature to predict biochemical recurrence.
Gene Selection and Computational Analysis
We developed a new technique based on SVD to identify genes with high entropy between the two cancer classes (ERG-gene rearrangement vs ERG no rearrangement). SVD is a linear transformation of the expression data from n genes by m samples represented by a matrix Am_n to the reduced diagonal L-eigengenes by L-eigenarrays matrix, where L = min (n; m). and si; i = 1; : : :;L are the singular values. We calculated the normalized relative significance pk of the k-th eigengene for Am_n as follows:
and the Shannon entropy of the data represented by Am_n is calculated as:
Varshavsky et. al . defined the contribution of the i-th gene, denoted CEi, by considering a leaving out comparison as:
Where is the matrix Am_n with the i-th row deleted. The SVD-based approaches discover genes that show change in the expression level across samples. Genes that show a large change across samples compared with genes that show a small change are more likely to have a large SVD value (CEi). Genes with a high SVD value pose large entropy compared with the expression of the other genes. To assess the entropy of gene i, we find the entropy of the whole gene expression E (Am_n), including gene i, and then we find the entropy of the gene expression without gene i. Genes are sorted based on CEivalue and the top genes are selected as significant. We preprocessed the data by averaging gene expression across the normal and cancer samples. Genes with high entropy across the two average samples are anticipated to have a global differential expression pattern. The advantage of this method compared with the traditional t-test and Significant Analysis of Microarrays is that it considers the effect of each gene on all other genes. Genes that have high entropy on the other genes are ranked high and identified as significant biomarkers.
Computational Analysis and Characterization of Significant Differentially Expressed Genes
We used the SVD-based gene selection method  to discover highly differentially expressed genes ranked based on their contribution to the overall entropy of the expression data. Genes that were highly ranked were identified as potential biomarkers discriminating between ERG-rearranged (positive) and ERG-non-rearranged (negative) samples. Only samples with known and consistent ERG status (based on IHC and FISH) were used in this study. A total of 46 samples, 27 ERG-negative (ERG0) and 19 ERG-positive (ERG1) were included in the final analysis.
Based on SVD analysis, a list of 10 candidate genes was identified showing significant expression changes between ERG0 and ERG1 samples (Fig. 1A; Table S2). The 10-gene list correlated to ERG status with a 76% accuracy rate using a linear support vector machine. The 10-gene ERG-like signature relationship to ERG status was validated on three independent public cohorts (the Swedish GSE8402, Taylor and Glinsky cohorts) with a 65–83% accuracy rate (data not shown) [34-36].
Association of Protein Levels of Identified Genes with ERG Protein Expression
To validate the association of the genes identified by computational methods with ERG-gene rearrangement, we analysed the protein expression of those genes in comparison with ERG protein expression using a progression tissue microarray of 86 samples (320 cores) belonging to 61 patients. In each subset of samples (i.e. ERG0 or ERG1), the intensity level for each marker was categorized as absent/weak or moderate/strong. Seven of 10 markers (CHD5, Ankyrin, MEIS2, FRP-3, LEF1, PLA2G7 and WNT2) were differentially expressed relative to ERG protein at a significant level, while ING3 and ANXA4 showed similar, although nonsignificant, trends. In these samples, Syntenin expression was not significantly associated with ERG expression; however, this was confirmed using a different cohort with a larger number of samples (data not shown). The mean protein expression of each of the 10 markers in PCA progression and its relation to ERG is shown in Fig. 1B and Table 1. Antibody specificities were confirmed by Western blots on protein isolated from frozen tissues from patients (data not shown).
Table 1. Mean protein expression of the 10 markers in relation to PCA progression and their signifcance to ERG
Progression P value
Benign Mean ± sd
Localized PCA Mean ± sd
CRPC Mean ± sd
ERG association P value
CRPC, castration-resistant prostate cancer.
1.81 ± 0.430
1.98 ± 0.311
1.27 ± 0.522
0.34 ± 0.509
1.52 ± 0.517
1.61 ± 0.502
1.55 ± 0.724
2.09 ± 0.684
1.94 ± 0.725
0.59 ± 0.523
2.04 ± 0.528
2.04 ± 0.713
2.51 ± 0.901
1.23 ± 0.661
1.50 ± 0.816
1.07 ± 0.596
1.68 ± 0.592
2.53 ± 0.539
1.26 ± 0.822
1.90 ± 0.640
1.00 ± 0.833
1.04 ± 0.630
1.79 ± 0.629
1.63 ± 0.809
0.58 ± 0.597
1.73 ± 0.610
1.96 ± 0.690
1.95 ± 0.590
1.60 ± 0.660
0.98 ± 0.887
1.59 ± 0.496
2.19 ± 0.549
2.17 ± 0.509
Association of Protein Expression Levels of Identified Genes with PCA Progression
To test our hypothesis that the ERG signature is reflective of an ERG-mediated transcriptional regulation regardless of ERG status, we analysed the expression levels of the 10 markers in benign, localized and castration-resistant prostate cancer tissue using the progression tissue microarray described above. The mean intensity level for each marker was then plotted relative to disease progression (Fig. 2A). Protein expression of the 10 markers was differentially expressed between several stages of PCA progression, with ANAX4 being significantly downregulated with disease progression (P < 0.001; Table 1).
ERG-Gene Signature in Relation to Prognosis and Overall Survival of Patients with PCA
As PCA is known to be heterogeneous, we hypothesized that ERG status alone might not be very predictive of patient prognosis, consistent with earlier reports, and that integrating the status of other genes reflective of the downstream effects of ERG-mediated transcription in each tumour sample might improve overall patient prognosis. To validate our signature in other cohorts, the 10-gene signature was represented as a vector of length 10 (reference vector), where the gene has a value of 1 if it is overexpressed and 0 when it is downregulated. To stratify samples based on the signature, the expression of each of the genes across all samples was categorized into high and low expression, and then a vector of length 10 for each sample was constructed (1 for high expression and 0 for low expression). The final step was to find the correlation between the reference vector and the samples' vectors; samples with high correlation (r > 0.5) were considered as ERG1-like and the remainder were considered ERG0-like.
We first investigated the 10-gene signature using the PHS (physician Health study). Samples were grouped into ERG-1-like and ERG-0-like samples and we assessed their association with cancer lethality. The 10-gene signature was associated with lethal disease with an odds ratio (OR) of 4.33 (95% CI: 1.82–10.3; P < 0.001) compared with ERG status alone: OR = 1.49 (95% CI: 0.68–3.26, P = 0.32). When investigating net reclassification improvement, the 10-gene signature showed significant increases in sensitivity (P = 0.02) and specificity (P = 0.008) for predicting lethal disease compared with GS alone.
We then applied the 10-gene signature to various publicly available patient datasets (Swedish, Taylor and Glinsky cohorts) [34-36] to investigate the strength of our gene signature in predicting patients' prognosis compared with ERG expression alone. In the Swedish cohort, ERG status was determined by FISH; whereas, in both the Taylor and the Glinsky cohorts it was predicted based on ERG-gene expression (samples with ERG expression above the third quartile range were considered ERG-positive). Based on ERG status, we grouped samples into ERG1 (fusion-positive) and ERG0 (fusion-negative). We also used our 10-gene model to group patients into ERG1-like and ERG0-like signatures to investigate the significance of the signature in stratifying patients into different prognostic groups.
In the Swedish cohort, patients with ERG1 status (n = 46) were at higher risk of lethal disease than those with ERG0 (n = 226; Fig. 3A; hazard ratio [HR]: 1.25, 95% CI: 0.99–1.58, P = 0.005). Application of our 10-gene signature separated patients into ERG1-like (n = 22; seven were ERG1) and ERG0-like patients (n = 259), which showed a stronger association with patient prognosis than did ERG status alone (Fig. 3B; HR: 2.38, 95%CI: 1.45–3.8, P < 0.001). Univariate analysis using a Cox proportional hazard model confirmed that the 10-gene model was more significantly associated with overall survival than was ERG status alone (P = 0.003 vs. P = 0.053, respectively).
In the Glinsky cohort, patients with high ERG (n = 32) were not well separated from patients with low ERG (n = 47; Fig. 3C; HR: 0.76, 95% CI: 0.38–1.47, P = 0.15), so we further grouped samples into high risk (n = 21) and low risk (n = 58) as described by Varambally et al.  using ERG status, but the poor separation remained (Fig. 3D; HR: 1.4, 95% CI: 0.67–2.67, P = 0.47). We then used our 10-gene signature which was able to separate patients into high-risk (n = 18, 11 were ERG1) and low-risk (n = 61) groups based on post-surgical biochemical recurrence (Fig. 3E; HR: 1.8, 95% CI: 1.2–2.8, P = 0.15). Using univariate analysis, the 10-gene model was more associated with prognosis, albeit at borderline significance (P = 0.1) compared with ERG status alone (P = 0.4). Similarly, in the Taylor cohort, ERG expression was not effective in separating patients into clinically distinct groups (Fig. 3F; HR: 0.7, 95%CI: 0.5–1.5, P = 0.33), but the 10-gene model successfully separated patients into high-risk ERG1-like (n = 23; nine were ERG1) and low-risk ERG0-like (n = 117) groups (Fig. 3G; HR: 3.2, 95%CI: 2.2–5.8, P = 0.0026).
The 10-gene signature was also able to classify samples with very aggressive forms of PCA (high GS and highly metastatic; i.e. cluster 5 in Taylor data) from the other clusters, slightly more accurately than ERG status alone (80%; HR: 1.07, 95% CI: 0.4–1.6 vs 77%; HR: 0.87, 95% CI: 0.2–1.4).
We also used Cox regression multivariate analysis to assess the significance of the ERG-like signature in stratifying patients before implementing treatments, so we included presurgical variables: needle biopsy GS and pre-biopsy serum PSA from the Taylor cohort (n = 150). Patients with ERG1-like signature had an overall HR of 2.6 (95% CI: 1.3–5.2; P = 0.004) and HR 2.3 (95% CI:1.1–4.6; P = 0.016) for predicting PCA relapse on univariate and multivariate analysis, respectively (Table 2).
Table 2. Univariate and multivariate analysis for cancer recurrence in PCA: taylor data
HR (95% CI)
HR (95% CI)
Next we assessed the ERG-like signature in terms of its ability to stratify subgroups of patients with GS 6 and 7 PCA, given that these Gleason patterns represent the majority of patients encountered in clinical practice, where current clinical and pathological variables are not able to differentiate aggressive from indolent disease. Focusing on patients with GS 6 and 7 from the Swedish cohort (n = 200), our 10-gene signature was still able to separate patients into high-risk (n = 13) and low-risk (n = 187, 13 of which were ERG positive) groups (Fig. 3H; HR: 3.5, 95% CI: 1.8–6.6, P < 0.001), compared with ERG status alone, which separated patients into high-risk (n = 26) and low-risk (n = 174) groups and showed a lower HR for lethal disease (HR: 2.4, 95% CI: 1.5–3.9, P < 0.001). Using a mixed clinical-molecular signature (i.e. GS and ERG-like signature) we were able to identify higher-risk patients more accurately than by using GS or ERG status alone (Fig. 3H; Table 3).
Table 3. Hazard ratio for cancer-specific mortality in the Swedish Cohort using GS alone, GS and ERG-gene rearrangments and GS with ERG-like signature
Number of samples
P value/Cox value
GS 7 Patients
GS 7 alone
2 × 10–4/2.4 × 10–4
n = 79
n = 38
GS 7 + ERG
GS 7(3+4) and ERG0
GS 7(4+3) or ERG1
9 × 10–4/5.6 × 10–5
n = 61
n = 44
GS 7 + ERG-like
GS 7(3+4) and ERG0-like
GS 7(4+3) or ERG1-like
3 × 10–5/6.2 × 10–6
n = 72
n = 45
GS 6,7 Patients
GS 6 +GS 7(3+4)
<10–7/5.3 × 10–8
n = 162
n = 38
GS 6+7 + ERG
GS 6 +GS 7(3+4) and ERG0
GS 7(4+3) or ERG1
<10–5/5 × 10–5
n = 136
n = 46
GS 6+7 + ERG-like
GS 6 +GS 7(3+4) and ERG0-like
GS 7(4+3) or ERG1-like
<10–10/7.5 × 10–11
n = 153
n = 45
Figure 4 is a flow chart of the multiple analysis and cohorts evaluated in the present study from initial bioinformatics analysis to developing of the 10-gene signature and validation of the signature using independent and well-annotated cohorts.
Predicting aggressive disease is one of the most important and necessary steps in cancer management. This is especially true in PCA, where overdiagnosis resulting from PSA screening has reached alarming levels. Currently, a reliable distinction between indolent and aggressive PCA is not achievable based on pathological and clinical variables alone. ERG-gene rearrangements are one of the most common gene alterations affecting PCA [1, 9]. Expression profile studies suggest that ERG represents a subset of prostate tumours that share specific progression pathways with potential prognostic and therapeutic implications .
As PCA is among the most heterogeneous tumours, it is expected that predicting tumour progression would be more achievable and reliable based on a multi-gene model rather than individual genes. This is similar to what is being clinically implemented for patients with breast and colon cancer, with the use of an ‘oncotype DX’ assay test. In the current study, we identified and characterized a 10-gene signature that shows potential for further development as a signature for aggressive and indolent PCA. This panel was identified by a combination of computational analysis and biological assays and validated on several well-annotated and large cohorts. The signature identified was confirmed to be more accurate than ERG gene expression alone, which at times was found to be unreliable (based on the Glinsky and Taylor surgical charts). The fact that the signature was reflective of ERG status in only 76% of patients is worth noting, as it signifies the ability to reflect transcriptional regulation within a tumour sample regardless of its ERG expression level. Hence, it is superior to ERG as a potential biomarker for aggressive disease (based on validation cohorts above). Our signature was also robust in multivariate analysis, which included biopsy GS and prebiopsy serum PSA, two of the most powerful and currently most used biomarkers in clinical practice. Moreover, the 10-gene signature was able to stratify patients in the intermediate grade category (GS 6 and 7) and identify two distinct classes within this group of patients. When implementing a mixed clinical and molecular model based on a combination of GS and the 10-gene signature, we were more accurate in identifying aggressive tumours that could have been misclassified if judged by GS alone. This indicates the strength of our signature compared with other models, where combining the signature with GS did not improve the predictive accuracy . Our 10-gene signature, coupled with other preclinical variables, such as GS and serum PSA, could be the initial step towards improving our ability to stratify patients into different prognostic groups of aggressive and indolent disease before implementing definite therapies. It could enable us to offer selected patients expectant therapy, based on the signature of their tumours at the time of prostate biopsy, thus avoiding over-treatments and unnecessary harmful side effects for a subgroup of patients with indolent disease.
These data collectively confirm that the expression signature of our 10-gene model is more reflective of the downstream effects of ERG-mediated transcription in PCA than the ERG gene alone. It further documents that some ERG-negative tumours are molecularly, biologically and prognostically more related to other ERG-positive tumours, based on the ERG-mediated transcriptional regulation contributing to PCA progression (the same is also true for ERG-positive tumours being classified as ERG0-like signature). Our model is further evidence of the heterogeneous and multifocal nature of PCA. The 10-gene signature model is more robust than other published ERG-related signatures, as it is able to identify more accurately patients in the higher-risk group who could have been misclassified as having a ‘favourable prognosis’ based on GS alone.
Finally, the significance of this multi-gene model is currently being validated and further refined using quantitative RT-PCR-based methods which would allow us to better quantify the expression levels of the 10 proposed genes within a given patient's sample using appropriate housekeeping genes. This would be similar to developing a ‘recurrence score’ per patient, based on the relative risk for disease recurrence or lethal outcome. By incorporating such tests in PCA, we hope to be able to more accurately predict those tumours to either indolent or aggressive nature, especially those within the GS 6 and 7 category. Those patients represent the commonest group seeking medical attention and the most difficult to predict based on current clinical and pathological variables alone.
In summary, we identified and validated a multi-gene model, reflective of the transcriptional regulation of downstream ERG in a given tumour sample, which is more robust than ERG expression or GS alone in identifying patients at higher risk of disease recurrence and lethal outcome. This signature, coupled with clinicopathological variables in prostate biopsy, could enable us to separate aggressive from indolent disease and to identify patients at highest risk for cancer progression and lethal disease. Finally, functional studies investigating individual genes within this signature could shed light on novel potential pathways associated with disease progression and with therapeutic potential.
This work was supported in part by the Prostate Cancer Foundation Young Investigator Award (to T.A.B). This work was also supported by Prostate Cancer Canada and is proudly funded by the Movember Foundation-Grant #B2013-01. The authors thank Felix Feng for scientific input and discussion.
Conflict of Interest
T. A. B. is a co-inventor on a patent filed by The University of Calgary covering the 10-gene signature prognostic and therapeutic implementation in prostate and breast cancer and leukaemia. No other conflict of interest exists.