Integration of ERG gene mapping and gene-expression profiling identifies distinct categories of human prostate cancer


Colin Cooper, Institute of Cancer Research, Molecular Carcinogenesis, Male Urological Cancer Research Centre, 15 Cotswold Road, Sutton, Surrey SM2 5NG, UK.



To integrate the mapping of ERG alterations with the collection of expression microarray (EMA) data, as previous EMA analyses have failed to consider the genetic heterogeneity and complex patterns of ERG alteration frequently found in cancerous prostates.


We determined genome-wide expression levels with GeneChip Human Exon 1.0 ST arrays (Affymetrix, Santa Clara, CA, USA) using RNA prepared from 35 specimens of prostate cancer from 28 prostates.


The expression profiles showed clustering, in unsupervised hierarchical analyses, into two distinct prostate cancer categories, with one group strongly associated with indicators of poor clinical outcome. The two categories are not tightly linked to ERG status. By analysis of the data we identified a subgroup of cancers lacking ERG rearrangements that showed an outlier pattern of SPINK1 mRNA expression. There was a major distinction between ERG rearranged and non-rearranged cancers that involves the levels of expression of genes linked to exposure to β-oestradiol, and to retinoic acid.


Expression profiling of prostate cancer samples containing single patterns of ERG alterations can provide novel insights into the mechanism of prostate cancer development, and support the view that factors other than ERG status are the major determinants of poor clinical outcome.


fluorescence in situ hybridization


(expression) (tissue) microarray


hierarchical clustering


haematoxylin and eosin


gene-set enrichment analysis.


There is considerable heterogeneity in the natural history of human prostate cancer. New prognostic markers are therefore urgently required so that radical therapies can be targeted to men with potentially fatal cancers, and that the remainder, with indolent cancers, can be spared inappropriate treatment. The potential use of expression microarray (EMA) data in the classification of human cancers was originally reported by Golub et al.[1] using expression data from acute myeloid leukaemia and acute lymphoblastic leukaemia samples, and by Alizadeh et al.[2] who identified two clinically and molecularly distinct subgroups of diffuse large B-cell lymphomas [2]. For breast cancer, for example, this approach identified the prognostically distinct ‘basal’, ‘ERBB2’ and luminal subgroups [3,4]. By contrast, unsupervised analyses of EMA datasets for prostate cancer have currently failed to provide a consistent classification of this disease [5–8]. Significantly, the sampling procedures for these studies invariably failed to consider the well documented occurrence of multifocality [9–14] and of genetic heterogeneity [15–20] in prostate cancer.

Fusion of the TMPRSS2 gene to ETS transcription factor genes including ERG has been reported in around half of prostate cancer cases [21–28]. Less frequently TMPRSS2 or other fusion partners can become fused to the related ETS genes ETV1, ETV4 and ETV5[22,24,29,30]. As a consequence of this rearrangement, the 3′-exons of ERG that become fused to 5′-TMPRSS2 exons are expressed at very high level, in contrast to the untranslocated 5′-ERG-exons that remain expressed at a low level [24,31]. Recently we mapped ERG alterations in whole-mount prostatectomy specimens [32], showing that the patterns of ERG alteration were often complex. Different categories of ERG alteration were found either together in a single cancerous region or within separate foci of cancer in the same prostate slice. We found a single TURP specimen containing separate areas of cancer with rearrangements of both the ERG and ETV1 genes, showing that the observed complexity might at least in part be explained by multiple ETS alterations arising together in a single prostate. Cancers with ERG rearrangements often occurred together with cancer that lacked ERG alteration. Other recent studies also support the existence of TMPRSS2-ERG genetic heterogeneity in human prostate cancer [33,34]. We reasoned [8,27] that a significant problem with previous prostate cancer EMA studies is that genetic heterogeneity was not considered when samples were selected for analysis. Individual samples might therefore have contained mixtures of foci with distinct ERG status and genetic origins, introducing substantial ‘noise’ into the data. The aim of the present study was to assess whether improved stratification of prostate cancers and clues to the mechanism of cancer development can be obtained when EMAs are analysed on samples of prostate cancer of defined ERG status.


Fresh prostate cancer specimens were obtained from a systematic series of patients who had undergone a prostatectomy at the Royal Marsden NHS Foundation Trust and St George’s Hospital NHS Trust. Formalin-fixed paraffin wax-embedded prostatic tissue from the same set of patients were also obtained from the pathology archives of these hospitals. This study was approved by the Clinical Research and Ethics Committee at the Royal Marsden and Institute of Cancer Research.

Frozen prostate slices were prepared and stored in RNAlater (Ambion Inc, Austin, Texas, USA) as described by Jhavar et al.[35]. A small piece of tissue (≈2 mm3) containing areas of cancer and/or normal glands was identified within the frozen prostate slice with the help of haematoxylin and eosin (H&E)-stained whole-mount sections from the adjacent formalin-fixed slices. This piece of tissue was washed in ice-cold PBS for 2 min, embedded in OCT embedding medium (Raymond Lamb UK, Ltd) and frozen on dry ice. Frozen sections obtained from this tissue were stained with H&E and high molecular weight cytokeratin to confirm the presence and location of cancer and normal glands within it. Areas containing cancer or normal glands were then macro-dissected from the OCT block using a fresh sterile scalpel blade and were used for RNA extraction. Care was taken to avoid contamination by using a new scalpel blade for each macro-dissected area. Care was also taken to keep the OCT block frozen during macro-dissection. RNA was extracted from the tissue in TRIzol (Invitrogen Ltd, Paisley, UK) by following the manufacturer’s instructions, and purified using the RNeasy MinEluteTM clean up kit (Qiagen Ltd, Crawley, Sussex, UK) according to the manufacturer’s instructions.

Expression profiles were determined using 1.0 Human Exon ST arrays (Affymetrix, Santa Clara, CA, USA) according to the manufacturer’s instructions. The Affymetrix GeneChip® Whole Transcript Sense Target Labelling Assay was used to generate amplified and biotinylated sense-strand DNA targets from the entire expressed genome (1.5 µg of total RNA) without bias. Manufacturer’s instructions were followed for the hybridization, washing and scanning steps. Arrays were hybridized by rotating them at 60 rpm in the Affymetrix Gene Chip hybridization oven at 45 °C for 16 h. After hybridization, the arrays were washed in the Affymetrix GeneChip Fluidics station FS 450. The arrays were scanned using the Affymetrix Gene Chip scanner 3000 7G system. Gene-level and exon-level expression signal estimates were derived from CEL files generated from Affymetrix GeneChip Exon 1.0 ST arrays using the robust multiarray analysis algorithm [36] implemented from the Affymetrix Power Tools software. Gene-level estimates were obtained using the ‘core’ metaprobe list annotation release 21. Exon-level data was filtered to include only those probesets that are in the ‘core’ meta-probe list.

The ERG gene break-apart assay was used exactly as previously described [32,37]. For detecting PTEN deletions the design of the fluorescence in situ hybridisation (FISH) assay is similar to the commercially available system (LSI PTEN/CEP 10 Dual Color DNA Probe; Vysis Inc., Downers Grove, IL, USA) except that the commercial PTEN probe is replaced with digoxigenin-labelled (green) BAC probes RP11–765C10 and RP11–959L24 at the PTEN locus. These two BACs precisely map to the minimal region of PTEN deletion at 10q23.3 in prostate cancer defined by Hermans et al.[38]. Cores analysed for the PTEN locus (labelled green) were considered hemizygously deleted for PTEN when >50% of nuclei contained either one signal of PTEN locus-targeted probe and two or more signals of reference chromosome 10 centromere CEP10 probe (red), or two signals of locus-targeted PTEN probe and four or more signals of reference CEP10 probe. Homozygous deletions were identified by the absence of both locus-targeted PTEN probes and the presence of signals of reference CEP10 probe in >90% of cells. FISH was used either on formalin-fixed slices taken immediately adjacent to the frozen sample used for RNA preparation, and/or on the whole-mount formalin-fixed section adjacent to the fresh frozen slice.

Tissue MAs (TMAs) prepared from TURP and prostatectomy specimens collected at the Royal Marsden Hospital were those reported in our previous studies [39]. SPINK1 (serine protease inhibitor Kazal-type 1) protein was detected in 4 µm slices of those TMAs exactly as described by Paju et al.[40].

Gene-level data were used for hierarchical clustering (HC) analyses carried out by a method similar to that described previously [41]. Briefly, HC analyses were used on all tumour samples using the complete gene-level dataset of 17 881 probesets, using an agglomerative HC method (hclust in the R statistical programming language) using a dissimilarity metric defined by Pearson’s correlation (1 − Pearson’s correlation) and the complete linkage agglomerative method that combines clusters where the maximum distance between the elements of each cluster are smallest. Differential expression analysis was applied to the gene-level data with two groups divided based on whether an ERG break was detected or not. Linear models were determined for each transcript cluster (gene) and an estimate for the overall variance calculated by an empirical Bayes approach [42]. A moderated t-statistic was computed for each transcript cluster with the resulting P values adjusted for multiple testing using Benjamini and Hochberg’s method to control the false discovery rate [43]. Those transcript clusters with an adjusted P < 0.05 were considered significantly differentially expressed between the groups. Gene-enrichment analysis was used to test whether sets of genes defining published prognostic signatures were differentially expressed between different HC groups. The approach used is that described in the ‘Limma’ package [44] and is based on the proposals of Mootha et al.[45]. The list of genes expressed differentially between ERG-rearrangement-positive and -negative cancers was uploaded into the Ingenuity pathway analysis (Ingenuity, Mountain View, CA, USA). A score was computed for each network according to the fit of the original set of significant genes. This score reflects the negative logarithm of the P value, which indicates the likelihood of the focus genes in a network being found together as a result of random chance.


GeneChip Human Exon 1.0 ST arrays were used to obtain genome-wide expression profiles from RNA prepared from 35 specimens of prostate cancer from 28 prostates. A 3–5 mm thick ‘research slice’ was taken from individual prostatectomy specimens, as described by Jhavar et al.[35] and frozen. H&E-stained whole-mount sections from above and below this slice were examined histopathologically. When cancer was detected at a particular location in both these sections it was inferred that cancer was also present in the intervening research slice at the same position. Small samples from the centre of the largest foci of cancer were microdissected and used for RNA preparation [27,31]. Two methods were used to assess ERG gene status. First, analysis of individual expression profiles obtained using the exon MAs could be used to score the presence of ERG alterations [31]. These analyses provide a check that the rearrangement detected by FISH (see below) does indeed cause alterations in ERG gene expression, and allowed us to eliminate the possibility that expression was altered at other ETS loci (i.e. ETV1 and ETV4). This is possible because as a consequence of the rearrangement, 3′-ERG exons that become joined to TMPRSS2 are expressed at very high level, in contrast to the 5′-ERG exons that remain expressed at low levels. Second, ERG status was assessed directly for the frozen sample by examining the immediately adjacent material in the formalin-fixed slice using a FISH ‘break-apart’ assay, as previously described [32] and shown in Fig. 1. Samples were classified as previously described [32,37] according to: (i) the number of un-rearranged ERG signals (twinned red and green signals); (ii) the number of separate red signals correspond to 3′-ERG gene sequences that become joined to 5′-TMPRSS2 sequences in the TMPRSS2-ERG fusion; and (iii) the number of separate green signals corresponding to 5′-ERG sequences (normal, separate 3′-ERG signals, separate 5′-ERG signals). Thus a score of (2,0,0) would represent a cancer containing two un-rearranged copies of ERG, while a score of (1,1,1) and (1,1,0) represent cancers that contain both normal and rearranged ERG sequences, in the latter case with loss of 5′-ERG sequences. Prognostically we have shown that patients with cancers with a single copy of 3′-ERG sequences have overall and cancer-specific survival that are not significantly different from cancers lacking ERG rearrangements [37]. By comparison, patients with cancers harbouring a duplication or triplication of the 3′-ERG sequences together with loss of 5′-ERG sequences (termed 2+Edel) have very poor survival.

Figure 1.

ERG rearrangements detected by FISH. Each image shows the stained whole-mount prostatectomy section. The coloured areas indicate ERG status of the cancer in that region. The small squares indicate the position of the sample taken for RNA preparation and subsequent EMA analysis. The FISH ‘break-apart’ assay result for ERG is shown in the inset next to the region of cancer. N, R and G represent respectively the number of normal ERG loci, separate 3′-ERG signals and separate 5′-ERG signals.

The samples selected for EMA analyses included 31 with known and homogenous ERG status, and an additional four samples for which ERG status could not be unambiguously assigned. HC analysis of the gene-level expression profiles from these samples is shown in Fig. 2. Several observations emerged from these studies. The main aim of the study was to examine expression profiles from cancer samples of defined and homogenous ERG status. However, for three prostates, cancerous samples had been selected from two separate regions. The positions within the whole prostatectomy ERG maps where these pairs of samples were selected are shown in Fig. 1. For one of these pairs (sample IDs, 24 and 36) the individual expression profiles clustered in entirely separate positions, consistent with the cancers having distinct genetic origins. However, for two of the pairs (IDs, 1 and 40, and 20 and 21) the expression profiles clustered in immediately adjacent positions, indicating that the separate samples potentially represented different regions of the same cancer. In the prostate sample pair 20 and 21, one had the ESplit pattern (1,1,1) and the second (2,2,2), suggesting that the second region of cancer could have arisen from the first following genomic duplication.

Figure 2.

Analysis of expression profiles determined using Affymetrix GeneChip Human Exon 1.0 ST arrays. A dendrogram from unsupervised HC analysis of EMA profiles obtained from prostate cancer samples of defined homogenous ERG status. The cancers cluster into two major groups designated group I (left) and II (right). Immediately below the dendrogram, pairs with samples taken from a distinct region of the same prostate are each represented by one colour. Below this, information relating to the samples represented in the dendrogram are shown: (1) Age at diagnosis; (2) PSA level at diagnosis; (3) Gleason score; (4) clinical stage (≥T3b); (5) presence or absence of ERG rearrangement determined from analysis of the expression profile of individual samples as described in a previous study [31]; (6) ERG status determined using the FISH ‘break-apart’ assay. ERG alterations were stratified according to potential clinical significance, as previously determined by Attard et al.[37]: light blue indicates 1Edel or Esplit; dark blue indicates 2+Edel; (7) PTEN status determined by FISH, ‘X’ indicates homozygous (complete) deletion and ‘/’ indicates hemizygous loss of one copy; (8) SPINK1 up-regulated as determined from exon level expression MA data (see Fig. 4); (9) PSA recurrence defined as two consecutive PSA increases, final >0.2 ng/mL. NA, not assessed.

Overall clustering of the expression profiles occurred into two distinct groups (I and II; Fig. 2). There were no significant differences in Gleason score (P = 1), clinical stage (P = 1) or PSA level at diagnosis (P = 0.3191) between the groups. Gene-set enrichment analysis (GSEA) provides a powerful method for evaluating MA data at the level of a gene set that is defined based on previous biological knowledge [45,46]. GSEA provides very strong evidence that group II cancers were associated with expression signatures of a poor prognosis. Remarkably, genes from nine of 10 previously published expression signatures of poor prognosis were differentially expressed between the groups (Table 1[47–54). These included the 75- and 25-gene signatures of cancer aneuploidy from Carter et al.[47], the 11-gene stem cell signature of Glinski et al.[48], the 17-gene signature of cancer metastasis of Ramaswamy et al.[49], the PTEN pathway signature of Saal et al.[50], and the 10-gene poor-prognosis signature for prostate cancer of Stephenson et al.[51]. In each case, group II was predicated to have the poorer prognosis. Analysis of outcome data supports this view. Although there were few patients with PSA failure, those in group II had statistically significantly shorter time to PSA failure (log-rank test, P = 0.028). Cancers with ERG rearrangements, including cancer with the poor-prognosis 2+Edel marker, were distributed between the groups. There was a statistically insignificant enrichment for cancers with ERG rearrangements in group II (Fisher’s exact test, P = 0.06). However, as many cancers with ERG rearrangement clustered in group I, factors other than ERG status must be determining membership of group II. We conclude that the EMA analyses have, in unsupervised analyses, identified a subgroup of prostate cancers associated with indicators of poor prognosis.

Table 1.  The differential expression of the prognostic signature between group I and II cancers. GSEA was used to test whether the set of genes defining each prognostic signature was differentially expressed between the expression profiles in each group
Reference forpredictive signatureType of signatureNo. of genesin signatureNo. of matching geneson 1.0 ST exon arraysGene enrichment P
[50]PTEN loss207158<<0.001
[52]Prostate cancer progression7054   0.017
[47]Chromosomal instability7066<<0.001
[53]Prostate cancer metastasis4443   0.002
[47]Chromosomal instability2524<<0.001
[49]Solid tumour metastasis1717   0.003
[54]Prostate cancer progression1212   0.015
[48]Cancer stem cell119   0.008
[51]Prostate cancer recurrence1010 <0.001
[51]Prostate cancer recurrence109 <0.001

We further interrogated this dataset to identify genes that were differentially expressed between ERG-rearrangement-positive and ERG-rearrangement-negative cancers. In all, 142 significantly differentially expressed genes were identified (full details available at Notably, only 16 of these genes, including HDAC1, were the same as those previously identified to be differentially expressed between prostate cancers that overexpressed ERG and those that did not [21]. Our analyses identified many additional genes that appeared to be differentially expressed between the categories. We previously reported that the potassium-channel genes KCNH8, KCNN4 and KCN33 were overexpressed in cancers containing ERG alterations [31], consistent with the view that potassium channels can be critical for the growth of human cancer cells and might represent effective targets for cancer therapy [55]. In the current analyses we also identified the calcium-channel gene CACNAID as overexpressed in cancers with ERG rearrangements (Table 2).

Table 2.  Genes differentially expressed between ERG-rearrangement-positive and -negative prostate cancer
RankAffymetrix IDPAdjusted Plog2 ratioz > 3Gene symbolDescription
  • *

    The top 10 genes differentially expressed between those samples defined as having ERG rearrangement and those lacking ERG rearrangement defined according to: (a) statistical significance using the adjusted P value using the Benjamin-Hochberg correction; (b) the log2 ratio of the mean expression level, and (c) an outlier test which counts the number of samples in the ERG-rearrangement-negative group with z > 3: z is the defined as the sd of the expression levels in cancers with ERG gene rearrangement.

Top 10 genes by statistical significance*
13931765<0.001<0.0010.52412ERGv-ets erythroblastosis virus E26 oncogene like (avian)
23265175<0.001<0.0010.6677TDRD1tudor domain containing 1
32613293<0.001<0.0010.3461KCNH8potassium voltage-gated channel, subfamily H (eag-related), member 8
43864646<0.001<0.0010.1721KCNN4potassium intermediate
52658595<0.001<0.0010.1773HES1hairy and enhancer of split 1 (Drosophila)
63320169<0.001<0.0010.1054AMPD3adenosine monophosphate deaminase (isoform E)
73126504<0.001<0.0010.1473ChGnchondroitin β1,4 N-acetylgalactosaminyltransferase
83147173<0.001<0.0010.2263NCALDneurocalcin delta
93381879<0.0010.002−0.1023P4HA3procollagen-proline, 2-oxoglutarate 4-dioxygenase (proline 4-hydroxylase), α polypeptide III
1026243851.06E-060.0019020.2750CACNA1Dcalcium channel, voltage-dependent, L type, α1D subunit
Top 10 genes by log2 ratio of means*
12880422<0.0010.035−0.7189SPINK1serine peptidase inhibitor, Kazal type 1
23265175<0.001<0.0010.6677TDRD1tudor domain containing 1
32956563<0.0010.0070.6040CRISP3cysteine-rich secretory protein 3
43931765<0.001<0.0010.52412ERGv-ets erythroblastosis virus E26 oncogene like (avian)
52344511<0.0010.0110.3550DNASE2Bdeoxyribonuclease II beta
62613293<0.001<0.0010.3461KCNH8potassium voltage-gated channel, subfamily H (eag-related), member 8
73567243<0.0010.0110.3190C14orf39chromosome 14 open reading frame 39
83482274<0.0010.0270.3060ATP8A2ATPase, aminophospholipid transporter-like, Class I, type 8 A, member 2
92624385<0.0010.0020.2750CACNA1Dcalcium channel, voltage-dependent, L type, α1D subunit
103462567<0.0010.0210.2320KCNC2potassium voltage-gated channel, Shaw-related subfamily, member 2
Top 10 genes by outlier test (z > 3)*
13931765<0.001<0.0010.52412ERGv-ets erythroblastosis virus E26 oncogene like (avian)
22880422<0.0010.035−0.7189SPINK1serine peptidase inhibitor, Kazal type 1
33265175<0.001<0.0010.6677TDRD1tudor domain containing 1
42956904<0.001<0.001−0.1367PKHD1polycystic kidney and hepatic disease 1 (autosomal recessive)
53190242<0.0010.045−0.0617LOC554175hypothetical LOC554175
62993124<0.0010.0180.2166NPYneuropeptide Y
72677356<0.0010.011−0.1486WNT5Awingless-type MMTV integration site family, member 5 A
83669382<0.0010.015−0.1276MON1BMON1 homologue B (yeast)
93638590<0.0010.021−0.0716MESP1mesoderm posterior 1 homologue (mouse)
103352948<0.0010.0340.0596SORL1sortilin-related receptor, L(DLR class) A repeats-containing

Analyses of the EMA dataset also highlighted SPINK1 as a gene that was overexpressed in ERG-rearrangement-negative cancers (Table 2). When genes were ranked according to the ratio of their mean expression in the two groups, SPINK1 was the top-ranked gene with ERG occurring at position 4 (Table 2). When an outlier test was used to screen for genes that were over- or under-expressed in only a proportion of cases, SPINK1 ranked second, with ERG as the top ranked gene (Table 2). The discovery that ERG had an outlier expression profile led directly to the observation that the high-level ERG expression present in around a half of prostate cancers resulted from rearrangement of the ERG locus in the form of a TMPRSS2-ERG gene fusion. However, this type of genetic alteration does not appear to explain the outlier pattern of expression of SPINK1, because a FISH break-apart assay, similar in design to the ERG break-apart assay described above, failed to detect rearrangement of the SPINK1 gene in cancers that contained its overexpression (results not shown).

SPINK1, also called tumour-associated trypsin inhibitor (TATI) and pancreatic secretory trypsin inhibitor (PSTI), is a four-exon gene that encodes a mature 56-amino-acid (6 kDa) protein [56,57]. SPINK1 has been proposed as a prognostic marker for several human cancer types, and in prostate cancer its presence is linked to Gleason score [56]. The expression profile of SPINK1 mRNA in our dataset is shown Fig. 3b. SPINK1 is expressed at a consistently low level in all cancers that have rearranged ERG. Higher expression of SPINK1 mRNA is restricted to the subgroup of cancers that lack ERG rearrangement. Also, it was interesting that many of the cancers overexpressing SPINK1 clustered as a single group in hierarchical analysis (Fig. 2A). We also examined SPINK1 expression in a published EMA dataset from patients with prostate cancer who had undergone prostatectomy [51]. As the ERG gene rearrangement status was not available for these cancers we used ERG gene overexpression as a surrogate for the presence of the TMPRSS2-ERG fusion. SPINK1 was again expressed at a consistently low level in all cancers that had high-level expression of ERG, and high-level expression of SPINK1 occurred exclusively in a subset of cancers that expressed relatively low levels of ERG mRNA. Within the cancers expressing low levels of ERG mRNA, those cancers overexpressing SPINK1 mRNA at high level (z > 3, Fig. 3e,f) did not have significantly worse rates of PSA failure (Fig. 4d). However, when only cancers with very high SPINK1 mRNA levels (z > 4) were considered, there was a significantly worse PSA failure ratio (log-rank test, P = 0.003, Fig. 4e). In a TMA-based study we failed to find a link between immunohistochemically detected SPINK1 protein expression and poor survival (Fig.  4a–c,f). Interestingly, the tight and reproducible relationship between SPINK1 mRNA and absence of ERG rearrangements was not fully maintained when SPINK1 protein was examined (Table 3, Fig. 4). Consistent with this, we found only a poor correlation between SPINK1 mRNA level and SPINK1 protein when they were examined in parallel in the series of cancers (results not show). Our results indicate that SPINK1 mRNA overexpression might define a distinct category of prostate cancers that lack ERG rearrangements, although we were unable to obtain conclusive evidence that this subgroup represents a distinct prognostic category.

Figure 3.

Gene expression profiles: (a–d) Selected gene expression profiles for 28 prostate cancers and three samples of non-neoplastic epithelium from the gene-level EMA dataset obtained using Affymetrix GeneChip Human Exon 1.0 ST arrays. (e,f) Expression profiles of ERG (e) and SPINK1 (f) from the published EMA dataset [51]. Samples were all ranked in order of the level of ERG expression (highest on the left). Blue indicates that the cancer is known to contain an ERG rearrangement.

Figure 4.

SPINK1 immunohistochemistry and Kaplan-Meier analyses. (A) Screening of SPINK1 detected by immunohistochemistry. Examples of the four levels of staining (0–3) are shown; (B) Example of intense level 3 staining with a Class N ERG status; (C) Example of focally stained level 3 staining with Class ESplit ERG status; (d) and (e) Kaplan-Meier curves showing prostate cancers from a dataset [51] comparing (i) cancers with higher ERG mRNA expression (as a surrogate for the presence of ERG rearrangement), (ii) cancers with low ERG mRNA expression and high SPINK1 mRNA expression, and (iii) cancers with low ERG and SPINK1 mRNA expression. (f) Kaplan-Meier plot showing survival in a series of prostate cancers described by Foster et al.[39] and represented on a TMA. The SPINK1 immunohistochemical study and detection of ERG alteration by FISH were done on adjacent slices. The curve shows the survival for the 78 cancers that lacked ERG alteration stratified according to level of SPINK1 protein detected by immunohistochemistry, i.e. no SPINK1, SPINK1 = 1 (low), SPINK1 = 2 or 3 (high). The three categories had similar survival values. There were no differences in survival when: (i) the three cancers contained high SPINK1 expression and ERG rearrangements were included together with ERG-rearrangement-negative cancers in the high SPINK1 staining category; or (ii) cancers containing SPINK1 = 2 and SPINK1 = 3 staining were considered separately (results not shown).

Table 3.  The relationship between SPINK1 staining score and ERG status
SPINK1 scoreERG rearrangedERG not rearranged
0 (absent)1929
1 (low)2638
2 or 3 (high) 311

Ingenuity pathway analysis software was used to identify the biological networks and pathways in which the genes expressed differentially between ERG-rearrangement-positive and ERG-rearrangement-negative cancer are involved (Fig. 5,6). Six significant interlinked pathways were identified; 97 of the differentially regulated genes that we had identified (available at were recognized as unique genes in the Ingenuity knowledge database. Of these, 88 appeared in one or more of the six networks. One pathway involved genes known to be modulated by β-oestradiol (Fig. 5B). A second pathway involved HDAC1, PKA and genes known to be modulated by retinoic acid; HDAC1 had been previously identified as a gene overexpressed in prostate cancers containing ERG rearrangement [21] (Fig. 6A). A third pathway involved EGFR and genes modulated by progesterone and by hydrogen peroxide (Fig. 6B). The remaining three pathways centred around NFκβ-RAS-MAPK (Fig. 7A), around TNF (Fig. 7B), and around d-glucose and Ca2+ (Fig. 7C).

Figure 5.

Significant interlinked networks found by Ingenuity pathway analysis network generation based on significantly differential genes found between samples with the ERG rearrangement and those that lack that re-arrangement. (A) Six significant pathways were identified. This diagram shows how the pathways are interlinked. A red lines shows that the two connecting pathways contained genes in common. Details of all six pathways are shown here, in Figs 6 and 7; the β-oestradiol-containing pathway had a significance of P << 0.001. Colouring for the differential genes is based on the log2 ratio between ERG-rearranged samples and non-rearranged samples with red indicating up-regulation in ERG-rearranged samples, and green indicating down-regulation. The lines connecting molecules indicate molecular relationships, with the dashed and solid lines indicating an indirect or direct (physical contact between molecules) interaction, respectively.

Figure 6.

Networks found by Ingenuity pathway analysis network generation based on significantly differential genes found between samples with ERG rearrangement and those that lack that re-arrangement. (A) The HDAC1, PKA and retinoic acid based network (P << 0.001); (B) The EGFR-, progesterone- and hydrogen peroxide-based network (P << 0.001).

Figure 7.

Figure 7.

Network based around (A) NFκ-RAS-MAPK, (B) TNF and (C) d-glucose and Ca2+ (all P << 0.001) found by Ingenuity pathway analysis network generation based on significantly differential genes found between samples with ERG rearrangement and those that lack that re-arrangement.

Figure 7.

Figure 7.

Network based around (A) NFκ-RAS-MAPK, (B) TNF and (C) d-glucose and Ca2+ (all P << 0.001) found by Ingenuity pathway analysis network generation based on significantly differential genes found between samples with ERG rearrangement and those that lack that re-arrangement.


The occurrence of genetic heterogeneity at the ERG locus in prostate cancer is now well documented, However, the current study represents the first to consider the issue of genetic heterogeneity when collecting EMA data. This has been made possible by our development of a method for collecting frozen slices of human prostates [31]. Our previous mapping studies [32] have shown that different patterns of ERG alteration are often juxtaposed in human prostate cancer. In previous EMA studies it is therefore likely that many of the samples analysed contained mixtures of samples that were genetically distinct. In the current study, samples for analysis were selected based on maps of ERG alteration determined using whole-mount slices taken from immediately adjacent formalin-fixed material. Thus all selected samples were genetically pure for ERG status. The results show that the EMA profiles clustered into two distinct groups, with the smaller group associated with indicators of poorer survival; this group had a significantly worse rate of PSA failure after prostatectomy, and its expression profile correlated significantly with nine of 10 prognostic indicators of poor survival.

There has been considerable interest in assessing the prognostic utility of genetic alterations at the ERG gene loci [25,26,58,59]. For example, Demichelis et al.[58] reported both a significant association between the presence of a TMPRSS2-ERG fusion and prostate cancer-specific death, and a link between the presence of ERG alterations and higher Gleason score. In a series of 26 patients with Gleason 7 disease who had a prostatectomy, Nam et al.[59] found that the presence of TMPRSS2-ERG fusion was associated with a greater probability of biochemical disease relapse (elevated PSA level). We recently determined that loss of 5′-ERG sequences coupled with duplication of translocated ERG predicts an extremely poor clinical outcome [37], independently of Gleason score and PSA level at diagnosis. Notably, group I and II identified in the present study were not tightly linked to ERG status. Most of the cancers in the group associated with indicators of poor prognosis (II) had ERG rearrangements, but cancers with similar alteration of ERG were also found in group I. Even the presence of 2+Edel was not entirely restricted to group II. Such observations are consistent with a model where alterations other than ERG are of key importance in determining poor clinical outcome, and indeed numerous prognostic biomarkers that have no apparent link to ERG status have been identified for this disease [8,15–20].

High levels of SPINK1 protein, commonly detected as a serum or urine marker, have been associated with markers of adverse outcome in various cancer types, including ovarian, endometrial, gastrointestinal, bladder and prostate cancer [40,56,60]. For example, in non-mucinous ovarian cancer, SPINK1 protein detected in serum is an independent prognostic marker of adverse outcome (175 and 229 in [56]). However, this relationship does not always apply, as in gastric cancer increased tissue expression of SPINK1 appears to be associated with a good prognosis [61]. In the prostate, SPINK1 is found in both benign and malignant disease, and increased SPINK1 is found in cancers with higher Gleason score. The present study shows that SPINK1 mRNA has an outlier pattern of expression that was entirely restricted to cancers that lacked ERG overexpression. Interestingly, there was a poor correlation between SPINK1 mRNA expression and SPINK1 protein expression, and this tight relationship was not maintained when SPINK1 protein was examined. We failed to find evidence that SPINK1 overexpression is associated with a poor clinical outcome, except that cancers expressing very high levels of SPINK1 mRNA had a significantly worse rate of PSA relapse (P = 0.003). A larger series will need to be examined to assess the precise clinical utility of SPINK1 overexpression.

Exposure to oestrogens has been proposed as an initiating agent in the development of prostate cancer (reviewed in [62]). Our results intriguingly showed that many of the genes with expression levels differentially regulated between ERG-rearrangement-positive and ERG-rearrangement-negative cancers are linked to regulation by β-oestradiol. The further analysis of ERG-rearrangement-positive and -negative cancers to test whether they show differential sensitivity to oestrogen exposure might be an interesting topic for future investigation. Links to genes controlled by progesterone, retinoic and hydrogen peroxide also deserve further investigation.

In mapping studies on sections of whole cancerous prostates we recently reported that several independent ETS alterations can arise together in the same prostate [32]. Analysis of TMA cores from distinct regions of cancer within a single prostate also supported this view. The results of EMA analyses also present an even more complex relationship between different foci of cancer within one prostate. For one pair the expression profiles clustered separately, supporting the view that the two foci had distinct genetic origins. Two further pairs appeared to have similar genetic origins, as judged by their sample expression profile, despite having distinct ERG FISH patterns. Such observations were consistent with the idea that alterations in ERG rearrangements might occur during progression of an individual cancer.

We previously reported that rearrangement of ERG can be used to stratify patients with prostate cancer into distinct survival categories. In particular, the presence of 2+Edel alteration, in which loss of 5′-ERG sequences is accompanied by duplication of the rearranged 3′-ERG, provided prognostic information in addition to that provided by Gleason score and PSA level at diagnosis [37]. In the present study, the EMA profiling studies on cancer of known ERG status identified a category of cancers associated with markers of poor prognosis (group II), and suggest that factors other than ERG status might be of key importance in determining a poor clinical outcome.


This work was funded by Cancer Research UK, The National Cancer Research Institute, The Grand Charity of Freemasons, The Bob Champion Cancer Trust, and The Rosetrees Trust. We thank Christine Bell for help with typing the manuscript. We thank Elise Nilsson for excellent technical assistance and acknowledge funding from the European Union and the Swedish Cancer Society.


None declared.