A new genome-driven integrated classification of breast cancer and its implications


  • Sarah-Jane Dawson,

    1. Cancer Research UK, Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
    2. Department of Oncology, University of Cambridge, Cambridge, UK
    3. Cambridge Breast Unit, Addenbrooke's Hospital, Cambridge University Hospital NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, UK
    Search for more papers by this author
    • Joint current affiliation: Division of Research, Peter MacCallum Cancer Centre, St Andrew's Place, East Melbourne, Victoria 8006, Australia.
  • Oscar M Rueda,

    1. Cancer Research UK, Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
    2. Department of Oncology, University of Cambridge, Cambridge, UK
    Search for more papers by this author
  • Samuel Aparicio,

    1. Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
    2. Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
    Search for more papers by this author
  • Carlos Caldas

    Corresponding author
    1. Cancer Research UK, Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, UK
    2. Department of Oncology, University of Cambridge, Cambridge, UK
    3. Cambridge Breast Unit, Addenbrooke's Hospital, Cambridge University Hospital NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, UK
    4. Cambridge Experimental Cancer Medicine Centre, Cambridge, UK
    • Corresponding author. Cancer Research UK, Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK. Tel.:+44 1223 769648; Fax:+44 1223 769510; E-mail: Carlos.Caldas@cruk.cam.ac.uk

    Search for more papers by this author


Breast cancer is a group of heterogeneous diseases that show substantial variation in their molecular and clinical characteristics. This heterogeneity poses significant challenges not only in breast cancer management, but also in studying the biology of the disease. Recently, rapid progress has been made in understanding the genomic diversity of breast cancer. These advances led to the characterisation of a new genome-driven integrated classification of breast cancer, which substantially refines the existing classification systems currently used. The novel classification integrates molecular information on the genomic and transcriptomic landscapes of breast cancer to define 10 integrative clusters, each associated with distinct clinical outcomes and providing new insights into the underlying biology and potential molecular drivers. These findings have profound implications both for the individualisation of treatment approaches, bringing us a step closer to the realisation of personalised cancer management in breast cancer, but also provide a new framework for studying the underlying biology of each novel subtype.


Breast cancer remains one of the leading causes of cancer death in women, despite significant improvements in survival over the past 25 years. One of the greatest challenges faced by clinicians and researchers in this field is that breast cancer is not a single entity, but rather a heterogeneous group of several subtypes displaying distinct differences in biological and clinical behaviour. A primary aim in cancer management is to tailor clinical decisions to the individual, based on a detailed understanding of the molecular profile of the tumour and the likely clinical outcome of the individual's disease. This progress will facilitate personalised treatment approaches that are more targeted, have superior efficacy and are associated with less toxicity. Our increased knowledge of the genomic aberrations underlying human breast cancers, and the molecular processes that are disrupted, are key to understanding the diversity of the disease and achieving the aims of personalised medicine. Over the past decade, the development of high-throughput technologies to study genetic, epigenetic and proteomic changes has allowed for rapid progress in our understanding of the complexity of breast cancer biology. Here, we review recent advances that have led to the integration of information on the genomic and transcriptomic landscapes of breast cancers to refine the molecular classification of the disease.

Current histopathological classification of breast cancer

The classification of invasive breast cancer currently involves the assessment of histological criteria encompassing both morphology-based and immunohistochemical (IHC) analyses. Traditional pathological parameters such as histological type, tumour size, histological grade and axillary lymph-node involvement have been shown to correlate with clinical outcome and provide the basis for prognostic evaluation (Elston et al, 1999). IHC markers such as the expression of hormone receptors (oestrogen (ER) and progesterone receptors (PR)) and the overexpression and/or amplification of the human epidermal growth factor receptor 2 (HER2) provide additional therapeutic predictive value and are of key importance in guiding treatment selection (Harris et al, 2007).

Histopathological subtypes and tumour grade

The vast majority of breast carcinomas (∼70–80%) are described as invasive ductal carcinomas not otherwise specified (IDC-NOS) based on architectural patterns and cytological features (Ellis, 2003). In contrast, around 25% of breast cancers are characterised according to ‘histological special types’ such as lobular, tubular, medullary and metaplastic carcinomas (Ellis, 2003). At the molecular level, each histological special type appears to be more homogenous than IDC-NOS and is likely to be driven by key underlying molecular mechanisms (Weigelt et al, 2008). However, the majority of the special types are rare and to date, this has limited their analysis in large-scale molecular studies. In addition to histological tumour type, tumour grade is the other important intrinsic tumour characteristic that can be assessed by histopathological analysis. Tumour grade is an assessment of differentiation (tubule formation and nuclear pleomorphism) and proliferative activity (mitotic index), allowing tumours to be further stratified and providing key prognostic information (Rakha et al, 2010).

ER, PR and HER2

In conjunction with histopathological assessment, the standard evaluation of breast cancer for clinical purposes involves IHC characterisation of ER, PR and HER2 status. Hormone receptor-positive breast cancers account for around 75–80% of all cases and standardised IHC assays for the routine testing of ER and PR are used to guide the selection of patients for hormonal-based therapies. HER2 represents the only additional predictive marker currently in routine use. Approximately 10–15% of breast cancers have HER2 overexpression and/or amplification with around half of these co-expressing hormone receptors (Konecny et al, 2003). These patients are selected for anti-HER2 based therapies, including the humanised monoclonal HER2 antibody, trastuzumab, which targets the extracellular domain of the HER2 receptor. The remaining 10–15% of breast cancers are defined by hormone receptor and HER2 negativity (i.e., triple negative cancers), which represent a key clinical entity given their lack of therapeutic options (Dawson et al, 2009).

While the current classification of human breast tumours has been fundamental for prognostic and predictive evaluation, there remain a number of important limitations. First, considerable variation in response to therapy and clinical outcome still exists, even for tumours with apparent similarities in clinical and pathological characteristics. Second, this classification continues to provide limited insight into the complex underlying biology and the molecular pathways driving the disease in different subtypes.

Molecular classification of breast cancer

Gene expression profiling and the identification of intrinsic subtypes

Expression analysis using microarray-based technology has provided researchers with an opportunity to begin moving towards comprehensive molecular profiling of breast cancer. These efforts have resulted in the identification of clinically relevant molecular subtypes, and have provided early insights into the molecular heterogeneity of the disease (Perou et al, 2000; Sorlie et al, 2001, 2003; Hu et al, 2006). Five distinct intrinsic subtypes have been identified based solely on gene expression: luminal A, luminal B, HER2 overexpressing, basal-like and normal breast tissue-like. Differences in gene expression patterns reflect basic alterations in the cell biology of the tumours and importantly are associated with significant variation in clinical outcome (Sorlie et al, 2003). The prognosis of patients with ER-positive disease is largely determined by the expression of genes related to proliferation (Hu et al, 2006). More recently, the intrinsic classification has been refined in a PAM50 assay based on the expression of 50 genes designed to classify single samples into each of the five intrinsic subtypes (Parker et al, 2009; Nielsen et al, 2010).

Following the initial identification of the intrinsic molecular subtypes, gene expression studies have evolved and further sub-classification of breast cancers into new molecular entities have been proposed. For example, a detailed analysis of genes differentially expressed in ER-negative tumours has demonstrated that basal breast cancers are a heterogeneous group with at least four main subtypes (Teschendorff et al, 2007). Furthermore, this analysis revealed an immune response gene expression module, which identifies a good prognosis subtype in ER-negative disease. Other recent studies have also identified a new breast cancer intrinsic subtype known as Claudin-low or mesenchymal-like (Prat et al, 2010). This subtype is characteristically negative for ER, PR and HER2 and carries an intermediate prognosis between basal and luminal subtypes. Importantly, Claudin-low/mesenchymal tumours appear to be enriched with cells showing distinct biological properties associated with mammary stem cells and tumour initiating potential (Hennessy et al, 2009; Lim et al, 2009; Lehmann et al, 2011; Bruna et al, 2012).

In parallel with the identification of the intrinsic subtypes, gene expression profiling has also been used by several groups to identify distinct prognostic signatures (van de Vijver et al, 2002; van't Veer et al, 2002; Paik et al, 2004). Two of these signatures, Mammaprint (a microarray-based assay of the Amsterdam 70-gene breast cancer signature) and OncotypeDX (a PCR-based assay of a panel of 21 genes) have been approved for clinical use and are now being tested in randomised clinical trials (Cardoso et al, 2008; Sparano and Paik, 2008).

Subtypes defined through IHC markers

IHC can reproduce a similar molecular taxonomy of the disease (Callagy et al, 2003; Abd El-Rehim et al, 2005; Jacquemier et al, 2005; Blows et al, 2010). The largest IHC study involving close to 12 000 samples (Blows et al, 2010) showed that the luminal and non-luminal subtypes are recognised primarily by the presence or absence of ER and PR expression, respectively. These two groups can be further separated on the basis of HER2 expression with luminal HER2-positive tumours most closely resembling the luminal B subtype and non-luminal HER2 expressing tumours representing the HER2 molecular subtype. The triple negative subtype is characteristically negative for ER, PR and HER2 expression, but importantly, can also be divided into two further subgroups based on the expression of basal cytokeratins (such as CK5-6) and EGFR (epidermal growth factor receptor) (Nielsen et al, 2004; Blows et al, 2010). The six subtypes of breast cancer defined by this approach demonstrate distinct differences in terms of breast cancer survival (Blows et al, 2010). Furthermore, similarly to gene expression profiling, the IHC expression of proliferation markers such as Ki67 and Aurora A kinase is associated with prognosis in ER-positive disease (Cheang et al, 2009; Ali et al, 2012). In addition, BCL2 expression, as assessed by IHC, is a powerful predictor of favourable prognosis in breast cancer across different molecular subtypes (Dawson et al, 2010). Despite these findings, the routine assessment of IHC markers in addition to ER, PR and HER2 has not yet been implemented into standard clinical treatment guidelines.

Integrating changes at the genomic level into classification

The varied genomic landscape of breast carcinomas is not fully captured using histopathological or transcriptomic analysis. Distinct patterns of genomic rearrangements in breast cancer have been characterised (Stephens et al, 2009) and molecular portraits of breast cancer can be identified from studying the spectrum of copy number alterations using array comparative genomic hybridisation (aCGH) (Chin et al, 2007). For example, aCGH analysis has identified a novel genomic subtype of ER-negative breast cancer characterised by low genomic instability (Chin et al, 2007). Changes in gene expression patterns are influenced by the underlying genomic architecture and some features associated with the intrinsic subtypes have been defined by copy number profiling (Chin et al, 2006; Ding et al, 2010). Furthermore, measures of genomic complexity, such as the complex arm aberration index (CAAI), have been shown to provide important prognostic information in both ER-positive and ER-negative diseases (Russnes et al, 2010).

The emergence of next-generation sequencing technologies has now allowed the characterisation of the mutational landscape of the disease. These analyses have identified novel cancer genes found to be recurrently mutated in breast cancer (Shah et al, 2009, 2012; Banerji et al, 2012; Ellis et al, 2012; Stephens et al, 2012; TCGA, 2012). Although mutations in many of these genes are relatively infrequent, specific patterns of somatic mutations can be grouped according to their association with cellular pathways, underlying tumour biology and distinct clinical phenotypes. Furthermore, these studies have demonstrated the extent of heterogeneity across breast cancer genomes and have allowed further exploration into the role of intratumour heterogeneity (Shah et al, 2009, 2012; Nik-Zainal et al, 2012). The limitation of most of these first generation sequencing studies is the relatively modest number of samples analysed making it difficult to integrate this information with other IHC-based or expression-based classifiers.

Moving towards an integrated classification: METABRIC

All breast carcinomas show significant genetic diversity due to both inherited genetic variation and acquired genomic aberrations (Table I). Inherited variants consist of single-nucleotide polymorphisms (SNPs) and copy number variants (CNVs) and these changes form the background germline genetic landscape of the individual where a cancer might develop. Somatic genomic changes, which include single-nucleotide variants (mutations) and copy number aberrations (CNAs) are acquired and contribute to the initiation and progression of sporadic breast cancers. Genomic aberrations can contribute to carcinogenesis by inducing abnormal gene expression. Through the integrated analysis of both genomic and transcriptomic data across large numbers of breast cancers, the impact of genomic aberrations on the transcriptome can be appreciated. We have recently used this approach to characterise the genomic and transcriptomic architecture of 2000 breast tumours as part of METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) (Curtis et al, 2012).

Table 1. Definition of genomic alterations
Genomic alterationsDefinition Description
SNPSingle-nucelotide polymorphismGermilineInherited genetic variation in the DNA sequence that occurs when a single nucleotide is altered
CNVCopy number variantGermlineInherited alteration of DNA that results in an abnormal number of copies of one or more segments of DNA (1 kilobase or larger)
SNVSingle-nucleotide variantSomaticAcquired genetic variation in the DNA sequence that occurs when a single nucleotide is altered (i.e., point mutation)
CNACopy number aberrationSomaticAcquired alteration of DNA that results in an abnormal number of copies of one or more segments of DNA (1 kilobase or larger)

In this analysis, both germline variants (CNVs and SNPs) and somatic aberrations (CNAs) were found to be associated with alterations in gene expression. However, CNAs accounted for the greatest variability in gene expression. Somatic CNAs were shown to modify the expression of genes located both in cis (nearby to the genomic aberration) and trans (distant to the genomic aberration), but the effects of cis-acting CNAs dominated. Clustering analysis of joint copy number and gene expression data from the cis-associated genes revealed 10 novel molecular subgroups (Figure 1; Table II). The 10 integrative clusters (IntClust 1–10) were each associated with distinct CNAs and gene expression changes (Figure 1). These clusters clearly demonstrated the heterogeneity present within tumours classified according to ER, PR and HER2 expression, and they divided all of the previously identified intrinsic subtypes into separate groups (Figure 2). Furthermore, the 10 groups were associated with distinct clinical features and outcomes (Figure 3). Here, we will provide an overview of each of the novel subtypes, including an analysis of the distribution of mutations in the TCGA data set (TCGA, 2012) (Figure 4), and summarise the new insights gained from this classification relating to the underlying biology and potential molecular drivers in each group.

Figure 1.

The genomic and transcriptomic landscape of the 10 integrative clusters. Genome-wide frequencies (Freq) of somatic copy number alterations (Y axis, upper plot) and the subtype-specific association (−log10 P-value) of gene expression (Y axis, bottom plot) based on the differential expression in each of the 10 integrative clusters compared to the rest. Regions of copy number gain are indicated in red and regions of loss in blue in the frequency plot (upper plot). Overexpressed genes are represented as positive and underexpressed genes as negative (lower plot). The left bars show the composition of each cluster in terms of the intrinsic subtypes (red: basal, pink: HER2, dark blue: luminal A, light blue: luminal B and green: normal-like).

Figure 2.

Relationship between the 10 integrative clusters and ER expression, HER2 expression and PR expression. IHC, immunohistochemistry; Expression, mRNA gene expression; SNP6=copy number alteration as assessed by Affymetrix SNP 6.0 array.

Figure 3.

The 10 integrative clusters represent distinct entities. (A) Distinct clinical features across the 10 integrated clusters. In the left plot, the coloured diamonds mark clinical features demonstrating significant subtype-specific associations. NPI, Nottingham prognostic index. The intensity of the colours in the right plot represents the frequency of each variable in every cluster. (B) Kaplan–Meier curves of disease-specific survival (left) and overall survival (right) across the 10 integrated clusters (truncated at 15 years). For each cluster, the number of cases at risk is indicated as well as the total number of deaths (in parentheses).

Figure 4.

The genomic instability and mutational landscape of the 10 breast cancer subtypes. (A) Genomic instability across the 10 integrative clusters. Genomic instability was measured by the area under the segmented means of the log ratios. (B) Somatic mutation spectrum across the 10 integrative clusters. The Cancer Genome Atlas Network has characterised the mutational landscape of ∼500 breast cancers, also classified into the 10 integrative clusters, using exome sequencing (TCGA, 2012). The top 10 genes mutated in each integrative cluster are represented. Mutational frequency is displayed on the Y axis. Red asterisks indicate genes where the mutational frequency shows a significant subtype-specific association.

Table 2. Features of the integrative clusters
IntClustFrequency (n, %)Defining molecular featuresExpression (n, %)PAM50 (n, %)Clinical featuresPrognosis (5-year, 10-year DSS)Genomic instability
  1. IntClust, integrative cluster; DSS, disease-specific survival; LN+, lymph-node involvement.

1139 (7%)17q23 amplificationER+: 123 (88.49%)Basal: 9 (6.47%)High gradeIntermediateHigh
   PR+: 60 (43.17%)HER2: 21 (15.11%) 0.80, 0.69 
   HER2+: 20 (14.39%)LumA: 11 (7.91%)   
    LumB: 90 (64.75%)   
    Normal: 8 (5.76%)   
272 (4%)11q13/14 amplificationER+: 69 (95.83%)Basal: 2 (2.78%)No distinct clinical featuresPoorHigh
   PR+: 51 (70.83%)HER2: 6 (8.33%) 0.78, 0.51 
   HER2+: 3 (4.17%)LumA: 25 (34.72%)   
    LumB: 36 (50%)   
    Normal: 3 (4.17%)   
3290 (15%)Paucity of copy number changesER+: 278 (95.86%)Basal: 4 (1.39%)Low gradeGoodLow
   PR+: 211 (72.76%)HER2: 9 (3.14%)Low LN+0.93, 0.88 
   HER2+: 1 (0.34%)LumA: 195 (67.94%)   
    LumB: 43 (14.98%)   
    Normal: 36 (12.54%)   
4343 (17%)CNA devoidER+: 238 (69.39%)Basal: 64 (18.71%)Low gradeGoodLow
   PR+: 155 (45.19%)HER2: 34 (9.94%) 0.89, 0.76 
   HER2+: 20 (5.83%)LumA: 106 (30.99%)   
    LumB: 29 (8.48%)   
    Normal: 109 (31.87%)   
5190 (10%)ERBB2 amplificationER+: 79 (41.58%)Basal: 21 (11.05%)Younger age at diagnosisPoorIntermediate
   PR+: 40 (21.05%)HER2: 108 (56.84%)High grade0.62, 0.45 
   HER2+: 181 (95.26%)LumA: 18 (9.47%)High LN+  
    LumB: 33 (17.37%)   
    Normal: 10 (5.26%)   
685 (4%)8p12 amplificationER+: 85 (100%)Basal: 3 (3.53%)No distinct clinical featuresIntermediateHigh
   PR+: 36 (45.88%)HER2: 10 (11.76%) 0.83, 0.59 
   HER2+: 3 (3.53%)LumA: 23 (27.06%)   
    LumB: 43 (50.59%)   
    Normal: 6 (7.06%)   
7190 (10%)16p gain, 16q loss, 8q amplifcationER+: 187 (98.42%)Basal: 3 (1.59%)Older age at diagnosisGoodIntermediate
   PR+: 150 (78.95%)HER2: 9 (4.76%)Low grade0.94, 0.81 
   HER2+: 2 (1.05%)LumA: 123 (65.08%)   
    LumB: 41 (21.69%)   
    Normal: 13 (6.88%)   
8299 (15%)1q gain, 16q lossER+: 297 (99.3%)Basal: 1 (0.33%)Older age at diagnosisGoodIntermediate
   PR+: 236 (78.93%)HER2: 9 (3.01%)Low grade0.88, 0.78 
   HER2+: 1 (0.33%)LumA: 192 (64.21%)   
    LumB: 89 (29.77%)   
    Normal: 8 (2.68%)   
9146 (7%)8q gain, 20q amplificationER+: 125 (85.62%)Basal: 20 (13.79%)High gradeIntermediateHigh
   PR+: 79 (54.11%)HER2: 26 (17.93%) 0.78, 0.62 
   HER2+: 10 (6.85%)LumA: 24 (16.55%)   
    LumB: 70 (48.28%)   
    Normal: 5 (3.45%)   
10226 (11%)5q loss, 8q gain, 10p gain, 12p gainER+: 25 (11.06%)Basal: 202 (89.38%)Younger age at diagnosisPoorIntermediate
   PR+: 19 (8.41%)HER2: 8 (3.54%)High grade0.71, 0.68 
   HER2+: 6 (2.65%)LumA: 1 (0.44%)Large tumours  
    LumB: 14 (6.19%)   
    Normal: 1 (0.44%)   

The ten integrative clusters

IntClust 1

Integrative cluster 1 is constituted by ER-positive tumours, predominantly classified into the luminal B intrinsic subtype. The subgroup typically has an intermediate prognosis, similar to that of IntClust 6 and 9 (Figure 3). All encompass a high proportion of higher proliferation ER+/luminal B tumours, and are characterised by relatively high levels of genomic instability (Figure 4). The defining molecular feature of IntClust 1 is amplification of the 17q23 locus (Figure 1), a region of amplification previously well described (Sinclair et al, 2003; Parssinen et al, 2007). IntClust 1 also has the highest prevalence of GATA3 mutations across all of the 10 clusters (Figure 4). These features separate IntClust 1 tumours from other ER-positive tumours previously grouped together within the luminal B intrinsic subtype.

Amplification of 17q23 in these tumours is associated with cis-driven overexpression of several adjacent genes including RPS6KB1, PPM1D, PTRH2 and APPBP2 (Figure 1). In particular, RPS6KB1 (ribosomal protein S6 kinase 1) and PPM1D (protein phosphatase 1D), show high cis outlying expression in this group and both have previously been implicated as potential oncogenic drivers (Sinclair et al, 2003). RPS6KB1 is a serine/threonine protein kinase that acts downstream of mammalian target of rapamycin (mTOR) signalling to regulate cell cycle, cell growth, proliferation and migration through translational control (Fingar et al, 2002, 2004; Hannan et al, 2003). PPM1D is a p53-inducible serine/threonine protein phosphatase known to dephosphorylate p38 mitogen-activated protein kinase (MAPK) and inhibit p38 MAPK-dependent phosphorylation of p53 leading to downregulation of p53-dependent transcription, inhibition of cell-cycle arrest and apoptosis (Bulavin et al, 2002). Both of these oncogene candidates represent novel potential therapeutic targets, highlighting the importance of understanding key genomic drivers within the subtypes and using this information to improve substratification within the disease.

IntClust 2

Integrative cluster 2 is comprised of ER-positive tumours and includes both luminal A and luminal B tumours. Remarkably, this subgroup is associated with the worst prognosis of all ER-positive tumours with a 10-year disease-specific survival rate of only around 50% (Figure 3). The defining molecular feature of this subtype is amplification of 11q13/14 (Figure 1) showing a characteristic ‘firestorm’ pattern identified by clustered narrow peaks of relatively high copy number gains (Hicks et al, 2006). This is reflected in the relatively high levels of genomic instability in this group (Figure 4). Amplification of 11q13/14 is well recognised in breast cancer, with several known and putative driver genes residing in this region including CCND1 (11q13.3), EMSY (11q13.5) and PAK1 (11q14.1) (Hughes-Davies et al, 2003; Santarius et al, 2010). Analysis of copy number data suggests two separate amplicons in this region; one amplicon centred around CCND1 at 11q13.3 and the other spanning UVRAG-GAB2 between 11q13.5 and 11q14.1 encompassing multiple genes that show strong cis outlying gene expression including PAK1, RSF1, EMSY, C11orf67 and INTS4 (Figure 1). Distinguishing a single driver in the region of 11q13/14 is challenging as the majority of individuals in this subgroup have amplifications involving multiple genes, suggesting that a combination of drivers are likely to be important rather than just a single oncogene. Pathway analysis in this subtype shows enrichment of genes involved in cell-cycle regulation, particularly the G1/S transition as exemplified by CCND1. These alterations are likely to explain the aggressive pathophysiology of this cluster and emphasise the importance of identifying this poor prognostic group within ER-positive subtypes.

IntClust 3

Integrative cluster 3 is composed primarily of luminal A cases and is enriched for histopathological subtypes that have a good prognosis such as invasive lobular and tubular carcinomas. Clinically, individuals within this subtype often present with small low-grade tumours and a low incidence of regional lymph-node involvement (Figure 3). At the molecular level, the subtype is characterised by low genomic instability, a very low prevalence of TP53 mutations, and a paucity of copy number and cis-acting alterations (Figures 1 and 4). However, of note, tumours within this subtype have the highest frequency of PIK3CA, CDH1 and RUNX1 mutations (Figure 4). Importantly, the subgroup is associated with the best prognosis of all the 10 integrative clusters with a 10-year disease-specific survival of around 90% (Figure 3). The excellent prognosis of this subtype emphasises the importance of identifying this cluster within the previously defined luminal A intrinsic subtype, as these individuals represent a distinct group that could potentially be spared treatment with systemic chemotherapy.

IntClust 4

Integrative cluster 4 is a unique cluster incorporating both ER-positive (n=238/343) and ER-negative (n=105/343) cases, including 26% of all triple negative tumours, and a mixture of intrinsic subtypes including basal-like cases (Table II). Importantly, the subtype is associated with favourable outcome and a 10-year disease-specific survival of around 80% (Figure 3). Similarly to IntClust 3, IntClust4, the largest subtype of breast cancer (up to 17% of cases), is characterised molecularly by low levels of genomic instability and a ‘CNA-devoid’ flat copy number landscape (Figures 1 and 4). Around 20% of cases within this subtype demonstrate deletions at the T-cell receptor (TCR) loci on chromosomes 7 (TRG) and 14 (TRA), in the background of an otherwise genomically quiescent subtype. Many of the tumours within this subgroup show evidence of extensive lymphocytic infiltration and the observed deletions are the consequence of the somatic TCR rearrangement present in the infiltrating T cells. This was previously reported using single-cell sequencing in one case of triple negative breast cancer with extensive lymphocytic infiltration (Navin et al, 2011), but is now demonstrated using array profiling and a fairly large number of tumours. The genomic copy number loss at the TCR loci is associated in trans with an immune response expression signature mirroring the lymphocytic infiltration and probably explaining the favourable prognosis seen, in particular, for the triple negative tumours classified into this subtype. These findings, in a subset of basal triple negative tumours, support earlier observations using gene expression analysis (Teschendorff et al, 2007; Teschendorff and Caldas, 2008) and are also corroborated by a recent combined analysis of imaging and expression data (Yuan et al, 2012). The observations suggest that the presence of mature T lymphocytes in the tumour represents a specific immunological response to the cancer, a finding that could potentially be exploited in the development of future therapeutics.

IntClust 5

Integrative cluster 5 encompasses the ERBB2 amplified cancers composed of both HER2-enriched ER-negative (58%) and luminal ER-positive cases (42%). Women in the METABRIC study were enrolled before the general availability of trastuzumab, and as expected, this group demonstrated the worst disease-specific survival at 10 years of around 45% (Figure 3). In keeping with common clinical features of HER2-positive breast cancers, individuals within this subtype often present at a younger age, with high-grade tumours and involvement of regional lymph nodes (Figure 3). In addition to specific ERBB2 amplification at 17q12 (Figure 1), these tumours demonstrate intermediate levels of genomic instability and a high proportion of TP53 mutations (in >60% cases) (Figure 4). In contrast to the HER2-enriched intrinsic subtype, the IntClust 5 identifies almost all cases with ERBB2 amplification, thus grouping all individuals that might benefit from HER2-related targeted therapy into a single subtype.

IntClust 6

Integrative cluster 6 represents a distinct subgroup of ER-positive tumours, comprising both luminal A and luminal B cases. Clinically, this cluster shows an intermediate prognosis and a 10-year disease-specific survival of around 60% (Figure 3). Molecularly, this subtype is characterised by specific amplification of the 8p12 locus (Figure 1) and high levels of genomic instability (Figure 4). Notably, tumours within this cluster demonstrate the lowest levels of PIK3CA mutations across all of the ER-positive cancers (Figure 4). The genomic landscape is dominated by cis-acting alterations associated with the 8p12 amplicon, a region previously known to be commonly amplified in ER-positive breast cancers, which encompasses the known oncogenic driver ZNF703 (Holland et al, 2011; Sircoulomb et al, 2011; Slorach et al, 2011; Figure 1). ZNF703 (zinc finger protein 703) is a transcriptional repressor that regulates genes involved in key cancer phenotypes such as increased proliferation, invasion and the balance of the progenitor/stem cell compartment (Holland et al, 2011). Similarly to IntClust 2, identification of this more aggressive group of ER-positive/HER2-negative tumours within the luminal intrinsic subtypes may assist in improving the stratification and prediction of outcome in women with ER-positive disease.

IntClust 7

Integrative cluster 7 is comprised predominately of ER-positive luminal A tumours and identifies a good prognostic subgroup with 10-year disease-specific survival rates of around 80% (Figure 3). As for cases in IntClust 3, the majority of individuals within this cluster present with low-grade well-differentiated tumours that display both ER and PR positivity (Figure 3). However, unlike the paucity of copy number changes seen in association with IntClust 3, IntClust 7 is characterised by intermediate levels of genomic instability, specific 16p gain and 16q loss, as well as a higher frequency of 8q amplification (Figures 1 and 4). Interestingly, tumours within IntClust 7 also demonstrate the highest frequency of MAP3K1 and CTCF mutations across all clusters (Figure 4).

IntClust 8

Integrative cluster 8 shares similarities with IntClust7 and encompasses ER-positive tumours predominately of the luminal A intrinsic subtype. As described for IntClust 7, individuals within IntClust 8 also present with low-grade well-differentiated tumours (Figure 3), and the subgroup is associated with a good prognosis and similar 10-year disease-specific survival rates of around 80% (Figure 3). This subgroup, however, is characterised molecularly by the classical 1q gain/16q loss event that corresponds to a common unbalanced translocation event (Kokalj-Vokac et al, 1993; Russnes et al, 2010), unlike IntClust 7 that lacks the 1q alteration but maintains the 16q changes (Figure 1). Although genomic instability is generally lower in well-differentiated breast carcinomas (Figure 4), the frequent identification of 1q gains and 16q losses that characterise the IntClust 8 subtype is well recognised in low-grade invasive ductal carcinomas (Roylance et al, 1999). Furthermore, tumours within IntClust 8 demonstrate high levels of PIK3CA, GATA3 and MAP2K4 mutations (Figure 4). Together with IntClust 3 and IntClust 7, IntClust 8 separates tumours previously grouped under the luminal A intrinsic subtype into three distinct biological subgroups driven by specific genomic aberrations (Figure 3).

IntClust 9

Integrative cluster 9 is comprised of a mixture of intrinsic subtypes, but includes a large number of ER-positive cases of the luminal B subgroup. Similarly to IntClust 6, IntClust 9 shows an intermediate prognosis with a 10-year disease-specific survival of around 60% (Figure 3). This cluster is characterised by high levels of genomic instability and the highest level of TP53 mutations among the ER-positive subtypes (Figure 4). Molecularly, it is defined by 8q cis-acting alterations and 20q amplification (Figure 1). In conjunction with IntClust 1 and IntClust 6, IntClust 9 shows a high proportion of cases with deletions of PPP2R2A, on chromosome 8p. PP2R2A (protein phosphatase 2 regulatory subunit B alpha) is a serine/threonine phosphatase integral to several signal transduction pathways. Loss of transcript expression of PPP2R2A appears to predominate in mitotic ER-positive breast cancers typically of the luminal B intrinsic subtype, such as those grouped into IntClust 1, 6 and 9. Mutations and methylation silencing of PPP2R2A have recently been reported in other solid malignancies (Tan et al, 2010; McConechy et al, 2011), suggesting a possible role for PPP2R2A as a putative tumour suppressor in tumours associated with the luminal B subtype, in particular those associated with this integrative cluster.

IntClust 10

Integrative cluster 10 incorporates mostly triple negative tumours (n=190/320 classify into this cluster) from the core basal-like intrinsic subtype. Although the subtype represents a high-risk group in the first 5 years after diagnosis, beyond 5 years the prognosis for this subgroup is relatively good (Blows et al, 2010; Figure 3). Clinically, these women usually present at a younger age with high-grade and poorly differentiated tumours (Figure 3). These breast cancers have the highest rates of TP53 mutations despite displaying only intermediate levels of genomic instability (Figure 4). Molecularly, the subtype is characterised by copy number alterations involving 5q loss and gains at 8q, 10p and 12p (Figure 1). In particular, 5q deletions are associated with a basal-specific trans gene expression module enriched for many checkpoint, DNA damage repair and apoptosis genes such as AURKB, BCL2, BUB1, CDCA3, CDCA4, CDC20, CDC45, CHEK1, FOXM1, HDAC2, IGF1R, KIF2C, KIFC1, MTHFD1L, RAD51AP1, TTK and UBE2C. These transcriptional changes reflect the high mitotic index typically associated with this subgroup. Of note, TTK (MPS1) a dual specificity kinase that assists AURKB in chromosome alignment during mitosis is upregulated in association with 5q loss and high levels of TTK have recently been reported to promote aneuploidy in breast cancer (Daniel et al, 2011). These findings suggest that 5q deletions modulate the landscape of genomic instability and cell-cycle regulation alterations observed within this subgroup.


The future of breast cancer classification will involve multiple levels of assessment incorporating clinical information about the patient, tumour-specific information determined by histopathology, and molecular information revealed by genomic, transcriptomic and proteomic profiling to provide subtype-specific diagnostic, prognostic and predictive tests. At the genomic level, next generation sequencing will allow the complete genomic landscape of somatic mutations, structural rearrangements, copy number alterations and epigenetic events to be assessed adding increasing complexity, yet helping to further elucidate the mechanisms driving each subtype. As well as focussing on understanding intertumour heterogeneity, the context of intratumour heterogeneity will also require consideration in the interpretation and implementation of molecular classification systems (Caldas, 2012). Integrating multiple layers of complex data into robust classifiers and the clinical implementation of these into the routine management of breast cancer patients will present many challenges. However, rapid progress is being made, allowing us to move closer to the realisation of individualising the diagnosis and treatment of breast cancer.

Through the integrated analysis of CNAs and their effect on gene expression, novel molecular subgroups have been identified that help refine our understanding of breast cancer heterogeneity. The integrative clusters provide important biological insights into the potential molecular drivers and pathways underlying certain groups, and these have distinct implications for the rationale development of targeted therapeutics. In addition the analysis highlights key subtypes, such as those devoid of somatic CNAs that will require more extensive molecular profiling. As we gain a better appreciation of the heterogeneity of breast cancer, it is clear that thousands of patients must be studied to fully appreciate the clinical implications of novel and rare subgroups of the disease. Furthermore, more sophisticated model systems, both in vitro and in vivo, including the use of xenograft tumours derived directly from primary clinical material, will be needed to dissect the biological complexities of this heterogeneity. These approaches will provide the opportunity to understand the molecular events and pathways underpinning each group, and potentially allow the identification of the cell of origin or tumour-initiating cell in each subtype. These findings will undoubtedly lead to fundamental advances in our approach to the classification, biological characterisation and management of breast cancer.

Conflict of Interest

The authors declare that they have no conflict of interest.