Expression profiling technology: its contribution to our understanding of breast cancer


  • E A Rakha,

    1. Department of Histopathology, Nottingham City Hospital NHS Trust, Nottingham University, Nottingham and The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, UK
    Search for more papers by this author
  • M E El-Sayed,

    1. Department of Histopathology, Nottingham City Hospital NHS Trust, Nottingham University, Nottingham and The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, UK
    Search for more papers by this author
  • J S Reis-Filho,

    1. Department of Histopathology, Nottingham City Hospital NHS Trust, Nottingham University, Nottingham and The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, UK
    Search for more papers by this author
  • I O Ellis

    1. Department of Histopathology, Nottingham City Hospital NHS Trust, Nottingham University, Nottingham and The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, UK
    Search for more papers by this author

I O Ellis, Department of Histopathology, Molecular Medical Sciences, Nottingham City Hospital NHS Trust, University of Nottingham, Hucknall Road, Nottingham NG5 1PB, UK. e-mail:


Breast cancer is a complex genetic disease characterized by the accumulation of multiple molecular alterations. Routine clinical management of breast cancer relies on clinical and pathological factors, however. These seem insufficient to reflect the whole clinical heterogeneity of tumours and are less than perfectly adapted to each patient. Recent advances in human genome research and high-throughput molecular technologies have made it possible to tackle the molecular complexity of breast cancer and have contributed to the realization that the biological heterogeneity of breast cancer has implications for treatment. Gene expression profiling of breast cancer has been performed using several approaches. This review will describe the details of gene expression profiling of breast cancer, the different approaches and the impact on clinical management.


oestrogen receptor


hormone receptor




reverse transcriptase-polymerase chain reaction


tissue microarray


Breast cancer, which is the most common cancer in women,1,2 is an extraordinarily genotypically, phenotypically and clinically heterogeneous disease. Current routine clinical management of breast cancer relies on a constellation of clinical and pathological prognostic and predictive factors to support treatment decision making. With the ever expanding armamentarium of systemic therapy agents, there is a perception that the traditional prognostic factors currently available are not sufficient to reflect the whole clinical and molecular heterogeneity of the disease and are therefore less than perfectly adapted for future clinical use. Despite the strong overall association between the standard clinicopathological variables currently used in breast cancer management and patients’ prognosis and outcome,3 it has become clear that patients with similar clinical and pathological features may show distinct outcomes and vary in their response to therapy.4 Although algorithms based on clinicopathological data are now routinely used to define prognostically significant groups and to tailor systemic therapy for breast cancer patients (e.g. adjuvant/online), further improvements are required. For example, several studies have shown that approximately one-third of lymph node-negative breast cancer patients who are classified as lying within a ‘good prognostic group’ develop recurrence,5 whereas a similar proportion of node-positive patients remain free from development of distant metastases.6 On the other hand, a significant proportion of patients allocated to a poor prognosis group will never develop distant recurrence.7–9 Therefore, there is an increasing need for additional prognostic factors to improve patients’ risk stratification and the targeting of treatment for those who will truly benefit, thereby avoiding iatrogenic morbidity in those who will not.

Recent advances in human genome research and high-throughput molecular technologies have made it possible to begin to tackle the molecular complexity of breast cancer and have contributed to the realization that the biological heterogeneity of breast cancer has implications for treatment. Gene expression profiling, which refers to any method that can analyse the expression of genes in selected samples, has been widely applied to cancer research in the past few years. These techniques include differential display,10 serial analysis of gene expression,11 various proteomics approaches12 and gene expression microarrays,13 which currently is the most ‘high profile’ or ‘popular’ method. Given the apocalyptic dimensions of the promise of microarrays,14 the focus of this review will be on the contribution of microarray-based expression profiling analysis. The two main types of expression microarrays are cDNA microarrays and oligonucleotide microarrays.15,16 Both types of microarray are hybridized with cDNA or RNA samples obtained from tissues of interest to assess changes in their expression levels. Technical details about microarrays are beyond the scope of this review, and readers are recommended to access the numerous excellent reviews on this subject.

As these global expression profiling approaches can be applied to a relatively limited number of cases and yield candidate genes that require verification, other techniques have been applied. Reverse transcriptase-polymerase chain reaction (RT-PCR) allows simultaneous PCR amplification and detection of target DNA or cDNA sequences and provides semiquantitative assessment of the relative abundance of specific transcripts using gene-specific primers. In addition, tissue microarrays (TMAs)17 have provided a method for high-throughput protein expression analysis of large cohorts of archival formalin-fixed paraffin-embedded tissue samples that can be readily linked to clinicopathological and long-term follow-up databases. In fact, the development and application of these high-throughput molecular techniques have ushered in an era of genome-wide approaches to cancer prognostication and therapy prediction and are paving the way for comprehensive studies of genes, gene products and signalling pathways.

Broadly speaking, molecular profiling of breast cancers by gene expression microarrays can be performed in one of two ways: unsupervised and supervised analysis. Unsupervised analysis refers to an extensive set of methods, of which hierarchical clustering analysis has become the most popular, for partitioning tumour samples into groups or classes on the basis of gene expression profiles, regardless of other features.18 The main objective of this approach in cancer is to determine whether discrete subsets can be defined on the basis of gene expression profiles and to identify new classes (class discovery) that may have clinical significance and therefore to develop a new molecular taxonomy. Supervised analysis requires tumour cases to be allocated to specific groups based on clinical or pathological features (e.g. clinical outcome or response to therapy). There are two main subtypes of supervised analysis: class comparison and class prediction. The former aims to identify the transcriptomic differences between two classes of tumours, whereas the development of a ‘gene signature’ is the ultimate goal of the latter.19–21 This review will focus on the details of these three approaches in breast cancer.

Class discovery in breast cancer (molecular taxonomy)

Class discovery studies have shaken the world of breast cancer research. The Stanford group22 pioneered this approach in proof of principle studies which demonstrated that breast cancer could be classified into distinct groups based upon their gene expression profiles and their similarity to normal cell counterparts. Using a hierarchical clustering method and an ‘intrinsic gene set’, they have classified breast cancer into four ‘molecular’ classes. Apart from the intuitive separation of breast cancers into oestrogen receptor (ER)+ and ER− disease (the two main branches or clusters), additional smaller secondary clusters have also been identified. According to that study, the ER+ group is characterized by higher expression of a panel of genes that are typically expressed by breast luminal epithelial cells (‘luminal’ cancer). The ER− branch encompassed three subgroups of tumours: one overexpressing ERBB2 (HER2); one expressing genes characteristic of breast basal/myoepithelial cells (basal-like cancer); and another with a gene expression profile similar to normal breast tissue which consistently clustered together with normal breast samples and fibroadenomas. Subsequent studies have been able to identify consistently most of the key groups originally defined by Perou et al.;22 however, some subgroups have been proven quite unstable. For example, the subclassification of luminal tumours has ranged from a single group to up to three groups; ‘normal breast-like’ cancers have appeared to be indistinguishable from the ER− cluster in most,22,23 but not all, studies.24 Alternative technological approaches to gene expression arrays have also emerged, including use of RT-PCR to detect expression of a panel of selected genes25–27 and protein biomarker expression of tissue sections using immunocytochemistry,28–32 which, interestingly, have been able to some extent to replicate this molecular classification (Table 1). The importance of this molecular taxonomy is twofold: (i) cases in each distinct molecular group differ in their clinical behaviour, even though the classification system was not developed to predict outcomes; and (ii) transcriptomic analysis has provided some leads on genes that may drive each molecular subgroup. These genes, or their protein products, could potentially be developed as therapeutic targets (e.g. epidermal growth factor receptor and c-kit in basal-like cancers).

Table 1.   Examples of class discovery studies in breast cancer
ReferenceTumours (no.)Intrinsic gene set (total)No. of classes Name and no. (%) of the different molecular classes
  1. *105 tumours were used to drive the intrinsic gene signature and 311 cases as a test and validation set.

  2. †The authors of this study applied a different analytical approach to 599 microarrays from five separate cDNA microarray datasets and identified three main groups, collectively called the ESR1/HER2 subtypes.

  3. ‡This study included oestrogen receptor (ER) and ER-related proteins and AQUA -based objective quantitative analysis of tissue microarray for the classification.

A. cDNA microarrays
2265496 (8102)4Luminal, 36 (58), HER2, 7 (11), basal-like, 8 (13) and normal breast like, 11 (18)
2378456 (8102)6Luminal A, 32 (38), luminal B, 5 (6), luminal C 10 (12), HER2, 11 (13), basal-like, 14 (16) and normal breast like, 13 (15)
54115534 (8102)5Luminal A, 28 (36), luminal B, 11 (14), HER2, 11 (14), basal-like, 19 (24) and normal breast like, 9 (12) (37 cases were unclassified)
7599706 (7650)6Luminal-1, 19 (19), luminal-2, 23 (23), luminal-3, 24 (24), HER2, 7 (7), basal-like-1, 16 (16) and basal-like-2, 10 (10)
48 105*
1300 (then 306 gene subset)6Luminal A, 89 (28), luminal B, 60 (19), HER2, 27 (9), basal-like, 84 (27), normal breast like, 21 (7) and INF regulated class, 25 (8) (9 cases unclassified)
24412484 (7650)5Luminal A, 122 (30), luminal B, 54 (13), HER2, 43 (10), basal-like, 59 (17) and normal breast like, 91 (22) (43 cases were unclassified)
100812300 (8000)5Luminal A, 26 (31), luminal B, 7 (8), HER2, 15 (18), basal-like, 16 (19) and normal breast like, 6 (7) (13 cases were unclassified)
  39 599† 3ESR1+/HER2−, ESR1−/HER2− and HER+
2711753 genes4Luminal, HER2, basal-like and normal breast like subsets
2619947 genes12They identified 12 subgroups some corresponds to the luminal A/B, normal breast like, HER2 and basal-like tumour subsets.
  2512440 genes4Luminal, HER2, basal-like tumour and normal breast like subsets
B. Immunohistochemistry
28107625 proteins6Group 1 (luminal-1), 336 (31%), group 2 (luminal-2), 180 (17%), group 3 (HER2), 234 (22%), group 4, 4 (0.4%), group 5 (basal-like), 183 (17%), group 6 (luminal-3), 139 (13%)
10116615 proteins3HER2 overexpressing, CK5/6 positive (basal-like), and ER/PgR and luminal CK positive tumours
2943831 proteins3Group-1, group-2 and group-3 using 21 markers and the same results were produced using a set of 11 markers
3052526 proteins4Using 21 proteins they classified BC into: 1–2 classes (good prognosis and poor prognosis), 2–3 clusters (Two major clusters designated A (471) including subcluster A1 (409) and A2 (62)] and cluster B (81), and then, into four clusters (Proliferation, mitosis, differentiation and ER-related clusters)
102977 proteins8Two distinct groups characterized by ER status were identified. Then, 8 subsets of tumours were further identified according to the expression of other markers
  316335 proteins4Clusters 1–4
10323634 proteins5(Using 24 protein), luminal-A, 61 (27%), luminal-B, 28 (12%), HER2, 48 (21%), basal-like, 29 (13%) cluster and a so-called ‘multiple marker negative’ (unclassified) cluster, 63 (27%) characterized by the absence of specifying markers.
  3259535 proteins‡4Cluster A (lowest ER levels), cluster B, cluster C (highest ER and best prognosis) and cluster D

Features of the molecular classes

Although gene expression studies have demonstrated that ER, ER-related genes and HER2 are important biological drivers for some of the individual subclasses that they define, the difference between these classes was not based on single genes or a specific pathway, but on a constellation of several groups of genes that make the fingerprint or ‘portrait’ characteristic of each class. For example, Charafe-Jauffret et al.33 have identified a set of 1233 genes differentially expressed between basal-like and luminal samples. In fact, no single gene can identify these classes reliably. For example, although ER expression is a key factor in these classifications, both ER+ and ER− samples displayed heterogeneous expression profiles, with the identification of at least two or three subgroups in each category with different behaviour and outcome. In other words, these novel molecular subtypes can be thought of as being defined by expression of a collection of genes. Most of the discriminator genes appear to be involved in cell cycle regulation, cell proliferation, cell signalling, hormone receptors (HR) and oncogenic pathways. However, it is likely that further clinical substratification will occur; for example, the existence of a molecular ‘apocrine’ breast cancer subtype with increased androgen signalling and frequent HER2 amplification has been reported.34 Another classification based on the gene signatures of RAS and other deregulated pathways has also been reported.35Table 2 shows some features of the different molecular classes of breast cancer described to date.

Table 2.   Some aspects of the different molecular classes of breast cancer
Class%Important aspects
  1. ER, Oestrogen receptor; CK, cytokeratin; EGFR, epidermal growth factor receptor.

  2. *Although there is no general consensus, this class can be grouped into at least two subtypes, luminal-A and luminal B.

  3. †Comprehensive reviews on the biology and clinical significance of basal-like cancers have recently been published.52,108,114

  4. ‡These are sometimes called unclassified, undetermined or null subtype.44,51,103,115

A. ER-positive/luminal tumours (34–66%)7,54,67,75*
1 Luminal-A19–39%Demonstrate the highest expression of the ER and ER-related genes and show the best prognosis
 2 Luminal-B10–23%Have profiles enriched for ‘luminal genes’ but show low to moderate expression of genes pertaining to the ER cluster. Compared with luminal-A, they may have a higher proliferation rate, express genes that seem to be shared with the basal-like and HER2 subtypes and are associated with less favourable outcome48,104
B. ER− tumours (30–45%), characterized by lack of HR expression and low to absent expression of some other luminal markers. This class is further subclassified into at least three distinct subtypes that share a common feature ‘e.g. HR and GATA3 negativity’, but are otherwise molecularly and biologically different
1 Basal-like7,67,7516–37%Express genes previously identified to be characteristic of basal/ myoepithelial cells such as CK5 and CK17, integrin 4, laminin, c-KIT, α6-integrin, metallothionein 1X, fatty acid binding protein 7, P-cadherin, EGFR and NF-κB.22,48,105–107 In addition, it is now recognized that tumours from patients carrying BRCA1 mutations fall within the basal-like subgroup33,54,108–110
2 HER2+448,54,67,111,1124–10%Express high levels of genes located in the HER2 amplicon (17q11), including HER2, GRB7, GATA4, high-levels of NF-κB activation. Both basal-like and HER2 tumours share certain additional features, such as high levels of p53 mutation,48 aggressive clinical behaviour, poor prognosis and do not respond to hormonal therapy
 3 Normal breast-like23,104,113Up to 10% of all breast cancersCharacterized mainly by high expression of genes characteristic of parenchymal basal epithelial cells and adipose/mesenchymal stromal cells with low expression of genes characteristic of luminal epithelial cells. These tumours have a prognosis that seems to be better than that of basal-like cancers23,104 and do not appear to respond to neoadjuvant chemotherapy at the same rates as other tumours pertaining to the ER− cluster88,113

Although the identification of more than one subtype within the ER+/luminal tumours is logical and expected, it remains to be determined whether expression arrays provide any additional information to adequate histological grading and growth fraction (e.g. Ki67 proliferation) indices. Unfortunately, data on ‘head-to-head’ comparisons between microarray subclassification of luminal breast cancers and the stratification obtained by means of grade, Ki67 and HER2 expression are scant. In addition, the frequency of the HER2 class, as reported in these studies (4–10%) is much lower than the percentage of HER2 amplification and overexpression (20–25%) seen in human breast cancers.36,37 This low incidence of representation of the HER2 class in profiling studies may be due to differences in the criteria of positivity when compared with the present cut-off for positivity in immunohistochemistry (IHC), as defined as 10% of tumour cells,38 which may result in sampling bias in expression studies. Secondly, a significant proportion of ER+ but HER2-expressing tumours have been shown to cluster together with other luminal B cancers and a small proportion of ER−/HER2+ cancers fall into the basal-like cluster.23,33,39,40 This is also supported by the findings in some studies, for example, that up to 50% of patients with HER2+ tumours have been classified as HR+,36,41,42 and up to 17% of HR+ tumours show HER2 positivity.36 It is likely that the difference in the incidence of HER2 tumours between expression profiling and IHC studies reflects some molecular heterogeneity of HER2-expressing tumours, and further investigation may help in further understanding the biology of this class and in predicting response to therapy. Interestingly, it has been shown that HER2-amplified cancers that express genes pertaining to the basal-like cluster at high levels show a poor response to Trastuzumab plus vinorelbine.43 Furthermore, the biological significance of certain groups, such as normal breast-like cancers, is yet to be determined. However, their existence has significant clinical implications; in a way akin to basal-like cancers, the majority of normal breast-like tumours lack ER, progesterone and HER2 expression (i.e. have a triple-negative phenotype). Thus, triple negativity should not be considered a surrogate for basal-like cancers. This issue is specifically covered by another review in this issue of Histopathology.

Importantly, most studies have reported that the best prognosis is seen in patients with luminal (ER+) tumours and, in particular, those of luminal-A subtype when subclassified, and the worst prognosis in HER2 and basal-like (mainly ER−) tumours. Although these findings are not novel and are to some extent to be expected, several points of clinical importance should be mentioned: (i) ER+ tumours are not a single entity and one subclass (luminal-B) is reported to show a poor outcome, comparable to the ER− basal-like and HER2 tumours; (ii) triple-negative tumours can be further classified into at least two distinct types, namely basal-like and normal breast-like groups, each with a distinct molecular signature and behaviour; (iii) Most, but not all, basal-like and HER2 tumours have poor prognostic features as defined by routine pathology methods (e.g. histological grade 3 or lymph node positive), which may have important implications for outcome and clinical management. However, with variation in methodology and defining criteria for each class, and consequent variation in cohort size and composition, a difference in the clinical significance of the resulting molecular classification is therefore expected. For example, although some studies have reported a worse prognosis for the basal-like tumours, other studies have failed to find this association.44,45

TMA and molecular profiling

After the identification of prognostic classes and the recognition of potential clinical utility of gene expression profiles, the issue of validation has been raised. Most breast cancer gene expression studies to date have examined limited sample sizes, which are not sufficient to generate reliable prognostic data.19,46 This has been mainly due to practical hurdles. The technique requires access to large numbers of fresh frozen tumour samples that are linked to reliable clinical information. However, before routine clinical application of the results, large-scale studies of well-characterized series of breast cancer need to be carried out to demonstrate the robustness and independent significance of these profiling results. In addition, the suboptimal results of microarray-based expression profiling using RNA extracted from formalin-fixed paraffin-embedded samples, coupled with the cost, complexity and interpretation of expression arrays, limit the use of this technology in the routine clinical setting. For this reason, IHC applied to TMAs has been used for the validation of molecular profiling in directing the discovery of optimal biomarker panels with maximal prognostic or predictive value and is, of course, suitable for use in clinical practice. Although molecular profiling of breast cancers at the transcriptomic and proteomic levels has been shown to yield comparable results (Table 1), differences between the two methods have also been documented. Ginestier et al.47 have found a good correlation between the two methods in only one-third of the 15 molecules examined. However, given that these approaches identify prognostically meaningful groups, some have suggested that a global molecular analysis at both the RNA and protein levels would lead to a more accurate prediction of prognosis.47

We, and others (Table 1), have applied a panel of IHC biomarkers with known relevance to breast cancer to TMA preparations of a large number of breast cancers with long-term follow-up data. Using unsupervised analysis, these studies have classified breast cancer into distinct molecular subgroups closely comparable to those identified by cDNA microarrays, with similar clinical significance. Another approach that has been successfully adopted is the use of a panel of IHC markers to identify specific molecular classes. Although these studies have confirmed the clinical significance of a protein biomarker expression-based molecular classification of breast cancer, variations in the criteria for definition of each class have been seen, particularly in the definition of the luminal-B and basal-like groups, with subsequent differences in their final and concluding results. For example, although most cDNA expression studies have not characterized the luminal-B group of tumours based on the expression of HER2 protein itself, but by the expression of genes characteristic of HER2+ tumours and a high proliferative rate,23,48 many authors have used HER2 protein expression to differentiate between luminal-A and -B subtypes.49–51 Similarly, many different markers have been used to define basal-like tumours.52,53

Molecular taxonomy of breast cancer: critical views

Although molecular profiling of breast cancer has attracted considerable attention and much hope that its application would result in a dramatic improvement in breast cancer management, to date the actual practical achievements have been surprisingly limited. Indeed, certain critical issues have been raised regarding molecular taxonomy in its current format. How novel are these molecular subgroups? What is the clinical significance of the classes identified? How much additional information does this classification offer us over traditional methods and are these important for patient management? Does it really outperform current classification systems? These, and other, questions of academic and clinical interest are yet to be satisfactorily answered.

In fact, the additional information provided by class discovery/molecular taxonomy studies so far are mainly of academic interest rather than of practical value. Not even basal-like cancers, which have received unprecedented attention, have become routinely identified in clinical practice. The sceptics have argued that almost all the existing molecular taxonomy studies have not included optimally accrued histopathological data in the multivariate models and, therefore, the claim of independent prognostic significance offered by microarray-based molecular classification is, at least, questionable. Furthermore, molecular classification based on hierarchical clustering analysis cannot be applied prospectively; the multiple studies performed to date have made use of different intrinsic gene sets; the groups are not stable; and some have called into question the whole technique.34 It should also be noted that it is not possible to classify a significant proportion (6–36%)24,54 of breast cancers into any of the identified categories. In addition, the argument about ‘cell of origin’, although theoretically possible, is arguable, given that only a minority of luminal cells in normal breast are ER+, that there is no known cell of origin for HER2 or normal breast-like tumours and that animal models inactivating specific genes in luminal cells (e.g. BRCA1 and p53) led to the development of basal-like cancers.55 Finally, one could argue that what we are seeing with hierarchical clustering expression data is merely a reflection of the ER status/HER2 status/proliferation status of breast cancers, which could be easily determined using routinely applied techniques. However, a more balanced view of the contribution of these studies would be that they have provided the first step towards a refined, molecular and functional classification for breast cancers that may add important prognostic and predictive value to the current classification systems.

Class comparison

Class comparison is the analysis of gene expression in predefined groups of tumours with the aim of determining whether the expression profiles are different between groups or classes and, if so, to identify the differentially expressed genes.20 The major characteristic of class comparison studies is that the groups being compared are defined independently of the expression profiles. Examples include comparison between in situ and invasive disease,56–59 different histological types of breast cancer,60–64 different histological grades65,66 and tumours of lymph node-positive and node-negative status.67,68 Although the findings of most class comparison studies may not have been unexpected per se, given the known phenotypically and biologically significant differences between the studied classes, they illustrate the power and sensitivity of microarrays for molecular characterization. In addition, they provide a powerful and appropriate research approach to improve our understanding of the complex biology of breast cancer and, in particular, to the potential discovery of diagnostic and therapeutic targets. Moreover, such analyses highlight the capacity of this technology to provide additional information that may refine the current clinical diagnostic and prognostic tools.

Interestingly, class comparison studies have corroborated concepts obtained by means of traditional pathology methods and molecular genetic studies. For example, as previously demonstrated by IHC, the most differentially expressed gene between ductal and lobular carcinomas is CDH1, which encodes E-cadherin.60,61 In addition, a comparison between matched in situ and invasive ductal carcinomas has demonstrated that tumours of similar histological grade cluster together; thus grade rather than nodal stage predicts the transcriptional profiles of breast cancers.57 This observation is particularly important, because it supports the view that grade 1 and grade 3 tumours, as defined either histologically or genetically, are independent pathobiological entities rather than points in a continuum of cancer progression.69–71 Microarrays have also been used successfully to characterize and identify the main differences between ER+ and ER− tumours;72,73 such studies have demonstrated that these two subtypes are entirely different disease entities at the transcriptomic level, with dramatic genetic differences involving multiple critical events.

Class prediction

Class prediction is similar to class comparison, in that the groups for analysis are predefined. Indeed, many studies have included both class comparison and class prediction objectives. However, the major aim of class prediction studies is to identify a set of key genes (also known as predictor, classifier or a signature) that can accurately predict the class membership of new samples, based solely on that predefined predictor set. Such predictors can be used for many types of clinical management decisions, including risk assessment, diagnostic testing, prognostic stratification and treatment selection. Previous class prediction studies in breast cancer can be categorized into two main subtypes: prognostic and predictive class prediction. Prognostic class prediction includes poor-prognosis gene signatures that can discriminate between a good and a poor outcome by comparison of highly aggressive and less aggressive primary tumours,7,9 and recurrence score gene signatures defining tumours based on the risk of disease relapse.8,74–76 Predictive class prediction describes predictors of response to therapy.77–81

Prognostic class prediction in breast cancer was pioneered by the study of Van’t Veer et al.,8 who used cDNA microanalysis of primary breast tumours and identified a list of 70 discriminatory genes (Amsterdam 70-gene prognostic signature). This ‘poor’ prognostic signature was found to be strongly predictive of a short interval to distant metastasis in lymph node-negative patients.7,8,82 Subsequently, microarray studies have been published which have described other prognostic signatures with clinical significance (Table 3). Some of these predictors have also been successfully developed using RT-PCR74,83 or IHC30 on formalin-fixed paraffin-embedded tissue.

Table 3.   Examples of class predictive studies and the different gene signatures identified
No.ReferencePatients cohortGene signature (total)Biological hypothesis
  1. LN, Lymph node; ER, oestrogen recptor; RT-PCR, reverse transcriptase-polymerase chain reaction.

Prognostic gene signatures
1A  898 LN-negative patients, < 55 years using cDNA microarraysAmsterdam 70-gene signature (24,479)Clinical outcome
1B  7295 LN−/+ patients, < 53 years with early-stage disease using cDNA microarrays  
1C82307 (T1-T2) patients (< 61 years) with long-term follow-up using cDNA microarrays  
2A  9286 LN-negative patients using cDNA microarraysRotterdam 76-gene signature (22,283) 
2B116171 LN-negative patients using cDNA microarrays  
2C117198 LN-negative systematically untreated patients using cDNA microarrays  
3118159 tumour samples using cDNA microarrays64-gene signature (44,792) 
411998 tumours using cDNA microarrays62-gene signature 
5120135 early-stage BC patients and validated on 715 patients from two external studies (using cDNA microarrays)70-gene signature (they also identified a common prognostic module of 29 gene signature) (22,575) 
 6121135 early-stage BC patients and applied to 877 ER+ patients using cDNA microarrays52-gene signature (they also identified a common prognostic module of 17 gene signature) (22,575) 
Invasiveness gene signature (IGS)
 7122Based on comparison between the gene expression profiles of tumorigenic breast-cancer cells and normal breast epithelial cells using cDNA microarrays186-gene IGSTumorigenic cancer cells
CD44+; CD24–/low
Wound-response gene-expression signature
 8123295 early breast cancer patients using cDNA microarrays442-gene signature (24,479)Wound healing and tumour progression
Hypoxia gene signature
 9124Derived by analysing the temporal changes in global transcript levels in response to hypoxia in cultured mammary cells using cDNA microarrays. Validation in 78 locally advanced breast cancers and 295 early-stage breast cancer samples Variation in the global transcriptional response to hypoxia and clinical outcome
Recurrence score (recurrence gene signature)
10AGenomic healthUsing RT-PCR in formalin-fixed paraffin embedded tissueThe Oncotype DX assay; recurrence score (21-genes)Clinical outcome
10B74668 tamoxifen-treated, hormone receptor-positive breast cancer patients using RT-PCR  
10C83220 were LN-negative patients who died from breast cancer. Controls = 570  
 10C12599 node-negative and node-positive breast cancer patients28-gene signature of recurrence and metastases 
Other signatures
11126251 p53-sequenced primary breast tumours32-gene p53 signatureFunctional status of p53
121271153 cancer patients diagnosed with 11 different types of cancer including breast cancer11-gene Death-from-cancer signatureBMI1 oncogenic pathway
self renewal
13128311 LN−/+ patients < 55 years 50-gene signature (24,479)Associated with a poor outcome, but only in ER positive tumours
1476189 invasive breast carcinomas and 597 independent tumours as a validation set97-histologic grade and tumour progression signatureHistological grade and tumour progression
 1566666 ER+ breast tumours97-histological grade and tumour progression signature

Other groups have devised signatures to predict surrogates of clinical aggressiveness, such as lymph node metastasis;67,68 however, the applicability of this type of signature remains to be determined. Others have used a gene expression grade index, derived from comparison between grade 1 and grade 3 tumours, to reclassify patients with histological grade 2 tumours into two subgroups with a high versus a low risk of recurrence. This approach may assist in the prediction of prognosis of patients with grade 2 cancers more accurately.65,76 Screening for genes associated with the ability of tumour cells to form metastases has been undertaken and a breast cancer ‘lung metastasis’ signature, linked to a poor clinical outcome, has been identified.84 Moreover, predictors relating to specific therapeutic approaches have been sought; one applying the ratio of levels of expression of two genes, one encoding homeobox 13 (HOXB13) and the other the interleukin17B receptor (IL17BR), assessed by RT-PCR in formalin-fixed paraffin embedded tissue, was developed to determine the risk of recurrence in women with node-negative, ER+ breast cancers treated with tamoxifen.85,86 However, validation of the clinical utility of the HOXB13:IL17BR ratio has proven controversial.86,87

Several other such ‘predictive’ class prediction studies have addressed whether molecular profiles can be identified which correlate with response to specific forms of systemic treatment. Many descriptions of potential predictive markers of response to both neoadjuvant and adjuvant chemotherapy have been reported.77–80,88 These have used supervised analysis of gene expression data from responsive compared with non-responsive tumours in order to identify a panel of clinically useful discriminatory genes. Preliminary data published in this area are promising, but are hampered by small sample size.81,89 Currently there is no signature which can be applied to specifically target systemic therapy based on gene expression profile, and data are scant to support such classifiers in selection of treatment in routine clinical practice.90 However, sufficiently powered studies using this approach are eagerly awaited, as preliminary studies have proved the principle that identification of drug-specific signatures may be achievable.81

The advances outlined above have motivated the introduction of clinical randomized trials that apply these new technologies. The Breast Cancer International Group has launched a European Organisation for Research and Treatment of Cancer-led clinical trial called MINDACT (Microarray In Node negative Disease may Avoid ChemoTherapy). This is based on upfront patient stratification using the 70-gene predictor, now available as a commercial Mammaprint® assay, developed and marketed by Agendia (Agendia, Amsterdam, the Netherlands). The results from this trial, which aims to recruit 6000 patients, may help resolve many existing questions and may reveal an improved prognostic gene set from analysis of a large cohort of patient samples. Investigation of material from clinical trials has been used to validate the utility of an alternative approach to gene expression analysis, the ‘Oncotype DX assay™’ (Genomic Health, Redwood City, CA, USA), which was developed to determine the risk of recurrence in women with node-negative, ER+ breast cancer who had received treatment with tamoxifen. This predictor system, which calculates a recurrence score based on the expression of 21 genes, uses RT-PCR analysis of routine formalin-fixed paraffin-embedded tissue samples. It has recently been shown to predict benefit from chemotherapy in patients entered into the NSABP B-20 trial, and a trial assigning individualized options for treatment (TAILORx) has been launched in order to validate its usefulness as a predictor of chemotherapeutic response in patients with ER+, node-negative tumours. Without the results of such clinical trials, caution should be exercised when tailoring systemic therapy for breast cancer patients solely based on alternative and novel predictor assays.

Translation of gene signature to clinical practice

Some authors have warned of limitations and drawbacks of the current technology, particularly their ability to identify specific signatures that can be applied reliably to patient management (reviewed in Refs.20,91–93). They have argued that most of the existing studies are significantly underpowered, results have not been sound and conclusions not supported by the authors’ data. Studies have also suffered from bias in sample selection, in statistical analyses and in the analysis of data based on preconceptions of outcome. Moreover, several authorities have described different signatures of apparent independent significance, but very limited overlap of their component genes or even specific pathways have been observed when comparing the profiles designed to address a similar question, which is puzzling and counterintuitive. Furthermore, some signatures suffer from a lack of reproducibility in independent studies, and multiple signatures have been shown to have prognostic significance in the same cohort of patients.94 All of these findings have raised critical suspicion about the reliability of these results and their clinical relevance, if any. Consequently, the US Food and Drug Administration has launched the Microarray Quality Control project, involving 137 participants from 51 academic institutions and industrial partners to address systematically the technical reproducibility of microarray measurements within and between laboratories, as well as across different microarray platforms.94 However, regardless of the arguable clinical applicability of these signatures, current progress and development in this field is unprecedented. A balanced view about the contribution of these signatures would be that this technology should not be expected to replace current traditional diagnostic algorithms, but should be integrated within these and may contribute additional complementary prognostic information, which should improve patient management. Realistically, it is only a few years since the introduction of this technology and therefore, although the results so far are at the least promising, it is not surprising that it is yet to find a stable long-term role in the routine clinical management of breast cancer.

An alternative approach to the direct application of gene signatures is the translation of such profiles to protein expression characteristics, using IHC on formalin-fixed paraffin-embedded tissue, i.e. to the commonplace tumour tissue resource available in most hospital laboratories. This may, however, also not be a straightforward task, because: (i) there is a plethora of signatures reported to have independent significance, but which show minimal overlap. Thus, choice of appropriate protein biomarkers is troublesome; (ii) the difference in expression levels of genes in the various ‘gene signatures’ can be low (1.5- or twofold), which is unlikely to be amenable for detection of potentially subtle changes in protein products by routine IHC methods; (iii) many of the genes in these signatures do not have validated antibodies available to their respective proteins; (iv) in order to simulate gene expression microarrays for identification of tumour classes, the protein expression needs to be scored reliably and reproducibly as a continuous variable and the results analysed as such, which again forms an obstacle to implementation in routine practice. However, the picture is not as bleak as it seems. Ongoing research may lead to the development of signatures comprising a limited number of genes with validated antibodies available and suitable for IHC detection of their respective proteins. Given the development of increasingly more sophisticated image analysis methods, one can envisage that this process will be amenable to automation. This would hypothetically provide uniform, objective and less costly means of routine protein analysis.

Some have interpreted the membership of a gene to a given signature as proof of the importance of this gene in breast cancer biology. We believe that this is a misconception, for several reasons. Predictive gene signatures suffer from instability of their membership, and genes in one given set are not necessarily superior to others in predicting disease outcome. As described above, different genes in other sets appear to have the same significance. Surprisingly, the statistical methods used to devise these gene signatures have led to the inclusion of very few genes known to play pivotal roles in specific classes of breast cancer. In fact, one could argue that most of the genes identified in such signatures to date are mere surrogates of the crucial pathways related to development and behaviour of breast cancer. This has important implications; the fact that a gene belongs to a given signature does not make it per se functionally biologically relevant to breast cancer. In particular, its importance as a therapeutic target should not be taken for granted. Other technologies (e.g. siRNA screening) seem to be better suited to the identification of genes of functional relevance.

It is also important to mention that although some oncologists and scientists95 have envisaged microarrays replacing histopathologists, these gene expression profiling techniques are not, in our opinion, capable of replacing traditional methods of histopathological assessment. Rather, they should be seen as complementary to the well-established clinicopathological prognostic and predictive variables that currently form the base of breast cancer management strategies. For example, the National Health Service Breast Screening Programme and the Royal College of Pathologists minimum dataset variables include tumour size, lymph node stage (1–3), tumour grade (1–3), lymphovascular invasion, HR and HER2 status. These can be combined with other factors, such as patients’ age and menopausal status, and used to identify clinically significant subgroups. Therefore, we believe that recent molecular profiling techniques will not simply replace the standard clinicopathological variables currently in use,19,96 but will complement them. In fact, the power of gene expression assessments, coupled with that of conventional prognostic markers, has the potential for improved and tailored treatment strategies for individual patients.


Gene expression profiling results have illustrated that molecular signatures contain information relevant to classification, prognosis and treatment prediction of breast cancers and have indicated that gene expression examination has the potential to be of direct benefit to patient management. It is likely that data generated from these techniques will have an important impact on the future drug and biomarker discovery process. Moreover, it is now recognized that grouping of tumours based on the expression of multiple genes and/or proteins is more powerful than assessment of individual genes for unravelling the complexity of breast cancer. In addition, gene profiling studies have already led to the identification of novel subtypes of breast cancer, such as the normal breast-like group, and have re-emphasized the clinical relevance of others, such as the basal-like subgroup.

These results are promising; some multigene prognostic predictors are already commercially available. However, the molecular-based clinical management of breast cancer still requires further investigation and validation (reviewed in Refs.20,97,98). There are still major limitations in our ability to assign consistently a molecular class to new cases of breast cancer. It remains to be determined how many molecular classes of breast cancer there are and how many classes can be identified reliably with currently available data. As discussed above, several gene signatures whose expression profiles have successfully predicted survival in patients with breast cancer have been identified; however, it remains to be determined whether these gene sets will be used individually, in conjunction (as elegantly illustrated by Chang et al.,99) or if there will be a gene signature to rule them all.19 However, this new technology holds the promise to help improve clinical management. Recent advances in microarray technology have demonstrated improved reproducibility, which may make its clinical application more achievable. The results of current and forthcoming clinical trials may demonstrate the appropriate clinical impact of the application of molecular profiling in breast cancer management. However, it should be emphasized that this technology is likely to improve and to refine, rather than replace, current prognostic and predictive tools.