Fax: +45-65411911.
Early Detection and Diagnosis
Prediction of metastasis from low-malignant breast cancer by gene expression profiling
Article first published online: 27 NOV 2006
DOI: 10.1002/ijc.22449
Copyright © 2006 Wiley-Liss, Inc.
Additional Information
How to Cite
Thomassen, M., Tan, Q., Eiriksdottir, F., Bak, M., Cold, S. and Kruse, T. A. (2007), Prediction of metastasis from low-malignant breast cancer by gene expression profiling. Int. J. Cancer, 120: 1070–1075. doi: 10.1002/ijc.22449
Publication History
- Issue published online: 19 JAN 2007
- Article first published online: 27 NOV 2006
- Manuscript Accepted: 5 OCT 2006
- Manuscript Received: 20 JUL 2006
Funded by
- Clinical Institute and Regional Institute of Health Sciences Research
- Raimond and Dagmar Ringgård Bohns Fond
- Dagmar Marshalls fond
- AP Møllers Fond til Lægevidenskabens Fremme
- Fabrikant Einar Willumsens Mindelegat
- Kurt Boennelyckes fond
- Købmand Svend Hansens Fond
- Else Poulsens Mindelegat
- Bankdirektør Hans Stener og Hustru Agnes Steners legat
- Danish Biotechnology Instrumentation Centre (DABIC)
- Clinical Institute and Regional Institute of Health Sciences Research at University of Southern Denmark
Keywords:
- low risk;
- breast cancer;
- microarray;
- prognosis;
- meta-stasizing
Abstract
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
Promising results for prediction of outcome in breast cancer have been obtained by genome wide gene expression profiling. Some studies have suggested that an extensive overtreatment of breast cancer patients might be reduced by risk assessment with gene expression profiling. A patient group hardly examined in these studies is the low-risk patients for whom outcome is very difficult to predict with currently used methods. These patients do not receive adjuvant treatment according to the guidelines of the Danish Breast Cancer Cooperative Group (DBCG). In this study, 26 tumors from low-risk patients were examined with gene expression profiling. An intermediate risk group of 34 low-malignant T2 tumors that fulfilled all other low-risk criteria than tumor size was included to increase statistical power. A 32-gene classifier, HUMAC32, was identified and it predicted metastases with 80% sensitivity and 77% specificity. The classifier was also validated in an independent group of high-risk tumors resulting in comparable performance of HUMAC32 and a 70-gene classifier developed for this group. Furthermore, the 70-gene signature was tested in our low- and intermediate-risk samples. The results demonstrated high cross-platform consistency of the classifiers. Higher performance of HUMAC32 was demonstrated among the low-malignant cancers compared with the 70-gene classifier. This suggests that although the metastatic potential to some extend is determined by the same genes in groups of tumors with different characteristics and risk, expression-based classification specifically developed in low-risk patients have higher predictive power in this group. © 2006 Wiley-Liss, Inc.
Breast cancer patients are classified into groups with high or low risk of recurrence using a combination of clinical and pathological criteria. The high-risk group implies the majority of patients and they are offered adjuvant systemic treatment, while, according to the Danish Breast Cancer Cooperative Group (DBCG), the patients, with low risk, are not offered treatment after surgery. Prediction of risk is however, not optimal and considerable overtreatment occurs in the high-risk group and recurrences still occur in the low-risk group.
Recently, microarray-based gene expression profiling of breast tumors has shown promising results for improvement of risk classification. van't Veer et al.,1 identified a 70-gene profile that could predict development of metastasis within 5 years with higher accuracy than could be obtained with classical methods among a group of relatively poor prognosis lymph-node-negative patients. Today the majority of these patients would have been offered adjuvant treatment. The 70-gene profile has been validated with similar results by the same group on a cohort of patients, including patients with lymph-node-positive disease.2 These studies were performed with a 25K chip with 60-mer oligonucleotides from Rosetta. In a similar study performed with Affymetrix platform a 76-gene profile was used to predict distant metastasis within 5 years in patients without lymph-node metastasis who had no adjuvant systemic treatment.3 Furthermore, by use of Affymetrix chips a Swedish group has developed a 64-gene signature classifying patients into three risk groups.4
These studies have mainly addressed the problem of overtreatment in the high-risk group. The classification method was aimed at detection of the majority of patients developing distant metastasis and it has been possible to classify a considerable group of the nonmetastasizing tumors correctly.
Besides the overtreatment in the high-risk group, another important issue is to identify women who may benefit from a treatment, they are not offered today. In the low-risk group of patients, ∼10% of patients experience recurrence, and a significant proportion of these patients would probably benefit from adjuvant treatment. This problem was hardly touched upon in the studies mentioned above presumably because of lack of frozen samples from small tumors and short follow-up. It therefore remains unclear whether a profile developed specifically in low-risk patients would be superior in this group of patients. We therefore studied a group of low-risk node-negative patients with long follow-up with the aim to predict metastasis in this group by gene expression profiling using a 29K chip composed with 60-mer oligonucleotides. We have developed a 32-gene classifier accurately predicting metastasis in this group of patients and performed extended comparison of this profile with the 70-gene signature in the present data set and the data set reported by van de Vijver et al.2 Testing of the 70-gene signature in the present data set served as an independent validation of this signature in the low-risk group where it has hardly been examined previously.
Material and methods
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
Patient's samples
For a large cohort of breast cancer patients treated in the county of Funen from 1982 to 1999, tumor biopsies have routinely been collected. The biopsies have been snap-frozen in liquid isopentane within 1 hour from surgery and stored at −80°C. Thirteen patients who fulfilled the low-risk criteria, defined by DBCG, and who developed metastasis were matched with 13 patients who did not develop metastasis within 10 years (these 26 tumors are referred to as low-malignant T1 or low-risk tumors). The DBCG low-risk criteria are essentially the same as the criteria defined at the eighth St. Gallen meeting in 20035 (http://www.dbcg.dk): Node negative, T ≤ 20 mm, grade = 1 if ductal carcinoma NOS (not otherwise specified), receptor positive, and age ≥ 35. In addition a group of 17 intermediate-risk tumors (Node negative, T ≤ 50 mm, grade = 1 if ductal carcinoma NOS, receptor positive and age ≥ 35) from patients who developed metastasis and 17 matched nonmetastasizing tumors was included. This group did not fulfill the DBCG low-risk criteria because the tumor size was 20–50 mm, but satisfied all other criteria (this group is referred to as low-malignant T2 tumors). All pathological examinations were performed at Odense University Hospital and grading was performed with the Bloom and Richardson scheme.6 Of the 30 patients who developed metastasis, 4 had regional metastasis while the remaining had distant metastasis. The tumors were matched pairwise according to tumor type as well as year of surgery, tumor size and age, as far as possible. Thirty-two of the patients had mastectomy (16 of whom developed metastasis) and 28 had lumpectomy followed by radiation therapy. None of the patients received adjuvant systemic therapy. The mean follow-up time was 12.3 years for the patients that did not develop metastasis. The study was approved by the regional ethical committee of Southern Denmark.
A data set from the Netherlands National Cancer Institute (NKI),2 was downloaded from the Rosetta Web site (http://rii.com). Two hundred forty-one samples from patients who developed distant metastases within 5 years or were disease free for at least 5 years were used. One hundred thirteen of these were lymph-node-positive.
Gene-expression analysis
RNA was purified from tumor biopsies with Trizol (Invitrogen) followed by further purification and DNAse treatment on RNeasy micro columns (Qiagen). The RNA yield was determined by OD 260 measurements and the quality was assessed on a Bioanalyzer (Agilent Technologies). The RNA integrity numbers (RIN) were calculated with Agilent 2100 Expert software. For gene expression analysis a 29K oligonucleotide chip with duplicate measurement of each gene was used as previously described.7 This chip is manufactured with a 60-mer oligonucleotide set from Sigma-Compugene. To examine the 70-gene set, the 70 60-mer oligonucleotides with the same sequences as reported1 was spotted on the chips in duplicates.
Probes were prepared from 500 ng RNA using the Ambion (Austin, TX) Amino Allyl MessageAmp™ aRNA kit as described in the manual. Briefly, cDNA synthesis was performed with oligo-dT primers also containing a T7 RNA polymerase recognition site. After second cDNA strand synthesis, transcription was performed with T7 RNA polymerase and the resulting aRNA was purified and 15 μg labeled with Cy5 fluorescent NHS-esters. Universal human reference RNA (Stratagene, CA) was labeled with Cy3 and used as reference RNA on the chip. Hybridization was performed with hybridization buffer # 1 (Ambion) as described in the manual using Lifterslip cover slips (Erie) and intensive mixing (150 RPM) in a New Brunswick shaking incubator for 16 hr at 42°C. Washing was performed 2 times for 15 min with 2 ×SSC, 0.1% SDS and with 0.5 ×SSC, 0.1 % SDS both at 42°C, followed by 1 min wash in 0.2 ×SSC at 25°C and drying by centrifugation. Scanning was performed with an arrayWoRx Scanner (Applied Precision, WA).
Data analysis
Identification of spot locations and quantification were performed using arrayWoRx software. The raw intensity data were normalized using the variance stabilization normalization procedure,8 implemented in the R package vsn.
To perform classification of outcome, the most discriminative genes were selected by applying the nearest shrunken centroids method9 in the R package pamr. Instead of filtering genes according to their differential expression in the classes, the method continuously shrinks the number of genes by increasing the amount of shrinkage until only a few have influence on classification. Gene selection was also performed by paired t-test using SAM (significance analysis of microarray) analysis to exploit the pair wise matching of the samples, however, resulting in slightly lower accuracy (data not shown). The selected genes were submitted to the support vector machines (SVM) procedure in the R package e1071 to build a hyper plane to separate the training set with maximal margin and to use to classify new samples in the testing set.10 SVM was chosen for classification because of its effectiveness in tumor classification11 and a meaningful output probability of poor outcome. The cut off limit for poor prognosis was set conventionally to 0.5. Classification and cross validation were performed by leaving 1 matched pair out at a time, selection of genes and development of prediction model followed by prediction of the left-out pair. Cross validation was performed for all pairs, a method referred to as 30 classifier scheme. An optimal gene list, HUMAC32, was developed by applying nearest shrunken centroids method to the entire data set.
To test HUMAC32 and a 70-gene classifier in the NKI data set, it was divided into a training set of 61 samples and a validation set of 180 samples. Training and prediction was performed with SVM. To test the performance of the 70-gene profile in the present HUMAC data, classification was performed by leaving 1 pair out cross validation and SVM as described above. The cut off limit was set to 0.5 for all classifications to obtain comparable results.
In addition, we also tested the differentially expressed genes using SAM analysis in the samr package12 in R. Expression analysis systematic explorer13 (EASE) and gene set enrichment analysis14(GSEA) were used for functional analysis of HUMAC32 and the entire data set, respectively, and p-values corrected by the Bonferroni method. The p-value for overrepresentation of upregulated genes in the 32-gene classifier was calculated by binomial distribution. All mentioned R packages are implemented in the R based Bioconductor package (http://www.bioconductor.org). Data are available from gene expression omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo, Accession No. GSE4796).
Results
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
The patients included in this study fulfilled the DBCG low-risk criteria, except 34 patients who had T2 tumors (20–50 mm), but otherwise fulfilled the criteria. The clinical and pathological parameters for patients and their tumors are summarized in Table I (further patient information are available in supplementary Table III). RNA of high quality was obtained from all tumors. The RNA integration number (RIN) had a median of 8.5 and a 90% interval (7.0–9.6).
| Metastasizing tumors | Nonmetastasizing tumors | |
|---|---|---|
| ||
| Age at diagnosis | 53 | 56 |
| Tumor size | ||
| T1 tumors | 14 | 15 |
| T2 tumors | 27 | 25 |
| Year of surgery | 1993 | 1995 |
| Tumor types | 17 Invasive ductal carcinomas, NOS | 17 Invasive ductal carcinomas, NOS |
| 2 Mucinous carcinomas | 2 Mucinous carcinomas | |
| 1 Papillary carcinoma | 1 Papillary carcinoma | |
| 10 Invasive lobular carcinomas | 10 Invasive lobular carcinomas | |
To perform classification, 58 samples were used in the training set and testing was performed on the remaining matched pair. In this way cross validation of all samples was performed leaving 1 matched pair out at a time. The resulting output is a probability for each patient for having poor outcome (Fig. 1). Choosing a probability of 0.5 as the discriminating limit resulted in 80% sensitivity (24 out of 30 samples, from patients who later developed metastases, correctly predicted) and 77% specificity (23 out of 30 samples, from patients who did not develop metastasis, correctly classified). This classification scheme gave 30 different classifiers with slightly different gene sets. To obtain the best gene list, the nearest shrunken centroids method was applied to the entire data set and classification accuracy assessed for various shrinkage limits. This resulted in an optimal list of 32 genes referred to as HUMAC32 (supplementary Table IV).

Figure 1. Classification of low-risk and low-malignant T2 tumors with 30 classifier scheme. Genes were selected with nearest shrunken centroids method, and classification was performed with SVM and by leaving-one-pair-out cross-validation. The resulting probability of poor outcome is plotted versus the tumors number (the same order as in supplementary Table III). The metastases are presented left to the dashed line and nonmetastases right to this line. Low-risk tumors are marked with asterisk and T2 tumors are marked with triangles. Samples above the horizontal line (probability of 0.5) are classified as poor prognosis tumors.
To test the prognostic value of HUMAC32 in higher risk patients the independent NKI data set including 241 tumors, mainly having high risk was used.2 Twenty-five of the 32 genes from HUMAC32 were contained on the Rosetta chip used for this data set. The large size of the data set offered the possibility to develop a classification profile in a training set and test in an independent set. Sixty-one samples that were part of the training set used by van't Veer et al.,1 were used for training to obtain results comparable with van de Vijver et al.,2 and to have comparable size of the training set used in the present data set (N = 58). The resulting probabilities for poor outcome for the 180 testing samples predicted with this classifier are shown in Figure 2a. The sensitivity is 81% and the specificity 57%. To compare the performance of HUMAC32 with the 70-gene classifier originally used for the NKI data set, classification was performed by applying SVM to the 70 genes and again using the 61 tumors as training set and 180 tumors as testing set. This resulted in 83 and 60% sensitivity and specificity, respectively (Fig. 2b). Because the 70-gene signature was developed with a correlation coefficient based classification method, this was also tested, using the 61 samples in the training set for calculation of an average good profile, resulting in 86% sensitivity and 58% specificity among the 180 testing samples (data not shown).

Figure 2. Evaluation of HUMAC32 (a) and 70-gene signature (b) on 241 breast tumors from the NKI data set. Classification was performed with SVM by training on 61 tumors for both gene sets and the resulting probability of poor outcome is plotted for each of 180 testing tumors. The metastatic tumors are presented left to the dashed line and the nonmetastasis right to the line. Samples above the horizontal line (probability of 0.5) are classified as poor prognosis tumors.
Finally, the performance of the 70 genes among low-risk patients was tested with the present data set. The expression values for the 70 genes were obtained by using the original 70 target sequences reported by van't Veer et al.1 Initially their algorithm including calculation of correlation coefficients to an average good prognosis template was tested, resulting in 77% sensitivity and 50% specificity (supplementary Fig. 5). To minimize the effect of differently used platforms classification was performed similar to HUMAC32, by leaving 1 matched pair of tumors out and performing cross validation using SVM, resulting in 73% sensitivity and 77% specificity (Fig. 3). The effect of using alternative oligonucleotide target sequences was also tested for the 70-gene profile (supplementary Fig. 6).

Figure 3. Performance of 70-gene classifier in lower risk cancers. The 70 genes were used for classification of the present dataset by using SVM and leave-one-pair-out cross-validation. The metastatic tumors are presented left to the dashed line and the nonmetastasis right to the line. Samples above the horizontal line (probability of 0.5) are classified as poor outcome tumors. Low-risk tumors are marked with asterisk and low-malignant T2 tumors are marked with triangles. The tumors appear in the same order as in supplementary Table III.
The classifications performed above are based on discrimination by a probability cut-off of 0.5 and the sensitivity and specificity for the gene profiles are sensitive for the results from few samples near this threshold. By calculation of mean probability values, the entire range of probability values is taken into account (Table II).
| HUMAC data 60 tumors | NKI data 241 tumors | |||
|---|---|---|---|---|
| Metastases | Nonmetastases | Metastases | Nonmetastases | |
| ||||
| 30 classifier/HUMAC32 | 0.70 | 0.35 | 0.62 | 0.37 |
| 70-gene signature | 0.63 | 0.46 | 0.67 | 0.43 |
To explore the biology of the genes in HUMAC32 functional annotation analysis with EASE identified mitotic cell cycle (p = 0.0002), cell cycle (p = 0.03) and mitosis (p = 0.05) to be significantly overrepresented gene categories. A striking feature of the 32 genes is that the majority (27) are upregulated in the metastasizing tumors (gene cluster 2, supplementary Fig. 7). This is a highly significant observation (p = 0.001) and it might reflect a tendency among a larger group of genes to be more highly expressed in metastasizing tumors. The latter was supported by SAM analysis revealing considerable more upregulated than downregulated genes (Fig. 4). Furthermore, gene set enrichment analysis (GSEA) on 16,025 well annotated genes identified cell cycle genes as the only enrichment group to be significantly upregulated (p = 0.05), while no pathways were significantly downregulated (lowest p value 0.6).

Figure 4. Identification of genes with significant changes in expression for metastasizing tumors versus nonmetastasizing tumors by SAM analysis. Scatter plot of the observed relative difference versus the expected relative difference. The solid line indicates where the observed relative difference is identical to the expected relative difference. The dotted lines are drawn at a distance δ = 0.2 from the solid line. Red color indicates genes that are significantly upregulated and green color genes that are downregulated.
Discussion
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
We have analyzed the low-risk group of breast cancer patients that is not offered adjuvant systemic therapy according to the recommendations from DBCG. In contrast to several studies with microarray profiling intending to reduce overtreatment, the aim has been to identify women who seem to be undertreated with the current protocol and determine if improvements in classification can be achieved by using a gene set and corresponding classification developed specifically in this group of patients. The low-risk group is particularly difficult to examine because of low frequency of recurrence requiring a large cohort of patients. Because the metastases in this group often develop late, the follow-up need to be very long. Furthermore, the tumors are small causing low RNA yield and under-representation in tumor banks. Among the recurrences is a high proportion of contralateral cancers that are difficult to determine as being metastasis or new cancers. For this reason distant and regional metastasis, which also represents the most severe forms of recurrence, were used as endpoint.
By genome wide expression profiling we have established a 30 classifier scheme that predicts outcome accurately. Validation of the 30 classifier scheme and a 70-gene signature in the present data set resulted in higher sensitivity and better separation of samples with the 30 classifier scheme (Table II, Figs. 1 and 3). In the NKI data set, consisting of tumors mainly from patients with higher risk, the 70-gene profile had slightly higher sensitivity and specificity than HUMAC32. However, the separation of the samples was comparable for both signatures in this data set (Table II). The separation of samples is a more informative measure taking into account the actual probability-value of poor outcome for all samples compared with sensitivity and specificity that are sensitive to small changes in probability values for samples near the threshold. This indicates that the 32-gene signature has higher capability of prediction of metastasis in the low-malignant tumors and that the 70-gene signature performs slightly more precisely for higher risk cancers. The comparison is reasonable although the 70 genes were used with SVM in the present study, while a correlation coefficient based method was used by van't Veer et al., because the methods had very comparable performance (83% versus 86% sensitivity and 60% versus 58% specificity in the NKI data set by SVM and correlation method, respectively). The effect of different platforms used in the studies was minimized by training and prediction within each data set (classification of the present data with the average good prognosis profile calculated by van't Veer et al., resulted in very low specificity). Furthermore, the use of original target oligonucleotide sequences for the 70-gene signature on the present chip reduced platform differences. This did actually favor the 70-gene set because HUMAC32 was annotated to an incomplete target set with distinct sequences on the Rosetta chip. However, the effect of different target sequences is minor as illustrated by similar specificity and sensitivity obtained with original and alternative target sequences for the 70 genes (Fig. 3 and supplementary Fig. 6). Furthermore, the training set had comparable size in the 2 studies: 58 in the present study versus 61 from the NKI data set. The different approaches, leave-one-pair-out cross validation and division of samples in training and testing set, presumably did not bias the comparison because it can be shown that cross validation gives an almost unbiased estimate of the classification error.15 The more precise performance of HUMAC32 among low-risk samples was also supported by fewer misclassified T1 samples (4) compared with the 70-gene signature (6).
Nevertheless, the performance of the signatures based on the 32 genes identified in a group of low-risk patients, and the 70 genes identified in a group of somewhat higher risk patients, is rather similar. This indicates that the gene expression pattern reflecting the metastatic potential of a tumor can be present in tumors with different size, receptor status, nodal status and grade and that the genes that are differently regulated are the same. This is supported by Wang et al., who found equal performance of a 76-gene profile in tumors of 10–20 mm and larger tumors.3 Furthermore, van de Vijver et al., reported comparable performance of the 70-gene profile among node-positive and node-negative patients.2 Comparison of gene expression profiles from primary tumors and metastases has also suggested that metastatic capability is an intrinsic feature.16, 17 However, there are indications that subgroups of patients with certain characteristics may be prognosticated more precisely with separate classifiers. For example Dai et al., found a gene expression pattern that strongly predicted metastasis in a subgroup of patients with relative high estrogen receptor expression for their age.18 The impact of the estrogen receptor expression was also emphasised by Wang et al., who improved the classification by developing separate classifiers for estrogen receptor positive and negative tumors.3 Chang et al., developed a wound response signature improving the classification of patients having poor prognosis according to the 70-gene signature.19 The similar performance of HUMAC32 and the 70-gene classifiers in both data sets indicates high cross platform consistency. Furthermore, testing of the 70-gene signature in the present data set constitute an independent validation of this signature and indicates fairly high performance in the low-risk group where it has hardly been tested before.
The present tumor material has overrepresentation of special type ductal carcinomas and lobular tumors compared with a clinical cohort e.g., 33% invasive lobular carcinoma compared with 10% generally observed.20 However, the sensitivity and specificity for ductal carcinomas and lobular carcinomas is comparable (82% versus 71% and 70% versus 80%, respectively, deduced form Fig. 1 and supplementary Table III). This may be supported by Korkola et al., who demonstrated no overall difference in gene expression between the 2 types, measured by hierarchical clustering.21
The clinical relevance of the current study is to prevent metastasis among low-risk patients. However, if a subgroup of patients from this group should be treated in the future, the positive predictive value of the classification should be high to avoid overtreatment. From the present study based on selected pairs of patients instead of a cohort it will not be possible to perform accurate estimates of this positive predictive value. Including the entire cohort of patients in the available tumor bank fulfilling the low-risk criteria would entail ∼300 samples and the predictive power for metastasizing tumors would still be determined by 13 tumors. Although only 26 low-risk tumors were included in this study it is actually the largest of its kind. The cohort examined by van de Vijver et al., only included 22 low-risk patients of whom 4 (all in testing set) developed distant metastasis. In the study by Wang et al., 14 low-risk patient including 3 who developed metastasis were included in the testing sample set. Further large studies are demanded to elucidate the potential of treatment decision by gene expression profiling in the low-risk group. Contrary to the DBCG recommendations, the St. Gallen consensus mainly advises low-risk patient endocrine therapy and to a lesser extend nil. The indications for selection of patients who might do without treatment are very weak. This certainly results in prevailing overtreatment and the long term side effects of tamoxifen and aromatase inhibitors are not negligible.22 The present study demonstrates high sensitivity by gene expression profiling in this patient group, entailing a potential to reduce the overtreatment. Although this was also the aim for Wang and van de Vijver, the limited material impaired this. The cut off limit of 0.5 for probability of poor outcome will undoubtedly be optimized in a clinical setting to tailor the sensitivity and specificity according to the above mentioned considerations. A common evaluation of prognostic markers is comparison with existing classical markers in multivariate analyses. The present low-risk samples can, as mentioned, not be differentiated with existing methods. However, by expression profiling this group can be differentiated with the potential of targeted selection of patients who may benefit from treatment.
A striking difference between the present data set and the NKI data set is that a significantly higher specificity is obtained in the present data with both HUMAC32 and the 70-gene signature (Figs. 1–3). One explanation might be that several of the nonmetastatic patients in the NKI-sample later had recurrence and the long follow-up in the present data resulted in a more reliable information concerning outcome in this group. The very low frequency of metastases in the low-risk group (3% within 10 years among patients at the county of Funen from 1982–1989) also causes a more reliable status of nonmetastasizing tumors.
In the present study, the matching of the tumors corrects for several factors that could bias the results. These include diagnostic procedures that have changed over time e.g., implementation of more sensitive techniques for detection of lymph-node metastasis and receptor status. The sampling methods, that might have changed slightly, and the storage time at −80°C may have impact on the expression profiles but this bias is also minimized with the sample matching. Bias from technical variation during purification and microarray procedure of the samples has also been minimized by performing simultaneously processing of the matched pairs.
The finding of cell cycle genes in the top ranking genes is not surprising. The 70-gene profile as well as the 76-gene profile reported by Wang both had considerable contents of cell cycle genes.1, 3 The finding of the same functional group involved in the different gene sets might explain why the different gene sets have similar performance despite relative small overlap in the classifiers, which is also supported by recent meta-analyses.23, 24 However, there is actually a considerable overlap between HUMAC32 and the 70-gene signature of eight genes (MGC:42657, ZNF533, DIAPH3, CCNE2, ORC6L, HEC, PRC1 and MELK) and 2 of these genes (CCNE2 and HEC) are involved in cell cycle according to gene ontology. A substantial overlap of 12 samples that are misclassified by HUMAC32 and the 70-gene signature supports that the same biological mechanisms are tracked by the classifiers.
The expression of four of the genes overlapping between HUMAC32 and the 70-gene classifier (CCNE2, HEC, PRC-1 and MELK) have recently been associated with histological grade.25 The authors in that study suggested that grade-related genes have major impact on the performance of the 70-gene classifier. Furthermore, they found a 97-gene grade classifier that predicted outcome more precise than histological grading. The present study might support this as the tumors are low-malignant, and the four grade-related genes have prognostic value (Supplementary Fig. 7).
Conclusion
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
In the low-risk group of breast cancer patients a subgroup experience metastatic recurrence of the disease. We have developed a 32-gene profile, HUMAC32, which accurately predicts metastasis in this group with a high degree of reliability. HUMAC32 performed more precisely among the low-malignant cancers compared with the 70-gene classifier. However, the rather similar performance of these classifiers in the low-risk and the high-risk groups indicates that the same pathways may be represented in the gene sets. In future diagnosis of breast cancer, gene expression profiling may be an integrated part and help to target the treatment, more specifically, to the patients that benefits from it.
Acknowledgements
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
Laura van't Veer, Marc van de Vijver and colleagues are acknowledged for allowing us to use their data. The Danish Research Agency through the Danish Biotechnology Instrumentation Centre (DABIC) is acknowledged for granting Human MicroArray Centre. Clinical Institute and Regional Institute of Health Sciences Research at University of Southern Denmark are acknowledged for granting the project. The following foundations are thanked for grants: Raimond and Dagmar Ringgård Bohns Fond, Dagmar Marshalls fond, AP Møllers Fond til Lægevidenskabens Fremme, Fabrikant Einar Willumsens Mindelegat, Kurt Boennelyckes fond, Købmand Svend Hansens Fond, Else Poulsens Mindelegat, and Bankdirektør Hans Stener og Hustru Agnes Steners legat.
References
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
- 1, , , , , , , , , , , , et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415: 530–6.
- 2, , , , , , , , , , , , et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 2002; 347: 1999–2009.
- 3, , , , , , , , , , , , et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; 365: 671–79.
- 4, , , , , , , , , , , , et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 2005; 7: R953–R964.
- 5, , , , , . Meeting highlights: updated international expert consensus on the primary therapy of early breast cancer. J Clin Oncol 2003; 21: 3357–65.
- 6, . Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years. Br J Cancer 1957; 11: 359–77.
- 7, , , , , , , , . Spotting and validation of a genome wide oligonucleotide chip with duplicate measurement of each gene. Biochem Biophys Res Commun 2006; 344: 1111–20.
- 8
- 9, , , . Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 2002; 99: 6567–72.
- 10, , , , , , , . Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000; 97: 262–67.
- 11, , . A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform 2002; 13: 51–60.
- 12, , . Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98: 5116–21.
- 13, , , , . Identifying biological themes within lists of genes with EASE. Genome Biol 2003; 4: R70.
- 14, , , , , , , , , , . Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005; 102: 15545–50.
- 15, . Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 2006; 7: 91.
- 16, , , , , . Gene expression profiles of primary breast tumors maintained in distant metastases. Proc Natl Acad Sci USA 2003; 100: 15901–5.
- 17, , , . A molecular signature of metastasis in primary solid tumors. Nat Genet 2003; 33: 49–54.
- 18, , , , , , , , , , , . A cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients. Cancer Res 2005; 65: 4059–66.
- 19, , , , , , , , , , , , et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA 2005; 102: 3738–3743.
- 20, , , . Infiltrating lobular carcinoma of the breast: tumor characteristics and clinical outcome. Breast Cancer Res 2004; 6: R149–R156.
- 21, , , , , , , , , . Differentiation of lobular versus ductal breast carcinomas by expression microarray analysis. Cancer Res 2003; 63: 7167–75.
- 22, , , , , , , , , , , , et al. A comparison of letrozole and tamoxifen in postmenopausal women with early breast cancer. N Engl J Med 2005; 353: 2747–57.
- 23, , , . Common markers of proliferation. Nat Rev Cancer 2006; 6: 99–106.
- 24, , , , , , , , . Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA 2004; 101: 9309–14.
- 25, , , , , , , , , , , , et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 2006; 98: 262–72.
Supporting Information
- Top of page
- Abstract
- Material and methods
- Results
- Discussion
- Conclusion
- Acknowledgements
- References
- Supporting Information
This article contains supplementary material available via the Internet at http://www.interscience.wiley.com/jpages/1097-0215/suppmat .
Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

1097-0215/asset/olbannerleft.jpg?v=1&s=45719cd7de57873027993264fcc568b335a8cd56)
1097-0215/asset/olbannerright.jpg?v=1&s=5e0fba63c1309b3036eb9215a0e1e83dd02efd19)
1097-0215/asset/cover.gif?v=1&s=9bea5e55449dab2cff7ad3b06277cc9745417a23)