Breast cancer arises as genetic aberrations accumulate in precursor epithelial cells. Considerable information is available about the molecular alterations characterizing breast cancers, but knowledge of alterations in earlier lesions is limited. Recently, abnormalities have been appreciated in histologically normal breast epithelium. These abnormalities include allelic imbalance or loss of heterozygosity,1–8 aberrant methylation of p16INK49 and of RASSF1A,10 cytogenetic changes,11, 12 telomere shortening,13 loss of IGF2 imprinting,14 aberrant response to estrogen,15 loss of RARβ expression,16 aberrant phosporylation of p38,17 upregulation of EZH2.18 Some of these abnormalities have been detected in normal-appearing tissue adjacent to the tumor, and others have been found at a distance from it. Some abnormalities are concordant, and others are discordant, with abnormalities in the tumors themselves.
Despite the evidence supporting the existence of occult abnormalities in normal-appearing breast epithelium of breast cancer patients, the roles these abnormalities play in carcinogenesis is poorly understood. One approach to better understand their significance is to compare histologically normal breast epithelium of breast cancer patients to normal breast epithelium of women without breast cancer. The few studies comparing these groups have examined allelic imbalance, aneuploidy and methylation or expression of specific proteins, and have found abnormalities more frequently in patients with cancer than in controls.7, 8, 10, 12, 17–19 We hypothesized that by taking a comprehensive gene expression approach, we might detect consistent abnormalities in the normal-appearing epithelium of breast cancer patients, compared to controls. These abnormalities might suggest mechanisms predisposing to cancer, or activated early in carcinogenesis. If this hypothesis were true, then elucidating these abnormalities could enhance understanding of important functional alterations present early in carcinogenesis, suggest targets for cancer prevention and improve cancer risk assessment. To begin testing this hypothesis, we undertook the present study to identify gene expression differences in the histologically normal breast epithelium of breast cancer patients, compared to reduction mammoplasty controls.
Material and methods
After obtaining institutional review board approval, deidentified tissues not needed for pathological diagnosis were collected from breast cancer surgeries and reduction mammoplasties performed at Boston Medical Center. To preserve RNA quality, tissues were obtained within 1 hr of surgery and immediately snap-frozen in liquid nitrogen, embedded in OCT medium and stored at −80°C. Tissues from 2 groups were examined: (i) CN and (ii) RM.
Laser-capture microdissection, RNA isolation and amplification
These procedures were carried out to obtain RNA from homogenous populations of breast epithelial cells of normal-appearing terminal ductal-lobular units (TDLU) of breast cancer patients and disease-free individuals, as described previously.20 Figure 1 is a representative photograph of a microdissection. Supplemental File 4 presents photographs of TDLUs from multiple cases. To obtain enough RNA, 2–3 TDLUs were microdissected per case. TDLUs from the CN group were “tumor-adjacent”, i.e., located 1–2 cm from the tumor, on blocks lacking malignant cells. Total RNA was extracted from the captured cells (Picopure RNA isolation kit, Arcturus Engineering) and 100 ng was used for T7-based RNA amplification (MessageAmp aRNA kit, Ambion, Austin, TX). To obtain enough amplified RNA (aRNA), a second round of RNA amplification was performed as described previously,20 except using 500 ng of aRNA from the first round amplification as starting material.
Hybridization, microarray data quantification and normalization
These procedures were carried out as described before.20 For each hybridization, 10 μg of fragmented, biotin-labeled aRNA were hybridized to U133A GeneChip arrays (Affymetrix, Santa Clara, CA), then washed, stained and scanned according to standard protocols (Affymetrix). The scanned arrays were quantified and scaled using the GCOS software package (Affymetrix). Each probeset's expression level was determined from the hybridization intensities of the 22 constituent probes using the Affymetrix Microarray Suite 5.0 (MAS5) algorithm. After removing probesets that lacked sequence-specific hybridization intensity in any sample, the final dataset included hybridization intensities for 14,681 probesets. These were log-transformed and used in subsequent analyses. The transcript interrogated by each probeset was determined using the NetAffx database.21
Identification of differentially expressed genes
Identification of genes differentially expressed between CN and RM samples was performed with Cyber-T,22 which combines Student's t test with a Bayesian estimate of the intragroup variance obtained from the observed variance of probesets at a similar expression level. We have used this approach previously23 as it provides increased sensitivity for the identification of differentially expressed probesets with differential hybridization intensity without inflating the false-positive error rate. For the Cyber-T analysis, we set the Sliding Window Size parameter at 101 and the Bayes Confidence Estimate parameter at 10. To identify probesets with significant differential hybridization intensity between groups, we ranked probesets by their Cyber-T p-value and calculated a False Discovery Rate statistic.24 Hybridization intensities for the differentially expressed probesets were each z-score normalized (mean = 0, standard deviation = 1) and organized by hierarchical clustering using a Euclidean Distance Measure in DecisionSite for Functional Genomics (Spotfire, Somerville, MA).
Validation of microarray data
We used qRT-PCR to confirm gene expression levels of 4 of the 105 genes (CXCL2, FOS, FOSB, KLF6). Each gene was examined in 6 samples (3 CN and 3 RM). A total of 16 independent samples (8 RM and 8 CN) had sufficient remaining unamplified RNA for additional studies, and each was examined with 1–3 test genes, plus the control. For each qRT-PCR validation, 5 ng of total unamplified RNA were reverse transcribed (Multiscript RT and TaqMan RT reagent kit, Applied Biosystems, Fostercity, CA). The RT reaction was performed using random hexamer, in a total volume of 25 μl and carried out at 25°C × 10 min, 37°C × 60 min, 95°C × 5 min. The PCR reaction was performed in a 25 μl volume, which included 11.25 μl cDNA solution, 12.5 μl Universal Mastermix (ABI) and 1.25 μl TaqMan gene expression assay (ABI) of the gene to be validated. PCR was performed at 95°C × 10 min followed by 40 cycles of 95°C × 15 sec and 60°C × 1 min. Amplifications for each of the 4 test and 1 control gene (GUSβ25) were conducted in duplicate and monitored (TaqMan Gene assays and Prism 7000 Sequence Detector System, ABI). Standard curves for each gene were generated from samples of known concentration. From these curves, the absolute quantity of each gene was determined for each sample, and the relative quantity of each test gene was calculated after normalization with GUSβ.
For immunohistochemical (IHC) corroboration of the microarray data, 5-μm serial sections were cut from paraffin blocks of both RM and CN samples, and mounted on Shandon Colormark Plus slides, deparaffinized in xylene and rehydrated in graded alcohol to water. The slides were steamed in Vector Retrieval Buffer for 60 min, blocked for 20 min with 10% normal goat serum in PBS and 30 min with Vector Avidin/Biotin blocking kit. The primary antibody (FosB rabbit monoclonal antibody, cell signaling, 17 μg/ml) at 1:50 dilution was incubated overnight at 4–6°C. Rabbit IgG (Vector, 5 mg/mL) was used as the negative control at the same concentration as the primary antibody. Using Vector Vectastain Elite ABC Kit, slides were then reacted with a biotinylated secondary antibody (goat anti-rabbit) and incubated with preformed avidin–biotin-peroxidase complex (ABC reagent). Diaminobenzidine (DAB) was used as a substrate. Sections were counterstained with hematoxylin, dehydrated, and mounted. Computer assisted morphometric analysis was performed (iVision Automated Digital Image Analysis System with proprietary software, BioGenex San Ramon, CA). The pathologist was blinded to which group each sample belonged. The analysis was performed on 5 regions of interest for each case, selected to include glandular tissue and avoid stroma. The results were averaged and expressed as percent positive staining.
Molecular and functional analysis of the microarray data
For classification of genes into biological categories, we used the EASE program (http://david.niaid.nih.gov/david/ease.htm,) which calculates overrepresentation of GO26 categories among genes on the gene list, compared to all genes on the chip used to generate the list, using Fisher's Exact Test. These analyses were supplemented by queries to UniProt (www.expasy.uniprot.org) and the literature. Placement of genes onto biological pathways was performed using the KEGG pathways (http://www.genome.jp/kegg/pathway.html) and 2 commercially available tools (iPATH, which maps genes to 225 well-established signaling and metabolic pathways based on the literature (http://escience.invitrogen.com/ipath/index.jsp); and Ingenuity Pathway Analysis (http://www.ingenuity.com) which connects a gene list to hypothetical networks of interacting genes derived from the literature).
Samples and patients
Microarray analyses were performed on 29 samples from 29 patients belonging to 2 groups: (i) the CN group, consisting of 14 samples of histologically normal TDLUs microdissected from 14 patients with ER+ ductal breast cancers undergoing surgery (median age = 49 years, range: 34–65); (ii) the RM or control group, consisting of 15 samples of histologically normal TDLUs microdissected from 15 patients at usual risk of breast cancer, undergoing breast reduction surgeries (median age = 47 years, range: 41–60). No patient had received chemo- or radiation therapy. Although no genotyping of the breast cancer cases was done, the subjects' available histories and the tumors' immunophenotypes27 suggest that only a small proportion were likely to represent BRCA-associated tumors (see Supplemental File 1). The microarray data from these samples, including the raw probe-level hybridization intensities, are freely available from the NCBI Gene Expression Omnibus under accession GSE9574.
Genes differentially expressed between the RM and CN groups
We analyzed the probeset hybridization intensities for differences between the RM and CN samples, using the Cyber-T-test and identified 127 probesets with a p-value < 0.0009, corresponding to a false-discovery rate < 0.10. Figure 2a shows the relative intensity of these probesets in each of the 29 samples. Among the 127 probesets are 7 that represent ESTs not yet assigned to a genetic locus, and 25 that represent a total of 10 genes that are detected by multiple probesets (range: 2–4 probesets per gene). When these were accounted for, 105 distinct locus-assigned genes were differentially expressed. Forty of 105 (38%) were overexpressed and 65 of 105 (62%) were underexpressed in CN compared to RM. Additional information about these probesets is provided in Supplemental File 2.
Validation of microarray data
We used qRT-PCR to confirm gene expression levels of 4 of the 105 genes (CXCL2, FOS, FOSB, KLF6) selected based upon consistent expression levels in the 29 RM and CN samples. Each gene was tested in 6 independent samples (3 RM and 3 CN). A total of 16 independent samples (8 RM and 8 CN) had sufficient remaining unamplified RNA, and each sample was examined with 1–3 test genes plus the control gene. As shown in Figure 3, we found that the relative abundance of every test gene recapitulated the relative expression levels on the microarray.
In a different approach to confirming the microarray data, we examined protein expression of FOSB in 8 microarray cases (3 RM and 5 CN) by IHC. FOSB was chosen because it had been evaluated by qRT-PCR, a reliable antibody was commercially available for use in formalin-fixed, paraffin-embedded tissue, and the protein's nuclear location makes quantitative automated image analysis feasible. One RM and 3 CN had been examined by qRT-PCR for FOSB transcript expression and the others had not. As shown in Figure 4, IHC corroborated what was seen in the microarray and by qRT-PCR. In all 3 RM cases, ∼70% of the ductal epithelial cell area stained for FOSB, whereas in 4 of 5 CN cases, ∼20% or less of the ductal cell area stained for FOSB. In the 5th CN case, FOSB staining resembled the level seen in RM tissues (69%).
To evaluate if the CN vs. RM expression data were relevant to breast cancer, we examined gene expression in a set of 6 CIS from independent patients with ER+ breast cancers collected using the same protocols as part of a separate study (manuscript in preparation). Despite the CIS patients being substantially older than the CN or RM patients (median age = 76, range 48–92), we found that the majority of the genes differentially expressed between CN and RM samples were also differentially expressed between CIS and RM samples. Specifically, 104 of the 127 (82%) probesets were also differentially expressed between the CIS and RM samples (Cyber-T-derived FDR < 0.10). For 102/104 (98%) probesets, the direction of differential expression in CIS was the same as in CN (χ2p-value ≪ 0.0001). These data are summarized in Figure 2b and Table 1.
Table I. Functional Classification of Genes Differentially Expressed in Epithelium from ER+ Breast Cancers (CN) Compared to Reduction Mammoplasty Controls (RM)
Genes and functional classes
CN vs. RM
Fold chg: CN vs. RM
CIS vs. RM
Fold chg: CIS vs. RM
Implicated in cancer previously
Each gene is listed only once, although genes may have multiple functions. The direction of differential expression is indicated with an arrow: higher is an upwards arrow, lower is a downwards arrow. A similar scheme is used for genes with significant differential expression in carcinoma in situ (CIS) relative to RM.
Immediate early (IE) gene.
Numbers in parentheses refer to references listed in Supplemental File 3.
For this gene, two probesets yielded contradictory results. For all other genes represented by multiple probesets, all probesets showed the same direction of change.
To further evaluate the relevance of our CN vs. RM results to breast cancer, we examined an independent dataset of genes differentially expressed between invasive ductal carcinomas and normal luminal epithelium cultured from reduction mammoplasties.28 We found that 75 of the 105 genes in our list were also differentially expressed between cancer and normal epithelium in that study, and that 80% of these 75 genes showed the same direction of change in both datasets, indicating significant concordance (χ2 test; p = 0.0002) and demonstrating the similar differential expression of the majority of the 105 genes in an independent data set (see Supplemental File 2).
Functional analysis of microarray data
We took several approaches to identify the potential functional significance of the 105 differentially expressed genes. We used gene ontology (GO) to classify each differentially expressed gene into functional categories and then to determine if any categories were overrepresented compared to all genes on the array. The most overrepresented GO molecular function and biological process categories relate to DNA binding and various types of transcriptional activity (see Table 2). This is reflected in the numerous transcription factors that are differentially expressed, including AP-1 components, Kruppel-like factors, nuclear hormone receptors and zinc-finger proteins (see Table 1). Other significant GO-defined categories included G-protein coupled- and chemokine-receptor binding and activity, and cell proliferation, metabolism and response to various stimuli (see Table 2).
Table II. Functional Classification of the 105 Differentially Expressed Genes, by Go, and their Overrepresentation (By EASE)
No. of genes
All categories with EASE scores < 0.05 are listed.
Transcription regulator activity
Transcription factor activity
Nucleic acid binding
Transcription corepressor activity
G-protein-coupled receptor binding
Chemokine receptor binding
Regulation of transcription, DNA-dependent
Regulation of transcription
Negative regulation of transcription, DNA-dependent
Nucleobase, nucleoside, nucleotide and nucleic acid metabolism
Regulation of cellular process
Negative regulation of transcription
Regulation of transcription from Pol II promoter
Response to stimulus
Negative regulation of transcription from Pol II promoter
Response to biotic stimulus
Regulation of cell proliferation
Response to external stimulus
Regulation of biological process
Negative regulation of cell proliferation
Cell growth and/or maintenance
We also evaluated the pathways linking the differentially expressed genes. Using the KEGG pathway databases, as well as the iPATH and Ingenuity programs, we found that the MAPK signaling cascade contained the most genes from the list (DUSP1, DUSP2, FOS, GADD45β, JUN, JUND, NR4A1). The pathways containing the next largest number of genes were the cytokine–cytokine receptor interaction pathway (CCL2, CXCL1, CXCL2) and the calcium-signaling pathway (GNAS, ATP2B2, ADCY2). The genes noted above (except ADCY2) were underexpressed in CN epithelium compared to RM (see Table 1). Any functional connections among the 40 genes that were overexpressed in CN epithelium remain to be discovered.
Finally, we reviewed putative functions and categorization of the encoded proteins in the Uniprot database and the published literature. We noted a large number (n = 16) of immediate early (IE) genes. We also noted that at least 32 of the 105 genes (31%) had been implicated previously in breast carcinogenesis and 34 additional genes (32%) had been implicated in other cancers, leaving 39 genes (37%) not previously reported to be associated with cancer (see Table 1). Some belong to functional categories implicated in cancer, and others are genes currently of unknown function.
The current understanding of events that initiate or predispose to breast carcinogenesis is limited. Therefore, the present study evaluated global gene expression in tumor-adjacent, histologically normal breast TDLUs microdissected from patients with untreated ER+ breast cancers, compared to TDLUs from control patients with no increased breast cancer risk. We identified differences in 127 probesets, corresponding to 105 genes. Most differences were maintained in a set of CIS. The 105 genes included a large group of transcriptional regulators, IE genes, and members of signaling pathways. The majority of these genes were expressed at lower levels in epithelium from women with cancer. One-third of the genes have been implicated previously in breast cancer, another third have been implicated in other cancers, and a final third have not been associated with cancer before. We cannot determine if these changes represent an effect of the tumor or an occult premalignant condition. But taken together, the data suggest that perturbations of key cellular functions are identifiable prior to the development of any histological abnormality, and that these perturbations may play important roles in the early stages of breast carcinogenesis.
Several potential objections could be raised to our study. The number of patients investigated is small, due to practical limits on the number of samples that can be investigated meticulously. However, a counterbalancing strength of the study is its use of primary uncultured epithelium, which eliminates introduction of artifacts inherent in cultured cells. We used amplified RNA, because only nanogram quantities are available from microdissected epithelium; however, we (and others) have shown that this approach yields reliable and reproducible data in which the biological variation between samples is greater than the technical variation between replicates.20 The data may not be generalizeable to ER-breast cancers, but that would not be unexpected, given breast cancers' considerable intrinsic heterogeneity.
Despite these potential objections, the data raise several points for consideration. First, how do our results compare to existing expression data from human breast tissue? Most existing breast tissue expression signatures were derived to predict tumor subtype29, 30 or disease outcome,31–35 or to distinguish luminal from myoepithelial cells in RM tissue,28, 36 as opposed to distinguishing between patients with and without breast cancer, and so are not directly comparable to our data. It is therefore not surprising that few of the genes that we find to vary between CN and RM epithelium have been useful in predicting tumor subtype, disease outcome, or epithelial cell type (analyses not shown).
Other studies are more comparable to ours
One found no gene expression differences between RM and tumor-adjacent normal epithelium by unsupervised hierarchical clustering.37 We also could not discern differences between CN and RM by unsupervised hierarchical clustering or principal component analysis of all genes (results not shown). This may be due to the presence of genes that vary from patient to patient and obscure the consistent differences between CN and RM that we see when comparing directly these 2 sample types. In contrast, there is overlap between our results and those reported to distinguish TDLUs from an early hyperplastic breast cancer precursor.38 There is also overlap between our results and those reported in a study comparing luminal epithelium from RM and cancers.28 These reports, combined with the fact that the CN vs. RM differences are largely preserved in the independent CIS samples we examined, suggest that the CN vs. RM differences are authentic alterations reflecting a breast cancer related process.
Second, although the CN vs. RM differences appear authentic, we cannot distinguish whether they represent cause or effect, i.e., an occult premalignant condition, or secondary changes due to the tumor or its surrounding stroma. We favor the former explanation, because of the similarity of the CN vs. RM differences to cancer microarray data. However, we cannot determine how far the affected area might extend geographically, since all CN TDLUs were tumor-adjacent. Tissue that is adjacent to a breast tumor may harbor more, or different, genomic abnormalities than tissue that is more distant.4
If the CN vs. RM differences represent a primary abnormality, then the identification of genes whose expression varies in normal epithelium from patients with breast cancer, compared to controls, suggests mechanisms that may predispose to breast cancer development or are active early in carcinogenesis. If the CN vs. RM differences represent a secondary abnormality, then they can illuminate direct or paracrine effects occurring in vivo. Regardless, the largest functional category among the 105 genes is transcription factors and regulators, especially members of the composite transcription factor AP-1. Transcription regulators are implicated frequently in breast carcinogenesis (for review see Ref.39). Many transcription-related genes (23/29 (79%)) were underexpressed in CN (and CIS) samples, which may reflect a generalized decrease in transcriptional activity, rather than involvement of a specific family. Also notable among the 105 genes was a large group (n = 16) of IE genes, which are rapidly induced upon cell stimulation and whose transcription is not dependent on protein synthesis. The IE genes were also underexpressed in CN (and CIS) epithelium. In addition, many of the 105 genes participate in signaling pathways. The largest number participates directly in the MAPK pathway, and others may affect MAPK signaling more peripherally. Considerable evidence supports the involvement of MAPK in breast cancer (for reviews see Refs.39 and40). Increased MAPK activity is usually reported, but the tumors examined have been mainly ER-negative and ERBB2-overexpressing.41, 42 In contrast, we found decreased expression of MAPK components, which could be related to using tissue from ER-positive tumors, or may reflect an initial step in the pathway's perturbation.
A final consideration is how can these data be utilized. If validated in future studies, the genes or pathways implicated here could identify new targets for chemoprevention, or help prioritize those already being studied.43 If differential expression of these genes can be detected in women without evident breast cancer, and associated with future disease, then they may be pertinent to risk assessment, since breast cancer risk is not thought to be uniform across all women.44, 45 DNA structural variants46–48 or single nucleotide polymorphisms might alter RNA expression49 and be associated with risk of disease.
To our knowledge, this is the first study to find gene expression differences between histologically normal epithelium of breast cancer patients and breast-cancer free controls. Our findings suggest that cancer-related pathways are already perturbed in normal epithelium of breast cancer patients. These perturbations could be markers of disease risk, of occult disease, or of the tissue's response to an existing tumor. Future studies should expand upon these results by examining expression of these genes in additional samples from breast cancer patients and controls, manipulating these genes' expression in model systems and developing clinically useful disease and risk classifiers.
This work was supported by grants from the Department of Defense Breast Cancer Research Program (DAMD17-01-1-0159) and the NIH (RO1 CA081078, S10 RR021211) to CLR.