Esophageal cancer is a particularly aggressive tumor with poor prognosis. It is the eighth most common cancer in the world, sixth most common cause of death from cancer.1 The 2 major histological subtypes of esophageal cancer are squamous cell carcinoma (SCC) and adenocarcinoma (ADC). Over the last few decades, whilst the incidence of SCC has remained stable, the number of cases of ADC of the esophagus has risen 4-fold in the Western world and although not at the same rate, the incidence of ADC has risen in Asia as well.2, 3, 4, 5 The reason for this marked increase in ADC is unclear.
ADC is thought to arise from an acquired precursor condition known as Barrett's esophagus (BE).6, 7 BE is a condition in which squamous mucosa of the lower esophagus is replaced by columnar epithelium, possibly as a protective response to reflux of bile and acid across the gastro-esophageal junction. Not all patients exhibiting BE progress to cancer, however, BE confers a 30-fold increased risk of developing ADC.8, 9 At present there are no clinical or histological parameters that allow for prediction of which patients with BE will progress to ADC. For patients with BE, endoscopic surveillance is used in an effort to reduce mortality through early detection of progression to ADC. However, only about 5% of ADC arise in patients with previously diagnosed BE10 and the procedure is expensive and invasive. Understanding the molecular mechanisms underlying the development of BE and progression to ADC is key to the rational design of novel strategies for early detection and therapeutic intervention. However, our current knowledge of the genes and pathways involved in this process is limited.
Gene expression profiling by microarray has proven to be a powerful technique for delineating genes and pathways associated with cancer progression (reviewed by Bucca et al.11). Previous analysis of esophageal samples through gene expression profiling have focused on specific histological subtypes.12, 13, 14, 15, 16, 17 Few studies to date have conducted comparative analysis of gene expression profiles of BE, ADC, and SCC tissue samples in the same study. Thus, to obtain insight into the molecular processes underlying tumorigenesis in the esophagus, we have a used a cDNA microarray approach to compare gene expression profiles of a large number of BE, ADC, and SCC with normal esophageal mucosa. Our thorough, unbiased analysis of genes differentially expressed between each of the tumor types will lead to a greater understanding of the molecular pathways involved in progression to ADC and SCC of the esophagus. Our data provides a rich source of information for further studies into the molecular basis of tumorigenesis of the esophagus, as well as identification of biomarkers for early detection of progression.
Material and methods
In total, 128 samples were profiled, including 25 BE, 38 ADC, 26 SCC and 39 normal samples from a total of 83 patients (Supplementary Table 1). Patient biopsy samples were collected during diagnostic endoscopy from patients seeking treatment at the Peter MacCallum Cancer Centre and St. Vincent's Hospital, Melbourne, Australia. Upon collection all biopsies were immediately immersed in RNAlater solution (Ambion) and stored at −20°C. Biopsies were divided in half with one of the pieces undergoing independent histopathological review and the other prepared for RNA extraction. Tumor biopsies containing less than 75% tumor were not used. Biopsies of normal mucosa were taken from normal appearing esophageal squamous mucosa greater than 5 cm proximal to tumor margin and confirmed to be 100% normal by independent histopathological review.
Microarray hybridization and scanning
RNA was extracted and purified from each sample by a 2-step process involving a phenol–chloroform extraction (Trizol, Life Technologies) followed by column chromatography (RNeasy minicolumns, Qiagen). Total RNA (4 μg) was amplified using a modified Eberwine method,18 which utilizes a T7 RNA polymerase to linearly amplify total RNA. First stand cDNA synthesis commenced with the annealing of T7 PolyT primer to the total RNA template. The second DNA stand was produced in the presence of dNTPs, DNA Polymerase I (Promega), E. coli DNA ligase, and RNase H (Invitrogen). Transcription of RNA from double-stranded cDNA was performed with a MEGAscript T7 Kit (Ambion) according to the manufacturer's protocol. Amplified RNA was then purified using RNeasy Minicolumns (Qiagen). An indirect labeling protocol was employed to label amplified RNA samples (10 μg) with cyanine-5 (Cy5) fluorescent dye (Amersham Biosciences). Aminoallyl-dUTP was incorporated into the cDNA backbone during reverse transcription of amplified RNA followed by chemical coupling of Cy5 to the aminoallyl side chains. A common reference cDNA (extracted from a pool of 11 cell lines; MCF7, Hs578T, OVCAR3, HepG2, NTERA2, MOLT4, RPMI-8226, NB4+ATRA, UACC-62, SW872, and Colo205, as described by Perou et al.19) was prepared and labeled with cyanine-3 (Cy3) fluorescent dye in a similar way to the amplified sample RNA. The sample and common reference cDNA were then competitively hybridized to a cDNA array printed at the Peter MacCallum Cancer Centre Microarray Core facility. Each cDNA array contained 10,500 elements representing ∼9,400 unique cDNA clones. These arrays have undergone extensive quality control measures and expression data has been validated by quantitative real time PCR of selected genes.20, 21 Hybridized slides were scanned with a confocal laser scanner (Agilent Technologies). Data from each channel, Cy3 and Cy5, were processed using GenePix Pro Software 4.1 (Axon Technologies) to calculate the mean intensity for each spot. Raw array data and protocols are available at http://www.ebi.ac.uk/arrayexpress/Exp#.
Further analysis of signal intensity was undertaken using the limma software package.22 Background correction was based on the local median estimator, but used as a moving minimum window of 9 spots to reduce the tendency of low intensity spots to have more variable log-ratios. The Cy3 and Cy5 log-intensities for each spot were summarized as M and A-values and normalized for dye-bias and spatial effects using print-tip loess normalization.23 All probes with an average A-value less than 6 across all arrays were filtered out, leaving 9,386 probes. Differential expression between each of the 4 tissue types (normal squamous epithelium, BE, ADC, and SCC) was assessed using the methods of Smyth22 with extensions to accommodate repeated measurement on patients.24 Gene-wise linear models with effects for each tissue type and print-run were fitted to the log-ratios using generalized least squares treating repeat observations on patients as correlated blocks. The overall intrapatient correlation was estimated as a robust average of the intrapatient correlations for each gene. This gave an overall correlation of 0.22. The linear models were then refitted using generalized least squares treating the intrapatient correlation as known. The gene-wise residual standard deviations were moderated using the empirical Bayesian method.22 Differential expression was assessed using the moderated F-test22 and the classifyTestsF function of the limma package. A probe was considered to show differential expression between the tissue types (without specifying which specific tissue types were different) if the raw p-value was less than 0.00001. After adjustment for multiple testing, this corresponds to a conservative overall p-value25 of ∼0.05. The false discovery rate26 is estimated to be less than 0.00003, meaning it is probable that there are no false discoveries.
Hierarchical clustering was performed with a Pearson's similarity metric and complete linkage27 using Gene Cluster 3.0 and viewed with TreeView.28 Linear discriminant analysis was performed using tools available in the R statistical package. Gene ontology (GO) analysis was performed using GoStat29 with a false discovery rate of 0.05.
Results and discussion
Analysis of tissue types
A total of 128 tissue samples were collected including 25 BE, 38 ADC, 26 SCC, and 39 normal samples (Supplementary Table 1). Multiple tissue types were collected from 37 patients. The age of patients ranged from 28 to 86, with a mean of 63. The tumor stages were distributed between >IIB and IVa, with 28% ≥ IIB, 45% II–III, and 27% ≤ IV. BE patients ranged from no dysplasia to high grade dysplasia.
Unsupervised hierarchical clustering of the 9,386 genes remaining after normalization and background correction separated the tissue types into 4 distinct clusters (Fig. 1a). Consistent with their squamous phenotype, normal and SCC samples cluster closely and separately from the BE and ADC samples, which are columnar in nature. The normal and BE clusters were clearly defined, with only 1 ADC sample clustering with the BE, interestingly next to the ADC sample from the same patient (42). The BE and ADC samples for another patient (60), also clustered next to each other on the ADC branch. Despite the samples exhibiting expression consistent with their histological subtype, some inconsistent clustering occurred among the ADC and SCC branches.
Linear discriminant analysis was performed on the subset of genes identified as differentially expressed between any 2 tissue types to find the best linear function of genes to separate the tissue groups based on their known histology. The plot of the tissue samples over the 2 linear discriminant functions, LD1 and LD3, shows a similar distance between groups as was shown in the cluster dendrogram (Fig. 1b).
Analysis of differentially expressed genes
A total of 3,516 clones were identified as differentially expressed between any 2 tissue types; 2,158 between normal and BE samples, 2,913 between normal and ADC samples, 1,306 between BE and ADC, 1,958 between normal and SCC sample, and 546 between ADC and SCC (Supplementary Table 3). There was a large overlap among the groups (Fig. 2).
To relate the gene expression observed with gene function, GO analysis was used to determine significantly over represented biological processes, cellular components, and molecular functions within the entire set of differentially expressed genes, and sets of genes having greater than a 4-fold change in expression between sample. The complete GO analysis results are available in Supplementary Tables 4 and 5. However, in summary, when compared to normal esophageal epithelium, BE and ADC samples were found to contain an over representation of genes involved with tissue development, specifically keratinization, intercellular junctions, calcium–ion binding, and endopeptidase activity. An increase in genes associated with response to external and biotic stimulus, as well as immune and inflammatory response, collagen catabolism, and proteolysis, were found in ADC samples, but not in BE. Similarly, genes responsible for the biological processes of digestion and alcohol metabolism were over represented in BE tissue, but not in ADC. Similar ontologies have previously been identified for genes differentially expressed in BE and ADC.17
A number of categories were found over represented in the SCC and ADC samples. These included immune and inflammatory response genes, as well as collagen binding, peptidase activity, and extracellular matrix genes. Categories identified as specifically over represented in SCC tissues include cell death and apoptosis, regulation of the NFKβ cascade, metallopeptidase activity, and plasminogen activator activity. In comparison with SCC, the ADC samples over represented genes involved in calcium–ion binding.
Cluster analysis of genes
Hierarchical clustering of differentially expressed genes produced a dendrogram with 3 main branches, denoted 1–3, based on the relative expression level shown in BE, ADC, and SCC (Fig. 3).
Genes up-regulated in cancer.
Branch 1 contains genes that are predominantly up-regulated in SCC and ADC. Functional annotation through GO analysis revealed that these genes were involved with the extracellular matrix, collagen binding, immune response, the cell cycle, spindle organization, and signal transduction (Supplementary Table 6). Branch 1 can be subdivided into 3 clusters A–C. Cluster A genes were up-regulated in SCC and ADC, but showed varying expression in BE. No significant enrichment for any GO categories was detected in Cluster A. Cluster B contains genes up-regulated across SCC, ADC, and BE. These genes are specific to the plasma membrane; represent MHC class 1 receptors and immune response genes. Cluster C represents an “esophageal cancer cluster,” including genes responsible for chemokine and cytokine activity, immune response, DNA metabolism, mitosis, and spindle organization. Specific gene subclusters, referred to as the “SPARC” cluster, immune response cluster, and proliferation cluster were identified within C (Fig. 3, Supplementary Table 7).
SPARC has been linked to cancer and wound repair, through its affect on the extracellular matrix30 and has previously been linked to esophageal carcinoma progression.31SPARC clustered with collagens, whose expression was specifically enhanced in ADC and SCC, as well as genes involved in the defense response and proteolysis. Previous gene expression profiling of SCC has similarly shown SPARC and COL1A2 clustering next to each other.32
The “immune response cluster” within branch C includes PTGS2 (cyclo-oxygenase 2). Linked to inflammation and proliferation, PTGS2 has been shown to have enhanced expression in ADC and SCC,33, 34, 35, 36 however conflicting results for BE exist.35, 37PTGS2 clustered with the stromelysins, MMP3 and MMP10, chemokines and SOCS3, which is involved in cell death. MMPs have long been believed to be involved in the breakdown of the extracellular matrix affecting tumor formation and growth, but have more recently been shown to target cytokines, chemokines, and cell adhesion molecules,38 groups of targets found to be affected throughout the samples analyzed.
A third subcluster identified within C is representative of a “Proliferation signature.”39 This cluster contains genes responsible for cytokinesis and cell cycle checkpoints, specifically genes involved in the M phase of the mitotic cell cycle including BUB1, STK6, CENPA, CENPE, TOP2A, CHEK1, and CDC2.
Genes down-regulated in esophageal cancer.
Branch 2 contains genes that are largely down-regulated in esophageal tumors compared to normal tissue. GO analysis of genes within Branch 2 revealed their involvement in oxidoreductase and phosphoprotein phosphatase activity, and cell junctions. Branch 2 can also be divided into 3 separate clusters D, E, and F. The majority of genes in Clusters D and F are down-regulated across the SCC, BE, and ADC samples but no specific gene ontologies were observed. Cluster E, contains genes specifically down-regulated in ADC and BE, not SCC. Cluster E is enriched for genes with enzyme inhibitor activity, specifically the serpins, negative regulators of apoptosis and genes responsible for keratinization (Supplementary Table 7). The serine protease inhibitors were found to cluster with a smaller group of genes containing the kallikrein serine proteases. This subcluster includes KLK13, 10, 11, 8, and 7, SERPINB2, SERPINB3, SLP1, CSTB, CSTA, and SPINK5 (Fig. 4). Interestingly, these genes appear to be coordinately regulated. In the samples in which they were down-regulated, all the genes were down-regulated. However, a number of samples showed coordinated up-regulation. The kallikreins are all located on a region of chromosome 19q13.3–13.4.40 This fact and their apparent coordinate regulation may indicate a frequent deletion of this chromosomal region in esophageal cancers. Further analysis of this region is necessary to determine if the pattern of expression observed is caused by a deletion specific to certain ADC or SCC samples, or if there is an alternative biological reason for the loss of kallikrein and serpin expression in these samples.
Cluster E also contained a group of genes centered on the VAV3 oncogene. Vav3 has been implicated in control of cytokinesis, its deregulation leads to cytoskeletal changes.41, 42 Vav3 clusters with desmoplakin (DSP), a constituent of the cytoskeleton, as well as genes involved with lipid and protein binding. A calcium–ion binding and cell junction subcluster was also identified within E, which includes DSG1 and GJB2.
Genes up-regulated in BE.
Branch 3 predominantly contains genes that are up-regulated in BE, and can be divided into 2 distinct clusters G and H. Cluster G contains those genes up-regulated in BE, but not ADC or SCC. The genes in cluster G are largely specific to the mitochondria, as well as metallothioneins and genes responsible for cellular lipid metabolism and oxidoreductase activity. Cluster H contains genes that are up-regulated in both BE and ADC and appears to reflect genes that are differentially expressed between the squamous and columnar phenotypes. Genes in cluster H are predominately responsible for UDP-glycosyltransferase activity (Supplementary Table 7).
Participating in O-glycan biosynthesis, UDP-glycosyltransferases are responsible for the synthesis of mucins. Mucins have been identified as characteristic of BE and ADC.43, 44 Our analysis identified MUC2, MUC5AC, and MUC6 as being significantly up-regulated in BE in agreement with previous findings.44 Two “digestion” subclusters containing mucins were identified within H (Fig. 5). The first, a cluster of genes significantly up-regulated in BE centered on the mucin-associated trefoil factor TFF1, and the secretary mucin MUC5AC. The second digestion cluster showed a similar up-regulation in BE and included TFF3 and MUC2.
Other significant gene clusters were also identified within H. A “transcription factor cluster” contained HOXB5 and B6, FOXA3 and GATA6, as well as other genes relevant to transcription. A fourth cluster within H, the “hydrolase” cluster also contained a transcription factor, TCF2, as well as members of the nuclear receptor family (NR1H3 and NR1L2), a number of genes known to be associated with BE, including MUC3B and villin,45 and the hydrolases for which it was named, including, fucosidase, GALC, and lysozyme (Fig. 5). Lysozyme has been identified in the Paneth cells that line the lower intestinal crypt.46 Paneth cells are believed to be a marker of complete intestinal metaplasia, more often seen in gastrointestinal metaplasia than BE.47 Although lysozyme has not formally been linked to BE, paneth cells have been noted in BE and ADC.47, 48 The significant over-expression of lysozyme within our BE and ADC samples may suggest a greater presence of Paneth cells in these conditions.
In conclusion, analysis of gene clusters, their ontologies, and pathway involvement has allowed us to identify the differences and similarities between the 4 tissue types, normal esophageal epithelium, BE, ADC, and SCC. Through linear discriminant analysis and hierarchical clustering the samples could be resolved into 4 distinct clusters. Some inconsistent clustering occurred between the ADC and SCC samples, which might be expected considering that the esophageal cancer profile revealed was similar among the 2 groups, and the fact that they shared the least number of differentially expressed genes between them.
While SCC and ADC each represent a distinct overall gene profile, we have shown that they do have a coordinated regulation of genes sharing specific functional annotations. ADC and SCC of the esophagus have a mutual up-regulation of genes involved in carcinogenesis, including cell cycle regulators, extracellular matrix effectors, and immune response genes. A consistent co-ordinated regulation of kallikreins and serine protease inhibitors, and down-regulation of genes related to calcium–ion binding and gap junction in comparison to normal esophageal epithelium was also noted. However, overall analysis of the gene profiles revealed that the SCC samples were closely related to normal esophageal epithelium, a finding that is perhaps not surprising given that they share the squamous phenotype.
As with the previous study by Wang et al.,17 BE was found to cluster more closely with ADC than normal esophagus, a result consistent with the columnar phenotype of BE and ADC compared with the squamous phenotype of esophageal epithelium. Similarly, SCC clustered more closely with normal tissue then either BE or ADC. This separation between the squamous and columnar tissues were predominantly dictated by genes associated with the columnar phenotype observed after intestinal differentiation in BE and ADC, which may be used as biomarkers of this transformation. These clusters were dominated by hydrolases, including lysozyme and fucosidase, transcription factors, mucins, and the trefoil factors, known to be markers of digestion.
Previous gene expression profiling analysis of esophageal cancer has often focused on specific histological subtypes. Few studies have completed comparative analysis of BE, ADC, and SCC. In these a limited number of samples were analyzed (1313 and 2212) and the number of samples was not evenly divided among the subtypes. Our analysis of 128 samples stratified across 4 tissue types, and the implementation of a moderated F-test to identify statistically differentially expressed genes among each of the groups, has allowed us to complete a thorough comparative analysis of esophageal cancer. Through hierarchical clustering we were able to group the expression data into clusters of coregulated genes across the subtypes, rather than providing simple lists of differentially expressed genes. Clustering also allowed us to show that in multiple cases different clones of the same gene cluster next to each other, demonstrating the reproducibility of our data. We have identified clusters containing genes known to be associated with BE, ADC, and SCC analogous to clusters described by others. For example, the SPARC and proliferation clusters, described by Su et al.32 and Whitfield et al.,39 and a cluster containing trefoil factor 1 by Fox et al.49 was consistent with our digestion cluster in BE and ADC.
In summary, we have conducted gene expression analysis on 128 esophageal tissue samples and matched normal samples, and identified 3,516 genes that are differentially expressed between the different tissue types. Ontological analysis of the differentially expressed genes and gene clusters has revealed valuable insights into the molecular and functional differences between the normal epithelium, the 2 distinct forms of esophageal cancer, and the ADC precursor, BE. These results are consistent with previous ontology analysis of ADC and BE.17 Our analysis clearly shows the overlap in genes expressed in both BE and ADC and between ADC and SCC. However, we were also able to show the distinct nature of SCC and ADC. They each represent 2 distinct cancer types arising from 2 different cell types, squamous and columnar. The similarities and differences in gene expression observed between each of these groups will lead to a clearer understanding of tumorigenesis of the esophagus. Our results provide a rich source of data for analysis of specific genes or pathways related to tumorigenesis in the esophagus, and for the identification of potential new biomarkers and/or treatment targets.
The authors acknowledge the support and encouragement of Prof. D. Bowtell and the Peter MacCallum Cancer Centre Microarray Facility. We also thank the many surgeons and endoscopists who provided tissue samples for this study for their valuable assistance.