Genomic profiling of 766 cancer-related genes in archived esophageal normal and carcinoma tissues



We employed the BeadArray™ technology to perform a genetic analysis in 33 formalin-fixed, paraffin-embedded (FFPE) human esophageal carcinomas, mostly squamous-cell-carcinoma (ESCC), and their adjacent normal tissues. A total of 1,432 single nucleotide polymorphisms (SNPs) derived from 766 cancer-related genes were genotyped with partially degraded genomic DNAs isolated from these samples. This directly targeted genomic profiling identified not only previously reported somatic gene amplifications (e.g., CCND1) and deletions (e.g., CDKN2A and CDKN2B) but also novel genomic aberrations. Among these novel targets, the most frequently deleted genomic regions were chromosome 3p (including tumor suppressor genes FANCD2 and CTNNB1) and chromosome 5 (including tumor suppressor gene APC). The most frequently amplified genomic region was chromosome 3q (containing DVL3, MLF1, ABCC5, BCL6, AGTR1 and known oncogenes TNK2, TNFSF10, FGF12). The chromosome 3p deletion and 3q amplification occurred coincidently in nearly all of the affected cases, suggesting a molecular mechanism for the generation of somatic chromosomal aberrations. We also detected significant differences in germline allele frequency between the esophageal cohort of our study and normal control samples from the International HapMap Project for 10 genes (CSF1, KIAA1804, IL2, PMS2, IRF7, FLT3, NTRK2, MAP3K9, ERBB2 and PRKAR1A), suggesting that they might play roles in esophageal cancer susceptibility and/or development. Taken together, our results demonstrated the utility of the BeadArray technology for high-throughput genetic analysis in FFPE tumor tissues and provided a detailed genetic profiling of cancer-related genes in human esophageal cancer. © 2008 Wiley-Liss, Inc.

Cancer is the result of a series of genetic or epigenetic changes,1 including aneuploidy, multiple gene amplifications, deletions and translocations.2 These genetic instabilities are caused by either inherited mutations in genes that monitor genome integrity or mutations that are acquired in somatic cells during tumor development. Environmental risk factors and individual cancer genetic susceptibilities could contribute to tumor development and progression by facilitating the inactivation or loss of tumor suppressor genes and by favoring the activation or amplification of oncogenes.3 Thus, comprehensive analysis of genetic alterations in tumors and identification of genes involved in tumorigenesis has been a major focus of cancer research.

Human esophageal cancer is one of the most common fatal cancers worldwide with a 5-year survival rate of less than 10%.4 The high incidences of the disease have been reported in certain areas of China, Japan, Iran, France, Italy and South African. In Linxian County, Henan province of China, for instance, the age-adjusted mortality rates for esophageal cancer have been reported as 150/100,000 for men and 115/100,000 for women.5 Although environmental and nutritional factors as well as cultural habits are thought to play important roles in esophageal carcinogenesis, multiple genetic alterations associated with the disease have been described. These include frequent amplification and over-expression of the cellular proto-oncogenes encoding epidermal growth factor receptor (EGFR), c-MYC and cyclin D1, loss of and/or mutation in tumor suppressor genes (e.g., p53, Rb and p16) and death pathway genes (e.g., FAS and FAS ligand).6–9 A recent report discussed frequent copy number aberrations in esophageal squamous cell carcinomas10, 11; and although several genomic regions were identified, further work is needed to reveal the direct corresponding gene(s).

We have developed a flexible, accurate and high-throughput single nucleotide polymorphism (SNP) genotyping system for large-scale genetic analysis.12 It includes a miniaturized BeadArray platform and a highly multiplexed SNP genotyping assay (GoldenGate® assay).13 Since the GoldenGate assay interrogates only ∼40–50 bp of sequence surrounding a SNP of interest, the assay can tolerate a certain degree of DNA degradation and allows reliable genotyping and targeted genomic profiling with partially degraded, low-quality DNA from FFPE tissues.14, 15 Formalin-fixed archival tissues represent an invaluable resource for genetic analysis in cancer, as they are the most widely available materials for which patient outcomes are known. The ability to perform genetic analysis in these samples will enable both prospective and retrospective studies, and should greatly facilitate research in correlating genetic profiles with clinical outcomes. In our study, we used the GoldenGate technology to analyze genomic DNAs (gDNA) extracted from 33 pairs of FFPE archived cancerous and matched normal adjacent tissues with 1,432 SNPs derived from 766 well-characterized cancer-related genes. We detected both novel and previously reported gene deletions/amplifications. Furthermore, we found SNPs showing significantly different allele frequency distribution between the esophageal cohort of our study and normal control samples from the International HapMap Project, suggesting that they may play a role in esophageal carcinogenesis.

Materials and methods

Tissue sample acquisition, medical data collection and DNA extraction

The study was approved by the Institutional Review Board of the Cancer Institute/Hospital, Chinese Academy of Medical Sciences. A total of 66 tissue samples from 33 patients diagnosed with esophageal cancer from 1998 to 2000 in Beijing city and Henan province, China, were entered into the study (see Supplementary Table I). This included the 33 esophageal tumor samples (mostly squamous cell carcinoma) and the 33 matched adjacent normal tissues. The samples were FFPE and stored at least for 4 years before use. The patients' medical data and lifestyle cancer risk factors (e.g., smoking, alcohol drinking and family history of cancer) were documented with informed consent.

Table I. Top 19 Genes Most Frequently Deleted in the 33 ESCC Samples
GeneChrPositionNumber of measured SNP(s) per geneNo. of individuals with LRR < −0.3
  1. We used a LRR < −0.3 cutoff to call a deletion event and required both technical replicates to meet this cutoff, to make a call for a given individual.


To extract gDNA from these FFPE samples, deparaffinization was done by adding 800 μl Xylene to each reaction tube containing multiple 5-μm tissue sections; inverted several times, and centrifuged at 4,000 rpm for 3 min; removed the supernatant from the tube; added 800 μl Xylene and 400 μl EtOH (100%) to each tube, inverted several times, centrifuged at 4,000 rpm for 3 min; removed the supernatant and added 1 ml ETOH to each tube, inverted several times, centrifuged at 4,000 rpm for 3 min; removed the supernatant and centrifuged at 14,000 rpm for 1 min; carefully removed all the supernatant from the tube. Proteinase K treatment was done after the “high pure RNA paraffin kit” procedure (Roche, cat no. 3270289). DNA purification was done after the “high pure PCR template preparation kit” procedure (Roche, cat no. 1796828).

SNP genotyping on Illumina universal bead arrays

We used a high-throughput SNP genotyping assay described previously.13 Assay probes were designed for 1,432 SNPs from 766 cancer related genes (Supplementary Table II). On average, this panel contains a relatively even distribution of genes per Chromosomes (n = 33), with the greatest number of genes residing on Chromosome 1 (n = 61) and the fewest on Chromosome 21 (n = 11). For each SNP locus, 3 probes were designed: 2 allele-specific oligos (ASO) and 1 locus-specific oligo (LSO). The ASOs consisted of 2 parts: the locus-specific sequence and a universal PCR primer sequence at the 5′-end. The LSOs consisted of 3 parts: the locus-specific sequence, a unique address sequence, which is complementary to a capture sequence immobilized on the array, and a universal PCR primer sequence (P3) at the 3′-end. Assay oligos corresponding to the 1,432 SNPs were pooled and hybridized to the gDNA template. Hybridized ASOs were extended and ligated to their corresponding LSO to create a PCR template that was amplified subsequently with universal primers (P1, P2 and P3′). The PCR products, which were fluorescently labeled by incorporation of 5′-labeled primers P1 (Cy3) and P2 (Cy5), were hybridized to capture probes on the beads in the array. The ratio of the fluorescent signals from 2 allele-specific ligation products was used to determine the genotype. All of the SNPs were assayed on 1 array and 500 ng of gDNA from each individual sample was used for each array experiment. Each individual sample was assayed twice by the array. Standard software developed at Illumina were used for automatic image registration and intensity extraction.16

Table II. Top 18 Genes Most Frequently Amplified in the 33 ESCC Samples
GeneChrPositionNumber of measured SNP(s)per geneNo. of individuals with LRR > 0.3
  1. We used a LRR > 0.3 cutoff to call an amplification event and required both technical replicates to meet this cutoff, to make a call for a given individual.


Copy number detection

Genotyping data consists of 2 channel intensity data corresponding to the 2 alleles. Data is generated as rectangular coordinates of the raw A versus raw B allele intensities. After normalization, using Illumina BeadStudio 2.0, the genotyping data were transformed to a polar coordinate plot of normalized intensity R = Xnorm + Ynorm and allelic intensity ratio θ = (2/π)* arctan (Ynorm/Xnorm), where Xnorm and Ynorm represent transformed normalized signals from alleles A and B for a particular locus. The observed normalized intensity of the subject sample (Rsubject) was compared to the expected intensity (Rexpected) computed from a linear interpolation of the observed allelic ratio (θsubject) with respect to the canonical genotype clusters.17 This transformed parameter, the log2R ratio [log2(Rsubject/Rexpected)] was analyzed along the entire genome for all SNPs on the array. In some cases, the log2R ratio is shown for a given gene and this was calculated by averaging the log2R values of all SNPs within the gene of interest.

Allele frequency analysis

The “fisher test” function from R (Version 2.3.0 on i686-redhat-linux-gnu) was used to perform Fisher's exact test (for each SNP) on the allele frequency distribution between the esophageal cancer cohort of our study and the ethnically matched normal population (the Han Chinese population from the International HapMap study;18). The null hypothesis is that rows and columns in the contingency table are independent, implying the same AA, AB and BB genotype frequency distribution between the 2 populations. The default parameters for R's “fisher test” were used to report p-values.

Results and discussions

Genotyping of exonic SNPs with genomic DNAs extracted from FFPE human esophageal tissue samples

We employed the highly multiplexed GoldenGate SNP genotyping assay13 to identify both somatic DNA changes and germline genetic loci (i.e., cancer susceptibility genes) from archived human esophageal cancer samples. To this end, we compiled 1,432 SNPs from the exonic regions of 766 cancer-related genes (Supplementary Table II). These included (but are not limited to): tumor suppressor genes (e.g., CDKN2A, CDKN2B, BRCA1, APC); oncogenes (e.g., CCND1, ERBB2, EGFR, FGF12, VEGF); genes regulating cell growth and differentiation; and genes located within published genomic regions subject to deletion or amplification in cancer. We parsed the latest NCBI RefSeq database and retrieved mapped SNPs in these genes. The SNP collection contains both synonymous and nonsynonymous changes, as well as those that affect splicing.

All 1,432 SNPs were analyzed simultaneously by the GoldenGate assay using 500 ng gDNA isolated from each of the 33 paired FFPE esophageal cancer and adjacent normal tissue samples. The gDNAs isolated from these esophageal FFPE tissue blocks were partially degraded with an average size of 500–700 bp (data not shown). We also genotyped gDNAs isolated from 4 human lymphoblastoid cell lines as intact gDNA controls. The genotyping calls were made automatically, and each call was assigned a quantitative score that reflects quality. On average, we obtained high call rates in the FFPE cancer samples (97.2%), the FFPE adjacent normal tissue samples (98.6%) and the lymphoblastoid cell lines (99.7%) (Supplementary Table III); the difference between FFPE cancer and FFPE normal tissues likely reflect the fact that more chromosomal aberrations occurred in the cancer samples, making the determination of a genotype in those regions difficult. We also measured the concordance of genotype calls made from technical replicates, and obtained 98.3, 98.0 and 99.9% for these 3 sample groups, respectively (Supplementary Table III). These results, together with other recent reports,15 demonstrated that highly accurate genotyping results could be obtained from partially degraded gDNAs such as those derived from FFPE tissue blocks, using the GoldenGate assay.

Table III. Allele Frequency Distribution between the ESCC and the Normal (HAPMAP) Populations
SNP nameGeneAAABBBFunctionProtein residueHapMap (normal controls)ESCC patient samplesFisher's test
rs1058885CSF1CCCTTTNonsynonymousLeu [L]≫Pro [P]004572139.02 E−18
rs963982KIAA1804AAAGGGSynonymousSer [S]0243213182.13 E−05
rs3087209IL2GGGTTTNonsynonymousLeu [L]≫Arg [R]004572061.65 E−15
rs1805321PMS2CCCTTTNonsynonymousPro [P]≫Ser [S]1222933003.15 E−11
rs1061501IRF7AAAGGGSynonymousArg [R]405020850.002425115
rs1933437FLT3CCCTTTNonsynonymousThr [T]≫Met [M]71325020130.003952709
rs2289657NTRK2GGGTTTSynonymousIIe [I]4041101753.36 E−07
rs3829955MAP3K9CCCTTTSynonymousAsn [N]4101151444.28 E−08
rs1058808ERBB2CCCGGGNonsynonymousPro [P]≫Ala [A]3114081772.86 E−05
rs8080306PRKAR1AAAACCCUntranslatedN/A4410102215.32 E−11
SNP nameGeneChrSNP seq (NCBI)

Somatic chromosomal alterations detected in human esophageal cancer

We examined the genomic profiles of the cancer-related genes in the human esophageal cancer and the matched adjacent normal tissue samples in detail. We compared each of the 1,432 SNPs across all samples and calculated the allele intensities in the 2 channels (Cy3 and Cy5 channels) to derive both DNA copy number and allele ratio information using Illumina BeadStudio software.17 The log R ratio (LRR) measurement was used as an indicator of DNA copy number change to detect homozygous/hemizygous gene deletions and gene duplication/amplification in the tumor samples. For instance, a hemizygous deletion (loss of 1 copy) would be manifested as a decrease in the LRR from ∼0 to −0.55 although this depression is typically attenuated to approximately −0.4. One copy duplications are manifested as an increase in the LRR from ∼0 to +0.40 and this signal is attenuated as well. Because of the low density of this array (average ∼2 SNP probes per gene), we were unable to average over a large number of probes, which can improve precision of the measurement.17

We detected a wide spectrum of chromosomal aberrations with several notable patterns across the esophageal cancer samples. For example, we found frequent homozygous/hemizygous deletions of the genes on chromosome 3p, including the known tumor suppressor genes FANCD2 and CTNNB1, and on chromosome 5, including the known tumor suppressor gene APC and ISL1, FER, ERCC4 and TGFBI (Table I). We also detected frequent duplication/amplification of the genes on chromosome 3q, including the known oncogenes (TNK2, TNFSF10 and FGF12) and other genes (BCL6, DVL3, ABCC5, AGTR1, MDS1 and MLF1) (Table II). There was a clear reciprocal correlation between gene deletions on chromosome 3p and gene duplications/amplifications on chromosome 3q in most of the affected esophageal cancer samples (Fig. 1), suggesting a molecular mechanism for the generation of somatic chromosomal aberrations. Deletion of chromosome 3p or duplication of chromosome 3q, or coupled 3p deletion and 3q duplication have also been observed in lung cancer,19–21 suggesting that genetic alterations of genes on chromosome 3 might be a common molecular mechanism for human epithelial cell carcinogenesis.

Figure 1.

A genome-wide (a) or chromosome-wide (b) LRR profile of esophageal samples. X-axis: assayed genomic loci (SNPs) on all chromosomes (a; 1,432 SNPs/766 genes) or chromosome 3 (b; 3p: 47 SNPs/26 genes; 3q: 46 SNPs/24 genes). Y-axis: all samples, including replicates. The LRR is the log2 of the SNP intensity from the tumor sample divided by the intensity from the matched normal sample. An increase (red) in LRR indicates an increase in copy number whereas a decrease (green) in copy number is represented as a decrease in LRR. Black indicates no changes. Note the dramatic loss in intensity on chromosome 3p and gain in intensities on chromosome 3q.

High-levels of gene amplification of CCND1 (cyclin D1 gene), ranging from 3–10 fold or higher, were detected in ∼30% (9/33) of the human esophageal cancer samples (Table II). The amplification of CCND1 in these tumor samples was further validated by quantitative real-time PCR (qPCR) using a set of CCND1 specific primers (data not shown). Previously, we and others showed that CCND1 was frequently amplified in ∼30–40% of human esophageal cancer cell lines and primary tumors using southern blot and/or qPCR analysis.6–8 Our results are consistent with the previous findings, further demonstrating that amplification of CCND1 is a common genetic alteration in human esophageal cancer. The results also indicate that genomic profiling of partially degraded DNAs from FFPE tissues is highly reliable and accurate using the BeadArray technology.

Aside from CCND1, high-levels of gene amplification of DVL3, TNK2, PTPN6, MLF1, BCL6, ABCC5, AGTR1 and E2F1 were also frequently detected in the esophageal cancer samples (Table II). The BCL6 has been shown to be frequently over-expressed in several types of human cancers.22 Expression of DVL3 was reported to be upregulated in human head and neck squamous cell carcinomas (HNSCC).23 TNK2 (ACK1) has also been found to be amplified in primary tumors and correlated with poor prognosis.24 PTPN6 is a member of the protein tyrosine phosphatase (PTP) family, which is known to be signaling molecules that regulate cell growth and differentiation.25 Over-expression of ABCC5 has been frequently detected in several types of human cancers and found to be associated with chemotherapeutic resistance.26

Frequent deletions of tumor suppressor genes CDKN2B (p15) and CDKN2A (p16) were observed in a large proportion of the esophageal cancer samples (Fig. 2). These genes locate on chromosome 9p21, a locus that is frequently deleted in many human cancers including esophageal carcinomas.27, 28 Previously, we showed that there was a mutual exclusion between CCND1 amplification and RB gene inactivation in human esophageal tumors indicating that cell cycle control could be abrogated either by loss of Rb or by increased expression of cyclin D1 during esophageal tumor development.7

Figure 2.

Average LRRs across all cancer samples for the 3 CCND1 SNPs, 3 CDKN2A SNPs and 4 CDKN2B SNPs, respectively. Samples, shown on the X-axis, are clustered based upon their LRR profile using a Manhattan hierarchical clustering metric, available within BeadStudio. Note the common pattern within samples on the right, which show an increase in the copy number of CCND1 and a concomitant loss of copy number on CDKN2A and CDKN2B. Bright green indicates a homozygous deletion.

To determine if there was also any correlation between amplification of CCND1 and deletions of CDKN2B and CDKN2A, we analyzed the genomic profiles of CCND1 (3 SNPs), CNDN2A (3 SNPs) and CDKN2B (4 SNPs) across the esophageal cancer samples. As shown in Figure 2, while 5 samples with high-levels of CCND1 amplification displayed low levels of gene deletion of CDKN2B and/or CDKN2A, 4 samples with high-levels of CCND1 amplification showed homozygous deletions of CDKN2B and/or CDKN2A. These results, in sharp contrast to the reciprocal correlation between CCND1 amplification and RB gene inactivation, indicated that CCND1 amplification and CDKN2B/A deletion in human esophageal tumors were not mutually exclusive. CCND1 amplification and CDNK2B/A deletion, which could cause high Cdk4/6 kinase activities, might provide greater growth advantage to tumor cells as previously suggested.29 Coincident amplification of CNND1 and deletion of CNDK2B/A were also observed in HNSCC.30, 31

Frequent deletion of ISL1, GPX1, APC, FVT1, DCC, FANCD2, RAP1A, COL4A3, CCNA2 and XRCC4 were also detected in a significant proportion of esophageal cancer samples (Table I). Among them, APC had been implicated in esophageal cancer development.32 CCNA2 (cyclin A2) promotes both cell cycle G1/S and G2/M transitions; it may have a role in progression of Barrett's esophagus to esophageal adenocarcinoma.33 The detection of frequent deletion of FANCD2 gene in the esophageal cancers was also interesting since it is mutated in the inherited genetic disease, Fanconi anemia (FA). FA is a rare chromosome instability syndrome characterized by aplastic anemia, cancer susceptibility and cellular hypersensitivity to interstrand DNA crosslinking agents.34 Functional defects in FA pathway were detected in many human cancers and deletion of FANCD2 in knockout mice caused epithelial cancer.35 Thus, genetic alterations of FANCD2 and FA pathway could play a critical role in human esophageal carcinogenesis.

Identification of germline esophageal cancer susceptibility genes using allele frequency analysis

As discussed above, we generated highly accurate genotyping results with the FFPE ESCC samples. We next sought to determine if any of the cancer-related genes we assayed could be associated with the risk of developing human esophageal cancer. To this end, we exported the genotyping data from the International HapMap Project18 and identified 1,032 overlapping SNPs between the HapMap data set and our study. We then compared the allele frequency distribution between the esophageal cancer cohort (i.e., the cases; n = 33) of our study and ethnically matched normal Han Chinese population from the HapMap project (i.e., the controls; n = 45). We identified 10 SNPs derived from 10 genes (CSF1, KIAA1804, IL2, PMS2, IRF7, FLT3, NTRK2, MAP3K9, ERBB2 and PRKAR1A) that showed significantly different allele frequency distributions in the 2 populations (Table III), based on a Fisher's exact test. These results suggested that these genes might play roles in esophageal cancer susceptibility and/or contribute to esophageal cancer development. Consistent with the notion, these genes have been reported to play critical roles in regulating cell proliferation, such as regulating cell signaling (CSF1, KIAA1084, IL2, FLT3, NTRK2, MAP3K9, ERBB2 and PRKAR1A) and gene expression (IRF7) in human cells.36–41

Of the 10 SNPs, 1 located in the untranslated region of the gene (PRKAR1A) and 9 located within the coding regions of the genes, which led to either synonymous (KIAA1804, IRF7, NTRK2 and MAP3K9) or nonsynonymous (CSF1, IL2, PMS2, FLT3 and ERBB2) changes. Notably, more heterozygotes were detected in the esophageal cancer cohort (than in the Han Chinese normal control population) for 9 of the 10 SNPs/genes (CSF1, KIAA1804, IL2, IRF7, FLT3, NTRK2, MAP3K9, ERBB2 and PRKAR1A) (Table III), consistent with the Knudson's 2-hit theory.42

It was of interest that our analysis also detected an exception, the PMS2 gene, in which more heterozygotes were observed in the normal population whereas exclusive wild-type homozygotes were detected in the esophageal cancer cohort (Table III). PMS2, reported previously to be involved in human esophageal carcinogenesis,43 is a component of protein complex required for nucleotide mismatch repair function and a potential tumor suppressor.44, 45 Our results implicated that additional factors might influence the function of PMS2 and thus increase esophageal cancer susceptibility and/or development.

It is worth to point out only 33 cases and 45 controls were analyzed in our study; more samples are needed to further validate our findings. In addition, we would have to validate the genotypes extracted from the HapMap data set, even though genotyping accuracy of the entire HapMap data was estimated at 99.99%.18 Except for PMS243 and ERBB2,46 the genes we identified have yet been implicated in human esophageal carcinogenesis. Thus, more functional studies of all the alleles in large number of samples are required to test their roles as the risk factors/biomarkers for esophageal cancer in the future.

In summary, we have employed the BeadArray technology to perform high-throughput genetic analyses in 33 FFPE archived human esophageal carcinomas and their marched adjacent normal tissues. Our results indicate that highly accurate and reliable genotyping and genomic profiling results can be obtained with partially degraded gDNA derived from FFPE archived tissues blocks. The detailed analysis of the genomic profiling in the archived esophageal samples enabled us to identify not only previously reported genetic alterations (e.g., CCND1 amplification and CDKN2A/B deletion) but also novel genetic changes (e.g., BCL-6, DVL3, TNK2, PTPN6, ABCC5 and AGTR1 amplifications and ISL1, GPX1, DCC, FANCD2 and CCNA2 deletions) in the human esophageal cancer. Importantly, we found that the most frequently deleted genomic regions are chromosome 3p and chromosome 5, and the most frequently duplicated or amplified genomic region is chromosome 3q. The strong reciprocal correlation between gene deletion on chromosome 3p and gene duplication/amplification on chromosome 3q in most of the affected human esophageal cancer samples suggests that genetic alterations of genes on chromosome 3 might have crucial roles in esophageal carcinogenesis. We also detected significant differences of allele frequency distributions between the esophageal cohort of our study and normal control samples from HapMap project for 10 genes (CSF1, KIAA1804, IL2, PMS2, IRF7, FLT3, NTRK2, MAP3K9, ERBB2 and PRKAR1A), suggesting they might play some roles in esophageal cancer susceptibility and/or development. Further validation study of these alleles (genes) in large number of samples will allow a better understanding of their roles in the etiology and susceptibility of esophageal cancer. Our study, together with the recent reports of use of the BeadArray technology for FFPE sample analysis,14, 15 has paved a way for large-scale genome-wide genomic profiling in archived patient samples including cancer samples to identify genetic alterations and risk factors associated with clinical outcomes retrospectively and prospectively.


This work was supported by grants from National Key Basic Research Program of China (973-2002BC513101) to S.-H.L., Q.Z. and W.J., and from NIH (GM67859) to W.J.