Program for New Century Excellent Talents in University (No. 81201583), and National Key Scientific and Technological Project (No. 2011ZX09307-001-04).
The epithelial cell adhesion molecule (EpCAM) is overexpressed in a wide variety of human cancers and is associated with patient prognosis, including those with lung cancer. However, the association of single nucleotide polymorphisms (SNPs) in the EpCAM gene with the prognosis for non-small-cell lung cancer (NSCLC) patients has never been investigated. We evaluated the association between two SNPs, rs1126497 and rs1421, in the EpCAM gene and clinical outcomes in a Chinese cohort of 506 NSCLC patients. The SNPs were genotyped using the Sequenom iPLEX genotyping system. Multivariate Cox proportional hazards model and Kaplan–Meier curves were used to assess the association of EpCAM gene genotypes with the prognosis of NSCLC. We found that the non-synonymous SNP rs1126497 was significantly associated with survival. Compared with the CC genotype, the CT+TT genotype was a risk factor for both death (hazard ratio, 1.40; 95% confidence interval [CI], 1.02–1.94; P = 0.040) and recurrence (hazard ratio, 1.34; 95% CI, 1.02–1.77; P = 0.039). However, the SNP rs1421 did not show any significant effect on patient prognosis. Instead, the AG+GG genotype in rs1421 was significantly associated with early T stages (T1/T2) when compared with the AA genotype (odds ratio for late stage = 0.65; 95% CI, 0.44–0.96, P = 0.029). Further stratified analysis showed notable modulating effects of clinical characteristics on the associations between variant genotypes of rs1126497 and NSCLC outcomes. In conclusion, our study indicated that the non-synonymous SNP rs1126497 may be a potential prognostic marker for NSCLC patients.
Lung cancer is the most common malignancy in the world. It is a main cause of cancer-associated death in both men and women. Non-small-cell lung cancer (NSCLC) accounts for approximately 80% of primary lung cancers. Tobacco use is known as the most significant risk factor leading to lung cancer. Other known risk factors for lung cancer include environmental and occupational exposures. However, the major causes underlying lung cancer development are genetic and epigenetic damage resulting from environmental carcinogens. In spite of advances achieved in diagnosis and treatment in recent years, the clinical outcome for patients with lung cancer remains disappointing. The 5-year survival rate of NSCLC patients who receive surgical resection is only approximately 15%. Although the key prognostic determinant in lung cancer is clinical stage at diagnosis, it is common to observe that patients at a similar stage have considerably different clinical outcomes. In fact, it is beneficial to improve survival rates after surgical operations by finding suitable biomarkers for prognosis prediction in lung cancer.
The EpCAM gene is located on chromosome 2q21, which encodes a carcinoma-associated type I membrane protein of 314 amino acids. It is expressed in a variety of human epithelial tissues and cancer tissues, as well as progenitor and stem cells. Previous studies have reported that epithelial cell adhesion molecule (EpCAM) plays diverse roles in different cancers.[8-11] Several studies have shown that EpCAM is overexpressed in many different types of cancers, such as NSCLC, hepatocellular carcinoma, and prostate cancer. Overexpressed EpCAM has been confirmed to increase the expression of c-Myc and cyclins A and E, and then induce a direct impact on cell cycle control. Cells with high levels of EpCAM have a reduced requirement for growth factors; they grow in an anchorage-independent manner and proliferate more rapidly. However, previous studies have also reported that EpCAM has protective effects in some cancers such as renal cell, thyroid, oral squamous cell, and gastric carcinomas.[16-18] Therefore, whether EpCAM functions as an oncogene or as a tumor suppressor gene mainly relies on the tumor microenvironment. With regard to lung cancer, cumulative experimental evidence has shown that knockdown of EpCAM inhibits tumor growth and results in massive apoptosis of cancer cells, which indicates a promising therapeutic target for lung cancer patients.
Single nucleotide polymorphisms located in encoded regions that alter the amino acid sequence of a protein are termed non-synonymous SNPs (ns-SNPs). Recently, ns-SNPs have been investigated as biomarkers for risk assessment and prognosis prediction, as they are closely associated with a variety of diseases and their development.[20, 21] Previously published reports showed that an ns-SNP in the EpCAM gene (rs1126497) is associated with risk of breast cancer and cervical cancer.[22, 23] In addition, EpCAM plays an important role in hereditary diseases of non-polyposis colorectal cancer (Lynch syndrome) and congenital tufting enteropathy. Another study reported that a nonsense mutation in the EpCAM gene is significantly associated with congenital tufting enteropathy patients. All of these results show considerable functions of the EpCAM gene in human hereditary and tumor diseases.
In this study, we selected two SNPs in the EpCAM gene and assessed their effects on the clinical outcomes of NSCLC patients. To the best of our knowledge, this is the first study on the prognostic role of SNPs in the EpCAM gene in NSCLC.
Materials and Methods
A total of 577 patients with NSCLC who had received surgical resection treatment were initially recruited into an ongoing molecular epidemiological study at the Department of Thoracic Surgery at Tangdu Hospital (Fourth Military Medical University, Xi'an, China) between July 2009 and February 2012. None of the patients had been treated with surgery, chemotherapy, and/or radiotherapy before enrolment into the study. There was no restriction for age, gender, or disease stage at enrolment, but the study was restricted to patients with the NSCLC type of lung cancer. In this prognosis study, we excluded 71 patients, including 32 who had incomplete clinical information or failed follow-up, 13 who died within 2 months after surgery, 17 patients with metastatic diseases, and nine with recurrence within 1 month after surgery. Finally, 506 NSCLC patients were included in the present study and successfully genotyped. All patients were Han Chinese. This study was approved by the Institutional Review Boards of the Fourth Military Medical University, and signed informed consent was obtained from each participant.
Demographic and clinical data
Demographic data was collected through in-person interviews using a standard epidemiological questionnaire, including age, gender, ethnicity, residential region, smoking status, alcohol use, education status, and family history of cancer. Detailed clinical information was collected through medical chart review, including medical imaging data, pathological diagnosis, TNM stage, and treatment information. A standard follow-up was carried out by a trained clinical specialist through medical record review or telephone interview. The latest follow-up data in this analysis were obtained in November 2012, and 41 (8.1%) patients were lost during follow-up and censored for the analyses. All the patients donated 5 mL blood before treatment for genomic DNA extraction using the EZNA Blood DNA Midi Kit (Omega Bio-Tek, Norcross, GA, USA) in the laboratory.
Single nucleotide polymorphism selection and genotyping
Functional SNPs in the EpCAM gene were selected using a set of Web-based SNP selection tools. We found two functional SNPs, rs1421 and rs1126497. The former, s1421, is located in the 3′-UTR, which was predicted to be a potential microRNA binding site by the FuncPred tool (available from http://snpinfo.niehs.nih.gov/snpinfo/snpfunc.htm), and may influence the mRNA degradation or translation. The SNP rs1126497 is an ns-SNP located in exon 3, which changes the 115 Met to Thr and may affect the protein function. Genotyping was carried out using the iPLEX genotyping system (Sequenom, San Diego, CA, USA). Laboratory personnel who undertook the genotyping assays were blinded to patients' information. The average call rate for the genotyping was 98.5%. Strict quality control measures were implemented during genotyping with more than 99% concordance in samples that were randomly selected to be genotyped in duplicate.
The spss version 19.0 statistical software (SPSS Inc., Chicago, IL, USA) was used for all statistical analyses. Three genetic models (additive, dominant, and recessive) were applied to assess the association between single SNP and clinical outcomes of NSCLC patients. Two major endpoints, overall survival (OS) and recurrence-free survival (RFS), were analyzed in this study. Overall survival was defined as the time from diagnosis to death from any cause; RFS was defined as the time from diagnosis to the first date when patients developed recurrence. Considering that there was a very small number (≤5) of patients with the rare homozygous variant genotype for these two SNPs of the EpCAM gene, we did not present the data from the recessive model in our analyses to avoid unstable estimations. Hazard ratios (HRs) and 95% confidence intervals (CIs) were estimated from a multivariate Cox proportional hazards model, adjusting for age, gender, smoking status, histology, TNM stage, differentiation, and adjuvant chemotherapy or radiotherapy. Kaplan–Meier curves and the log–rank test were used to assess the differences among patient groups in OS and RFS analyses. All P-values in this study were two-sided. P < 0.05 was considered statistically significant.
Demographic and clinical characteristics and prognosis analysis
A total of 506 NSCLC patients were included in this study from Tangdu Hospital. They were an ethnically homogenous group of Han Chinese, the median age was 59 years (range, 27–86 years), and 392 (77.5%) were male patients. Only 160 (31.6%) patients were never smokers, and all others had a smoking history. Of the total 506 patients, 271 were diagnosed with squamous cell carcinoma, 152 with adenocarcinoma, 62 with adenosquamous carcinoma, 16 with carcinosarcoma, three with large-cell lung cancer, and two with mucoepidermoid carcinoma. There were 284 (56.1%) patients diagnosed with stage I and stage II disease, and 222 (43.9%) patients had stage III disease. There were 347 (68.6%) patients with well or moderately differentiated tumors, and 159 (31.4%) with poorly differentiated or undifferentiated tumors. According to the records, 375 patients received adjuvant chemotherapy or radiotherapy.
Associations between patients' characteristics and NSCLC outcomes
We carried out a multivariate Cox regression analysis to assess the associations between clinical characteristics and outcomes in NSCLC patients. As shown in Table 1, patients with adenocarcinoma showed better OS (HR = 0.61; 95% CI, 0.39–0.93) than those with squamous cell carcinoma. Patients at advanced tumor stage had significantly increased risk for death and recurrence with HRs of 1.94 (95% CI, 1.41–2.68) and 2.01 (95% CI, 1.53–2.65), respectively. Patients with poorly differentiated or undifferentiated tumors showed significant association with worse RFS (HR for recurrence = 1.61; 95% CI, 1.14–2.28). In addition, adjuvant chemotherapy or radiotherapy significantly improved the OS of patients with NSCLC (HR for death = 0.67; 95% CI, 0.47–0.69).
Table 1. Distribution of characteristics in Chinese patients with non-small-cell lung cancer (n = 506) and prognosis analysis
†Adjusted by age, gender, smoking status, histology, TNM stage, differentiation, and adjuvant chemotherapy or radiotherapy where appropriate. ‡Other carcinomas include adenosquamous carcinoma, large cell carcinoma, carcinosarcoma, and mucoepidermoid carcinoma. CI, confidence interval; HR, hazard ratio; Ref., reference.
Associations between SNPs and clinical outcome of NSCLC
The Multivariate Cox regression model was used to evaluate the effect of EpCAM SNPs on death and recurrence in NSCLC patients. As shown in Table 2, patients carrying CT or TT genotypes in the SNP rs1126497 showed a significantly higher death risk when compared with those carrying the CC genotype (HR for additive model = 1.46; 95% CI, 1.09–1.97, P = 0.012; HR for dominant model = 1.41; 95% CI, 1.02–1.94, P = 0.039). Similar results were observed in the RFS analysis (HR for additive model = 1.37; 95% CI, 1.06–1.76, P = 0.016; HR for dominant model = 1.34; 95% CI, 1.01–1.77, P = 0.039). Additionally, there were no significant results observed for the SNP rs1421 in either the OS or RFS analyses. For rs1126497, Kaplan–Meier curves significantly distinguished the patient groups with different genotypes in the OS analysis (log–rank P for additive and dominant models = 0.007 and 0.021, respectively) and RFS analysis (log–rank P for additive and dominant models = 0.008 and 0.006, respectively) (Fig. 1).
Table 2. Association between single nucleotide polymorphisms (SNPs) in the EpCAM gene and clinical outcomes in Chinese patients with non-small-cell lung cancer
†Adjusted by age, gender, smoking status, histology, TNM stage, differentiation, and adjuvant chemotherapy or radiotherapy. CI, confidence interval; HR, hazard ratio; Ref., reference.
rs1421 miRNA binding site
rs1126497 non- synonymous SNP
Associations between EpCAM polymorphisms and NSCLC progression
We applied binary logistic regression to explore the association between EpCAM SNPs and cancer progression. As shown in Table 3, the AG+GG genotype in rs1421 was significantly associated with early T stages (T1/T2) when compared with the AA genotype (odds ratio [OR] for late stage = 0.65; 95% CI, 0.44–0.96, P = 0.029). However, there was no significant association between the SNP rs1421 and N stage. Additionally, the SNP rs1126497 had no significant association with either the T or N stage of NSCLC.
Table 3. Association between single nucleotide polymorphisms (SNPs) in the EpCAM gene and disease progression in Chinese patients with non-small-cell lung cancer
OR (95% CI)
OR (95% CI)
N stage, regional lymph node involvement; Ref., reference; T stage, size and tissue invasion of the primary tumor.
Associations between EpCAM polymorphisms and prognosis of NSCLC stratified by patient characteristics
To investigate the modulator effects of clinical characteristics on the association between EpCAM polymorphisms of rs1126497 and prognosis, we carried out multivariate Cox proportional hazard analyses in a dominant model stratified by age, gender, smoking status, histology, TNM stage, differentiation, and adjuvant chemotherapy or radiotherapy. As shown in Table 4, the CT+TT genotype was significantly associated with both worse OS and RFS in NSCLC patients with older age, male gender, smoking history, squamous cell type tumor, well or moderately differentiated tumor, and adjuvant chemotherapy or radiotherapy. We observed that patients with the CT+TT genotype had significantly increased risk for death (HR = 1.60; 95% CI, 1.02–2.53) and recurrence (HR = 1.67; 95% CI, 1.11–2.51) in patients with older age. For male patients, the HRs for death and recurrence were 1.46 (95% CI, 1.02–2.10) and 1.57 (95% CI, 1.15–2.14), respectively. For ever smoker patients, the HRs for death and recurrence were 1.52 (95% CI, 1.04–2.23) and 1.73 (95% CI, 1.24–2.42), respectively. For patients with squamous cell carcinoma, the HRs for death and recurrence were 1.94 (95% CI, 1.24–3.03) and 1.54 (95% CI, 1.04–2.28), respectively. For patients with well or moderately differentiated tumors, the HRs for death and recurrence were 1.84 (95% CI, 1.20–2.81) and 1.58 (95% CI, 1.10–2.26), respectively. For patients who received adjuvant chemotherapy or radiotherapy, the HRs for death and recurrence were 1.52 (95% CI, 1.04–2.21) and 1.57 (95% CI, 1.15–2.15), respectively. Tumor stage, differentiation, and adjuvant treatments are common and important clinical variables in cancer survival analysis. Therefore, we assessed the prognostic values of rs1126497 on OS analysis stratified by these factors. As the results show in Figure 2, compared to patients carrying wild-type (CC) rs1126497, patients with variant genotypes showed significantly shorter survival in the strata of patients with early-stage disease (P = 0.038), better differentiated tumors (P = 0.002), and those who received adjuvant chemotherapy or radiotherapy (P = 0.014).
Table 4. Stratified analysis of association between single nucleotide polymorphism rs1126497 in the EpCAM gene with prognosis in Chinese patients with non-small-cell lung cancer (n = 503)
In this study, we investigated the effects of two SNPs in the EpCAM gene on the survival of NSCLC patients. We found that a variant genotype (CT+TT) of rs1126497 in the EpCAM gene was significantly associated with poor prognosis (both OS and RFS) of NSCLC, whereas there was no significant association observed between the SNP rs1421 and the clinical outcomes of NSCLC patients. Further stratified analysis indicated that the effect of rs1126497 had more prominence in older patients, smokers, patients with squamous cell carcinoma, patients with well or moderately differentiated tumor, and patients who received adjuvant chemotherapy or radiotherapy treatment. Moreover, we examined the associations between these two SNPs and NSCLC progression by the logistic regression model and found that a variant genotype of rs1421 had significant association with advanced T-stage disease in NSCLC patients. To the best of our knowledge, this is the first study to investigate the association between EpCAM gene polymorphisms and NSCLC prognosis.
Previous studies have indicated that a functional polymorphism (rs1126497) in the EpCAM gene was associated with the risk of breast and cervical cancer.[22, 23] One of these studies showed that subjects carrying the CT+TT genotype had a 1.40-fold increased risk for breast cancer than patients with the CC genotype. Another study found that there was a dose–response of cancer risk for patients with the CT genotype (adjusted OR = 1.72) and TT genotype (adjusted OR = 1.96) for cervical cancer. Nonetheless, there is no report about the association between functional polymorphisms in the EpCAM gene and cancer prognosis. Our study found that variant genotypes of rs1126497 had a significant association with worse prognosis of NSCLC patients in both OS and RFS analyses, which is in concordance with those previous reports.
As a cell adhesion molecule, EpCAM is involved in cell adhesion and proliferation, which plays a regulative role in human cancer. It has been indicated that EpCAM is overexpressed in various types of cancers, including cancers of the colon and rectum, breast, prostate, bladder, and also frequently in lung cancer.[12, 29] Downregulation of EpCAM by RNAi decreases the proliferation, migration, and invasive capacity of breast cancer cells. In addition, EpCAM knockdown leads to substantial apoptosis in lung cancer cell lines rather than in a bronchial epithelial cell line. Moreover, recent studies have revealed that EpCAM plays a versatile role in carcinogenesis, in that it not only affects cell–cell adhesion but is also involved in many processes including cell migration, differentiation, metabolism. All these studies suggest that EpCAM play an essential role in the development of cancer, including lung cancer.
The EpCAM molecule consists of three domains: an extracellular domain with epidermal growth factor-like repeat and thyroglobulin repeat-like (TY) domain, a single transmembrane domain, and an intracellular domain (EpICD). The TY domain is capable of selectively discriminating cathepsin L and S, two similar cysteine proteases. It has the potential role of inhibiting the activity of cysteine proteases, and this activity depends on the specific interaction with target protein. The SNP rs1126497 is an ns polymorphism (M115T) located in the TY domain of EpCAM. Thus, the rs1126497 polymorphism could alter the primary structure of the TY domain, which may decrease the interaction between the TY domain and cathepsin L, weaken the inhibiting ability of cathepsin L, and thus eventually affect NSCLC patients' prognosis. More recently, many studies have focused on the EpCAM intracellular signaling pathway, in which EpCAM is cleaved by tumor necrosis factor-α converting enzyme (TACE/ADAM17) and γ-secretase and the ectodomain is released into the medium and EpICD into the cytoplasm.[7, 34] The released EpICD as a signal transducer is transported into the nucleus and forms a nuclear complex with FHL2, β-catenin, and Lef-1 to stimulate expression of target gene c-myc. This result supports the proliferative signaling activity of overexpressed EpCAM.[34, 35]
In the stratified analysis, we observed that the significant associations between SNP rs1126497 and OS were only evident in NSCLC patients with better clinical parameters, such as early stage disease and well or moderately differentiated tumors, and in those receiving adjuvant chemotherapy or radiotherapy (Fig. 2). To date, the detailed mechanism underlying the effect of SNP rs1126497 on cancer progression is largely unknown. However, this SNP has been reported to be significantly associated with risk of breast and cervical cancers in Chinese populations.[22, 23] It has also been reported that EpCAM is an early biomarker commonly overexpressed in many premalignant tissues, especially in epithelial neoplasia. These data suggest that this ns-SNP may play an important role in cancers at early stage. Moreover, it is generally accepted that most SNPs only have subtle effects on complex traits, such as cancers. This may partially explain why rs1126497 did not show a significant effect on prognosis in patients with worse clinical parameters, as the worse the cancer, the more complex and varied the disease. Additionally, a previous in vitro study suggested that EpCAM is a promising target to sensitize tumor cells to chemotherapy, which is in concordance with our finding that the effect conferred by this ns-SNP on prognosis was only evident in patients receiving adjuvant treatment. Taken together, it is reasonable to find that the effect of SNP rs1126497 on prognosis is more prominent in those patients with better clinical parameters and receiving adjuvant treatments.
Additionally, although our data did not show any association between rs1421 and NSCLC prognosis, we found that the rs1421 AG+GG genotypes were significantly associated with early T stage disease. This could be due to the location of rs1421 in miRNA binding sites of the EpCAM 3′-UTR region that may affect expression of EpCAM. Further study is needed to understand the effects of the rs1421polymorphism on tumor progression.
The biggest advantage of our study is that all subjects' characteristics were homogenous, and they were treated using similar procedures, which minimized the confounding effects. Nevertheless, there are limitations to our study. First, the follow-up time was relatively short. However, this limitation can be partially compensated by the high recurrence and death rate in our study population. Second, the relatively small sample size limited the analysis on the recessive model and caused an unstable estimation for the stratified analyses. A larger number of patients will be needed to assess the association between the SNP rs1126497 at EpCAM and NSCLC survival in future studies. Finally, functional assays are warranted to establish the underlying mechanism.
In summary, our study indicates for the first time that the SNP rs1126497 in the EpCAM gene is significantly associated with the prognosis of NSCLC. Our findings contribute to current understanding of the effects of EpCAM polymorphisms on NSCLC.
This work was supported by grant 81201583 from Program for New Century Excellent Talents in University of China and grant 2011ZX09307-001-04 from the National Key Scientific and Technological Project of China.