Lung cancer remains the leading cause of cancer-related deaths in the world and approximately 1.2 million deaths were reported annually.1 In China, lung cancer incidence rate has been increasing significantly in both urban and rural areas in the last 2 decades,2 which may be mainly due to continuous increase of cigarette consumption.3, 4 Accumulative evidence suggests that approximately 90% of lung cancer cases are caused by tobacco smoking,5 yet only a small fraction of the smokers (usually < 20%) developed lung cancer, suggesting an individual susceptibility to lung cancer.
Tobacco smoke contains numerous carcinogens that may produce DNA bulky adducts, crosslinks, oxidative damage and DNA strand breaks. The major repair pathway for the DNA damage caused by tobacco smoke is the nucleotide excision repair (NER) pathway, which eliminates a variety of DNA lesions, such as bulky monoadducts and crosslinks.6, 7 DNA repair capacity (DRC) of NER is central and crucial in maintaining normal cellular functions but varies in the general population. For example, individuals with a reduced DRC have a high level of BPDE-DNA adducts when exposed to tobacco carcinogens8 and have an increased risk of tobacco-induced lung cancer.9, 10 DNA repair gene polymorphisms, particularly single nucleotide polymorphisms (SNPs), may be the underlying molecular mechanisms of the interindividual variation of DRC11 and therefore may be associated with increased risk of smoking-related lung cancer.
XPC protein binds to HR23B, forming the XPC-HR23B complex involved in NER of global genome. Therefore, XPC is thought to be involved in the earliest damage detection and initiation of NER.12 Germline mutations in XPC gene result in defective NER, such as in patients of XP-C, 1 of the 7 XP phenotypes.13 Two potentially functional polymorphisms were recently identified in XPC, one A-to-C transition at codon 939 (exon 15), resulting in a Lys-to-Gln alteration, and the other biallelic poly(AT) insertion/deletion polymorphism (XPC-PAT) of intron 9, and these 2 sites are reported to be in linkage disequilibrium.14 The XPC 939C variant had an influence on irradiation-specific DNA repair rates in peripheral lymphocytes15 and was found to be associated with increased risk of bladder cancer16 and breast cancer,17 suggesting that it may be a functional variant. We have previously reported that the poly(AT) insertion/deletion polymorphism in XPC intron 9 was associated with DRC variation measured by a host cell reactivation assay in a normal population18 and that the XPC-PAT+ allele contributed to the risk of developing squamous cell carcinoma of the head and neck (SCCHN) in Caucasians.19 However, we did not find a significant association between XPC-PAT+ and risk of lung cancer in a Chinese population.20 More recently, a novel C-to-T transition at XPC codon 499 (exon 9) resulting in an Ala-to-Val alteration was identified in 102 cancer-free individuals of mixed ethnicities (http://snp500cancer.nci.nih.gov/home.cfm; dbSNP ID: rs2228000), but there are no reports on its functional relevance or association with cancer to date.
Here we described a case-control study of 320 incident lung cancer cases and 322 age and sex frequency-matched cancer-free controls in a Chinese population. In this study, we tested the hypothesis that 2 exonic genotypes/haplotypes of XPC contribute to host susceptibility to lung cancer.
Material and methods
This study included 322 lung cancer patients and 326 healthy controls. All subjects were genetically unrelated ethnic Han Chinese. Patients with lung cancer newly diagnosed, histopathologically confirmed and untreated by radiotherapy or chemotherapy were randomly selected from the patients recruited from January 1997 to June 2001 at the Cancer Hospital, Chinese Academy of Medical Sciences (Beijing, China), without the restrictions of age, stage, or histology.21 The exclusion criteria included previous cancer, metastasized cancer and previous radiotherapy or chemotherapy. Population controls were randomly selected from a pool of cancer-free subjects recruited from a nutritional survey conducted in the same region as the cases.21 These control subjects had no history of cancer and were frequency-matched to the cases on age (± 5 years), sex and ethnicity. Each subject was scheduled for an interview after informed consent was obtained, and a structured questionnaire was administered by interviewers to collect information on demographic data and the lifetime history of tobacco use. Each subject also donated a 5 ml blood sample for genetic testing. This study was approved by the institutional review board of the Chinese Academy of Medical Sciences Cancer Institute.
Genotyping of C499T and A939C polymorphisms
Genomic DNA was extracted from the leukocyte pellet obtained from each blood sample by centrifugation of 5 ml whole blood. The XPC exon 9 C499T polymorphism was detected using a primer-introduced restriction analysis (PIRA)-PCR method.22 The sense primer introduced a mismatched C to replace A at −2 bp from the polymorphic site (Genbank number AF261898) to create a SacII restriction site. The primers used to amplify this fragment were XPC499F, 5′-TAAGGACCCAAGCTTGCCCG-3′; and XPC499R, 5′-CCCACTTTTCCTCCTGCTCACAG-3′. The 20 μl PCR mixture contained approximately 50 ng of genomic DNA, 12.5 pmol of each primer, 0.1 mM each dNTP, 1 × PCR buffer (50 mM KCl, 10 mM Tris HCl, 0.1% Triton X-100), 1.8 mM MgCl2 and 1.0 unit of Taq polymerase. The PCR profile consisted of an initial melting step of 95°C for 5 min; 35 cycles of 95°C for 30 sec, 63°C for 40 sec and 72°C for 40 sec; and a final extension step of 72°C for 10 min. Five microliters of the PCR products were digested at 37°C overnight with 3 U of SacII (New England BioLabs, Beverly, MA). The digestion products were then separated on a 3% agarose gel at 80 V for 100 min and stained with ethidium bromide. The wild-type (499C) allele produces 2 fragments of 131 and 21 bp and the polymorphic (499T) allele produces a single 152 bp fragment (Fig. 1a).
We used a modified PCR-restriction fragment length polymorphism (RFLP) assay to type the A939C polymorphism of XPC exon 15 (Genbank number AF261901) by primers of XPC939F, 5′-ACCAGCTCTCAAGCAGAAGC-3′, and XPC939R, 5′-CTGCCTCAGTTTGCCTTCTC-3′, using a similar PCR profile with an annealing temperature of 62°C. A 281 bp fragment was then digested with PvuII (New England BioLabs) overnight at 37°C. The variant C allele had a PvuII restriction site that results in 2 bands (150 and 131 bp), and the wild-type A allele lacked the restriction site and therefore produced a single 281 bp band.
Genotyping was performed without knowing the subjects' case/control status. For the XPC C499T polymorphism, 100 samples randomly selected were regenotyped using the PCR-SSCP assay with the primers of 5′-GGAGAAGCCTCTGATCCCTC-3′ and 5′-CCACTTTTCCTCCTGCTCAC-3′, which generate a 279 bp fragment to confirm the genotyping data obtained from PIRA-PCR (Fig. 1). For the XPC A939C polymorphism, 10% of the samples were also repeated using the same assay. These results were all 100% concordant.
DNA quality or quantity was insufficient for XPC genotyping in 6 subjects (2 cases and 4 controls); thus, the final analysis included 320 cases and 322 controls. Differences in select demographic variables, smoking status, pack-years smoked and frequencies of the XPC genotypes, alleles and haplotypes between the cases and controls were evaluated by using the chi-square test. The associations between XPC variants and lung cancer risk were estimated by computing the ORs and their 95% CIs from both univariate and multivariate logistic regression analyses with adjustment for age, sex and pack-years of smoking. Hardy-Weinberg equilibrium was tested by a goodness-of-fit chi-square test to compare the observed genotype frequencies to the expected for the distribution of genotype frequencies among the control subjects. We used a Bayesian statistical method23 to infer the most probable haplotypes as well as haplotype genotypes for each individual based on their known XPC genotypes. We also used the EH algorithm available online (http://linkage.rockefeller/edu/soft) to estimate the haplotype frequencies and compared them with those derived from the Bayesian method. We then estimated the joint effect of these 2 variant genotypes (XPC 499CT/TT and 939AC/CC) and further cross-tabulated the combined genotypes and estimated haplotype genotypes with smoking status to evaluate the potential gene-environment interaction. Those who had smoked less than 100 cigarettes in their lifetime were defined as nonsmokers as traditionally used in epidemiologic studies; otherwise, they were considered as smokers. Pack-years smoked [(cigarettes per day/20) × years smoked)] were calculated to indicate the cumulative smoking dose. All statistical analyses were performed with Statistical Analysis System software (version 8.0e; SAS Institute, Cary, NC).
The selected characteristics and the XPC C499T and A939C allele/haplotype distributions of the 320 lung cancer patients and 322 controls are summarized in Table I. The cases and controls appeared to be adequately matched on age and sex as suggested by the chi-square tests. The mean age was 57.0 ± 9.9 years for the cases and 56.6 ± 6.9 years for the controls. Compared with control subjects, the cases were more likely to be smokers and cigarette smoking was associated with a 3.05-fold (95% CI = 1.15–8.10) increased risk for lung cancer. The mean cigarettes per day were 13.0 ± 13.9 among the cases and 5.3 ± 9.4 among the controls (p < 0.001). About 35.6% of the lung cancer cases smoked more than 28 pack-years, which was significantly higher than that of the controls (10.6%; p < 0.001). Of the total 320 lung cancer cases, 290 (90.6%) were classified as non small cell lung cancer (130 squamous cell carcinoma, 88 adenocarcinoma and 72 large cell, mixed cell carcinomas, or undifferentiated carcinoma) and only 30 (9.38%) as small cell lung cancer.
Table I. Frequency Distributions of Selected Variables, XPC Alleles/Haplotypes in Lung Cancer Patients and Cancer-Free Controls
Allele frequencies between cases and controls were significantly different for C499T but not for A939C. The variant 499T allele and 939C allele were more frequent among cases than among controls, suggesting that both the 499T and 939C may be putative risk alleles (Table I). Linkage disequilibrium (LD) between these 2 loci of XPC and their haplotype frequencies were estimated using the EH algorithm. We found that the C499T locus was in LD with the A939C locus (chi-square = 92.7; p < 0.001) and the parameter of linkage disequilibrium D′ was 0.800. There were 4 haplotypes in total derived from these 2 XPC genotypes (Table I). The haplotype 499C-939A, composed of the 2 wild-type alleles (499C and 939A), was significantly fewer in cases than in controls (30.9% vs. 42.2%; p = 0.004). Furthermore, haplotypes harboring either the variant allele 499T or 939C were more common in cases than in controls (32.5% vs. 25.2%, p = 0.034, for 499T-939A; 34.5% vs. 29.4%, p = 0.156, for 499C-939C). However, only 13 cases (2.0%) and 21 controls (3.2%) were in possession of the 499T-939C haplotype (Table I). Overall, haplotype frequencies between the cases and the controls were significantly different (chi-square = 21.74; p < 0.001). We also used the Bayesian statistical method to infer the most probable haplotypes, and the haplotype frequencies were 30.2% (499C-939A), 33.3% (499T-939A), 35.3% (499C-939C) and 1.3% (499T-939C) for the case patients and 40.7% (499C-939A), 26.7% (499T-939A), 30.9% (499C-939C) and 1.7% (499T-939C) for the control subjects, which were not statistically different from those estimated by the EH algorithm (p = 0.713 for cases and p = 0.276 for controls; data not shown).
The XPC C499T and A939C genotype distributions in the cases and controls are shown in Table II. The observed genotype frequencies for these 2 exonic polymorphisms were both in Hardy-Weinberg equilibrium in the controls (chi-square = 3.68, df = 1, p = 0.06 for C499T; chi-square = 1.77, df = 1, p = 0.18 for A939C). The XPC C499T genotype frequencies were 38.8% (CC), 53.4% (CT) and 7.8% (TT) in the case patients and 49.1% (CC), 45.0% (CT) and 5.9% (TT) in control subjects, and the difference was statistically significant (chi-square = 7.05; p = 0.03). Logistic regression analysis revealed that the 499CT heterozygote had a significantly increased risk (adjusted OR = 1.61; 95% CI = 1.14–2.27) and the 499TT homozygote had a nonsignificantly elevated risk (adjusted OR = 1.32; 95% CI = 0.67–2.62) compared with 499CC homozygous wild-type. When we combined the variant genotypes (499CT/TT) assuming a codominant allele effect, the combined 499CT/TT variant genotype was associated with a 1.57-fold (95% CI = 1.13–2.19) increased risk for lung cancer (Table II). For the XPC A939C polymorphism, the genotype frequencies were 39.4% (AA), 48.1% (AC) and 12.5% (CC) in the cases and 43.8% (AA), 47.2% (AC) and 9.0% (CC) in control subjects, and the difference was not statistically significant (p = 0.27). Compared with 939AA wild-type genotype, no significantly elevated risks were associated with the 939AC, 939CC, or their combined genotypes 939AC/CC (ORs and 95% CIs were 1.20, 0.85–1.70; 1.28, 0.72–2.28; and 1.21, 0.87–1.69, respectively; Table II).
Table II. Frequency Distributions of the XPC Polymorphisms among Cases and Controls and their Associations with Lung Cancer
The individuals with any variant homozygotes or heterozygotes at one site and wild-type homozygote at the other site.
CT + TT
Two-sided chi-square = 7.05; p = 0.03
AC + CC
Two-sided chi-square = 2.60; p = 0.27
C499T and A939C combinations
499CC and 939AA
Either one variant genotype2 (499CT/TT or 939AC/CC)
Both variant genotypes (499CT/TT and 939AC/CC)
We then examined the combined effect of these 2 XPC genotypes on lung cancer risk. As shown in Table II, 30.6% of the cases and 23.0% of the controls had both 499CT/TT and 939AC/CC variant genotypes, which was associated with a significantly increased risk of lung cancer (adjusted OR = 2.37; 95% CI = 1.33–4.21) with adjustment for age, sex and pack-years of smoking compared with those having both wild-type homozygotes (499CC and 939AA). Likewise, individuals with either 499CT/TT or 939AC/CC variant genotype had a nonsignificantly elevated risk (adjusted OR = 1.56; 95% CI = 0.93–2.63) compared with those having both wild-type genotypes (499CC and 939AA). In stratification analyses, the increased risk associated with the XPC combined genotypes was more pronounced in subjects who were older (> 60 years) at diagnosis (adjusted OR = 2.51, 95% CI = 1.02–6.17 for individuals with either 499CT/TT or 939AC/CC variant genotypes; OR = 4.54, 95% CI = 1.73–11.93 for individuals with both 499CT/TT and 939AC/CC variant genotypes). No significant differences were observed in terms of the associations between XPC combined genotypes and lung cancer risk for individuals with different gender category (data not show).
There were in total 8 haplotype genotypes constituted by the 4 haplotypes, and the number of variant alleles within haplotype genotypes between the cases and controls was significantly different (p < 0.001). As shown in Table III, when we used the haplotype genotypes and assumed the 499T and 939C as risk alleles, the adjusted ORs were significantly increased as the number of variants within the haplotype genotypes increased (ptrend < 0.001) and the individuals with 2 or 3 variants had a significantly 2.26-fold (95% CI = 1.30–3.89) increased risk compared to those without any variant (Table III). Because there were too few individuals without any risk variant and this may result in unstable estimated OR, we combined this group with those who had at least one variant. Compared with these individuals with 0–1 variant, individuals with 2 or 3 variants had a significantly 1.70-fold (95% CI = 1.21–2.38) increased risk.
Table III. Risk of Lung Cancer Associated with the Number of XPC Variants within the Haplotype Genotypes
Number of variants within the haplotype genotypes1
The variant (risk) allele used for the calculation were 499T and 939C.
Adjusted for age, sex and pack-years of smoking.
0 means no variant allele within haplotype genotype: 499C939A/499C939A; 1–3 means the number of variant alleles within the haplotype genotypes, e.g., 1 variant represents 499C939A/499C939C or 499C939A/499T939A.
Chi-square test for the distributions of haplotype variants between the cases and controls.
p-value for the trend obtained in the logistical regression model with adjustment for age, sex and pack-years of smoking.
The joint effects of the XPC combined genotypes and estimated haplotype genotypes and smoking status on lung cancer risk are shown in Table IV. The trend of elevated risk associated with increased number of XPC variant genotypes and/or haplotype genotypes was observed among both smokers and nonsmokers. Smokers with both variant genotypes (499CT/TT and 939AC/CC) had the highest risk (adjusted OR = 7.36; 95% CI = 3.19–17.00) compared with nonsmokers having both wild-type genotypes. Likewise, a more than 7-fold increased risk (adjusted OR = 7.27; 95% CI = 3.37–15.68) was also observed in smokers with 2 or 3 variants within haplotype genotypes compared with nonsmokers without any variant (Table IV). However, the interaction term of the XPC combined genotypes/haplotype genotypes and smoking in the multivariate logistic regression model was not statistically significant (data not shown).
Table IV. Adjusted ORs (95% CI) for the Joint Effect of XPC Genotypes/Haplotype Genotypes and Smoking Status
XPC codon499 and codon939
499CC and 939AA
Either one variant genotype (499CT/TT or 939AC/CC)
Both variant genotypes (499CT/TT and 939AC/CC)
OR, adjusted for age and sex.
The variant (risk) allele used for the calculation were the XPC 499T and 939C.
There were no significant differences in terms of the associations between XPC genotypes/haplotype genotypes and lung cancer risk for individuals with different histologic types (stratified by non small cell lung cancer, small cell lung cancer, squamous cell carcinomas and adenocarcinomas) and for those in different clinical stage (data not shown).
In this hospital-based case-control study, we investigated the associations of one newly identified polymorphism (C499T) and the other common variant (A939C) of the DNA repair gene XPC with risk of lung cancer in a Chinese population. When we evaluated each variant separately, increased risk was associated with the XPC 499T and 939C variant genotypes, although the risk associated with the XPC 939C variant genotype was not statistically significant. When we evaluated these 2 exonic variants together (C499T and A939C), the risk was significantly increased as the number of variant genotypes and haplotype genotypes increased in a dose-response manner. In addition, we found a potential joint effect between these XPC genotypes/haplotype genotypes and smoking, suggesting that the XPC variants may modulate the risk of lung cancer associated with smoking. To the best of our knowledge, this is the first study that has investigated the association of these 2 XPC variants and risk of lung cancer.
The XPC gene encodes a 940 amino acid protein uniquely involved in global genome repair.24In vitro data support a model in which preincision complex formation is triggered by the binding of XPC-HR23B to the DNA damage,25 which is necessary for the recruitment of TFIIH.26 The knowledge acquired from the mutated XPC gene suggests that a normal XPC gene is critical for the cells to complete excision repair of bulky DNA lesions,13 including smoking-induced DNA adducts. Polymorphisms in the XPC gene, especially in exon regions, may alter its protein function and an individual's capacity to repair damaged DNA and therefore may lead to genetic instability and carcinogenesis. Recently, in a genotype-phenotype correlation study, Vodicka et al.15 revealed that XPC 939C variant might influence base excision repair activity in an irradiation-specific DNA repair. However, no functional significance of the XPC C499T polymorphism has been reported to date.
Very few molecular epidemiologic studies on the XPC genotypes and cancer susceptibility have been reported.16, 17, 19 We previously reported that a poly(AT) polymorphism in XPC intron 9 was associated with risk of developing head and neck cancer in a Caucasian population of 287 head and neck cancer patients and 311 frequency-matched controls,19 which did not include other functional polymorphisms of XPC that may be in LD with this XPC-PAT polymorphism (e.g., XPC A939C was in LD with PAT as previously reported by Khan et al.14). Very recently, 2 case-control studies in Swedish and Finnish populations were presented and the XPC A939C variant genotypes were found to be associated with an increased risk of bladder cancer and breast carcinoma.16, 17 However, these studies did not present the genotype information on XPC C499T. In this Chinese population, as we previously reported,20 we failed to find evidence for a significant association between the XPC-PAT polymorphism and lung cancer risk. In the present study, we found an increased risk of lung cancer was associated with 2 exonic variants of XPC C499T and A939C and their haplotypes, suggesting that these 2 exonic variants may be markers for genetic susceptibility to lung cancer in this Chinese population. The discrepancies described in the reported studies may be because of the different etiology and mechanisms of the diseases, different ethnic background, and/or small sample size with limited statistical power.
Interestingly, our data suggest that an increased risk of lung cancer associated with the combined genotypes of 2 variant genotypes of XPC C499T and A939C was the greatest in individuals with both variant genotypes. In addition, we also found a dose-response relationship of increased number of XPC variants within haplotype genotypes and increased risk of lung cancer. These findings suggest that the effect could be polyallelic, with several tightly linked polymorphisms jointly influencing lung cancer risk. It is also possible, however, that theses 2 XPC exonic polymorphisms may be in linkage disequilibrium with other putative etiologic variants, indicating the potential interaction or a joint effect of different loci within a gene or among different genes and illustrating the importance of combined analyses of multiple loci or multiple genes simultaneously. As new technologies are developed for easier genotyping for hundreds of genes or loci at a time, it will become possible to construct genetic profiles of risk or risk models composed of combinations of many polymorphisms in low-penetrance genes in the same biologic pathway, each of which contributes only slightly to the overall risk. Our results showing more pronounced risk among smokers with both variant genotypes and haplotypes suggest that the variant XPC genotypes and haplotype genotypes may be a biomarker for susceptibility to smoking-induced lung cancer or that the variant XPC genotypes and haplotypes may modulate the risk associated with smoking. These findings, once validated by larger studies, may help identify at-risk population for primary cancer prevention.
In 2 European case-control studies, the allele frequencies of XPC 939C in controls were 0.329 and 0.297, respectively,16, 17 which were comparable to those in this Chinese population (0.326). Because this was the first molecular epidemiologic study that investigated the association between C499T polymorphism of XPC with cancer, no comparison between published studies could be made at this time. According to the data of 102 anonymized cancer-free subjects in the SNP500Cancer database (http://snp500cancer.nci.nih.gov/home.cfm; dbSNP ID: rs2228000), the genotype frequency in 102 subjects of mixed ethnicities was 57.8% for CC, 37.3% for CT and 4.9% for TT, which was not different from our Chinese population (49.1% for CC, 45.0% for CT and 5.9% for TT; p = 0.303). Furthermore, the genotype frequency from 24 subjects of the Pacific Rim was 50.0% for CC, 45.8% for CT and 4.2% for TT, which was quite consistent with our data.
Several limitations in our study need to be addressed. First, although we achieved significant main effects of the XPC variant genotypes, the inherited small sample size restricted us to generate reliable results in stratification analyses. Second, our study lacked the information on occupational and environmental exposures in addition to smoking. However, by matching the controls to the cases on age, sex and residential area, this kind of bias could be minimized. Finally, because we used the Bayesian statistical method to infer the most probable haplotype genotypes instead of dealing with probabilities, potential errors may exist. However, as presented above, the differences of haplotype frequencies estimated by the EH algorithm and the Bayesian method were not statistically significant in either cases or controls, and therefore it should not be a major concern.
In conclusion, our study provides evidence that 2 exonic polymorphisms of the XPC gene, particularly XPC C499T, may contribute to the etiology of lung cancer, and the variant XPC genotypes and haplotype genotypes may modulate the risk of lung cancer associated with smoking. However, their functional relevance warrants further laboratory work and the related cancer risk of these XPC genotypes/haplotype genotypes remains to be determined in larger well-designed epidemiologic studies. Such studies may benefit from analyses of multiple genes or genotypes/haplotype genotypes in the same biologic pathway and from the consideration of relevant exposures that may influence the likelihood of cancer in the presence of reduced DNA repair capacity.
Supported in part by National Key Basic Research Program grant 2002CB512902, National Natural Science Foundation grant 30371240 and Nanjing Medical University Innovative Foundation grant CX2003005 (to H.S.), as well as National High Technology Project grant 2002BA711A06 and National Natural Science Foundation grant 30128020 (to D.L.).