Polymorphisms in the DNA nucleotide excision repair genes and lung cancer risk in Xuan Wei, China


  • The research described here has been reviewed by the National Environmental and Health Effects Research Laboratory of the U.S. Environmental Protection Agency and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the agency nor does mention of trade names or commercial products constitute endorsement or recommendation for use.


The lung cancer mortality rate in Xuan Wei County is among the highest in China and has been attributed to exposure to indoor smoky coal emissions that contain very high levels of polycyclic aromatic hydrocarbons (PAHs). Nucleotide excision repair (NER) plays a key role in reversing DNA damage from exposure to environmental carcinogens, such as PAHs, that form bulky DNA adducts. We studied single nucleotide polymorphisms (SNPs) and their corresponding haplotypes in 6 genes (ERCC1, ERCC2/XPD, ERCC4/XPF, ERCC5/XPG, RAD23B and XPC) involved in NER in a population-based case-control study of lung cancer in Xuan Wei. A total of 122 incident primary lung cancer cases and 122 individually matched controls were enrolled. Three linked SNPs in ERCC2 were associated with lung cancer with similar ORs; e.g., persons with the Gln allele at codon 751 had a 60% reduction of lung cancer (OR = 0.40, 95% CI 0.18–0.89). Moreover, one haplotype in ERCC2 was associated with a decreased risk of lung cancer (OR = 0.40, 95% CI 0.19–0.85) compared to the most common haplotype. In addition, subjects with one or 2 copies of the Val allele at codon 249 of RAD23B had a 2-fold increased risk of lung cancer (OR = 1.91, 95% CI 1.12–3.24). In summary, our results suggest that genetic variants in genes involved in the NER pathway may play a role in lung cancer susceptibility in Xuan Wei. However, due to the small sample size, additional studies are needed to evaluate these associations within Xuan Wei and in other populations with substantial environmental exposure to PAHs. © 2005 Wiley-Liss, Inc.

Lung cancer is the leading cause of death from cancer worldwide, with an estimated mortality in 2000 of 31.4 per 100,000 for men and 9.5 per 100,000 for women.1 Overall, lung cancer incidence and mortality rates are higher for males than females, due largely to the higher prevalence of tobacco smoking among men. Compared to worldwide patterns and those in China, lung cancer has distinctive characteristics in rural Xuan Wei County, Yunnan Province, China. Lung cancer mortality in Xuan Wei has been reported to be 8 times the Chinese national average for women and 4 times that for men.2 Although very few women smoke, the lung cancer mortality rates in Xuan Wei County were similar between men and women (27.7 and 25.3 per 100,000 for males and females, respectively).2 This pattern has been attributed to burning smoky coal indoors for heating and cooking without adequate ventilation with high exposure time, particularly among women, accounting for >90% of lung cancer cases for both men and women.2, 3 When smoky coal is burned, the indoor air concentration of particulate matter and extractable organic matter may be as high as 24.4 and 17.6 mg/m3,2 respectively, and the corresponding benzo[a]pyrene concentration, an indicator of carcinogenic PAHs, can reach as high as 19.3 μg/m3,4 which is comparable to exposure levels experienced by coke oven workers. As such, the study of lung cancer in Xuan Wei provides a unique opportunity to evaluate genetic susceptibility in a nonsmoking model of PAH carcinogenesis.

The removal and repair of DNA damage plays a key role in protecting the integrity of the genome from the insults of cancer-causing agents. Several different DNA repair pathways exist, including BER, NER, double strand break repair and mismatch repair.5 The NER pathway repairs bulky DNA adducts induced by chemical carcinogens, such as PAHs found in smoky coal emissions and tobacco smoke.6 Several studies have assessed the relationship between SNPs in NER genes and the risk of lung cancer, but the results have been inconsistent.7, 8, 9, 10 Differences in the interaction between environmental exposures and genetic risk factors could contribute to some of the heterogeneity between studies, and there is some evidence that effects may vary as a result of different levels and possibly different types of exposure to carcinogens.11, 12 As exposure to smoky coal emissions is the primary risk factor for lung cancer in Xuan Wei,2 we hypothesized that genetic variation in the NER pathway would play an important role in lung carcinogenesis in this special population. Here, we report the direction and magnitude of associations between genetic variants in 6 NER genes (ERCC1, ERCC2/XPD, ERCC4/XPF, ERCC5/XPG, RAD23B and XPC) and lung cancer risk in a population-based case-control study in Xuan Wei, China.


BER, base excision repair; CI, confidence interval; EM, estimation-maximization; ERCC1, excision repair cross-complementation group 1; EX, exon; HWE, Hardy-Weinberg equilibrium; IVS, intervening sequence; LD, linkage disequilibrium; NER, nucleotide excision repair; OR, odds ratio; PAH, polycyclic aromati hydrocarbon; RAD23B, RAD23 homolog B; SNP, single nucleotide polymorphism; TFIIH, transcription factor IIH; XPC, xeroderma pigmentosum, complementation group C.

Material and methods

Study population

This was a population-based case-control study of lung cancer in Xuan Wei, China. Details of the study have been described elsewhere.13 Briefly, the field phase of the study lasted from March 1995 to March 1996. A total of 122 newly diagnosed lung cancer cases were recruited, and the criteria for inclusion were positive histology or cytology results (105 cases) or clinically diagnosed cases who died within the 1-year period (17 cases). Within 2 weeks after the diagnosis and recruitment of each lung cancer case, a matched control was selected randomly from a list of registered households in the same villages. Participation rates for cases and controls were 98% and 100%, respectively. Matching conditions included sex, age (±2 years) and type of fuel currently used for cooking and home heating. A standardized structured questionnaire was used to obtain information about demographic characteristics, lifetime use of different types of coal, tobacco smoking, family history of lung cancer and personal medical history. The study was conducted according to the recommendations for human subject protection of the World Medical Association Declaration of Helsinki. The research protocol was approved by a U.S. Environmental Protection Agency Human Subjects Research Review official for international research projects, and informed consent was obtained from all subjects.


Sputum samples were collected from cases and controls. DNA from sputum samples was extracted using phenol-chloroform extraction,14 and 15 SNPs in 6 NER genes were genotyped by real-time PCR on an Applied Biosystems (Foster City, CA) 7900HT sequence detection system at the Core Genotyping Facility of the National Cancer Institute as described on the SNP500 website (http://snp500cancer.nci.nih.gov).15 Of the 122 cases and 122 controls, DNA was successfully extracted from 119 cases and 113 controls, and >95% of DNA samples were successfully genotyped for all candidate SNPs, except ERCC5 Leu700Leu (88%) and XPC EX 16 +315 C>G (93%). Concordance rates between quality-control samples were 99–100% for all assays.

Statistical analysis

An ever-smoker was defined as a smoker of at least one cigarette per day for 6 months or longer. Cut-off points of distribution of smoky coal use (tons) and tobacco smoking (pack-years) were estimated based on the distribution of lifetime cumulative use in controls. The HWE for each SNP was tested using a Pearson χ2 test or an exact test (if one of the genotypes was infrequent) in controls. Measures of pairwise LD between SNPs within the same gene were carried out with HaploView (http://www.broad.mit.edu/personal/jcbarret/haploview/). Genotype data were analyzed using homozygotes for common allele as the reference group. As genotype data were not obtained for all cases and controls, unconditional logistic regression was used to estimate the ORs and 95% CIs, with 2-sided p values for the association between lung cancer risk and the SNPs adjusted for age, sex and current fuel type, and in an alternative model, pack-years of smoking (≥25 vs. <25) and smoky coal use (≥130 tons vs. <130 tons). The haplotype block structure for each gene with more than one SNP was examined with HaploView using the 4-gamete rule16 with a minimum frequency of 0.005 for the fourth gamete. For each haplotype block, individual haplotypes were estimated using the EM algorithm, and overall differences in haplotype frequencies between cases and controls were assessed using the omnibus test in SAS/Genetics (SAS Institute, Cary, NC). The association between each haplotype and lung cancer risk was estimated using unconditional logistic regression, with the most common haplotype or the haplotype containing the common alleles as the reference. Data were analyzed with Statistical Analysis Software, version 8.02 (SAS Institute), if not specified elsewhere.


Demographic features, including age, sex, ethnicity, education level, household income, dwelling type and type of fuel source, were comparable between cases and controls (Table I). About 93% of men smoked tobacco, while only one woman smoked. Compared with studies in other populations, the impact of tobacco smoking in Xuan Wei was quite weak, with a 1.7-fold (95% CI 0.8–3.5) risk of lung cancer for exposure to more than 25 pack-years among men, which is consistent with previous studies in Xuan Wei.17 However, smoky coal use was a strong risk factor in Xuan Wei. Compared to subjects who used <130 tons of smoky coal during their lifetime, heavy smoky coal users (≥130 tons) had a 2.27-fold (95% CI 1.25–4.10) increased risk of lung cancer.

Table I. Distribution of Demographic Features in Lung Cancer Cases and Controls3
 Cases (%) (n = 122)Controls (%) (n = 122)p value1
  • 1

    Two-sided p value based on χ2 test.

  • 2

    Males only.

  • 3

    Demographic data were previously reported.

Age (years)
 <5552 (43)51 (42) 
 ≥5570 (57)71 (58)0.90
 Male79 (65)79 (65) 
 Female43 (35)43 (35)1.00
 Nonsmokers9 (11)10 (13) 
 <2527 (34)36 (46) 
 ≥2543 (54)33 (42)0.26
Smoky coal use (tons)
 <13051 (42)72 (59) 
 ≥13071 (58)50 (41)0.007

Fifteen SNPs were genotyped in 6 NER genes. These SNPs were comprised of nonsynonymous and synonymous substitutions in coding regions, as well as single base pair changes in noncoding regions. With the exception of ERCC5 His46His (p = 0.04) and XPC Lys939Gln (p = 0.03), the genotype frequencies for all of the SNPs in controls were consistent with HWE. Quality-control samples for the 2 SNPs out of HWE were rechecked and the concordance rate was >99% for each. Genotype frequencies for cases and controls and the main effect of these SNPs on lung cancer risk are shown in Table II. Five SNPs in 3 genes displayed a significant or borderline significant association with lung cancer. Three linked SNPs in ERCC2 were associated with decreased risk of lung cancer. Homozygotes of a minor allele of ERCC2 Lys751Gln and ERCC2 IVS19 –70 C>T were rare, and individuals with at least one variant had a 60% reduction of lung cancer risk. The ERCC2 Arg156Arg polymorphism showed a borderline association with lung cancer, with the CC genotype displaying a lower risk of lung cancer compared to the AA genotype. Subjects with at least one 249Val allele for RAD23B had an approximately 2-fold higher risk of lung cancer, and XPC 939Gln homozygotes displayed an increased risk of lung cancer that was borderline significant (OR = 2.45, 95% CI 0.96–6.21, p = 0.06). Because RAD23B and XPC work collectively to recognize DNA damage as a primary step in NER,6 we analyzed the combined effects of the RAD23B Ala249Val and XPC Lys939Gln polymorphisms and found that subjects with both the XPC 939Gln/Gln genotype and either the RAD23B 249Ala/Val or 249Val/Val genotype had a 6-fold higher risk of lung cancer (Table III). Other variants evaluated in this study were not associated with lung cancer risk.

Table II. Main Effect of SNPs of DNA NER Genes on Lung Cancer Risk in Xuan Wei, China
Gene and SNP position (db SNP ID)Cases (%) (n = 119)Controls (%) (n = 113)OR195% CIp value3OR295% CIp value3
  • 1

    Adjusted for age, sex and current fuel type by unconditional logistic regression.

  • 2

    Adjusted for age, sex, current fuel type, pack-years of smoking and smoky coal use by unconditional logistic regression.

  • 3

    Two-sided p value.

 IVS5 +33 C>A (rs3212961)
  CC41 (35)34 (32)Ref.  Ref.  
  CA58 (49)54 (50)0.900.50–1.610.710.870.48–1.590.65
  AA19 (16)19 (18)0.830.38–1.810.630.820.37–1.830.63
  CA+AA77 (65)73 (68)0.880.50–1.530.650.860.48–1.510.59
  Trend    0.61  0.59
 IVS3 +74 G>C (rs3212948)
  GG66 (56)67 (60)Ref.  Ref.  
  GC45 (38)40 (36)1.150.66–2.000.621.130.64–1.990.67
  CC7 (6)5 (4)1.440.43–4.800.551.660.49–5.620.42
  GC+CC52 (44)45 (40)1.180.70–2.010.531.190.69–2.040.54
  Trend    0.48  0.43
 Ex23 +61 A>C (rs1052559)
  AA (Lys/Lys)107 (91)86 (80)Ref.  Ref.  
  AC (Lys/Gln)11 (9)20 (19)0.450.20–0.990.0480.420.18–0.950.037
  CC (Gln/Gln) 2 (2)      
  AC (Lys/Gln) + CC (Gln/Gln)11 (9)22 (20)0.400.18–0.890.020.390.17–0.860.02
 IVS19 −70 C>T (rs1799787)
  CC106 (91)89 (80)Ref.  Ref.  
  CT11 (9)20 (18)0.460.21–1.030.0590.440.19–0.990.047
  TT 2 (2)      
  CT+TT11 (9)22 (20)0.420.19–0.920.030.400.18–0.900.03
 Ex10 −16 G>A (rs1799793)
  GG (Asp/Asp)109 (92)99 (88)Ref.  Ref.  
  GA (Asp/Asn)9 (8)14 (12)0.580.24–1.400.230.620.25–1.530.30
 Ex6 −10 A>C (rs238406)
  AA30 (26)24 (22)Ref.  Ref.  
  AC64 (55)50 (45)1.010.52–1.960.981.020.52–2.010.96
  CC23 (20)37 (33)0.500.23––1.010.05
  AC+CC87 (74)87 (78)0.790.42–1.460.450.770.41–1.460.42
  Trend    0.058  0.046
 Ex11 −247 T>C (rs1799801)
  TT69 (59)74 (67)Ref.  Ref.  
  TC40 (34)32 (29)1.350.76–2.420.301.330.73–2.400.35
  CC8 (7)5 (5)1.700.53–5.480.371.710.52–5.580.38
  TC+CC48 (41)37 (33)1.400.81–2.430.221.380.79–2.410.26
  Trend    0.21  0.23
 Ex2 +50 T>C (rs1047768)
  TT55 (47)63 (56)Ref.  Ref.  
  TC49 (42)36 (32)1.560.89–2.750.121.540.87–2.740.14
  CC14 (12)13 (12)1.230.53–2.850.631.310.55–3.070.54
  TC+CC63 (53)49 (44)1.470.88–2.480.141.480.87–2.520.15
  Trend    0.28  0.25
 Ex8 −369 G>C (rs2227869)
  GG (Cys/Cys)103 (87)100 (90)Ref.  Ref.  
  GC (Cys/Ser)14 (12)11 (10)1.250.54–2.910.601.340.57–3.190.50
  CC (Ser/Ser)1 (1)       
 Ex9 −100 C>A (rs2228959)
  CC97 (92)88 (89)Ref.  Ref.  
  CA9 (8)11 (11)0.750.29–1.910.540.790.30–2.060.63
 Ex15 −344 C>G (rs17655)
  CC (His/His)38 (33)38 (35)Ref.  Ref.  
  CG (His/Asp)52 (45)46 (42)1.160.63–2.120.631.080.58–2.010.81
  GG (Asp/Asp)26 (22)25 (23)1.050.52–2.140.891.030.49–2.130.95
  CG (His/Asp) + GG (Asp/Asp)78 (67)71 (65)1.120.64–1.950.691.060.60–1.880.84
  Trend    0.85  0.92
 Ex7 +65 C>T (rs1805329)
  CC (Ala/Ala)58 (49)72 (65)Ref.  Ref.  
  CT (Ala/Val)52 (44)34 (31)1.901.09–3.300.021.791.02–3.160.04
  TT (Val/Val)8 (7)5 (5)1.980.61–6.410.261.880.56–6.250.30
  CT (Ala/Val)+ TT (Val/Val)60 (51)39 (35)1.911.12––3.110.03
  Trend    0.03  0.05
 Ex16 +315 C>G (rs2229090)
  CC52 (48)47 (44)Ref.  Ref.  
  CG52 (48)52 (49)0.910.52–1.580.740.900.51–1.580.70
  GG5 (5)8 (7)0.580.18–1.930.380.560.16–1.910.35
  CG+GG57 (52)60 (56)0.870.51–1.490.610.850.49–1.480.57
  Trend    0.45  0.42
 Ex16 +211 A>C (rs2228001)
  AA (Lys/Lys)43 (38)39 (37)Ref.  Ref.  
  AC (Lys/Gln)50 (44)58 (55)0.780.44–1.380.390.670.37–1.230.20
  CC (Gln/Gln)21 (18)8 (8)2.450.96––5.740.10
  AC (Lys/Gln) + CC (Gln/Gln)71 (62)66 (63)0.980.56–1.690.930.860.48–1.520.60
  Trend    0.25  0.43
 Ex9 −377 C>T (rs2228000)
  CC (Ala/Ala)56 (48)50 (45)Ref.  Ref.  
  CT (Ala/Val)47 (41)47 (43)0.890.51–1.560.690.910.52–1.620.75
  TT (Val/Val)13 (11)13 (12)0.890.38–2.110.790.980.41–2.380.97
  CT (Ala/Val) + TT (Val/Val)60 (52)60 (55)0.890.53–1.510.670.930.54–1.590.78
  Trend    0.70  0.86
Table III. Combined Genotypes of RAD23B Ala249Val and XPC Lys939Gln on Lung Cancer Risk
RAD23B Ex7 +65 C>T (rs1805329) Ala249ValXPC Ex16 +211 A>C (rs2228001) Lys939GlnCases (n = 113)Controls (n = 103)OR95% CIp value1
  • 1

    Two-sided p value. The test for multiplicative interaction between these 2 variants was not significant.

  • 2

    By exact logistic regression.

  • 3

    Trend test.

CCAA/AC45 (37)60 (49)Ref.  
CCCC58 (47)41 (34)1.911.09–3.340.02
CT/TTCC10 (8)2 (2)6.3521.26–62.540.02

We examined pairwise LD between the SNPs and the haplotype block structure for genes in which more than one SNP was genotyped. The extent of LD varied between the SNPs for ERCC2, ERCC5 and XPC (Table IV); and for both ERCC2 and ERCC5, we observed 2 different haplotype blocks within each gene. For ERCC2, Lys751Gln and IVS19 –70 C>T appeared to be in one haplotype block using the 4-gamete rule, whereas Asp312Asn and Arg156Arg were located in a second block. Similarly, the His46His and Cys529Ser variants in ERCC5 were in a separate haplotype block from Leu700Leu and His1104Asp. For XPC, only Lys939Gln and Ala499Val were in the same haplotype block. The SNPs analyzed in ERCC1 were not tightly linked (D′ = 0.77) and did not appear to be in the same haplotype block.

Table IV. Lewontin's D′, a Measure of LD Between SNPs in ERCC2, ERCC5, and XPC
Lys751GlnIVS18−70 C>TAsp312AsnHis46HisCys529SerLeu700LeuEx16 +315 C>GLys939Gln
 IVS19−70 C>T1.0       
 Cys529Ser   1.0    
 Leu700Leu   0.190.08   
 His1104Asp   0.511.01.0  
 Lys939Gln      0.71 
 Ala499Val      1.01.0

Haplotypes were estimated for SNPs within the same haplotype block. The distribution of estimated haplotypes and results of logistic regression are shown in Table V. An omnibus likelihood ratio test showed a significant difference in frequency profiles of an ERCC2 haplotype (Lys751Gln, IVS19 –70 C>T) between cases and controls (p = 0.01). However, the Lys751Gln and IVS19 –70 C>T SNPs in ERCC2 were completely linked, so the estimated OR for the joint association of the variants was the same as for the individual SNPs (OR = 0.40, 95% CI 0.19–0.85). The omnibus test for the haplotype block containing ERCC2 Asp312Asn and Arg156Arg was not statistically significant (p = 0.11).

Table V. Haplotype Analysis of DNA NER Genes on Lung Cancer Risk in Xuan Wei, China
 HaplotypesCasesControlsOR195% CIp value2
  • 1

    Adjusted for age, sex and current fuel type by unconditional logistic regression.

  • 2

    Two-sided p value.

ERCC2Lys751Gln–IVS18-70 C>T     
 Omnibus test     0.01
 Omnibus test     0.11
 Omnibus test     0.47
 Omnibus test     0.75
 Omnibus test     0.44


The high mortality of lung cancer in Xuan Wei, China, is overwhelmingly driven by exposure to PAH-rich coal combustion emissions.2, 3 Further, it has been shown that DNA base pair changes in TP53 and KRAS from lung tumors of nonsmokers exposed to PAH-rich coal emissions in Xuan Wei have a mutational spectrum highly consistent with PAH mutagenesis, which is distinctive from the mutation pattern in lung tumors caused by cigarette smoke.18 Here, we report that genotypes of 2 NER genes (ERCC2 and RAD23B), which play a key role in repairing DNA damaged by PAHs, altered risk of lung cancer in this population. In addition, the XPC Gln939Gln genotype was associated with a borderline significant risk of lung cancer.

The protein encoded by ERCC2 is involved in transcription-coupled NER and is an integral member of the basal transcription factor TFIIH complex,19 which is necessary for normal transcription initiation of NER. The 751Gln variant of the ERCC2 gene leads to a conformational change in the coded protein at the domain of interaction between the ERCC2 protein and its helicase activator, p44 protein, inside the TFIIH complex.20 The role of the ERCC2 polymorphism in human cancer is unclear, with equivocal results from both functional and molecular epidemiologic studies.11, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 The 751Gln allele has been associated with higher DNA adduct levels, lowered DNA repair capacity or increased chromosomal aberrations in some studies.21, 24, 25 However, the association between higher DNA adduct levels and the 751Gln allele was restricted to specific exposure subgroups, e.g., never-smokers23 and traffic-exposed workers26 in 2 studies. Other studies found null27, 28 or even protective22, 29 effects for this variant.

Similarly, epidemiologic studies have not yielded a clear picture of the effect of the ERCC2 751Gln allele on lung cancer risk. A meta-analysis from 9 published case-control studies showed a small increased risk (OR = 1.21, 95% CI 1.02–1.43) associated with 751Gln.10 However, there was no significant association among Asians, and there was evidence of significant heterogeneity among the 3 studies, indicating uncertainty of the role of this SNP in Asian populations. Differences in associations between Caucasian and Asian populations could be due to differences in LD patterns between the 2 populations if the ERCC2 751Gln allele is merely linked to the true “at-risk” variant. A study of ERCC2 haplotypes in 3 different ethnic groups revealed that the haplotype structure of ERCC2 differed substantially between Europeans, Asians and Africans.32

Alternatively, differences in exposures may alter the effect of the ERCC2 polymorphism on lung cancer risk. One study in Caucasians by Zhou et al.11 found that the risk associated with the ERCC2 751Gln allele decreased as pack-years increased and the 751Gln allele had a protective effect among heavy smokers, opposite to its effect among nonsmokers. In contrast, a case-control study in China found that the association between the 751Gln allele and lung cancer was greater among heavy smokers.33 However, the minimum number of pack-years for heavy smokers was much lower in the Chinese study compared to the Caucasian study (≥29 pack-years vs. >55 pack-years), and in the Caucasian study, persons with the 751Gln allele who smoked 25–55 pack-years also displayed an increased risk of lung cancer.11 Thus, it is likely that the protective effects of the 751Gln allele are seen only at very high exposure levels. We found that the ERCC2 751Gln allele as well as 2 other linked variants in ERCC2 were associated with a reduced risk of lung cancer. In view of the overall heavy exposure to PAHs in Xuan Wei, our finding is consistent with the results of Zhou et al.11 and provides a novel lead for further study on the role of the ERCC2 Lys751Gln polymorphism.

The protein encoded by RAD23B is one of 2 human homologs of Saccharomyces cerevisiae Rad23. RAD23B and XPC bind to form an XPC-HHR23 heterodimeric subcomplex, which plays a key part in DNA damage recognition in the NER global genome repair pathway.6 We found that the RAD23B Val allele was associated with increased risk of lung cancer; however, its biologic function is not clear, and additional studies evaluating the functional significance of this polymorphism are warranted.

XPC encodes a 940–amino acid protein that stably combines with RAD23B in the DNA damage recognition step of NER. Laboratory studies show that XPC–/– mice have an increased risk of chemically induced lung tumors compared to normal or heterozygous mice.34 Even though an XPC 939 allele-specific complementation assay utilizing post-UV host cell reactivation did not find different DNA repair capacity between the 2 alleles,35 carriers with homozygotes of Gln/Gln at codon 939 were found to have a 2.5-fold increased risk (p = 0.06) of lung cancer compared to those with Lys/Lys and Gln/Lys genotypes in our study. Although this finding could be a false-positive, the association may be due to LD with another polymorphism at intron 11 that is located at a splice acceptor site and associated with decreased DNA repair activity.36 Other studies have examined a biallelic poly(AT) insertion/deletion polymorphism( XPC-PAT) which is in LD with the functional splice site polymorphism at intron 11. Although one study in a Chinese population did not observe an association with the XPC-PAT polymorphism,37 a study in Caucasians found an increased risk of lung cancer for the PAT+/+ genotype.38 These findings suggest that variants in XPC may alter the risk of lung cancer. Since RAD23B and XPC unite in the DNA damage recognition step of NER, variants in both genes may interact to hinder NER and increase the risk of lung cancer. Our study found that persons with “at-risk” genotypes for both RAD23B and XPC had a significantly higher risk of lung cancer, which is consistent with this hypothesis.

Our study is limited by its small sample size and, consequently, low power to detect effects that may truly exist. Also, given the borderline significance of some associations and multiple comparisons carried out, there is a possibility that one or more findings are false-positives.39 As such, our results need to be considered as preliminary. However, these findings are biologically plausible and derive from a unique population that provides a special model of nonsmoking PAH carcinogenesis. Also, in addition to providing further elucidation of the role of ERCC2 in lung cancer, this study points to 2 new variants (XPC Lys939Gln and RAD23B Ala249Val) in the NER pathway that deserve further exploration and research in other studies.

In summary, we found that genetic polymorphisms in ERCC2, RAD23B and XPC were associated with lung cancer risk in Xuan Wei, China. This suggests that NER may play a role in the pathogenesis of lung cancer, particularly in populations exposed to high levels of PAHs. A substantially larger case-control study of lung cancer will begin later this year in this region of China and will provide an opportunity to replicate and extend these findings.