DNA repair systems maintain the integrity of the genome; therefore, their activities are associated with mutation frequencies of cancer-related genes. Up to the present, a number of genes have been shown to be involved in DNA repair systems.1 Some of the genes encode factors for base excision repair (BER), nucleotide excision repair (NER), mismatch repair (MMR), DNA double-strand breaks repair (DSBR), or other repair pathways, while some others encode DNA polymerases that can bypass DNA damages. In addition, DNA damage response genes, which encode factors to transmit the signals of DNA damages to the cell cycle checkpoint machinery and to the monitoring systems controlling cellular apoptosis, can be regarded as a class of DNA repair genes. It has been considered that polymorphisms in DNA repair genes lead to interindividual differences in the capacities for repairing DNA damages; therefore, they could contribute to susceptibility to cancer.2, 3, 4
Lung cancer is the leading cause of cancer-related deaths in the world, and genetic factors responsible for susceptibility to lung cancer have been searched for to establish novel and efficient ways of preventing the disease. Several epidemiologic studies have indicated that there are genetic factors to modify the risk of individuals to lung cancer.5, 6, 7 Segregation analyses suggest that rare autosomal dominant genes may explain susceptibility to early-onset lung cancer, but only a minority of lung cancer cases can be explained by the presence of such genes.5, 6, 7 Therefore, more common genetic polymorphisms have been considered to affect the risk of lung cancer in the general population. Lung cancer patients were reported to have lower capacities to repair DNA damages than healthy individuals.6, 8 Therefore, it was indicated that polymorphisms of DNA repair genes are strong candidates for genetic factors responsible for lung cancer susceptibility. In fact, single nucleotide polymorphisms (SNPs) in several DNA repair genes were examined for associations with lung cancer risk, and a few SNPs, TP53-Arg72Pro, OGG1-Ser326Cys, XRCC1-Arg399Gln and XPD-Asp312Asn, showed associations.3 Therefore, the significance of genetic polymorphisms in DNA repair genes in lung cancer risk is being revealed. However, to make their contribution on lung cancer risks clearer, polymorphisms in various classes of DNA repair genes should be more extensively examined for associations with the risk. For this reason, in this study, 36 DNA repair genes involved in diverse intracellular processes that maintain genome integrity (Table I) were searched for nonsynonymous (associated with amino acid changes) SNPs, and 50 SNPs detected were subjected to a case-control study to examine their associations with risks for lung cancer. Lung cancer subjects analyzed in this case-control study consisted of adenocarcinoma (ADC) and squamous cell carcinoma (SQC) cases. ADC and SQC are the first and second major histologic subtypes of lung cancer, respectively, and epidemiologic studies have indicated that carcinogenic processes are different between them.9, 10 Thus, associations were examined in each of the 2 histologic subtypes.
Table I. Fifty Amino Acid Substitution Polymorphisms in DNA Repair Genes
DNAs extracted from blood or other noncancerous tissues of 59 unrelated Japanese individuals, consisting of 17 noncancer volunteers, 23 kidney cancer patients and 19 endometrial cancer patients, were amplified according to a ligation-mediated PCR method using a high-fidelity DNA polymerase, KOD Plus (Toyobo, Osaka, Japan)11 and were used for resequencing. Secondary PCR, the reaction for amplifying DNA fragments intended for resequencing, was performed by using KOD Plus DNA polymerase and PCR primers designed to amplify every exon plus approximately 100 bases upstream and downstream of each exon. Reference sequences of the selected 36 genes were retrieved from public databases such as Genbank (http://www.ncbi.nlm.nih.gov/) and EMBL-EBI (http://www.ebi.ac.uk/). The accession numbers of the used reference sequences are NM001618.2 (ADPRT), NM001641.2 (APE1), NM000057.1 (BLM), NM000059.1 (BRCA2), NM000123.1 (XPG), NM000124.1 (ERCC6), NM000135.1 (FANCA), NM021922.1 (FANCE), NM022725.1 (FANCF), NM004629.1 (FANCG), D42045.1 (KIAA0086), NM002312.1 (LIG4), NM003925.1 (MBD4), NM000249.1 (MLH1), NM014381.1 (MLH3), NM000251.1 (MSH2), NM002439.1 (MSH3), NM000179.1 (MSH6), AF058696.1 (NBS1), NM002452.1 (NUDT1), NM002542.3 (OGG1), NM002691.1 (POLD1), NM006502.1 (POLH), NM007195.1 (POLI), NM013274.1 (POLL), NM002912.1 (POLZ), NM002878.1 (RAD51L3), NM003579.1 (RAD54L), NM016316.1 (REV1), AF317622.1 (RINT1), NM000546.2 (TP53), NM000553.1 (WRN), NM004628.1 (XPC), X52221.1 (XPD), NM006297.1 (XRCC1) and NM005432.2 (XRCC3). The output of the automated sequencer was base-called with PHRED (http://www.genome.washington.edu/UWGC/analysistools/Phred.cfm), aligned with PHRAP (http://www.phrap.org) individually by each fragment, and the candidates of the polymorphic sites were detected by using POLYPHRED (http://www.codoncode.com/polyphred/). SNPs were confirmed visually by using a fluorescence-based sequence chromatogram viewer CONSED from the narrowed-down polymorphic sites that POLYPHRED had selected (http://www.phrap.org/consed/consed.html).
The case population consisted of 752 ADC cases and 250 SQC cases. All the cases were Japanese and recruited from 3 hospitals in the Kanto area of Japan (i.e., Tokyo and surrounding prefectures) from 1999 to 2002. All ADC and SQC cases, from whom informed consent as well as blood samples were obtained, were consecutively included in this study without any particular exclusion criteria. The participation rate was nearly 80%. All the cases were diagnosed by cytologic and/or histologic examinations according to WHO classification.12 Diagnoses of primary lung cancers, but not of metastases of other cancers, were made by pathologic examinations for these cases. The predominance of ADC cases in the case population was consistent with the distribution of the histologic subtypes of non small cell lung cancer in Japan.13 Controls consisted of 383 inpatients/outpatients of the 3 hospitals and 302 healthy volunteers of Keio University, Tokyo, and they were selected with a criterion of no history of cancer during the study period. Three hundred thirty-nine of the ADC cases and 163 of the controls are the same as the ones used in our previous study.14 The study was approved by the ethical committees of the National Cancer Center, the Nishigunma Hospital, the Gunma Prefecture Cancer Center and Keio University. Ages of the participants were computed from their date of birth. Smoking history was obtained via interview using a questionnaire. Smoking habit was expressed by pack-years, which was defined as the number of cigarette packs smoked daily multiplied by years of smoking, both in current smokers and former smokers. Smokers were defined as those who had smoked regularly for 12 months or longer at any time in their life, while nonsmokers were defined as those who had not. There were no individuals who had smoked regularly for less than 12 months. From each individual, a 10 or 20 ml whole blood sample was obtained. Genomic DNAs for all cases and the 383 inpatient/outpatient controls were isolated directly from the samples as described previously.14 Genomic DNAs for the 302 healthy volunteers were isolated from EBV-transformed B lymphocytes derived from the collected whole blood samples; 10 ng of genomic DNA were subjected to genotyping for 50 SNPs by pyrosequencing using the PSQ96 system (Pyrosequencing, Uppsala, Sweden). The method for pyrosequencing was described previously.15 Briefly, a genomic fragment containing an SNP site was amplified by PCR with a set of PCR primers, one of which was biotinylated. The PCR products were purified using streptavidin-modified paramagnetic beads (Dynabeads M-280; Dynal, Skoyen, Norway), denatured and subjected to nucleotide sequencing with a sequence primer. Information on the primer sequences and conditions for PCR is shown in Table II.
Table II. Primers and Conditions for Pyrosequencing
Forward primer (5′–>3′)
Reverse primer (5′–>3′)
Anneal temperature (°C)
Differences in the allele and genotype distributions were tested by chi-square test. The strength of association of genotypes with risk was measured as crude odds ratios (ORs).16 ORs adjusted for age, gender and smoking dosage with 95% confidence intervals were calculated using an unconditional logistic regression analysis. When ORs in nonsmokers were calculated, adjustment was done only for age and gender. The statistical analyses described above were performed using the StatView version 5.0 and JMP version 5.0 software (SAS Institute, Cary, NC). Hardy-Weinberg equilibrium (HWE) tests were performed using the TFPGA software (http://bioweb.usu.edu/mpmbio/). A level of p < 0.05 was considered as statistically significant, while a level of 0.05 ≤ p < 0.10 was considered as marginally significant.
All the exons of 36 DNA repair genes (Table I) were resequenced in 59 unrelated individuals (118 chromosomes). Three hundred fifty-five SNPs were found from the scanned regions. The SNPs consisted of 272 synonymous and 83 nonsynonymous ones. Fifty of the 83 nonsynonymous SNPs appeared on 2 or more chromosomes (i.e., minor allele frequencies were > 0.017) in the 59 individuals. These 50 SNPs were distributed over the 36 genes, and 1–5 SNPs were present in each gene (Table I). Ten (20.0%) of the 50 SNPs were novel, while the remaining 40 (80.0%) had been reported in previous studies4, 17 or deposited in public SNP databases: dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) and JSNP (http://snp.ims.u-tokyo.ac.jp/).18, 19, 20 We prepared a population consisting of 685 control subjects, 752 ADC cases and 250 SQC cases (Table III), and the 50 nonsynonymous SNPs were subjected to a case-control study using this population. First, allele frequencies were compared between the cases and controls, since this approach is a simple and efficient method to identify SNPs involved in cancer susceptibility.21 When we used the cutoff p-value of 0.05, the power for detection of an allele with a risk ratio of 1.5 was estimated as being > 0.33 and > 0.23 in the ADC and SQC cases, respectively, assuming that minor allele frequencies for these SNP are > 0.017. When we used the cutoff p-value of 0.1, the power for detection of the allele was estimated as being > 0.46 and > 0.33 in the ADC and SQC cases, respectively.
Table III. Distributions of Gender, Age and Smoking Habit Among Cases and Controls
Mean ± SD
55.4 ± 14.8
61.7 ± 10.4
65.6 ± 9.0
Smoking habit (pack-years)
≥ 20, < 40
≥40, < 60
All the ADC and SQC cases and control subjects were subjected to genotyping of 50 SNPs, and the success rate was 99.98% (representative genotyping data in Fig. 1). Minor alleles for these SNPs were observed in all 3 populations. Deviations from the HWE for the SNPs were examined each in the control, ADC and SQC populations. Genotype distributions of the POLL-Arg438Trp SNP in the control population and the FANCA-Arg350Gln SNP in the SQC population significantly deviated from HWE (p < 0.05), while the other 48 SNPs were consistent with HWE (p ≥ 0.05) in all the case and control populations. The deviations observed were likely to be chance results, since frequencies of the minor allele for these 2 SNPs in the control and SQC populations were less than 1%, respectively (Table I). The excess or shortage of a few homozygotes for these minor alleles could lead to the deviations from HWE. Minor allele frequencies for the 50 SNPs in the control, ADC and SQC populations are shown in Table I, together with frequencies in another Japanese population and several populations of other ethnicities, which were published elsewhere or deposited in SNP databases.17, 18, 19, 20 Frequencies of the SNPs varied among populations of different ethnicities. For instance, minor alleles for SNPs, such as XPD-Asp312Asn and XRCC3-Thr241Met, which have been examined for risks for several kinds of cancers,3 were rare (< 10%) in Japanese, but common in Caucasian and Hispanic populations.
Allele distributions for the 50 SNPs in the ADC and SQC cases were compared with those in the controls (Table I). Significant (p < 0.05) allele differentiations between the controls and ADC cases were observed in 2 SNPs, PARP-Lys940Arg and MLH3-Pro844Leu, and marginally significant (0.05 ≤ p < 0.10) allele differentiation was observed in an SNP, POLI-Thr706Ala. Significant allele differentiations between the controls and SQC cases were observed in 3 SNPs, XPG-His1104Asp, LIG4-Ile658Val and TP53-Arg72Pro, while marginally significant allele differentiations were observed in 4 SNPs, PARP-Lys940Arg, APEX-Ile64Val, NBS1-Gln185Glu and REV1-Phe257Ser. For the other SNPs, allele distributions were not significantly or marginally significantly different either between the controls and ADC cases or between the controls and SQC cases. The relative risks of the genotypes for the 9 SNPs, which showed allele differentiations, were calculated as crude ORs. Heterozygotes or homozygotes for the minor alleles for 6 of the 9 SNPs showed significantly increased or decreased ORs, when homozygotes for the major allele were used as a reference (Table IV). ORs in dominant (i.e., AA vs. Aa + aa, where A is major allele and a is minor allele) or recessive (i.e., AA + Aa vs. aa) modes for these 6 SNPs were also significantly increased or decreased (Table IV). On the other hand, ORs of genotypes for the remaining 3 SNPs, MLH3-Pro844Leu, APEX-Ile64Val and NBS1-Gln185Glu, did not show significant increase or decrease either in ADC or SQC cases (data not shown). ORs for the 6 SNPs, which showed significant difference in genotype distributions, were calculated after adjustment for age, sex and smoking dosage by the unconditional logistic regression method. ORs for 4 SNPs (i.e., POLI-Thr706Ala in ADC, and LIG4-Ile658Val, TP53-Arg72Pro and REV1-Phe257Ser in SQC) remained significant. On the other hand, adjusted ORs for the other 2 SNPs, PARP-Lys940Arg and XPG-His1104Asp, were not significant, suggesting the possibility that the associations observed were due to the confounding effects of the factors described above. Thus, the 4 SNPs, for which both the crude and adjusted ORs were significantly increased or decreased, were concluded as being candidate SNPs associated with lung cancer risk.
Table IV. Associations Between Lung Cancer Risk and Genotypes for SNPs in DNA Repair Genes
To assess the effect of smoking on the 4 candidate SNPs in the contribution to lung cancer risk, the ORs by smoking dosage were calculated (Table V). To increase statistical power, crude and adjusted ORs were calculated in dominant or recessive modes. LIG4-Ile658Val was excluded from this analysis, since the number of carriers of the LIG4-658Val allele in the SQC cases was small (i.e., n = 9; Table IV). Since only 11 nonsmokers were included in the SQC cases (Table III), they were excluded from the analysis of the SQC risk due to a lack of statistical power. Significant increases or decreases of ORs were not observed in either subgroups from the ADC population. In SQC, ORs for REV1-Phe257Ser were significantly increased in heavy smokers, but not in light smokers, and the OR in heavy smokers was higher than that in light smokers. Significant increases or decreases of ORs for the other 2 SNPs were not observed in either subgroup from the SQC population.
Table V. Lung Cancer Risk Stratified by POLI, TP53 and REV1 Genotypes and Smoking Dosage
We can hypothesize that individuals with low DNA repair activities may represent a group that is more prone to acquiring gene mutations and cancer at a younger age; thus, associations would be more clearly observed in subjects who have contracted cancer at a younger age.21, 22, 23 Therefore, association between genotypes for the same 3 SNPs and lung cancer risks was examined after dividing the subjects into 2 subgroups by age with a cutoff point of 61 years, the median of all the control and case subjects (Table VI). To increase statistical power, crude and adjusted ORs were calculated in dominant or recessive modes. ORs for POLI-Thr706Ala were significantly increased in the ADC cases < 61 years, but not in the ADC cases ≥ 61 years. Preferential increase in ORs in cases < 61 years was also observed in the SQC population. Thus, it was suggested that the POLI-706Ala allele confers increased lung cancer susceptibility at a young age. ORs for the remaining 3 SNPs were slightly different between the 2 groups; however, preferential associations either in cases < 61 years or cases ≥ 61 years were not evident.
Table VI. Lung Cancer Risk Stratified by POLI, TP53 and REV1 Genotypes and Age
In the present study, SNPs associated with risks for ADC and SQC of the lung were first searched for based on the difference in the allele distribution using a cutoff p-value of 0.1 in the 2 × 2 chi-square test. By taking the results of genotype differentiations into account, 4 SNPs, TP53-Arg72Pro, POLI-Thr706Ala, REV1-Phe257Ser and LIG4-Ile658Val, were defined as being associated with lung cancer risk. However, the possibility of false positives (type I statistical errors) must be considered. We performed 50 separate tests of significance in the analysis; therefore, we might expect 2 or 3 significant results in each of the ADC and SQC cases. When the Bonferroni adjustment was made for the present multiple analyses, TP53-Arg72Pro was defined as the only SNP significantly associated with lung cancer risk (i.e., p < 0.001; Table IV). Thus, we concluded that TP53-Arg72Pro was the strongest candidate SNP underlying lung cancer susceptibility, while the other 3 SNPs could also be candidates.
The TP53 gene encodes a damage response protein, which functions as a tumor suppressor protein. Significant associations of the TP53-Pro/Pro genotype with increased lung cancer risk have been observed.24, 25 A recent study indicates that TP53-Pro protein has a weaker activity in inducing apoptosis of human cells than TP53-Arg protein.26 Therefore, the TP53-72Pro allele contributes to carcinogenesis by reducing apoptosis of cells suffering from DNA damages. Consequently, the present result strengthened the idea that TP53-72Pro is a risk allele for lung cancer. It was indicated that TP53-72Pro is a risk allele for lung ADC rather than SQC in a Caucasian population, especially in smokers.24 In the present study, this allele was defined as risky for SQC, rather than ADC, and the modification of the effect of the allele on the risk by smoking was not evident. The reason for this fact is unclear; however, ethnic factors, including genetic and environmental ones, may have some influence on the effect of the TP53 polymorphism on lung cancer risk.
To the best of our knowledge, the other 3 SNPs, POLI-Thr706Ala, LIG4-Ile658Val and REV1-Phe257Ser, were defined for the first time as being associated with risks for ADC or SQC of the lung in this study. LIG4 encodes a DNA ligase employed in DSBR, and POLI and REV1 encode translesion DNA polymerases.1 The biologic significance of these SNPs has not been elucidated. However, the fact that all these SNPs are associated with changes of amino acids conserved in mouse counterparts suggests the potential for these SNPs to impact protein functions. Carriers of the POLI-706Ala allele showed a significantly increased OR for ADC (Table IV). However, ORs in cases carrying the POLI-706Ala allele (i.e., dominant mode) significantly increased particularly in ADC cases < 61 years (Table VI), and the same tendency was also observed in SQC. Thus, the POLI-706Ala allele was suggested as a risk allele for lung ADC and SQC cancer, especially in individuals of ages < 61 years. Interestingly, the mice counterpart for POLI, Poli, was recently identified as a strong candidate for the Par2 (pulmonary adenoma resistance 2) gene responsible for reducing the mean multiplicity of urethane-induced lung adenoma/adenocarcinoma.27 POLI/Poli proteins undertake the error-prone translesion DNA synthesis in incorporating deoxynucleotides opposite highly distorting or noninstructional DNA lesions. Thus, interindividual differences in the activity of POLI/Poli proteins due to polymorphisms might cause the interindividual differences in the probability of mutagenesis in both humans and mice.
ORs in homozygotes for the REV1-257Ser allele were higher in heavy-smoker SQC cases than light-smoker SQC cases (Table V). The REV1 gene encodes a deoxycytidyl transferase (i.e. exclusively utilizing dCTP in polymerase reactions) that functions as a translesion synthesis DNA polymerase against several damaged bases, including those caused by smoking. Thus, the result may imply that the SNP might cause the interindividual differences in the probability of mutagenesis by smoking-related damaged bases. However, the fraction of homozygotes for the REV1-257Ser allele was not increased in heavy-smoker SQC cases, and the increased OR was led by the fact that the fraction of the homozygotes was decreased in heavy-smoker controls (Table V). Thus, modification of the effect of REV1-Phe257Ser SNP by smoking is obscure and to be examined in different sets of subjects.
In previous case-control studies, homozygotes for the minor allele of OGG1-Ser326Cys and XRCC1-Arg399Gln were repeatedly proposed as being risk genotypes for lung cancer.3, 28 In this study, ORs of these homozygotes were increased both in ADC and SQC populations when homozygotes for the major alleles for these 2 SNPs were used as references; however, the increases were not statistically significant. Associations of XPD-Asp312Asn with lung cancer risk have been examined in several populations up to the present, but the risk allele/genotype is inconsistent among them.3, 23 In this study, ORs for minor allele homozygotes were higher and lower in the ADC and SQC populations, respectively, compared with those for major allele homozygotes. The frequency of the minor allele, XPD-312Asn, is considerably lower in Japanese than in U.S. individuals for whom previous case-control studies were undertaken. Therefore, this result might be due to the lack of power to detect associations.
The present study focusing on 50 nonsynonymous SNPs in 36 DNA repair genes led us to identifying 4 candidate SNPs associated with lung cancer risk. These SNPs were present in the genes involved in different DNA repair pathways, suggesting that modulation of activities in multiple DNA repair pathways to maintain genome integrity due to DNA polymorphisms underlies interindividual differences in lung cancer susceptibility. However, it is still possible that some of the remaining 46 SNPs are also associated with lung cancer risk due to the following reason. A majority (i.e., 28) of the 50 SNPs were rare SNPs with minor allele frequency < 10% in the control population (Table I). ORs for the minor alleles for the TP53-Arg72Pro SNP, which showed the strongest association with lung cancer risk in this study, were 1.4. When we set the cutoff p-value of 0.1 and the predicated OR of 1.4, the power to detect association of alleles with a frequency < 10% was < 0.90 and < 0.68 in the ADC and SQC cases, respectively. This may imply that associations of several SNPs can be overlooked due to the lack of statistical power. Thus, distributions of the 4 candidate SNPs as well as other SNPs in DNA repair genes should be further examined in a larger number of and with several different sets of subjects to reveal how they contribute to lung cancer risk.
The authors thank Dr. Ikuo Saito, Dr. Matsuhiko Hayashi and Dr. Keiichi Hirao of the Keio University School of Medicine and Dr. Teruhiko Yoshida, Dr. Hiromi Sakamoto, Dr. Kimio Yoshimura and Dr. Shunpei Ohnami of the National Cancer Center Research Institute for their help in collecting blood samples in Keio University; Dr. Kouichi Minato and Dr. Shinichi Ishihara of Gunma Prefectural Cancer Center for their collection of blood samples from lung cancer patients; Ms. Yuka Hiroi, Shizuka Shinohara, Nobutaka Naito, Rumie Sasaki, Naoko Okada, Mayumi Kudo, Nozomi Fujita, Anna Kikuchi, Sachie Kuroda, Miwako Osawa, Satoko Matsuyama, Kaoru Toyama and Ayaka Otsuka for providing considerable contributions to DNA sequencing and information processing. N.Y. was the recipient of a Research Resident Fellowship from the Foundation for Promotion of Cancer Research in Japan during the study.