The presence of single-nucleotide polymorphisms (SNPs) within the 3′-untranslated regions of genes could affect the binding between a microRNA (miRNA) and its target, with consequences on gene expression regulation. Considering the important role of miRNAs in carcinogenesis, it is hypothesized here that these SNPs could also affect the individual risk of colorectal cancer (CRC).
To test this hypothesis, a list was developed of 140 somatically mutated genes deduced from previous works on the mutome of the CRC. A further selection was conducted of SNPs within target sites for miRNAs that are expressed only in the colorectum (the colorectal microRNAome) and having adequate population frequencies. This yielded 12 SNPs that were genotyped in a case-control association study on 717 colorectal cases and 1171 controls from the Czech Republic.
Statistically significant associations were found between the risk of CRC and the variant alleles of KIAA0182 (rs709805) (odds ratio = 1.57; 95% confidence interval = 1.06-2.78, for the variant homozygotes) and NUP210 genes (rs354476) (odds ratio = 1.36; 95% confidence interval = 1.02-1.82, for the variant homozygotes).
Colorectal cancer (CRC) is the third most common cause of cancer death in the world. There is a consistent body of evidence from prospective studies to indicate that a high glucidic diet,1 high consumption of fat and red meat,2 and obesity are positively associated with CRC risk.3 Conversely, the use of nonsteroidal anti-inflammatory drugs4 and the consumption of brassica vegetables5 significantly reduce the risk for this cancer. Genetic factors can also contribute in modulating risk of CRC. In fact, high-risk allelic variants are responsible for familial cases (such as in familial amyloid polyneuropathy or hereditary nonpolyposis colorectal cancer), whereas in the sporadic forms, the risk can be modulated by low-risk variants (genetic polymorphisms). Recently, several genome-wide association studies were performed and risk alleles were identified,6-9 such as those reviewed by Houlston et al.10 The genome-wide association studies offer the advantage of exploring thousands of loci without the need to establish any a priori hypothesis. However, they also suffer from some limitations, including the need for adjustment of the P values for multiple testing (lowering the power of the study) and the difficulty of interpreting the results or generating any hypothesis on anonymous markers associated with risk.11 In this context, the studies carried out on candidate genes under the classical hypothesis “common allele, common disease” may provide advantages.12 Actually, these studies helped detect the null genotype of the gene GSTM1 (glutathione S-transferase mu 1),13 and specific single-nucleotide polymorphisms (SNPs) within, eg, MTHFR (methylene tetrahydrofolate reductase), NOD2 (nucleotide-binding oligomerization domain 2),14 or TP53 (tumor protein p53)15 as risk factors for CRC.
Gene deregulation is one of the key mechanisms by which cells can progress to cancer. The post-transcriptional regulation carried out by microRNAs (miRNAs) is one of the most interesting and powerful of these mechanisms.16 SNPs can reside within the genes encoding for miRNAs or within the target sequences at the 3′-untranslated regions (3′-UTRs); thus, they can affect the strength of the binding between an miRNA and its target.17, 18 As a result, SNPs can affect gene regulation, and therefore it is conceivable that these SNPs could be associated with a differential risk of cancer, as shown in a previous study.19 Following this concept, we report here a case-control association study on CRC where SNPs were selected on the basis of their potential effect on the different binding between miRNAs and their targets. In order to restrict the analysis to SNPs with a high likelihood of being associated with the risk of CRC, we reviewed the literature and filtered the results using a series of stringent a priori hypotheses.
Initially, we started from a list of 140 somatically mutated genes thought to be crucial in driving the development of CRC (Table 1). These genes derive from the analysis of the mutome, where 20,857 transcripts from 18,191 genes were sequenced in 11 patients with CRC.20 Then, we predicted in silico the target sequences for the miRNAs within the 3′-UTRs and indexed all the SNPs falling within these targets. Finally, we considered only those polymorphisms affecting the binding with miRNAs expressed specifically in the colon and rectum (Table 1), according to the colorectal microRNAome from Cummins et al.21 Thus, the overlap between the mutome and the microRNAome data sets allowed the selection of only 12 SNPs. These SNPs were checked for their association with the risk of CRC in a case-control study on 717 CRC cases and 1171 controls from the Czech Republic.
Table 1. List of 140 Candidate (CAN)-Genes for Colorectal Cancer20 and 190 MicroRNAs Expressed in the Colorectum21 From Which the Single-Nucleotide Polymorphism Selection Was Based
CAN-Genes Selected (Colorectal Mutome)
miRNA Expressed in Colorectum (the Colorectal MicroRNAome)
MATERIALS AND METHODS
Cases were patients with histologically confirmed CRC recruited between September 2004 and February 2009 from 9 oncological departments in the Czech Republic: Prague (2 departments), Benešov, Brno, Liberec, Ples, Příbram, Ústí nad Labem, and Zlín. During this period, a total of 968 cases provided blood samples. This study includes 717 subjects (74% of the whole set) who were able to be interviewed, provided biological samples, and who were genotyped appropriately. The mean age at diagnosis of the patients was 61.9 years.
Controls were 739 hospital-based volunteers with negative colonoscopy results for malignancy or idiopathic bowel diseases (cancer-free colonoscopy inspected controls [CFCCs]). CFCCs were selected among individuals admitted to the same hospitals during the same period of recruitment of the cases. The reasons for undergoing the colonoscopy were: 1) positive fecal occult blood test, 2) hemorrhoids, 3) abdominal pain of unknown origin, and 4) macroscopic bleeding. Cases and CFCCs had the same inclusion and exclusion criteria. Among 739 CFCCs, 502 (83.1%) showed complete covariates and valid genotypes and were analyzed in this study. The mean age at the time of sampling was 55.8 years.
A second group of controls consisted of 669 healthy blood donor volunteers (HBDV) collected from a blood donor center in Prague. All individuals were subjected to standard examinations to verify the health status for blood donation (detailed blood count, urinary examination, blood pressure, and general examination). The sample collection was performed at the same time as that of the other 2 study groups above. The mean age at the time of sampling was 49.2 years. All subjects were informed and provided written consent to participate in the study and to approve the use of their biological samples for genetic analyses, according to the Helsinki declaration. The design of the study was approved by the local ethics committee. Cases and controls were personally interviewed by trained personnel using a structured questionnaire to determine demographic characteristics and potential risk factors for CRC. Study subjects provided information on their lifestyle habits, body mass index, diabetes, and family/personal history of cancer. A portion of the cases and controls presented here were also analyzed in previous association studies.22, 23
Selection of Candidate Genes
Wood et al carried out a mutome study on 11 patients with CRC.20 In order to discriminate between passenger and driver mutations, each mutation was further verified in an independent series of 96 patients with CRC. Wood et al20 reported a list of 140 candidate genes (“CAN-genes” in Table 1) for driving carcinogenesis in the colorectum. The initial selection of the present study was based on this list.
The predicted miRNA binding sites were screened for the presence of SNPs by an extensive search in the SNP database (dbSNP; http://www.ncbi.nlm.nih.gov/SNP/). As a result, we found 61 SNPs within 31 genes. In order to have an appropriate statistical power, we excluded the SNPs having the minor allele frequency lower than 0.24 in Caucasians, and 37 SNPs were retained (Table 2).
Table 2. Selected Single-Nucleotide Polymorphisms (SNPs) with Minor Allele Frequency (MAF) > 0.25 and SNPs in MicroRNA Targets
SNPs in the 3′-UTR (Generic)
miRNA From the MicroRNAome21 Predicted to Bind the Target
For each SNP in miRNA targets, the ΔG was calculated (COFOLD software, Vienna Package).
As a third criterion of selection, we kept only those SNPs within target sites for miRNAs specifically expressed in CRC. These miRNAs were taken from a study on the microRNAome of the colorectum by Cummins et al21 (the list is shown in Table 1). Thus, only 12 SNPs were selected and verified in the case-control association study (Table 2).
Genomic DNA was isolated from peripheral blood lymphocytes, using standard procedures. The DNA samples from cases and controls were randomly placed on plates where an equal number of cases and controls could be run simultaneously. Genotyping of the 12 selected SNPs was carried out by using the KASPar chemistry (KBioscience, Hoddesdon, UK), which is a competitive allele-specific polymerase chain reaction (PCR) SNP genotyping system that uses fluorescence resonance energy transfer quencher cassette oligonucleotides. The reaction employed the KASP 2× Reaction Mix, KASPar primers and probes, water, and 5 ng of DNA for 10 μL of reaction and a standard PCR protocol available from KBioscience. Duplicate samples (5%), no-template controls in each plate, and Hardy-Weinberg equilibrium tests were used as quality control tests.
For the selected SNPs, the algorithm RNAcofold (http://rna.tbi.univie.ac.at/cgi-bin/RNAcofold.cgi) was run to assess the Gibbs binding free energy (ΔG, expressed in kilojoules per mole), both for the common and the variant alleles. The algorithm RNAcofold computes the hybridization energy and base-pairing pattern of 2 RNA sequences.29 The difference of the free energies between the 2 alleles was computed as “variation of ΔG” (ie, ΔΔG) (Table 2). Because the neighbor sequence of each SNP can be a target for different miRNAs, we calculated the sum of the absolute values of ΔΔGs for each SNP (ie, |ΔΔG|tot = Σ |ΔΔG|) (Table 2). The |ΔΔG|tot should be considered as a sort of “disturbance index” predicting the likelihood for a given SNP to affect the function of the 3′-UTR, and it allows a ranking of SNPs for their relevance, as illustrated in previous studies.30, 31
Vectors Employed and In Vitro Assays
We PCR-amplified both the common and variant 3′-UTR regions of the NUP210 and KIAA0182 genes. The amplification was done using 2 primers having a sequence of 6 bases to their 5′ ends, recognized by the restriction enzyme SacI → gagctc (forward primer) and XhoI → ctcgag (reverse primer). The PCR products were cloned in the pUC57 vectors. Successively, the plasmids were cleaved with SacI and XhoI, and the inserts were cloned downstream from a reporter vector containing the firefly luciferase (Photinus pyralis) and the Renilla luciferase (Renilla reniformis) genes (pmiR-GLO vector; Promega, Madison, Wis). Caco2, HCT116_p53 WT, and HCT116_p53−/− cell lines were plated at a density of approximately 2 × 105 cells per well in 6-well plates and incubated overnight at 5% CO2, 37°C in a humidified incubator. They were transiently transfected in 12 μL of PolyFect transfection reagent (2 mg/mL; Qiagen Spa, Italy) and 1.5 μg of luciferase/Renilla chimeric construct, according to the manufacturer's protocols. Each experimental point was repeated 6 times, and the experiment was repeated 3 times. Forty-eight hours after transfection, cells were lysed with 500 μL of 1× passive lysis buffer (dual-luciferase reporter assay kit; Promega, USA) after washing with phosphate-buffered saline. Cells were lysed for 15 minutes at room temperature, transferred to 1.5-mL microcentrifuge tubes, vortexed briefly, and centrifuged at 13,000 rpm for 30 seconds to pellet cell debris. Supernatants were transferred to clean tubes and used for the measure of activity of firefly and Renilla luciferases, using a dual-luciferase reporter assay kit and a luminometer (Berthold Technologies, Germany). The firefly luciferase (Luc) reporter was measured first by adding Luciferase Assay Reagent II (LARII). After the measurement of the firefly luminescence, this reaction was quenched, and the Renilla luciferase (Ren) reaction was simultaneously initiated by adding Stop & Glo Reagent (dual-luciferase reporter assay kit; Promega). The measurements of the luminescence of luciferase and Renilla of the nontransfected cells (background) were subtracted to the values obtained for the transfected cells with the pmiR-GLO vector containing the 3′-UTR. The luminescence of the Renilla luciferase was used as the reference value to calculate the value of firefly luciferase (Luc/Ren ratio of luminescence).
This ratio (Luc/Ren) was compared to the one obtained for the transfection with the pmiR-GLO vector without 3′-UTR (empty vector (EV)).
To verify whether the genotypes were in Hardy-Weinberg equilibrium in controls, we used the chi-square test (1 degree of freedom), with a type-I alpha error of 0.05. The multivariate logistic regression analysis (MLR) was used to test the association between genotypes and risk of CRC. The covariates included in the model were: sex, age, smoking habit (nonsmokers vs smokers and ex-smokers), body mass index, any positive familial history of CRC, education level (high, intermediate, and low), and living area (country, town neighborhood, and town). The individual SNPs were input in the MLR analysis; however, 3 and 2 SNPs were genotyped for ABCB11 and NUP210, respectively. Thus, we first reconstructed the individual haplotypes for these 2 genes with the software Fastphase,31 then we calculated the linkage disequilibrium (LD) between SNPs and found that they were strongly associated with each other (r2 > 0.85). Thus, we used the SNPs that showed the highest values of |ΔΔG|tot (ie, rs354476 for NUP210 and rs495714 for ABCB11) because the others were almost completely tagged by them. The association between SNPs and CRC risk was calculated, by estimating the odds ratio (OR) and its 95% confidence interval (CI), adjusted for both continuous and discontinuous covariates, as linear variables (the adjusted OR). For all genotypes, we performed the Cochran-Armitage trend test32 in order to detect the best genetic model (dominant, additive, recessive), and the one with the highest likelihood was input in the MLR analysis. The statistical threshold of significance was set at 0.05; however, the more restrictive Bonferroni correction was applied. In this case, because of the strong LD between SNPs, only 9 completely independent statistics were performed on the genotypes, and this value was used for the Bonferroni correction (P threshold = 5.56 × 10−3). For the in vitro assays, the ratio of the measurements of fluorescence (Luc/Ren), subtracted of the background, were compared among cell lines and between genotypes (for each gene) using the multifactor analysis of variance (MANOVA). All statistical tests were 2-tailed and were carried out using Statgraphics Centurion software (StatPoint Technologies, Warrenton, Va).
Study Group and Genotype Analysis
Table 1 shows the initial list of genes, whereas Table 2 shows the calculations of the selected SNPs for their |ΔΔG|tot. The definitive number of subjects for whom all the data were available accounted for 717 cases and 1171 controls. Among the controls, 502 were CFCCs and 669 were HBDVs. The quality control of genotypes was assured (>99% concordance) and all the SNPs were in Hardy-Weinberg equilibrium (data not shown). The SNPs within ABCB11 and NUP210 were analyzed for their LD, and the haplotypes are reported in Table 3. The SNPs within ABCB11 and within NUP210 show a strong LD to each other (r2 > 0.85), with the prevalence of 2 main haplotypes for each gene. Thus, for further analyses, as “tagging SNP” for each gene, we used the SNP showing the highest |ΔΔG|tot (Table 2).
Table 3. Haplotype Analyses of ABCB11 and NUP210 Loci
Frequency (All Samples)
SNP indicates single-nucleotide polymorphism.
vs SNP2: D′=1; r2=0.995
vs SNP3: D′=1; r2=0.949
vs SNP3 D′=1; r2=0.954
vs SNP2: D′=1; r2=0.86
The characteristics of the study population as well as the outcomes from MLR analyses are given in Table 4. When the CFCCs were used as reference group, the risk of CRC was associated in a statistically significant way with an increased age and a positive history of smoking habit: the cases were, on average, approximately 6 years older than CFCCs, whereas the never-smokers represented 51.2% of the cases and 58.8% of the controls. Two genotypes were also associated with CRC risk: the AA homozygotes for rs709805 (KIAA0182) showed an OR of 1.72 (95% CI = 1.06-2.78; P = 2.8 × 10−2), as compared to the GG+GA group, and the CC homozygotes for rs354476 (NUP210) had an OR of 1.36 (95% CI = 1.02-1.82; P = 4.5 × 10−3), compared with the TT+TC group. This latter SNP was also significantly associated with the risk of CRC after applying the Bonferroni correction. The fact that only the homozygotes were the genotypes at risk prevented us from finding a statistically significant difference between alleles, when the analyses were carried out on per-allele bases (rs709805, minor allele frequency = 0.29 and 0.26; rs354476, 0.48 and 0.45 among cases and controls, respectively). When the CFCCs were pooled with the HBDV control group, the statistically significant association between the carriers of the rare allele for the SNP rs709805 within KIAA0182 and the risk of CRC was confirmed. In this case, the adjusted OR for the rare homozygotes is 1.57 (95% CI = 1.05-2.33; P = 2.7 × 10−2). A negative familial history of CRC was also found to be associated with a reduced risk of CRC (OR = 0.56; 95% CI = 0.41-0.76), a trend (not statistically significant) observed also when only CFCCs were used.
Table 4. Characteristics of Colorectal Cancer Patients and Control Subjects and Adjusted Odds Ratio (OR) and 95% Confidence Interval (CI) Following Multivariate Logistic Regression Analysis
Cases n (%)
CFCCs n (%)
Adjusted OR (95% CI)
CFCC+HBDV n (%)
Adjusted OR (95% CI)
Statistically significant results are shown in bold. Only the best genetic model is given (recessive for rs354476 NUP210 and rs709805 KIAA00182).
BMI indicates body mass index; CFCC, cancer-free colonoscopy inspected controls; HBDV, healthy blood donor volunteers.
1.05 (1.04 -1.06)
≤45 years old
>45 years old
Smokers + Ex-smoker
BMI ≤ median (26.2)
BMI > median (26.2)
Positive familial history of CRC
G/G + G/A
Because 2 genotypes were associated with the risk of CRC, we carried out in vitro assays to investigate whether these SNPs could have some direct functional role in the regulation of expression (ie, translation) of KIAA0182 and NUP210. For KIAA0182, we assayed rs709805, whereas for NUP210, we tested rs354476. The pmiR-GLO carrying both luciferase reporter gene and Renilla reference gene was chimerized by placing either the common or the variant form of the KIAA0182 and NUP210 3′-UTRs at the 3′ end of the luciferase gene. Thus, the measurements of luminescence for luciferase were indicative of the intensity of its expression, depending on the 3′-UTR that was adopted. These measurements were compared to the internal reference (Renilla). Because the same vector carries both luciferase and Renilla genes, the experimental variability was greatly reduced compared with the typical experiments where 2 independent vectors are cotransfected. This allowed a more precise evaluation of the slight differences in the biological activity of the tested alleles. The background measurements of the luminescence of luciferase and Renilla were subtracted from the values obtained after transfection, and their ratio (luciferase/Renilla) was compared to the EV (vector not chimerized). Three cell lines derived from CRC were used: Caco2, HCT p53wt (wild-type for p53), and HCT p53−/− (lacking a functional p53). The results from 3 independent experiments (with 6 replicates for each point) are reported in Figure 1. Each luciferase/Renilla ratio is given as percent of the maximal intensity obtained with the EV (luciferase without 3′-UTR) within each experiment. In fact, at least in these series of experiments, the addition of a 3′-UTR at the luciferase gene led to a significant reduction of the luciferase expression. This could be due to a less stable messenger RNA, to the presence of negative regulators acting on the chimeric 3′-UTR (eg, miRNAs), or to some other unknown mechanisms. The C-to-T point mutation of rs354476 (NUP210) resulted in a reduction of the expression of luciferase in all 3 cell lines employed. The average luciferase expression with the common C allele of NUP210 was 53%, whereas it was 41.1% with the T allele, as compared to the EV. The MANOVA showed that the differences of the ratios between the C and the T alleles obtained by combining all the 3 cell lines are statistically significant (P = .0035). For KIAA0182, we observed a modest increase of the variant A allele (22.6%) compared with the common G allele (18.7% respect to the EV), but the ratios were not statistically significant (MANOVA, P = .378).
In this work, we combined the information provided by the CRC mutome20 with those from the microRNAome,21 and we extracted 12 SNPs that have a potential role in affecting the regulation of 9 candidate genes (namely ABCB11, ADARB2, KRAS, KIAA0182, IGSF22, CD109, NUP210, PKNOX1, and EYA4), thereby potentially affecting the individual risk of CRC. Actually, we found an increased risk for the rs709805 within the KIAA0182 gene in a population of cases and controls (either CFCCs or HBDVs+CFCCs) from the Czech Republic, the country with the highest incidence of CRC worldwide. A further association between rs354476 within NUP210 and risk of CRC was significant only when considering the CFCC group. In this study, 2 different control populations were chosen. The inclusion of colonoscopy-negative individuals as controls (CFCCs) ensured cancer-free control individuals, because the negative result of colonoscopy serves as best available proof of CRC absence. Because the selection of these controls may not necessarily represent the healthy general population, we decided to expand the group by also including healthy individuals recruited from blood donor centers (HBDVs). However, it should be stressed that the HBDVs differ from cases for several covariates. There is a meaningful share of people with a younger age and a higher educational level than cases and HBDVs were mainly from the Prague district, whereas cases were collected from throughout the country. Thus, there could be explanations why the association with the SNP rs354476 within NUP210 was not confirmed in the combined analysis (still maintaining the same trend). When cases were compared with the CFCCs, who share more similarities with CRC cases, the association between NUP210 and CRC was strong enough to survive to the Bonferroni correction. Thus, it could be speculated that NUP210 constitutes a risk factor for CRC when other preexisting predisposing conditions, such as inflammation or organ dysfunctions, are present.
In order to further interpret our results, we questioned whether the 9 candidate genes were actively transcribed in the normal colonic mucosa. In fact, the selection of these “CAN-genes” by Wood et al20 was based on the resequencing of exonic regions, regardless the actual expression of the genes in the colon. By browsing the database of the University of Tokyo, Japan (http://www.lsbm.org/site_e/database/index.html), we found that only KRAS, KIAA0182, and NUP210 are significantly expressed in the normal colonic mucosa. It is remarkable to note that 2 of 3 SNPs that compiled all the criteria related with the mutome, microRNAome, and transcriptome were associated with the risk of CRC. When the predicted biological effect (ie, the |ΔΔG|tot of Table 2) was examined, the rs709805 within KIAA0182 ranks as fourth, whereas the rs354476 (NUP210) ranks as eighth, but they ascended in rank to the second and third place, respectively, after filtering for the transcriptome. In vitro assays carried out to test the differences between the common and variant 3′-UTRs of NUP210 and KIAA0182 showed that the T allele of rs354476 (NUP210) was associated with a reduced expression of the reporter gene. This preliminary evidence does not prove any role of the predicted miRNAs in a differential regulation of NUP210. However, this finding highlights the fact that the alternative 3′-UTRs, placed into a “normal” cellular context and exploiting the “normal” cellular machineries, have different capacities in determining the levels of expression of NUP210. We employed colorectal cell lines, and it is conceivable that these cell lines express a set of miRNAs similar to that of normal colorectal cells. However, we cannot say whether miRNAs, messenger RNA stability, or other mechanisms are involved in the observed genotype-dependent differential expression of NUP210. No clear evidence came from the assay carried out on the KIAA0182 3′-UTR. KIAA0182 maps at 16q24.1 and, as suggested by its designation, it belongs to a family of more than 2000 genes. It encodes a putative genetic suppressor element 1 protein and it might exhibit RNA-binding activity.33 However, there is little information about this gene, whereas more is known about NUP210. This gene maps at 3p25.1 and encodes the nuclear pore glycoprotein-210 (gp210) involved in the structural organization of the nuclear pore complex.34 Interestingly, during mitosis, Ser1880 of gp210 is phosphorylated by the cyclin B-p34cdc235 and an increased expression of NUP210 was found in cervical cancer.36 According to The Roche Cancer Genome Database (http://rcgdb.bioinf.uni-sb.de/MutomeWeb/), somatic mutations within KIAA0182 and NUP210 were already described not only for CRC (KIAA0182 = c.366_dupC and c.1879_C>T encoding for p.R627W; NUP210 = c.2951G>A encoding for R984H; c.923 C>T encoding for S308L; IVS6-3C>T) but also for malignant melanoma (KIAA0182 = c.2172C>T; NUP210 = c.3447C>T). The presence of somatic mutations in the same genes in different types of cancer reinforces the hypothesis that these genes play an important role in human carcinogenesis. NUP210 and KIA0182 were also proposed among the biomarkers for human CRC (eg, see www.wipo.int/patentscope/search/en/WO2006081248). According to our results, it is likely that functional SNPs could modulate the normal levels of these proteins. Therefore, knowledge of the effects of the SNPs is also very important to appropriately set the correct thresholds to distinguish normal concentrations from pathological ones.
In conclusion, this hypothesis-driven study carried out using all the latest “-omics” information available from the literature suggests for the first time that the regulation of NUP210 and KIAA0182 may be important for modulating the risk of CRC. Future work is warranted to validate the results in other populations and to explore further the biological significance of the mentioned SNPs.
This work was supported by AIRC (Associazione Italiana Ricerca Cancro, investigator grant year 2008), by the MIUR (Italian Ministry of Research, PRIN) and the University of Pisa (Ex60%) and was supported by the Grant Agency of the Czech Republic: CZ:GACR:GA P304/10/1286, CZ:GACR:GA 310/07/1430, and CZ:GACR:GA 305/09/P194.