Identification of polymorphisms in ultraconserved elements associated with clinical outcomes in locally advanced colorectal adenocarcinoma


  • Presented as a poster discussion by Cathy Eng at the Annual Meeting of the American Society of Clinical Oncology; Chicago, IL; June 3-7, 2011.

  • The first 2 authors contributed equally to this article, with final editing by the second author.



Ultraconserved elements (UCEs) are noncoding genomic sequences that completely identical among human, mouse, and rat species and harbor critical biologic functions. The authors hypothesized that single nucleotide polymorphisms (SNPs) within UCEs are associated with clinical outcomes in patients with colorectal cancer (CRC).


Forty-eight SNPs within UCEs were genotyped in 662 patients with stage I through III CRC. The associations between genotypes and recurrence and survival were analyzed in patients with stage II or III CRC who received fluoropyrimidine-based adjuvant chemotherapy using a training and validation design. The training set included 115 patients with stage II disease and 170 patients with stage III disease, and the validation set included 88 patients with stage II disease and 112 patients with stage III disease.


Eight SNPs were associated with clinical outcomes stratified by disease stage. In particular, for patients with stage II CRC who had at least 1 variant allele of reference SNP sequence 7849 (rs7849), a consistent association with increased recurrence risk was observed in the training set (hazard ratio [HR], 2.39; 95% confidence interval [CI], 1.04-5.52), in the replication set (HR, 3.70; 95% CI, 1.42-9.64), and in a meta-analysis (HR, 2.89; 95% CI, 1.54-5.41). Several other SNPs were significant in the training set but not in the validation set. These included rs2421099, rs16983007, and rs10211390 for recurrence and rs6590611 for survival in patients with stage II disease; and SNPs rs6124509 and rs11195893 for recurrence in patients with stage III disease. In addition, a significant cumulative effect was observed of multiple risk genotypes and potential gene-gene interactions on recurrence risk.


To the authors' knowledge, this is the first study to evaluate the association between SNPs within UCEs and clinical outcome in patients with CRC. The results suggested that SNPs within UCEs may be valuable prognostic biomarkers for patients with locally advanced CRC who receive 5-fluorouracil–based chemotherapy. Cancer 2012. © 2012 American Cancer Society.


Surgery is the primary treatment modality with curative intent for patients with localized colorectal cancer (CRC). However, approximately 50% of patients will develop recurrent or metastatic disease after undergoing radical resection.1 Administration of 5-fluorouracil (FU) may be considered in patients who have high-risk, stage II and III disease according to the American Joint Committee on Cancer (AJCC) AJCC Cancer Staging Manual (version 6.0). Unfortunately, 40% to 50% of patients will not experience beneficial effects and will suffer from treatment-related toxicities.2 Recent studies have demonstrated that single nucleotide polymorphisms (SNPs) can provide information for personalized chemotherapy.3 It has been estimated that there are perhaps 50,000 to 250,000 SNPs that confer a biologic effect, most of which are distributed in and around the 30,000 genes.4 Therefore, we believed it would be advantageous to evaluate SNPs that are more likely to be functional and to have a bearing in CRC recurrence.

Ultraconserved elements (UCEs) are 200-base pair (bp) to 779-bp, absolutely conserved, noncoding sequences that have 100% sequence identity among orthologous genomic regions of the human, mouse, and rat species.5 Recent studies suggest that UCEs frequently are located at fragile sites and genomic regions involved in cancers6 and have important functions in vertebrate genomes, such as serving as long-range enhancers of flanking genes7 and regulating splicing,8 epigenetic modifications,9 and transcriptional coactivation.10 Alleles derived from SNPs within conserved regions are rarer than new alleles in nonconserved regions (P = 3 × 10−18).11 The variants in these regions have been subjected to extreme evolutionary pressure and are conserved in humans over long evolutionary periods, suggesting that the few common SNPs within UCEs may harbor critical biologic functions.12 Thus, these SNPs may be excellent tools for studying cancer risk, treatment efficacy, and patient prognosis. However, to our knowledge, only 1 study has been published that analyzed the association between genetic polymorphisms within UCEs and cancer risk, revealing the potential impact of 6 SNPs on familial breast cancer risk.13 However, another study demonstrated that UCEs have distinct expression signatures in CRC, and inhibiting overexpressed UCEs induced apoptosis,6 suggesting potential links between UCEs and prognosis and treatment response in patients with CRC. Therefore, we hypothesize that SNPs within UCEs modulate the clinical outcomes of patients with CRC. To test this hypothesis, we selected 48 potentially functional SNPs within UCEs and systematically evaluated their individual and joint associations with the clinical outcomes of patients with locally advanced CRC patients who received adjuvant fluoropyrimidine-based chemotherapy.


Study Population and Epidemiologic Data

Six hundred sixty-two patients with histologically confirmed colorectal adenocarcinoma were enrolled at The University of Texas M. D. Anderson Cancer Center (MD Anderson) between March 1995 and May 2008. All patients were diagnosed with stage I through III disease according to the AJCC TNM classification (version 6.0), and they all underwent radical surgery. There were no recruitment restrictions on age, sex, ethnicity, or cancer stage. Of all 662 patients, 435 patients had their disease diagnosed within 1 year before recruitment, and these patients were analyzed as the training set in this study. The remaining 227 patients had a longer history (>1 year) of CRC before their referral to MD Anderson and were used as the validation set. Each patient signed an informed consent form and donated a 10-mL to 20-mL peripheral blood sample for the isolation of DNA.

Epidemiologic data were collected using a structured questionnaire, including questions about demographic characteristics, smoking history, alcohol consumption, medical history, and family history of cancer. Clinical and follow-up data, including date of diagnosis, performance status, clinical stage, tumor location, histologic grade, primary surgery, pathologic stage, chemotherapy, chemoradiation, radiation, and tumor recurrence/progression, were abstracted from the patients' medical records. Information on vital status was obtained from the medical records and from the Social Security Death Index. The study was approved by Institutional Review Board of MD Anderson.

Single Nucleotide Polymorphism Selection and Genotyping

Bejerano et al discovered 481 UCEs using a bioinformatic comparison of the mouse, rat, and human genomes.14 All ultraconserved sequences that were used in the current study are available in their report (available at:∼jill/ultra.html [accessed December 10, 2010])14 and from the UCbase and miRfunc database (available at: [accessed December 10, 2010]).15 For each UCE, we first selected haplotype-tagging SNPs based on data from the International HapMap Project (available at: [accessed January 2, 2012]), and we obtained a list of 141 SNPs. After filtering the SNPs with the LD Select program (available at: [accessed January 2, 2010]) and the University of California, San Francisco Golden Path Gene Sorter program (available at: [accessed January 2, 2012]), we retained 54 SNPs on the basis of a linkage disequilibrium r2 threshold of 0.8 and a minor allele frequency >0.05 in Caucasians. The 54 SNPs were then submitted to Illumina (San Diego, Calif) technical support, and those with low Illumina quality design scores (<0.6) were excluded. Table 1 lists the 48 SNPs that we selected for genotyping in this study.

Table 1. Polymorphisms in Ultraconserved Elements Selected for this Study
Gene SymbolGene NameSNP IDChromosomeUCE ElementPositionMajor/ Minor AlleleMAF, %
  1. Abbreviations: A, adenine; C, cytosine; FR, flanking region; G, guanine; MAF, major allelic frequency; rs, reference single nucleotide polymorphism sequence; SNP ID, single nucleotide polymorphism identifier; T, thymidine; UCE, ultraconserved element; UTR, untranslated region.

HS2ST1Heparan sulfate 2-O-sulfontransferase 1rs107473351uc.29IntronC/A0.23
PKN2Protein kinase N2rs104938071uc.315′-FRG/A0.08
C1orf110Chromosome 1 open reading frame 110rs44125721uc.385′-FR  
RGS4Regulator of G-protein signaling 4rs115806791uc.395′-FRA/G0.38
RGS4 rs168472921uc.395′-FRA/G0.07
NUF2NDC80 kinetochore complex component, homolog (S. cerevisiae)rs109178041uc.403′-FRA/G0.08
C1orf75Chromosome 1 open reading frame 75rs75336891uc.415′-FRG/A0.19
LOC730134Protein-coding geners102113902uc.543′-FRC/G0.36
BCL11AB-cell chronic lymphocytic protein 11A (zinc finger protein)rs124731132uc.573′-FRG/A0.09
BCL11A rs97841002uc.603′-FRG/C0.39
FLJ16124Hypothetical proteinrs29549632uc.653′-FRG/A0.27
SFXN5Sideroflexin 5rs24210992uc.66IntronA/T0.14
LOC730124Protein-coding geners7862552uc.7935′-FR??A/G0.39
LOC728773Similar to poly(A) binding proteinrs13996852uc.813′-FRT/A0.06
LOC728304Hypothetical proteinrs126198422uc.925′-FRC/G0.17
PDK1Pyruvate dehydrogenase kinase, isozyme 1rs67101292uc.99IntronA/G0.20
ZBTB20Zinc finger and BTB domain containing 20rs168229253uc.119IntronC/A0.14
  rs98381683uc.126 G/A0.44
RSRC1Arginine/serine-rich coiled-coil 1rs117133633uc.131IntronA/G0.26
C5orf36Chromosome 5 open reading frame 36rs131549725uc.1723′-FRA/G0.17
EBF1Early B-cell factor 1rs49214455uc.175IntronA/G0.13
C7orf30Chromosome 7 open reading frame 30rs1996577uc.2085′-FRG/C0.40
LOC442660Ribosomal protein L7a pseudogene 38rs7742657uc.213IntronG/A0.15
SHFM1Split hand/foot malformation (ectrodactyly) type 1rs69539837uc.2205′-FRG/A0.36
EBF2Early B-cell factor 2rs99428388uc.235IntronG/A0.50
LOC347119C15orf2-like pseudogeners70331009uc.2643′-FRG/A0.32
LHX6LIM homeobox 6rs14677379uc.278IntronA/G0.47
ZNF503Zinc finger protein 503rs1278230810uc.2875′-FRG/C0.14
PKD2L1Polycystic kidney disease 2-like 1rs230538610uc.294IntronG/A0.05
SCDStearoyl-coenzyme A desaturase (delta-9 desaturase)rs784910uc.2983′ UTRA/G0.17
TECTBTectorin betars1119589310uc.3103′-FRG/A0.10
RAB11FIP2RAB11 family interacting protein (class I)rs1221893510uc.3113′-FRA/G0.46
MGMTO-6-methylguanine-DNA methyltransferasers171166210uc.318IntronG/A0.28
DLG2Disks large homolog 2rs381598811uc.331IntronA/G0.35
LOC647277Hypothetical proteinrs139535113uc.3513′-FRA/G0.29
ARHGEF7Rho guanine nucleotide exchange factor (GEF) 7rs956001013uc.357IntronG/A0.17
AKAP6A kinase (PRKA) anchor protein 6rs800704214uc.3675′-FRC/A0.12
AKAP6 rs195621114uc.369IntronA/G0.22
MAP2K5Mitogen-activated protein kinase kinase 5rs803788715uc.391IntronA/C0.15
IKZF3IKAROS family zinc finger 3 (Aiolos)rs1331356117uc.410IntronG/C0.05
TCF4Transcription factor 4rs1245588118uc.436IntronG/A0.16
ZNF407Zinc finger protein 407rs424328918uc.437IntronG/C0.44
RBL1Retinoblastoma-like 1 (p107)rs612450920uc.4553′-FRA/G0.14
FAM48B1Family with sequence similarity 48, member B1rs16983007Xuc.4655′-FRG/A0.08
LOC72918840S ribosomal protein S26 pseudogeners1029496Xuc.4663′-FRA/G0.14
PDK3Pyruvate dehydrogenase kinase, isozyme 3rs10482283Xuc.4695′-FRG/A0.19
GRIA3Glutamate receptor, ionotropic, alpha 3rs1293524Xuc.4815′-FRA/T0.11

DNA was isolated from the peripheral blood samples using a QIAampDNA extraction kit (Qiagen, Valencia, Calif). SNP genotyping was conducted using an Illumina VeraCode GoldenGate Assay kit. The BeadXpress Reader was used for microbead code identification and fluorescent signal detection. Genotype clustering and calling were performed using Illumina GenomeStudio software. Ten duplicate DNA samples exhibited 100% concordance. The mean call rate for the SNP array was 99.9%. One SNP, reference SNP sequence 4412572 (rs4412572) in chromosome 1 open reading frame 10 (C1orf110), failed in all samples because of low signals and, thus, was discarded from further analysis.

Statistical Analysis

Chi-square tests were used to assess differences in the distributions of categorical variables, and Student t tests were used to evaluate continuous variables. A Cox proportional hazards model was used to estimate hazard ratios (HRs) and their 95% confidence intervals (CIs) for multivariate survival analyses, adjusting for age, sex, ethnicity, smoking status, and histologic grade. For each SNP, we tested 3 different genetic models: specifically, a dominant model, a recessive model, and an additive model. The model with the most significant P value was considered the best-fitting model. Fixed-effects and random-effects meta-analyses were used to calculate the pooled HRs. The Cochrane Q statistics test was used to assess heterogeneity between different data sets. When the Q test was significant, a random-effects model was used to accommodate the diversity in the magnitude of treatment effects. Otherwise, the pooled HR was estimated using the random-effects model. The associations between genotype and survival time were plotted using the Kaplan-Meier method and were analyzed using the log-rank test. We also evaluated the combined effects of the SNPs according to the number of genotypes identified from the main-effects analysis of single SNPs. Higher order gene-gene interactions were evaluated using survival-tree analysis, as implemented in the STREE program (available at: [accessed February 14, 2012]), which uses recursive partitioning to identify subgroups of individuals with similar risk. All statistical analyses were performed using STATA software (version 10; STATA Corporation, College Station, Tex). All P values were 2-sided, and a P value < .05 was considered statistically significant.


Patient Characteristics

The demographic and clinical characteristics of patients are presented in Table 2. Of the 435 patients in the training set, 352 (80.9%) were Caucasian, and 263 (60.5%) were men. There were 61 patients with stage I disease, 171 patients with stage II disease, and 203 patients with stage III disease. All patients underwent primary surgery with curative intent. Of these 435 patients, we obtained genotype data from a total of 285 patients (66%) who had received fluoropyrimidine-based chemotherapy, including 115 patients with stage II disease and 170 patients with stage III disease. During the median follow-up of 45.1 months, there were 65 deaths and 93 recurrences. Sex, race, smoking pack-years, tumor location, and histologic grade were not associated significantly with clinical outcomes. AJCC stage was associated significantly with both recurrence (P = .005) and survival (P = .02), and age was correlated with survival (P = .01).

Table 2. Demographic and Clinical Variables for Patients With Colorectal Cancer
 Training Set, N = 435Replication Set, N = 227
VariableRecurrenceNo RecurrencePDeadAlivePRecurrenceNo RecurrencePDeadAliveP
  • Abbreviations: SD, standard deviation.

  • a

    Significant P value.

Age: Mean±SD, y59.2±13.758.4±12.8.662.3±13.357.9±12.8.01a57.06±12.3053.00±16.28.2356.77±12.2656.86±13.00.95
Smoking: Mean±SD, pack-years37.4±36.930.8±39.3.3337.2±36.431.4±39.2.4727.30±20.7422.70±15.12.6325.30±21.6830.20±21.05.29
 Men57 (61.3)206 (60.2) 41 (63.1)222 (60) 124 (58.8)7 (46.7) 72 (58.1)60(58.3) 
 Women36 (38.7)136 (39.8).8524 (36.9)148 (40).6487 (41.2)8 (53.3).3652 (41.9)43 (41.7).98
 Caucasian73 (78.5)279 (81.8) 51 (78.5)301 (81.6) 179 (84.8)12 (80) 105 (84.7)87 (84.5) 
 African American9 (9.7)33 (9.7) 7 (10.8)35 (9.5) 11 (5.2)1 (6.7) 7 (5.6)5 (4.9) 
 Other11 (11.8)29 (8.5).617 (10.8)33 (8.9).8421 (10)2 (13.3).8812 (9.7)11 (10.7).94
Tumor location            
 Proximal21 (23.3)100 (29.4) 14 (22.6)107 (29.1) 64 (32)3 (20) 40 (35.1)27 (26.5) 
 Distal20 (22.2)80 (23.5) 17 (27.4)83 (22.6) 70 (35)8 (53.3) 40 (35.1)39 (38.2) 
 Rectal49 (54.4)160 (47.1).4131 (50)178 (48.4).5166 (33)4 (26.7).3534 (29.8)36 (35.3).38
 I4 (4.3)57 (16.7) 2 (3.1)59 (15.9) 27(12.8)0 (0) 15 (12.1)12 (11.7) 
 II36 (38.7)135 (39.5) 26 (40)145 (39.2) 78 (37)9 (60) 43 (34.7)45 (43.7) 
 III53 (57)150 (43.9).005a37 (56.9)166 (44.9).02a106 (50.2)6 (40).1366 (53.2)46 (44.7).36
Histology grade            
 Well differentiated4 (4.4)11 (3.3) 3 (4.8)12 (3.3) 8 (3.9)0 (0) 5 (4.2)3 (3) 
 Moderate differentiated73 (80.2)283 (84.2) 49 (77.8)307 (84.3) 159 (77.9)11 (78.6) 88 (74.6)83 (82.2) 
 Poorly differentiated14 (15.4)42 (12.5).6511 (17.5)45 (12.4).4337 (18.1)3 (21.4).7325 (21.2)15 (14.9).40

Among the 227 patients in the validation set, 200 patients (88%) had received fluoropyrimidine-based chemotherapy, including 88 patients with stage II disease and 112 patients with stage III disease. Overall, there were 124 deaths and 211 recurrences during the median follow-up of 57.5 months. These patients were diagnosed outside of MD Anderson at least 1 year before they presenting to MD Anderson for treatment because of potential tumor recurrence or progression; therefore, their recurrence and death rates were higher than those for newly diagnosed patients (Table 2).

Individual Single Nucleotide Polymorphisms and Clinical Outcomes

We assessed the association of each individual SNP with disease recurrence and death using a multivariate Cox model, adjusting for age, sex, ethnicity, smoking status, and histologic grade. We identified eight genetic loci that were associated with the clinical outcomes of patients who received fluoropyrimidine-based chemotherapy stratified by stage (Table 3). Next, we evaluated the associations between genotype and clinical outcomes after fluoropyrimidine-based chemotherapy in patients with stage II and III disease. We did not analyze the patients with stage I disease, who typically undergo surgery alone and who have an excellent prognosis, and there were very few recurrence or death events in our training set. Among the patients with stage II disease who received fluoropyrimidine-based chemotherapy (N = 115), those with the homozygous variant and heterozygous genotypes of rs7849 had an increased risk of recurrence (HR, 2.39; 95% CI, 1.04-5.52; P = .04) and a decrease in median recurrence-free survival (log-rank P = .03) compared with those who had the wild-type genotype. This association was confirmed in the replication set (HR, 3.70; 95% CI, 1.42-9.64; P = .007) and in the meta-analysis (HR, 2.89; 95% CI, 1.54-5.41; P = .001). For other SNPs that were significant in the training set, patients who carried a homozygous variant genotype of rs10211390 had a significantly increased risk of recurrence (HR, 2.79; 95% CI, 1.16-6.71; P = .02) and shorter median recurrence-free survival (log-rank P = .03) compared with those who had the wild-type and heterozygous genotypes. A significant increase in the risk of recurrence also was observed for patients who had the homozygous variant and the heterozygous genotypes of rs2421099 (HR, 2.44; 95% CI, 1.08-5.51; P = .03) and rs16983007 (HR, 2.81; 95% CI, 1.02-7.70; P = .04). Moreover, patients who carried at least 1 variant allele of rs16983007 had a significantly shorter recurrence-free survival than those who had the wild-type genotype (log-rank P = .03). The variant alleles for rs6590611 were associated with an increased risk of dying in a dose-dependent manner (per-allele HR, 2.92; 95% CI, 1.22-7.02).

Table 3. Single Nucleotide Polymorphisms Associated With Clinical Outcome in Patients Receiving Fluoropyrimidine-Based Adjuvant Chemotherapy
  Training SetReplication SetPooled Analysis 
SNPModelHR (95%CI)PLog-Rank PGenotype Distribution: MM/MV/VVHR (95%CI)PaGenotype Distribution: MM/MV/VVHR (95%CI)PaGenotype Distribution: MM/MV/VVCochran Q Test Pb
  • Abbreviations: CI, confidence interval; HR, hazard ratio; M, major allele; rs, reference single nucleotide polymorphism sequence; SNP, single nucleotide polymorphism; V, variant allele.

  • a

    Adjusted by age, sex, ethnicity, smoking status, and histologic grade.

  • b

    The Cochran Q statistic is used to test for heterogeneity between studies.

  • cSignificant P value.

 Stage II            
  rs10211390Recessive2.79 (1.16-6.71).02c.03c39/52/241.93 (0.73-5.13).1826/18/82.37 (1.23-4.54).01c65/70/32.58
  rs2421099Dominant2.44 (1.08-5.51).03c0.1686/25/40.50 (0.24-1.08).0838/14/01.04 (0.60-1.80).90124/39/4.005
  rs7849Dominant2.39 (1.04-5.52).04c0.03c80/28/73.70 (1.42-9.64).007c36/11/52.89 (1.54-5.41).001c116/39/12.50
  rs16983007Dominant2.81 (1.02-7.70).04c0.03c102/3/101.23 (0.28-5.34).7844/6/22.16 (0.94-4.97).07146/9/12.36
 Stage III            
  rs10211390Recessive2.70 (1.12-6.50).03c0.007c63/84/221.60 (0.77-3.35).2145/47/141.98 (1.13-3.49).02c108/131/36.37
  rs6124509Dominant0.38 (016-0.91).03c0.04c118/47/41.00 (0.59-1.70)1.0071/32/30.77 (0.49-1.21).26189/79/7.06
  rs11195893Dominant0.26 (0.08-0.91).04c0.61139/30/10.83 (0.42-1.61).5890/15/10.63 (0.35-1.14).13229/45/2.10
 Stage II            
  rs6590611Additive2.92 (1.22-7.02).02c0.0554/52/91.46 (0.63-3.35).3828/22/22.03 (1.11-3.72).0282/74/11.26
 Stage III            
  rs9942838Recessive3.23 (1.17-8.92).02c0.1744/88/381.27 (0.57-2.83).5527/47/321.82 (0.97-3.41).0671/135/70.16

For patients with stage III disease who received fluoropyrimidine-based chemotherapy (N = 170), a significantly decreased risk of recurrence was observed for those with the homozygous variant and heterozygous genotypes of rs6124509 (HR, 0.38; 95% CI, 0.16-0.91; P = .03) and rs11195893 (HR, 0.26; 95% CI, 0.08-0.91; P = .04), whereas the homozygous variant genotype for rs10211390 was associated with an increased risk of recurrence (HR, 2.70; 95% CI, 1.12-6.50; P = .03). In addition, patients who carried the homozygous variant genotype of rs10211390 had a shorter recurrence-free interval than those who carried the wild-type and heterozygous genotype (log-rank P = .007). Patients who carried the homozygous variant genotype of rs9942838 also were at an increased risk of death (HR, 3.23; 95% CI, 1.17-8.92; P = .02). These SNPs in patients who had stage III disease were not validated in the replication set.

Cumulative Effects of Unfavorable Genotypes on Clinical Outcome

We defined those genotypes that were associated with increased risks of disease recurrence or death as unfavorable genotypes. Next, we asked whether combining the unfavorable genotypes would have an additive effect on the clinical outcomes of patients who had received fluoropyrimidine-based chemotherapy. We performed a joint-effect analysis using 4 SNPs that were associated significantly with recurrence risk in patients with stage II disease. There was a significant dose-response trend toward an increased risk of CRC recurrence with increasing number of unfavorable genotypes. Compared with the low-risk group (0-1 unfavorable genotypes), the medium-risk group (2 unfavorable genotypes) and the high-risk group (3-4 unfavorable genotypes) had a 4.36 times (95% CI, 1.66-11.47) and 9.67 times (95% CI, 2.99-31.25) higher risk of recurrence, respectively (P for trend = 1.78 × 10−5). The median recurrence-free survival was >143.3 months, 30.2 months, and 16.8 months for patients in the low-risk, medium-risk, and high-risk groups, respectively (log-rank P = 8.14 × 10−6) (Fig. 1A).

Figure 1.

These Kaplan-Meier curves illustrate recurrence-free survival according to the number of unfavorable genotypes (UFG) for patients with (A) stage II disease or (B)) stage III disease. MST indicates median recurrence-free survival time.

We also evaluated the combined effects of the 3 SNPs that were associated significantly with disease recurrence in patients with stage III disease. Compared with the reference group (those with 0-1 unfavorable genotypes), the HRs for individuals with 2 and 3 unfavorable genotypes were 3.21 (95% CI, 1.34-7.74) and 6.98 (95% CI, 2.12-22.97), respectively (Ptrend = .001). Cumulative effect analysis also revealed a significant dose-dependent effect on median recurrence-free survival (log-rank P = .001) (Fig. 1B).

Single Nucleotide Polymorphism-Single Nucleotide Polymorphism Interactions and Clinical Outcomes

We next used survival tree analysis to further evaluate the potential interactions among the SNPs that were associated significantly with recurrence in patients who had stage II and III disease (Fig. 2A). For those with stage II disease, the tree structure resulted in 4 terminal nodes, ranging from low to high recurrence risk. The initial split was rs2421099, suggesting its value as a prognostic marker for patients who received adjuvant chemotherapy. When using terminal lymph node A as the reference group (wild-type genotypes of rs2421099 and rs16983007), the HR was 1.46 (95% CI, 0.46-4.68) for terminal lymph node B (heterozygous and homozygous variant genotypes of rs2421099 and wild-type genotypes of rs7849), 3.18 (95% CI, 0.90-11.19) for terminal lymph node C (wild-type genotype of rs2421099 and heterozygous and homozygous variant genotypes of rs16983007), and 6.27 (95% CI, 2.25-17.41) for terminal lymph node D (heterozygous and homozygous variant genotypes of rs2421099 and rs7849) (Ptrend = .0004). The increase in recurrence risk resulted in a decrease in median recurrence-free survival for subgroups corresponding to terminal nodes A, B, C, and D (log-rank P = 6.21 × 10−5) (Fig. 2B).

Figure 2.

Potential single nucleotide polymorphism (SNP)-SNP interactions are illustrated. (A) This tree structure identifies subgroups of patients with different genetic backgrounds. No rec indicates no recurrence; Rec, recurrence; rs, reference SNP sequence; W, wild type; M, major allele. (B,C) These Kaplan-Meier curves illustrating recurrence-free survival were based on survival tree analysis in patients with (B) stage II disease or (C) stage III disease. MST indicates median recurrence-free survival time.

We performed a similar analysis for patients with stage III disease. The analysis resulted in 3 terminal nodes, with rs10221390 as the initial split. When using terminal lymph node 1 (those with the wild-type and heterozygous genotypes of rs10211390 and the heterozygous and homozygous variant genotypes of rs6124509) as the reference group, the HRs for terminal lymph node 2 (those with the wild-type and heterozygous genotypes of rs10211390 and the wild-type genotype of rs6124509) and terminal lymph node 3 (those with the homozygous variant genotypes of rs10211390) were 3.75 (95% CI, 1.23-11.50) and 7.96 (95% CI, 2.07-30.65), respectively (Ptrend = .001). The corresponding decreased median recurrence-free survival was highly significant (log-rank P = .003) (Fig. 2C).


We have completed a comprehensive study to identify polymorphisms within UCEs that influence the clinical outcomes of patients with locally advanced CRC who received adjuvant fluoropyrimidine-based chemotherapy. We identified 8 genetic loci that are most likely to have an impact on the sensitivity to fluoropyrimidine agents. These SNPs can be used as prognostic biomarkers to assist in stratifying patients for fluoropyrimidine-based chemotherapy. For patients with stage II disease, rs7849 consistently was associated with disease recurrence in the training set, validation set, and meta-analysis. Moreover, we demonstrated that the genotype-drug interaction was much more pronounced when multiple gene variants were considered in combination.

Currently, chemotherapy is considered for patients with AJCC high-risk stage II and stage III disease. Thus, it is very important to define individual risk to determine who may or may not benefit from adjuvant chemotherapy. In this study, rs10211390 allowed us to identify the patients with stage II and III disease who had an increased risk of recurrence after fluoropyrimidine-based chemotherapy. The SNP rs10211390 affects the nonexonic element uc.54. It has been demonstrated that this category of nonexonic elements acts as a long-range enhancer to control flanking gene expression.16, 17 Such long-range enhancers can act at distances >2630 kilo bases (kb) from their target genes.16 An in vivo analysis confirmed that 45% of human conserved noncoding sequences, including uc.54, function as tissue-specific enhancers of gene expression.7, 18 The nearest gene downstream of rs10211390 is Fanconi anemia, complementation group L (FANCL), 1 of 13 known Fanconi anemia genes that are 705 kb from rs10211390. FANCL recently was identified as the putative catalytic E3 ubiquitin ligase subunit of the Fanconi anemia core complex, which monoubiquitinates FANCD2 to allow proper repair of exogenous DNA damage.19, 20 Moreover, cross-links between the Fanconi anemia core complex and the breast cancer susceptibility gene BRCA2 appear to be involved with multiple DNA repair mechanisms.21 Thus, the FANC protein network has an important role in promoting chromosomal instability and tumor development and determining the sensitivity of cancer cells to chemotherapy.22, 23 Recently, Zhang et al reported that a splice variant of FANCL resulted in decreased FANCL expression, which provided lung cancer cells with a growth advantage.22 Nevertheless, the biologic mechanisms that underlie the associations of rs10211390 with cancer and the function of uc.54 remain unclear and need further research.

The receipt of adjuvant chemotherapy by all patients with stage II disease remains controversial. Our results suggest that individual outcomes after fluoropyrimidine treatment can be determined based on the genotypes of rs7849 and rs6590611. Notably, rs7849 was associated consistently with an increased recurrence risk in both the validation set and the combined set. The rs7849 SNP is located in uc.298, which is 1 of 12 paralogous UCE sets. The finding that paralogous sets have changed minimally in the past 300 million years suggests that they have crucial functions. The nearest gene upstream of rs7849 is stearoyl-CoA desaturase 1, a critical mediator of fatty acid synthesis. It has been demonstrated that the rare allele of rs7849 has an effect on body mass index, waist circumference, and insulin sensitivity, suggesting its potential physiologic significance. Recently, Luyimbaz et al linked this cell fat metabolism gene to the mammalian target of rapamycin (mTOR) oncogenic cell signaling pathway.24 The mTOR pathway functions through its effectors to mediate protein synthesis and cell cycle progression, and it is involved in multiple anticancer drug resistance. The SNP rs6590611 affects uc.334, which was included in the neurotrimin (HNT) intron, a cell adhesion molecule family member. The intronic polymorphisms of HNT were identified as possible susceptibility loci for immunoglobulin A (IgA) nephritis and Alzheimer disease.25, 26 However, to our knowledge, no study has reported the genetic effects of HNT polymorphisms on CRC treatment response. It is noteworthy that HNT expression has been associated with disease recurrence in patients with stage I and II disease after surgery.27 Our results further suggest that these patients may be good candidates for chemotherapy but may not benefit from the fluoropyrimidine regimen.

Although a pooled analysis of fluoropyrimidine-based adjuvant therapy trials demonstrated a beneficial treatment effect in patients with stage III disease,.28 we observed that patients who had minor alleles of rs9942838 had poorer survival. The rs9942838 genotype is located in the intron of early B-cell factor 2 (EBF2). The EBF family is a group of DNA-binding transcription factors with a basic helix-loop-helix domain.29 Several studies have indicated that EBF inactivation because of genomic deletion, epigenetic silencing, or somatic point mutations exists in several types of cancer, including leukemia, glioblastoma, and pancreatic cancer, supporting the emerging roles of the EBF family in tumor suppression.30-32 Another study demonstrated that silencing EBF2 led to a reduced resistance to apoptosis in chemotherapy-naïve, tumor-derived cell populations from patients diagnosed with sporadic osteosarcoma.33 However, to our knowledge, EBF2 has not been investigated within the context of a fluoropyrimidine regimen.

To enhance the identification of patients with CRC who would benefit from the fluoropyrimidine regimen, we completed a combined-effects analysis of unfavorable genotypes within the identified prognostic loci. A clear and significant trend was evident for increased risk with an increasing number of unfavorable genotypes. These results suggest that the cumulative influence of multiple genetic variants within the UCEs can further enhance the separation of patients based on clinical outcome. Complex interactions between the SNPs could determine the functional outcome more than the independent main effects of any 1 susceptibility gene. We also performed an exploratory analysis of the SNP-SNP interactions and identified subgroups of patients with dramatically different recurrence-free survival after fluoropyrimidine treatment. However, statistical modeling of an interaction does not amount to a true biologic interaction, and these results should be interpreted with caution.

The current study has several strengths. First, we reviewed and analyzed all variations within 481 UCEs and reported systematic SNPs within the UCEs. Second, this is the first study to date that was designed specifically to identify the genetic effects of SNPs within UCEs in patients with locally advanced CRC and their treatment outcomes after adjuvant therapy. In addition, we have comprehensive epidemiologic and clinical data for all patients at our institution with locally advanced CRC who had a prolonged period of surveillance. The main limitation of this study is that the replication set was not comprised of newly diagnosed patients but of those patients who came to MD Anderson mainly because of potential recurrence or disease progression. Therefore, there was an over-representation of recurrence and progression among patients in the replication set. The reason for our current means of splitting the training and replication sets was to keep the training set as clean as possible to identify promising and more reliable candidate SNPs for further validations by us and other investigators in the field. However, the relative small and heterozygous population in the replication set may result in false-negative results. Another weakness was that, although the overall patient cohort was large, we stratified our analysis by stage and treatment to limit the confounding of stage and treatment on recurrence or survival, which caused smaller numbers in the analyses and limited our power to detect additional significant associations. Only 1 SNP, rs7849, was validated in our replication set. Other SNPs were not significant in the replication set, although several of them exhibited a consistent trend and remained significant in pooled analyses (eg, rs10211390 for recurrence in both stage II and stage III patients and rs6590611 for survival in stage II patients). In addition, given the multiple comparison issue, there is a possibility that the significant results may have been caused by chance. Future validations with patient populations comparable to our training set and larger sample sizes are needed to confirm our results and validate more significant SNPs.

In conclusion, we have identified genetic variations within UCEs as prognostic markers in patients with locally advanced CRC who received fluoropyrimidine-based adjuvant chemotherapy. The validation and incorporation of the identified SNPs and interactions with the clinical variables may allow clinicians to stratify patients for optimal adjuvant chemotherapy to achieve a step forward in personalized cancer care.

Note Added in Proof


This work was supported by a Multidisciplinary Research Program grant on colorectal cancer from The University of Texas MD Anderson Cancer Center and by grant NIH-NCI CA-16672 from the National Cancer Institute.


The authors made no disclosures.