Differential genetic influences over colorectal cancer risk and gene expression in large bowel mucosa

Site‐specific variation in colorectal cancer (CRC) incidence, biology and prognosis are poorly understood. We sought to determine whether common genetic variants influencing CRC risk might exhibit topographical differences on CRC risk through regional differences in effects on gene expression in the large bowel mucosa. We conducted a site‐specific genetic association study (10 630 cases, 31 331 controls) to identify whether established risk variants exert differential effects on risk of proximal, compared to distal CRC. We collected normal colorectal mucosa and blood from 481 subjects and assessed mucosal gene expression using Illumina HumanHT‐12v4 arrays in relation to germline genotype. Expression quantitative trait loci (eQTLs) were explored by anatomical location of sampling. The rs3087967 genotype (chr11q23.1 risk variant) exhibited significant site‐specific effects—risk of distal CRC (odds ratio [OR] = 1.20, P = 8.20 × 10−20) with negligible effects on proximal CRC risk (OR = 1.05, P = .10). Expression of 1261 genes differed between proximal and distal colonic mucosa (top hit PRAC gene, fold‐difference = 10, P = 3.48 × 10−57). In eQTL studies, rs3087967 genotype was associated with expression of 8 cis‐ and 21 trans‐genes. Four of these (AKAP14, ADH5P4, ASGR2, RP11‐342M1.7) showed differential effects by site, with strongest trans‐eQTL signals in proximal colonic mucosa (eg, AKAP14, beta = 0.61, P = 5.02 × 10−5) and opposite signals in distal mucosa (AKAP14, beta = −0.17, P = .04). In summary, genetic variation at the chr11q23.1 risk locus imparts greater risk of distal rather than proximal CRC and exhibits site‐specific differences in eQTL effects in normal mucosa. Topographical differences in genomic control over gene expression relevant to CRC risk may underlie site‐specific variation in CRC. Results may inform individualised CRC screening programmes.


| INTRODUCTION
Genome-wide association studies (GWAS) on large, well-characterised case-control series of colorectal cancer (CRC) have identified numerous common genetic variants associated with individually modest effects on CRC risk.Identifying the underlying causal mechanism may provide novel targets for cancer prevention or therapy.However, as these variants frequently lie in intergenic positions or noncoding regions of the gene, mechanisms responsible for risk modification are not readily identifiable.
[3][4] Site-specific variation in CRC incidence, 5 biology, 6,7 response to adjuvant therapy 8 and prognosis [9][10][11][12] are well recognised, yet incompletely understood.Differences may reflect different embryological origin or different exposure to faecal stream, microbiome and carcinogens. 13However, topographical differences in eQTL effects may explain part of the observed differences in CRC risk.We previously reported differential effects of rs3802842 and rs4939827 on cancer risk in the rectum and colon. 14Furthermore, differences in gene expression have been observed in normal mucosa 15 and cancers [16][17][18] originating from left and right colon.
One explanation for differences in gene expression and CRC risk is differential genomic control through site-specific eQTL effects.Differences in eQTL effects for known CRC risk SNPs previously reported 3 using GTEx RNAseq data from harvested transverse (full thickness) and sigmoid colon (smooth muscle only) samples are likely confounded tissue of origin effects since the sigmoid samples contain no colonic mucosa.
We hypothesised that a subset of risk loci might impart site-specific effects on CRC risk through topographical differences in genomic control over gene expression.To investigate this, we first tested CRC risk loci for differential site-specific effects.We then sought to identify differences in gene expression across the colorectum and explored association between CRC risk loci and gene expression through site-specific eQTL analysis.

| Site-specific association study
We searched relevant GWAS to identify all putative loci impacting CRC risk, with 160 loci identified, including those from the most recently reported GWAS. 19,20The list was constrained to include new and those replicated at P < 1 Â 10 À6 level and excluding variants identified previously in GWAS studies of Asian and African subjects.We also excluded SNPs identified by conditional analysis at known CRC risk loci, instead using the lead SNP (n = 87) (Supplementary Table 1).We then ran association analyses for the risk of distal and proximal cancers in previously described cases-control studies of CRC in Scotland and UK Biobank (Supplementary Methods 19 ).Tumours located proximal to or at the splenic flexure (ICD codes C18.0, C18.2, C18.3, C18.4,C18.5), were defined as proximal and cancer cases with tumours located distal to the splenic flexure were counted as distal CRC (ICD codes C18.7, C19, C20), reflecting the embryological origin of midgut and hindgut, respectively.The associations between cancer sites and genetic variants were tested using a multinomial logistic regression likelihood as implemented in SNPTEST v2.5.2 [21][22][23] and assuming additive model of inheritance.Meta-analyses of four case-control studies across proximal and CRC were performed using the fixed-effects inverse-variance method using META v1.7. 24Cochran's Q-statistic to test for heterogeneity and the I2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated.Finally, we performed case-only analysis to study effects of genetic variants on the risk of developing distal compare to proximal cancers.Results of individual case-only analyses were combined in the fixed effects inversevariance meta-analysis as implemented in META v.1.7.

| Study population eQTL analysis
We included CRC cases from the Study of Colorectal Cancer in Scotland (SOCCS), a population-based case-control study designed to identify genetic and environmental factors impacting on CRC risk and survival outcomes. 25We also included pretreatment samples from participants in the Scottish Vitamin D study (SCOVIDS), which comprised patients with previous history of CRC and healthy volunteers.Clinical variables were collected from clinical records systems and pathology records, entered into a prospective study database and extracted for analysis.

| Mucosa sampling and storage
Normal colorectal mucosa (NM) was sampled from a single site from freshly resected surgical specimens or rectal biopsy.Samples were What's new?
Common genetic variants are known to influence colorectal cancer (CRC) risk.In our study, the authors asked whether sequences that influence gene-expression levels, known as "expression quantitative trait loci" (eQTLs), might lead to different transcription patterns depending on where in the mucosa the cells occur (eg, proximal vs distal mucosa).They found site-specific risk of CRC at a known risk locus and site-specific trans-eQTL effects with this locus and expression of four genes.These topographical differences in genomic control of gene expression may lead to more highly-individualised tools for CRC screening programs.immersed in the stabilisation solution RNAlater (Invitrogen).All samples were kept in RNAlater for 24-72 hours prior to RNA extraction or storage at À80 C. Assessment of gene expression was performed using Illumina HumanHT-12v4 BeadChip with validation of top genes performed using standard qRT-PCR (see Supplementary Methods).

| Differential expression analysis
All statistical analysis was undertaken in R. 26 Investigation of differential gene expression was undertaken using the lmFit and eBayes functions within the limma package with a total of 42 184 probes assessed.PEER factors 27 were estimated on the processed expression matrix, and were used as covariates in the model together with age and gender.Adjustment for multiple testing was undertaken and FDR P-values derived. 28Linear regression modelling was used to adjust for relevant demographic variables (PEER factors, age and gender), with adjusted fold-difference in expression between samples taken from distal and proximal sites calculated as the antilog 2 of the model beta value.

| Functional pathway analysis
Gene ontology and enrichment analysis was undertaken using the "GOrilla," Gene Ontology enRIchment anaLysis and visualisation tool through the Gorilla web page. 29Process, function and component ontologies were investigated using gene lists from differential gene expression analysis by site ranked by unadjusted P-value from smallest to largest.Terms enriched at FDR < 0.05 were considered to be significantly enriched.
For eQTL analysis, 462 samples were retained that passed quality control and for which we had genotyping data.The analysis was carried out for all samples together, and also separately by site of mucosa sample (proximal = 113, distal = 349).eQTL discovery was carried out with matrix eQTL 36 using linear model adjusted for age, gender and 15 PEER factors.Only the additive genetic model was used with genotypes considered as a quantitative variable.To explore the influence of anatomical location on eQTL signals, an analysis stratified by site was performed and charted for those probes with putative eQTL signals (nominal P < .05 at either both sites combined or in proximal samples or in distal samples).Interaction analysis with the site was performed using "modelLINEAR_CROSS" model specification as implemented in Matrix eQTL.

| Site-specific genome-wide association study
We conducted site-specific meta-analyses of case-control association studies (3089 proximal colon cancer cases, 7541 distal CRC cases, 31 331 controls) to identify whether established risk variants (n = 87) influence risk of proximal and distal CRC differently (Supplementary Table 2).Using the collated cases, we then performed case-only analysis to study effects of genetic variants on the risk of developing distal compared to proximal cancers across each of the included studies.
We meta-analysed results of individual studies with just one locus (rs3087967, P3.32 Â 10 À5 , FDR 0.003) showing evidence for differential risk of proximal vs distal cancer after FDR correction.Hits with a nominal P value <.01 in the case only analysis are given in Table 1 (allele frequencies given in Supplementary Table 3, case-only sitespecific results given in Supplementary Table 2).

| Topographical gene expression analysis
Next, we performed gene expression profiling to assess for topographical variation in normal mucosa gene expression between the proximal colon (proximal to splenic flexure) and distal colorectum (colon distal to splenic flexure and rectum).NM samples from 481 unique subjects were analysed (Table 2).
Transcriptomic analysis provided expression data for 42 436 probes and 28 707 unique named genes after QC.We identified differential expression between the proximal and distal colorectum for 1430 probes, accounting for 1261 genes (Table 3, Supplementary Table 4; Figure 1) with 486 genes more highly expressed in the proximal colonic mucosa.
PRAC was the top differentially expressed gene between proximal and distal samples in subgroup analyses of NM from patients with CRC and those subjects without CRC.Six hundred and nineteen differentially expressed genes by site were seen in the 329 samples from patients with CRC and 255 differentially expressed genes by site were seen in the 152 samples from subjects without CRC (Supplementary Table 5).
Gene ontology analysis demonstrated enrichment of numerous processes in relation to mucosal sampling site, with many hits relevant to carcinogenesis including cell cycle checkpoint, cell division and DNA repair (Supplementary Table 6).
To explore whether the site-specific effect on risk with rs3087967 genotype could be associated with topographical differences in genomic control over gene expression we performed cis-and trans-eQTL analysis.First, we sought association between genotype at chr11q23.1 and expression of genes within a 1 MB distance up and downstream of the transcription start site.Of the 34 probes assessed, 8 showed putative eQTL signals (nominal P < .05 at either all sites combined or in proximal samples or in distal samples, Supplementary Table 7), including strong cis-eQTL effects associated with the expression of COLCA2 (FDR 4.52 Â 10 À70 ), COLCA1 (FDR 2.48 Â 10 À40 ) and C11orf53 (FDR 4.74 Â 10 À7 ).Cis-eQTL effects associated with PPP2R1B expression (FDR 0.004) were seen in proximal colonic mucosa samples only and not seen when all sites were combined, and these site-specific differences maintained in a subgroup analysis including only CRC cases (Supplementary Table 8).Cis-eQTL effects associated with PIH1D2 expression (FDR 0.01) were seen in distal colorectal mucosa samples only.COLCA2 eQTL effects were stronger in proximal colonic samples (beta 0.98 vs 0.84, Supplementary Fig- ure 3), yet on formal interaction testing no probe showed significant differential eQTL effects between the proximal and distal colorectum (Supplementary Table 7).There were no baseline differences in expression in these eight probes between proximal and distal sample sites in the complete sample set (Supplementary Table 9).
To corroborate these data, COLONOMICS 37 and GTEx 38 were interrogated for data on eQTL effects at rs3087967 and expression of the eight genes reported in Supplementary Table 6 (Supplementary Tables 10 and   11).Both COLONOMICS and GTEx eQTL data correlated with the current findings showing stronger eQTL in proximal mucosa for COLCA2, FDX1 and C11orf53 (COLONOMICS only) yet in contrast to our data showed stronger effects for PPP2R1B in distal samples.We also tested for differential expression in COLONOMICS data between mucosa from healthy controls and adjacent normal mucosa from CRC patients.Significantly lower expression was seen in mucosa from CRC patients for PPP2R1B, PIH1D2, COLCA2, C11orf53, FDX1 and COLCA1 (Supplementary Table 12).
As cis-eQTLs explain only a small fraction of total transcript-level heritability, 39 we next performed trans-eQTL analysis for association between rs3087967 genotype and expression of 35 375 probes.Trans-eQTL effects were found to be associated with expression of 23 probes accounting for 21 genes (FDR < 0.05, top hit LRMP FDR 6.01 Â 10 À12 , Supplementary Table 13).Probes with a putative trans-eQTL signal (nominal P < .05 at all sites combined or in proximal samples or distal samples, n = 3798) were tested for differential trans-eQTL effects using a site interaction model.This revealed 4 probes with differential trans-eQTL effects dependent on site, all driven by eQTL signals in proximal colonic samples (top hit AKAP14, interaction FDR 0.006; Table 4; Figure 2).Site-specific trans-eQTL effects were maintained in a subgroup analysis including only CRC cases (Supplementary Table 8).There were no differences in expression in these four probes between proximal and distal sample sites in the complete sample set (Supplementary Table 14), or in COLONOMICS data (normal tissue).
Finally, we tested for differential expression in COLONOMICS data between mucosa from healthy controls and adjacent normal mucosa from CRC patients.No differences in AKAP14 or ASGR2 were seen.Of potential interest, the expression of ADH5 was less in adjacent mucosa from CRC patients, but only when comparing proximal samples (proximal expression 7.89 vs 7.55, P = .008;distal expression 7.89 vs 7.72, P = .3).

| DISCUSSION
We report a comprehensive analysis of site-specific difference in genetic risk for CRC and explore transcriptomic data for variation in gene expression, and eQTL effects dependent on mucosa sampling site.We show that genetic variation at the chr11q23.1 CRC risk locus imparts significantly greater risk of distal rather than proximal CRC.
Trans-eQTL analysis demonstrates significant differential eQTL effects for the chr11q23.1 locus between the proximal and distal T A B L E 3 Top genes with differences in expression between distal and proximal colorectum mucosa samples colorectum, suggesting that site-specific effects on risk may be attributable to topographical differences in genomic control over gene expression in large bowel mucosa.These findings shed further light on differential genomic control effects on gene expression relevant to CRC risk.
These findings establish that, at least in the case of rs3087967 as a paradigm, there are site-specific differential effects of CRC risk loci and that these might be mediated through changes in eQTLs.The degree of differential expression provided strong rationale to then test for site-specific effects on risk.Of the established risk loci tested, only the chr11q23.1 locus imparted significantly different risk between the proximal and distal colon.Previous data had demonstrated a greater risk of rectal cancer for both rs3802842 and rs4939827. 14r the purposes of this analysis, we partitioned the large bowel by the embryological interface at the splenic flexure.Due to statistical power, it was not appropriate to provide further breakdown of sites, yet given data suggesting linearity in tumour characteristics beyond the simple proximal-distal divide, 11 we acknowledge that there may be further risk SNPs that exert differential effects on risk and gene expression in an anatomically biased manner.
Genotype at the rs3087967 SNP imparted risk on the distal colorectum (odds ratio [OR] = 1.20,P = 1.28 Â 10 À20 ), with no significant impact on proximal colonic cancer risk.[42] F I G U R E 2 Differential trans-eQTL effects with rs3087967 in proximal and distal colorectal mucosa.Normal mucosa was sampled from resected colorectal specimens or by rectal biopsy.RNA was extracted and gene expression assessed using HT12 microarrays.Expression of genes with differential eQTL effects with rs3087967 are charted by site and genotype.The lower and upper hinges correspond to the first and third quartiles.The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge [Color figure can be viewed at wileyonlinelibrary.com] We demonstrate a large number of genes with differential expression between the proximal and distal colorectum, validating previous reports [15][16][17][18] and supporting our downstream site-specific eQTL analysis.Functional annotation indicates differences in processes relevant to carcinogenesis including regulation of cell cycle checkpoint, cell division and DNA damage responses which may underlie site-specific variation in CRC molecular pathogenesis and increased sensitivity to certain chemotherapy regimens. 8ans-eQTL analysis here validate previously reported findings. 1,37The TT genotype at rs3087967 is associated with higher expression of AKAP14 (A-kinase anchor protein 14), ASGR2 and ADH5P4 (Alcohol dehydrogenase 5) in proximal colonic samples (FDR < 0.05), but no effect on expression of these genes in distal colorectal samples (FDR > 0.05).ASGR2 (Asialoglycoprotein Receptor 5][46] Further investigation is required to define the relevance of AKAP14, ASGR2 and ADH5P4 to the observed site-specific risk associated with the rs3087967 TT genotype.
We acknowledge several limitations within the current study.
First, we only identified one locus with site-specific risk and we acknowledge increased sample size may uncover further relevant hits.
Colinearity in sample site and cancer status (94% samples from noncancer patients were from rectum) precluded a robust case-control analysis within the mucosa dataset, and eQTLs may differ between cases and controls thus impacting our analysis and downstream conclusions.The significance of this sampling colinearity may introduce a bias if current CRC differentially influences expression in distal and proximal tissue samples.To address this, we performed case-control gene expression analysis, stratified by sample site and identified no differences in gene expression between CRC-cases and controls (data not shown).We also performed eQTL analysis in CRC-cases only with site-specific eQTL effects reported in our overall cohort maintained in this subgroup (Supplementary Table 8).It was not appropriate to perform this analysis in "non-CRC" cases, given the low number of proximal samples in subjects without CRC (N = 8), thus further studies should consider how best to reliably sample the proximal colon out with the operating theatre (eg, at colonoscopy).Finally, we recognise that while the rs3087967 locus is associated with distal CRC risk, the observed eQTL effects for this locus are strongest in proximal mucosal samples.It is unclear why this might be, but given that distal and proximal cancers are known to have different risk factors, both genetic and environmental, 47,48 it is reasonable to propose that the mechanism by which a locus imparts risk for proximal and distal cancer may be different, with possible interplay with environmental factors.Such factors might include stool make-up, the microbiome, obesity, physical activity, smoking or aspirin exposure, which could underlie the absence of relevant eQTL effects in distal mucosa samples.
Despite these limitations this is, to our knowledge, the first study to perform site-specific genetic association analysis and carry candidate loci forward to explore whether differences in genomic control over gene expression might underlie site-specific risk.We report a T A B L E 4 Site interaction analysis for trans-eQTL signals for rs3087967 (11: Notes: Overall eQTL FDR adjusted for all probes within trans region (n = 35 375).Site-specific eQTL and interaction FDR adjusted for 3798 probes with putative evidence of eQTL (nominal P < .05) at any site (all sites combined, proximal or distal).Table shows top five hits for interaction analysis between SNP and sample site.Bold values significant as FDR < 0.05.
T A B L E 1 Top SNPS with evidence for differential risk of proximal vs distal cancer Case-control meta-analysis results given for case-control association study of 3089 proximal colon cancer cases and 31 331 controls and between 7541 distal colorectal cancer cases and 31 331 controls, nominal P value for association with risk given.Significant association (FDR < 0.05) confirmed at 70 SNPs for distal CRC and 42 SNPs for proximal CRC.Case-only meta-analysis OR and interaction P value indicates association between SNP and risk of distal CRC compared to risk of proximal colon cancer.Interaction FDR adjusted for 87 SNPs tested.
Samples taken in patients with CRC comprised 108 from proximal colon and 221 from distal colorectum.Samples taken in patients without CRC comprised 8 from proximal colon and 143 from distal colorectum.
Top genes with differential proximal/distal expression, with fold-difference adjusted for age, gender and PEER factors given.Analysis not adjusted for CRC status given collinearity between CRC status and sample site.Alternative probes for top genes given where available. 111156836:C:T)