Circulating white blood cell traits and colorectal cancer risk: A Mendelian randomisation study

Observational studies have suggested a protective role for eosinophils in colorectal cancer (CRC) development and implicated neutrophils, but the causal relationships remain unclear. Here, we aimed to estimate the causal effect of circulating white blood cell (WBC) counts (N = ~550 000) for basophils, eosinophils, monocytes, lymphocytes and neutrophils on CRC risk (N = 52 775 cases and 45 940 controls) using Mendelian randomisation (MR). For comparison, we also examined this relationship using individual‐level data from UK Biobank (4043 incident CRC cases and 332 773 controls) in a longitudinal cohort analysis. The inverse‐variance weighted (IVW) MR analysis suggested a protective effect of increased basophil count and eosinophil count on CRC risk [OR per 1‐SD increase: 0.88, 95% CI: 0.78‐0.99, P = .04; OR: 0.93, 95% CI: 0.88‐0.98, P = .01]. The protective effect of eosinophils remained [OR per 1‐SD increase: 0.88, 95% CI: 0.80‐0.97, P = .01] following adjustments for all other WBC subtypes, to account for genetic correlation between the traits, using multivariable MR. A protective effect of increased lymphocyte count on CRC risk was also found [OR: 0.84, 95% CI: 0.76‐0.93, P = 6.70e‐4] following adjustment. Consistent with MR results, a protective effect for eosinophils in the cohort analysis in the fully adjusted model [RR per 1‐SD increase: 0.96, 95% CI: 0.93‐0.99, P = .02] and following adjustment for the other WBC subtypes [RR: 0.96, 95% CI: 0.93‐0.99, P = .001] was observed. Our study implicates peripheral blood immune cells, in particular eosinophils and lymphocytes, in CRC development, highlighting a need for mechanistic studies to interrogate these relationships.


| INTRODUCTION
6][7] Given current challenges, and the estimation that 50% of CRC cases may be preventable, 8 a focus on identifying novel risk factors, and subsequent prophylactic and treatment options are warranted to limit the future healthcare burden.
White blood cells (WBCs) are commonly measured in routine blood tests and are divided into five subtypes: basophils, eosinophils, lymphocytes, monocytes and neutrophils. 95][16][17] Higher absolute lymphocyte count [such as T, B and natural killer (NK) cells] has also been associated with better overall survival of CRC. 18,19By contrast, a high absolute monocyte count was found to be associated with worse CRC survival, 18,19 consistent with their potential role in tumour progression and metastasis. 20nally, neutrophils, a critical part of the innate immune system, 21 have also been associated with poor CRC overall survival. 18,19raditional (ie, cross-sectional, case-control or cohort) studies account for the majority of epidemiologic analyses undertaken on CRC, which can suffer from certain limitations, such as confounding and reverse causation which can bias effect estimates. 22Mendelian randomisation (MR) is a method in genetic epidemiology which could overcome these issues by using single-nucleotide polymorphisms (SNPs) to proxy for an exposure of interest to estimate the effect of an exposure on an outcome. 23It operates akin to a randomised control trial (RCT), as alleles are randomly assigned at birth. 23We have used MR to estimate the total effect of each WBC trait on CRC, 24 and, to account for correlation between WBC traits, multivariable (MV) MR to estimate the direct effect of these exposures on CRC by adjusting for their shared genetic architecture. 24 our study, we aimed to investigate the relationship between circulating WBC subtypes and CRC incidence.First, an MR analysis was undertaken using the most comprehensive genome-wide association study (GWAS) for both WBC subtype counts and incident CRC available.Second, for comparison, we ran the largest longitudinal cohort study between WBCs and CRC to date using UK Biobank.
Together these analyses allow us to compare between genetically proxied and observational estimates to thoroughly explore the relationship between circulating WBC counts and CRC development.

| Study design
We aimed to investigate the relationship between circulating WBCs and CRC using a genetic epidemiologic and complementary observational approach.First, MR analyses were undertaken to estimate the effect of WBC subtype counts on CRC.We then performed a multivariable MR (MVMR) analysis where the direct effect of each WBC subtype count was estimated by adding all five WBC subtypes into the model (Figure 1A).STROBE-MR guidelines were followed (STROBE-MR Supplement). 25Here, units were interpreted as odds ratio (OR) for CRC per a normalised SD (1-SD) increase in WBC count.Next, a prospective longitudinal cohort study was undertaken using UK Biobank individual-level data.Here, subtype-specific WBC counts were studied individually, and then each was adjusted for each of the other traits (Figure 1B).STROBE guidelines were followed (STROBE Cohort Supplement).Here, units were interpreted as risk ratio (RR) for CRC per a normalised SD (1-SD) increase in WBC count.

| WBC count GWAS data
Summary statistics for WBC subtype counts were obtained from the 'Blood Cell Consortium' (BCX) meta-analysis, the largest study being UKBB (N = $562 243). 26Genetic sex, age, age 2 , study-specific covariates and PCs 1 to 10 were used as covariates.A brief description of each study included in the meta-analysis is available in Table S1.Only variants which did not display heterogenous effects across studies were selected.Specific details on QC steps and association testing are available in the source manuscripts.Summary statistics for WBC counts were downloaded from: http://www.mhi-humangenetics.org/en/resources/.

| CRC GWAS data
GWAS summary statistics for overall CRC risk and anatomical subsite were taken from the most comprehensive meta-analyses to date 27 : Registry (CCFR) consortia were meta-analysed. 27The final sample was predominantly of European ancestry with 5.36% of East Asian ancestry.These were included due to their similar genetic architecture with regard to CRC risk. 28Genetic sex, age, study-specific variables and PCs were used as covariates.An overview of all consortia included in the CRC meta-analysis is available in Table S2, and a breakdown of the sample sizes for each set of CRC summary statistics is presented in Table S3.

| Genetic data processing
To select valid MR instruments, summary statistics for exposures were processed using the 'TwoSampleMR' R package. 29,30The presence of correlated instruments, that is, those in linkage disequilibrium (LD), can bias MR estimates. 29,30Therefore, the exposure SNPs were clumped (r 2 = 0.001, window = 10 Mb, P-value threshold = 5eÀ8) using the 1000 Genomes European dataset 31 as a reference panel.Following this step, the exposure and outcome datasets were 'harmonised', that is, had their effect alleles placed on the same reference strand. 32SNPs with incorrect but unambiguous strand references were corrected, while those with ambiguous strand references were removed.

| MR analysis of WBC counts on CRC risk
Primary MR analyses were undertaken using the inverse-variance weighted (IVW) method, which is the multiplicative random-effects meta-analysis of the estimated effect of all exposure SNPs on the CRC outcome. 33Conditional F-statistics were calculated to detect weak instrument bias for each exposure SNP as previously described. 34Several sensitivity analyses were undertaken to compare with the main IVW estimates.The presence of vertical pleiotropy, that is, when a trait is downstream of the genetic variant but on the same biological pathway as the exposure, 35 was measured using Cochran's Q heterogeneity test.Horizontal pleiotropy, when some or all instruments for a trait act through a different pathway to the exposure, 35 can violate one of the main MR assumptions.A number of sensitivity MR analyses were undertaken to identify horizontal pleiotropy: MR-Egger (where the regression intercept is not constrained to zero), 36 Weighted median (the median of all SNP ratio estimates, where each ratio is weighted by the inverse of the variance), 37 Weighted modes (assumes that the most frequent estimate in a set of instruments is zero) 38 and MR-PRESSO (detects individual SNPs contributing to horizontal pleiotropy). 39The direction of the causal relationship between WBC traits and CRC risk was tested using the MR Steiger method, which uses Steiger's test to test the difference between the Pearson correlations of genetic variants with both the exposure and outcome. 30

| Multivariable MR analysis of WBC counts on CRC risk
The IVW method was used for the MVMR analysis.First, a pair-wise analysis between all five WBC subtype counts was undertaken, where the proportion of variance explained (PVE) for SNPs used to instrument a WBC trait was estimated in the other four WBC subtypes using the previously described methodology. 34The direct effect of each WBC subtype was estimated by adding all five WBC subtypes into the MVMR model.Bias arising from weak instruments was also determined here.This was undertaken using the methodology described by Sanderson et al, where a generalised version of Cochran's Q was employed to evaluate instrument strength. 40Standard Cochran's Q statistic was calculated to detect the presence of heterogeneity.For those traits with an F-statistic <10, a follow-up MVMR analysis was done accounting for the presence of weak instruments.

| UK Biobank phenotypic data
Between the years 2007 and 2010, UKBB participants visited assessment centres (N = 22) throughout the UK. 41The individuals had their health records linked, their genomes sequenced and underwent multiple evaluations, such as self-report questionnaires and medical examinations. 41The latter includes the analysis of blood cell samples using Beckman Coulter LH750 instruments designed for high throughput screening. 42Total WBC count and WBC subtype percentage (%) were measured, with absolute WBC subtype count derived as 'WBC subtype %/100 Â total WBC' and expressed as 10 9 cells/L. 42The blood sampling date variable was split into year, month, day and minutes (passed since the start of the day of the appointment visit).
Additional variables were gathered including recruitment centre, sampling device ID, age, genetic sex, principal components 1 to 10, body mass index (BMI), Townsend deprivation index, smoking and alcohol drinker status (self-report questionnaire-UKBB codes 20116 and 20117).CRC cases were identified through hospital inpatient records coded to the 10th version of the International Classification of Disease (ICD-10).UK Biobank data was granted under application code 81499.

| Filtering and selection criteria
The UKBB dataset underwent a series of steps prior to further analyses.Withdrawn participants and those of non-European ancestry were excluded.Viable controls and incident CRC cases were defined using methodology previously described by Burrows et al 43 (Table S4).
Here, we defined incident CRC cases as those diagnosed at least 1 year after blood sampling.Participants with no WBC measurement data or sampling date were removed, as were those who were known to be pregnant, have chronic conditions (eg, HIV, blood cancers, thalassaemia) or undergoing erythropoietin treatment, as in Astle et al 44 and Chen et al, 26 given the effects of these traits on WBC measurements.Those with acute conditions (eg, upper respiratory infections) diagnosed less than 3 months prior to blood sampling were excluded.Finally, missing values in 'Townsend Deprivation Index', 'Body mass index', 'Smoking status' and 'Alcohol drinker status' variables were removed.

| Cohort study between WBC count and CRC
We conducted a cohort analysis between circulating WBCs and incident CRC.WBC count values were log-transformed, after which they were adjusted for the following covariates: assessment centre, sex, age, age 2 , PCs 1 to 10, as in Chen et al's GWAS. 26The resulting residuals were rank-inverse normal transformed and then used in a logistic regression on CRC incidence.This main observational analysis was termed 'Model 1', a minimally adjusted model.
An additional fully adjusted analysis was undertaken, termed 'Model 2', where BMI, Townsend DI, smoker status and alcohol drinker status were added as additional covariates.Following this, another pair of analyses was run, where all five WBC subtype counts were added together into the model to reduce potential bias due to their correlated values.Analyses, where each WBC trait was studied individually, were termed as 'univariable', while those where they were added together were termed as 'multivariable'.

| Working environment
All analyses were performed with R version 4.1.2(Bird Hippie) in a Linux environment supported by the University of Bristol's Advanced Computing Research Centre (ACRC).Genetic data preparation, as well as the MR analyses, were undertaken with the 'TwoSampleMR' R package. 29,30The MVMR analyses were undertaken with the 'MVMR' R package. 45

| Patient and public involvement
Patients or the public were not involved in the design, conduct, reporting or dissemination plans of our research.

| Effect of WBC count on CRC
Before performing the MR analysis, the average F-statistic for each WBC trait was estimated to detect the presence of weak instrument bias, which is generally indicated by an average F-statistic <10. 46For overall CRC, these were 64.48   S5).
Next, the presence of vertical and horizontal pleiotropy in the MR analyses was analysed.Cochran's heterogeneity test indicated the presence of heterogeneity in all but one WBC trait-CRC pair (basophil count-male CRC, P HET = .104;Table S7).Following this, the MR-Egger test for horizontal pleiotropy was performed.Here, evidence for this type of pleiotropy was identified for eosinophil count and overall (P PLT = .015)CRC risk, suggesting a possible bias of MR estimates (Table S7).Although the MR-PRESSO method identified the presence of SNP horizontal-pleiotropic outliers, there was little evidence that the removal of these outliers contributed to a notable shift in the point estimates (Table S8).

| Multivariable MR of WBC count on CRC
A pair-wise analysis of the proportion of variance explained (PVE) for SNPs instrumenting each WBC trait indicated a low PVE for each of the other WBC traits, with the exception of basophil count (2.44% vs 2.39% when instrumenting neutrophil count; Table S9).The overall results indicated that statistical power should not suffer to a large degree by adding all five WBC subtype counts into the MVMR analysis.Here, the MVMR IVW method estimated a protective effect of eosinophil count (OR: 0.88, 95% CI: 0.80-0.97,P-value: .011)and lymphocyte count (OR: 0.84, 95% CI: 0.76-0.93,P-value: .0007) on CRC risk (Figure 2).Site-specific CRC estimates are available in Table S10.
Sensitivity analyses were undertaken to assess heterogeneity and the presence of weak instruments in the MVMR analysis.There was evidence of heterogeneity in all WBC trait-CRC pairs (Table S11).
Conditional F-statistics showed evidence of weak instruments (F < 10) for basophil count (Table S11).Based on these results, an additional MVMR analysis was run adjusting for weak instruments for basophil count (Table S11).

| Cohort observational analysis between WBC count and CRC
In total, 336 816 UKBB participants remained after passing the filtering and selection criteria (Figure S2).Of these, 332 773 were controls and 4043 were incident CRC cases.When split by genetic sex, there were 154 629 male and 178 144 female controls and 2316 male and 1727 female cases.Those with CRC were more likely to be male (57% vs 46%), had a higher average age (60.7 The relationship between WBC count and CRC risk based on UV, MV two-sample MR and cohort observational analyses.Each WBC trait is presented on the X-axis.The estimated effect is presented on the Y-axis.Point estimates were filled where the P-value was less than .05. Results are interpreted as ORs (95% CI) for CRC risk per 1-SD normalised increment in WBC count. 1 Chen et al 26 ; 2 Huyghe et al 27 (https:// doi.org/10.1136/gutjnl-2020-321534); 3 UK Biobank. 47een cigarette smokers in the past (46% vs 55% never smokers and 44% vs 34% previous smokers; Table 1).
As a percentage of the total WBC count based on the median values, basophils accounted for 0.3%, eosinophils 2.11%, lymphocytes 28.16%, monocytes 6.78% and neutrophils 60.39% (Table S12).A pair-wise correlation matrix between each WBC subtype demonstrated correlation coefficients equal to or below 0.3 (Figure S3).
Batch variables (eg, blood sample device and sampling date), Townsend DI, and alcohol drinker status explained some of the variances in WBC count (0%-0.66%).Depending on the WBC subtype, genetic sex explained between 0.23% and 2.93% of the variance, BMI explained between 0.14% and 2.64%, and smoking status explained between 0.44% and 3.85% (Figure S4).
Observational associations were re-computed by adding all five WBC subtype counts together, largely showing agreement with the main analysis (Table S14).

| DISCUSSION
In our study, we aimed to estimate the effects of five circulating WBC subtypes on CRC risk by using a combined genetic epidemiologic and longitudinal cohort framework.Through the aid of MVMR, we were able to assess the independent causal effect of WBC counts by adjusting for their shared genetic architecture.Taken together, the evidence across analyses suggests a potential protective effect of increased circulating eosinophil and lymphocyte count on CRC risk.
Consistent with our study, Prizment et al found that eosinophil count (tertiles Q3 and Q2 vs Q1) was negatively associated with odds of developing colon, but not of rectal cancer. 16Similar results have been reported for other cancers; Wong et al reported a negative trend between increasing eosinophil count quartiles and lung adenocarcinoma odds in a UKBB study, 47 while a similar study looking at prostate cancer showed a negative association between eosinophil count quintiles Q3-5, as well as a per 1-SD increase in the trait (HR 0.96 vs OR 0.93 for CRC in our analysis). 48sinophils have a well-established role in allergic diseases, including asthma and allergic rhinitis. 49Indeed, MR analyses have also reported that eosinophil count levels affect the risk of developing allergic disease 44,50 and a recent systematic review investigating the relationship between allergies and cancer reported evidence for a reduced risk of CRC in those with allergic diseases. 51re, our results suggest that the immune response through eosinophils provides protection against tumour development.Indeed, in several neoplasias, including CRC, eosinophils have been found to play an anti-tumourigenic role and are a source of anti-tumourigenic molecules, such as eosinophil-derived neurotoxin (EDN). 52,53Experimental studies have also found a tumour-protective effect of IgE. 54creased eosinophil recruitment to the CRC tumour site has also been associated with better survival, even when adjusting for the effects of CD8 + T-cells, 11 and eosinophil-specific granule secretion of granzyme A has been linked with the killing of CRC cells. 55 addition to eosinophil count, we also found a protective effect of lymphocyte count on CRC risk.While this was not apparent in the MR analysis, the MVMR estimates indicated lower ORs for CRC across all anatomical subsites (proximal colon cancer trended towards protective) with increased lymphocyte counts.The multivariable fully adjusted 'Model 2' in the cohort analysis also indicated that there could be a negative association with CRC risk.It is not surprising that we found a protective effect of higher circulating levels of lymphocytes with CRC odds given their established role in combatting tumour development. 56Tumour-infiltrating lymphocytes (TILs) like CD8+ T-cells help antagonise tumour growth through direct action and recruitment of other immune cells. 56High levels of TILs were previously associated with better CRC overall survival and disease-free survival. 19,56In support of our findings, two observational studies found higher lymphocyte counts compared to cases vs controls a year to 6 months prior to CRC diagnosis.

| Limitations
There are several limitations to our study.With regard to the cohort analysis, only baseline blood measurements were available.This assumes that WBC counts were constant and did not allow us to establish a relationship between a trend in WBC count and its relationship with CRC odds.Nevertheless, baseline WBC count measurements have previously been shown to be associated with disease risk, 47,48,59,60 making their study in relation to disease development a worthwhile endeavour.Also, in the cohort analysis, incident CRC cases were defined as those diagnosed at least 1 year after blood sampling, in order to not diminish the number of cases to a large degree.However, as CRC develops over a long period, our cohort analysis may have not excluded all participants with undiagnosed CRC.
With regards to the MR analysis, the genetic instruments used here proxied for lifetime variation of WBC count.Therefore, the MR analysis cannot be used to infer how large changes over a short timespan might affect CRC development.Regarding the MVMR method, caution should be applied when investigating traits with very weak instruments, as it cannot reliably adjust for those traits. 45This was the case for basophil count, as the F-statistic was estimated to be between 4.7 and 4.8 (Table S12).Therefore, despite pointing to an increased detrimental effect compared to the main MVMR analysis, ORs derived from the weak-MVMR analysis should be interpreted with this in mind.

| CONCLUSION
In summary, the results generated here provide evidence for a protective causal effect of elevated levels of circulating eosinophil and lymphocyte counts on CRC risk.Going forward, additional research is needed to disentangle the biological mechanisms and pinpointspecific pathways through which eosinophils and lymphocytes might protect against CRC development.

1
Study design.We triangulated findings from two study designs: a Mendelian randomisation analysis (A) and a longitudinal cohort analysis (B) to estimate the causal effect of WBC on CRC risk.
vs 55.8 years), had slightly higher BMI (28.0 vs 27.4 kg/m 2 ) and were more likely to have Baseline characteristics of UK Biobank study sample.
57,58However, these results could indicate production and recruitment of lymphocytes to the site of precancerous or undetected tumours rather than a causal effect.T A B L E 1 a n/N (%); Mean (SD).