A prospective study of serum metabolites and colorectal cancer risk


  • Amanda J. Cross PhD,

    Corresponding author
    1. Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, London, United Kingdom
    • Corresponding author: Amanda J. Cross, PhD, Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, St Mary's Campus, Norfolk Place, London, W2 1PG, United Kingdom; Fax: (011) +44 20 7402 2150; amanda.cross1@imperial.ac.uk

    Search for more papers by this author
  • Steven C. Moore PhD,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Rockville, Maryland
    Search for more papers by this author
  • Simina Boca PhD,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Rockville, Maryland
    Search for more papers by this author
  • Wen-Yi Huang PhD,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Rockville, Maryland
    Search for more papers by this author
  • Xiaoqin Xiong PhD,

    1. Information Management Services, Inc., Calverton, Maryland
    Search for more papers by this author
  • Rachael Stolzenberg-Solomon PhD,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Rockville, Maryland
    Search for more papers by this author
  • Rashmi Sinha PhD,

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Rockville, Maryland
    Search for more papers by this author
  • Joshua N. Sampson PhD

    1. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Health and Human Services, Rockville, Maryland
    Search for more papers by this author

  • This article has been contributed to by US Government employees and their work is in the public domain in the USA.

  • We thank Tom Riley, Craig Williams, Michael Furr, and Adam Risch of Information Management Services, Inc. of Silver Spring, Maryland, for data management.



Colorectal cancer is highly prevalent, and the vast majority of cases are thought to be sporadic, although few risk factors have been identified. Using metabolomics technology, our aim was to identify biomarkers prospectively associated with colorectal cancer.


This study included 254 incident colorectal cancers and 254 matched controls nested in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Serum samples were collected at baseline, and the mean length of follow-up was 8 years. Serum metabolites were analyzed by ultra-high performance liquid-phase chromatography with tandem mass spectrometry, and gas chromatography coupled with mass spectrometry. Conditional logistic regression was used to calculate odds ratios (OR) and 95% confidence intervals (CI) for metabolites above the limit of detection and present in at least 80% of participants.


A total of 676 serum metabolites were measured; of these, 447 were of known identity and 278 of these were present in >80% of individuals. Overall, there was no association between serum metabolites and colorectal cancer; however, some suggestive associations were observed between individual metabolites and colorectal cancer but none reached statistical significance after Bonferroni correction for multiple comparisons. For example, leucyl-leucine was inversely associated (OR comparing the 90th to the 10th percentile = 0.50; 95% CI = 0.32-0.80; P = .003). In sex-stratified analyses, serum glycochenodeoxycholate was positively associated with colorectal cancer among women (OR90th vs.10th percentile = 5.34; 95% CI = 2.09-13.68; P = .0001).


No overall associations were observed between serum metabolites and colorectal cancer, but serum glycochenodeoxycholate, a bile acid metabolite, was positively associated with colorectal cancer among women. Cancer 2014;120:3049–3057. © 2014 American Cancer Society.


Colorectal cancer remains the third most common cancer around the globe, and is fatal in approximately half of cases.[1] Although overall incidence rates have recently decreased in the United States, they remain among the highest in the world and incidence is increasing among those less than 50 years of age.[2] The majority of colorectal cancer cases arise sporadically, and although screening reduces the risk of colorectal cancer, improvements in lifestyle and diet have been hypothesized to have the largest effect on incidence.[3, 4] However, a recent review of the literature showed that few known modifiable risk factors have been identified; only obesity, alcohol, red meat, and processed meat intake have been consistently positively associated with colorectal cancer.[5] Furthermore, observational studies report that these risk factors only confer modest increases in risk for this malignancy.[5, 6]

Epidemiologic studies usually capture data on health and lifestyle using self-administered questionnaires, which are subject to a number of random and systematic errors, such as reporting bias, and can result in misclassification. Biological markers are not subject to such errors and they may be able to better characterize true exposures by incorporating multiple factors, such as diet, lifestyle, the environment, microbiome, and genetics. Biomarkers could incorporate nutritional status, food composition, food processing or cooking products, bioactive food components, contaminants, as well as drugs and endogenous metabolites.

High-throughput technologies such as gas or liquid chromatography and mass spectrometry, or nuclear magnetic resonance spectroscopy, can be used to measure hundreds of metabolites (<1000 daltons) in any given biospecimen; this is known as metabolic profiling, or metabolomics. The aim of our study was to use metabolomics to agnostically investigate a large number of metabolites in prospectively collected serum samples in relation to incident colorectal cancer.


The PLCO Cancer Screening Trial

We conducted a nested case-control study within the screening arm of the Prostate, Lung, Colorectal, and Ovarian cancer screening (PLCO) trial, which is a large randomized controlled trial to test the efficacy of screening methods for each of these 4 cancers.[7-9] Approximately 155,000 men and women, aged 55 to 74 years, who had no history of prostate, lung, colorectal, or ovarian cancer were enrolled from 10 US centers between 1993 and 2001, and were randomly assigned to the screened or the nonscreened arm.

At baseline, all participants in the screening arm of the trial completed questionnaires that queried a variety of health conditions and lifestyle choices, as well as a 137-item food frequency questionnaire. In addition, individuals in the screened arm (n = 77,445) were offered a flexible sigmoidoscopy to examine the distal colorectum as a screening procedure for colorectal cancer; of which 83% (n = 64,658) were compliant and 89% (n = 57,559) of these procedures were considered successful (insertion to at least 50 cm with >90% of mucosa visible or a suspect lesion identified). If neoplastic lesions were detected during flexible sigmoidoscopy, participants were referred to their health care provider for a colonoscopy. All participants in the screening arm of the trial were offered a follow-up flexible sigmoidoscopy either 3 or 5 years after baseline.

Colorectal cancers were ascertained through self-reported annual health surveys and linkage to the National Death Index (for completeness) and histologically confirmed through medical record review. The Institutional Review Boards of the US National Cancer Institute and the 10 screening centers approved the study, and all participants provided informed consent.

Study Sample

Our study sample was drawn from those in the screening arm of the trial who completed the baseline risk factor questionnaire and the dietary questionnaire, provided consent for biospecimens to be used in etiologic studies, and did not have colorectal cancer at study entry according to the questionnaire data and screening sigmoidoscopy (n = 52,705). We excluded individuals who had a self-reported personal history of cancer (except basal-cell skin cancer) (n = 4924), had less than 6 months of follow-up (an additional 168 individuals), had a rare cancer during follow-up (an additional 1074), had self-reported Crohn's disease, ulcerative colitis, familial polyposis, Gardner's syndrome, or colorectal polyps (an additional 6429), and those who did not have a serum sample available from baseline (an additional 2866 individuals); some individuals fell into multiple exclusion categories.

We then selected the 254 first primary incident colorectal cancers (International Classification of Diseases for Oncology10, ICD-0-3 codes: C180-189, C199, C209, C260) whose ICD morphologies were not in the range of 8240-8249 (which are carcinoid/neuroendocrine tumors and atypical for colorectal cancer), and were identified at least 6 months after baseline through February 2011; of these, 30 were rectal cancers. Controls (n = 254) were free from any cancer at the time the matched case was diagnosed and were incidence-density sampled and matched to the cases on age at randomization (5-year intervals), sex, race, year of randomization, and season of blood draw.

Metabolite Assessment

Using nonfasting serum samples from baseline that had been stored at −70°C or lower and had not been previously thawed, we identified a range of metabolites (approximately < 1000 daltons) in serum using mView metabolomic profiling conducted by Metabolon, Inc. (Durham, NC). The details of the technology used have been previously described11,12; in brief, a nontargeted single methanol extraction was performed, followed by protein precipitation. Ultra-high performance liquid chromatography–tandem mass spectrometry and gas chromatography–mass spectrometry were used to identify peaks. Using a chemical reference library generated from 2500 standards, mass spectral peaks, retention times, and mass-to-charge ratios were used to identify individual metabolites as well as their relative quantities.

Samples were arranged in batches of up to 30, which included blinded quality control samples of pooled serum at a level of 10%. Matched cases and controls were consecutive samples within a batch, and the order of case versus control was counterbalanced within each batch. In addition, Metabolon inserted a standard every sixth sample.

Metabolite levels were batch-normalized and log-transformed. Within each batch, measurements of a given metabolite were divided by their median value. These batch-normalized levels were then log-transformed, and those individuals with values below the limit of detection were assigned the minimum of all observed values.

Statistical Analysis

Demographic data for the cases and controls was compared by either Fisher's exact test for the categorical variables, or Wilcoxon rank test for the continuous variables. The analyses were restricted to metabolites measured in >80% of individuals. Our primary analysis modeled the association between each metabolite and colorectal cancer by conditional logistic regression, adjusting for body mass index (BMI, continuous), and study center; in addition to the 5-year matching by age, we also adjusted for age in the models to more finely control for the effects of this important colorectal cancer risk factor. We used the Hosmer and Lemeshow goodness-of-fit test[13] to determine whether logistic regression appropriately captured the relation between case status and age, BMI, sex, and study center in the baseline model; the resulting P value of .26 indicated that logistic regression was a good fit.

We report the odds ratios (ORs) comparing the 90th to the 10th percentile of the metabolite values, the corresponding confidence intervals (CIs), and the P value from the likelihood ratio test comparing models with and without the metabolite. Letting X90, X10, and β denote the 90th percentile, 10th percentile, and the log(OR) from the conditional logistic regression, we defined the OR of interest by math formula. We repeated the analyses for each sex separately. We used Bonferroni correction to account for multiple comparisons, with the threshold for statistical significance determined by P values < .05/number of metabolites analyzed. Using a Bonferroni-corrected threshold will limit our power to detect associations; based on the threshold α = .05/278 and after accounting for within-individual variability, our power to detect a metabolite with an OR = 3 is only 0.4 and a metabolite with an OR = 5 is 0.9. Therefore, many relatively strong associations may not be detectable in this analysis. When considering P values from our secondary analyses, testing the metabolite-colorectal cancer association separately in men and women, a conservative Bonferroni threshold (6 × 10−5 = 0.05/(278 × 3) should account for the multiple metabolites (278 metabolites) and the multiple tests per metabolite (both sexes combined, men only, and women only).

We examined the proportion of metabolites associated with cancer by a quantile-quantile (QQ) plot. We plotted, on a log10-scale, the expected P values (n/(n +1), (n − 1)/(n + 1),…,1/n) against the observed P values, ordered from largest to smallest math formula, where n is the number of metabolites. We also plotted a pointwise 95% CI showing the range of p(i) that can occur by chance. Specifically, we created 1000 permuted data sets by randomly assigning case-control status within each matched pair, calculated math formula for each permutation math formula, and then extracted the 2.5th and 97.5th quantiles of each math formula.

We also performed a standard pathway analysis. Specifically, we evaluated whether the metabolites within predefined pathways were associated with the outcome using the Gene-Set Enrichment analysis.[14] P values for the Gene-Set Enrichment analysis were calculated by permutation and are therefore valid given the potentially high correlation between some groups of metabolites.

We examined whether the metabolites, as a group, can distinguish cases from controls using principal component analysis.[15] We calculated the top-10 principal components of the metabolite measurements, and used a likelihood ratio test to evaluate whether each principal component was associated with the outcome. We then attempted to correctly classify case status by multiple metabolites using random forests. Briefly, random forests, an ensemble learning method: 1) constructs 500 classification trees based on 500 bootstrapped samples; 2) uses each tree to classify the out-of-bag individuals (individuals not used to build the tree); 3) obtains a single prediction or probability for each individual by averaging over all trees. We report the classification error, area under the receiver operating characteristic curve (AUC) based on the probabilities, and their corresponding P values from Wilcoxon rank-sum tests.

Although the incidence of colorectal cancer is roughly equally common in men and women, and several established risk factors are evident among both sexes, there are some sex disparities in the incidence of this malignancy by anatomic subsite and age,[16] as well as the clear role of female hormones.[17] With this in mind, we also conducted separate analyses within each sex. Furthermore, in exploratory analyses, we conducted analyses stratified by length of follow-up time.

We assessed the technical reliability of our data by calculating coefficients of variation and intraclass correlation coefficients (ICC) for the quality control samples. We have also previously reported on the overall reliability and validity of this platform.[18] All analyses were performed with SAS software version 9.1.3 (SAS Institute, Cary, NC) and the R statistical language version 3.0.1.


The median follow-up time from serum collection to diagnosis of colorectal cancer was 7.8 years (25th and 75th percentiles were 5.6 and 10.1 years, respectively). None of the baseline characteristics were significantly different between cases and controls (Table 1).

Table 1. Characteristics of Colorectal Cancer Cases and Controls From the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Triala
Characteristic Casesb(n = 254)Controlsb (n = 254)Pc
  1. a

    Figures may not add to total due to some missing values.

  2. b

    Total counts for categorical variables, means and standard deviations for continuous variables.

  3. c

    Fisher's exact test (categorical variables) or Wilcoxon rank test (continuous variables),

  4. d

    Among women only.

Age (years) 64.3 (5.1)64.3 (5.1)Matched
Cigarette smokingCurrent2619 
EducationHigh school or less8584 
 Post–high school training/some college8575 
 College or postgraduate8594.59
BMI (kg/m2)<257888 
 25 to <30103112 
 30 to <355441 
Vigorous physical activity (h/wk)<1 h7578 
 1-3 h11295 
 4+ h6575.35
Menopausal hormone usedNever3828 
Regular aspirin useNo124131 
Regular ibuprofen useNo174183 
Alcohol (g/day) 12.2 (23.0)13.2 (23.6).44

There were a total of 676 metabolites identified, of which 447 were named. Among the named metabolites, 278 were measured in >80% of the individuals in our study, and they included amino acids, carbohydrates, fatty acids, androgens, and xenobiotics.

Using the quality control samples to assess technical reliability, the median coefficient of variation across all of the metabolites was 0.10 (interquartile range = 0.04-0.21). Furthermore, the median ICC was 0.86 (10th-90th percentile: 0.39-0.95; 25th-75th percentile: 0.64-0.92).

A QQ plot revealed that there was no overall association between serum metabolites and risk of incident colorectal cancer (Fig. 1). In analyses of individual metabolites, none were statistically significantly associated with colorectal cancer after Bonferroni correction for multiple comparisons. The top 20 metabolites with the lowest P values are shown in Table 2; the lowest P value was for leucyl-leucine (OR = 0.50; 95% CI = 0.32-0.80; P = .003), although this did not reach the Bonferroni-corrected level of statistical significance. Associations for leucyl-leucine were also evident in each sex (OR = 0.42; 95% CI = 0.22-0.83; P = .0089 for men, Table 3; and OR = 0.45; 95% CI = 0.21-0.95; P = .029 for women, Table 4), albeit they did not reach the Bonferroni-corrected threshold for multiple comparisons.

Figure 1.

A QQ-plot of the P values (on a −log10 scale) and 95% confidence intervals from conditional logistic regression—adjusted for age, BMI(continuous), and study center—for each metabolite in relation to colorectal cancer compared to the values that would be expected by chance.

Table 2. Odds Ratios (OR) and 95% Confidence Intervals (CI) for Diagnosis of Colorectal Cancer Comparing Men and Women Combined in the 90th and 10th Percentiles for Each of the Top 20 Metabolites, Using Conditional Logistic Regression
 Metabolite% of Individuals With This MetaboliteOR (95% CI)aPb
  1. a

    Adjusted for age, BMI (continuous), and study center.

  2. b

    Bonferroni corrected P value for significance: 0.05/278 = 0.00018.

1Leucyl-leucine990.50 (0.32-0.80).003
2Fumarate980.55 (0.36-0.84).004
310-undecenoate (11:1n1)1000.52 (0.31-0.87).012
4Xanthine971.90 (1.14-3.18).012
5Stearate1000.52 (0.30-0.89).015
6Glycochenodeoxycholate961.76 (1.09-2.85).018
7Andro steroid monosulfate 2862.08 (1.12-3.87).019
8Alpha-tocopherol1000.56 (0.33-0.95).028
917-methylstearate930.61 (0.39-0.96).030
10Palmitate1000.56 (0.33-0.96).031
11Eicosenoate (20:1n9 or 11)1000.55 (0.32-0.95).032
12Margarate950.57 (0.34-0.96).033
13Dihomo-linoleate (20:2n6)1000.57 (0.34-0.96).033
14Dihomo-linolenate1000.58 (0.35-0.97).034
15Cyclo(phe-phe)821.88 (1.03-3.41).036
16Octadecanedioate970.63 (0.41-0.98).036
17Docosapentaenoate1000.57 (0.34-0.98).039
18N-acetylserine841.76 (1.01-3.06).043
1913-methylmyristic acid950.67 (0.45-1.00).043
20Leucylalanine990.60 (0.36-1.00).046
Table 3. Odds Ratios (OR) and 95% Confidence Intervals (CI) for Diagnosis of Colorectal Cancer Comparing Men in the 90th and 10th Percentiles for Each of the Top 20 Metabolites, Using Conditional Logistic Regression
 Metabolite% of Individuals With This MetaboliteOR (95% CI)aPb
  1. a

    Adjusted for age, BMI (continuous), and study center.

  2. b

    Bonferroni corrected P value for significance: 0.05/(278 × 3) = 0.00006.

113-methylmyristic acid960.34 (0.17-0.67).0007
2Methyl palmitate (15 or 2)1000.31 (0.14-0.66).0014
317-methylstearate940.34 (0.17-0.70).0017
410-undecenoate (11:1n1)1000.34 (0.15-0.76).0062
5Leucyl-leucine990.42 (0.22-0.83).0089
62-hydroxystearate1000.39 (0.18-0.83).012
7Margarate940.42 (0.21-0.85).013
8Scyllo-inositol970.45 (0.22-0.91).021
9Stearate1000.42 (0.19-0.91).023
10Pentadecanoate1000.44 (0.20-0.93).026
112-hydroxypalmitate1000.45 (0.22-0.93).027
12Fumarate980.50 (0.26-0.96).028
13Stearidonate (18:4n3)910.41 (0.18-0.93).029
14N-acetylserine852.63 (1.07-6.45).030
15Glutaroyl carnitine970.46 (0.22-0.95).030
16Glycerate1000.43 (0.20-0.94).030
17N-acetylneuraminate872.46 (1.03-5.89).038
18Xanthine962.19 (1.01-4.73).041
19Phenol sulfate1001.98 (0.97-4.00).054
20Myristate1000.49 (0.23-1.03).055
Table 4. Odds Ratios and 95% Confidence Intervals for Diagnosis of Colorectal Cancer Comparing Women in the 90th and 10th Percentiles for Each of the Top 20 Metabolites, Using Logistic Conditional Regression
 Metabolite% of Individuals With This MetaboliteOR (95% CI)aPb
  1. a

    Adjusted for age, BMI (continuous), and study center.

  2. b

    Bonferroni-corrected P value for significance: 0.05/(278 × 3) = 0.00006.

1Glycochenodeoxycholate955.34 (2.09-13.68).0001
2Pantothenate1000.32 (0.14-0.74).0046
3Cyclo(phe-phe)813.69 (1.37-9.94).0064
4Taurochenodeoxycholate873.63 (1.35-9.78).0069
5Pyridoxate1000.39 (0.19-0.81).0069
6Glycocholate873.62 (1.33-9.87).0077
7Taurolithocholate-3-sulfate853.30 (1.31-8.34).0084
8Taurodeoxycholate883.22 (1.27-8.19).0096
9Glycylvaline982.57 (1.14-5.81).018
10Glycoursodeoxycholate833.15 (1.12-8.90).025
11Pyroglutamine1000.40 (0.17-0.92).027
12Leucyl-leucine1000.45 (0.21-0.95).029
13Glycolate-hydroxyacetate972.16 (1.06-4.42).029
14Pyroglutamylglycine812.70 (1.04-7.00).034
15Dihomo-linolenate1000.42 (0.19-0.96).035
16Glycolithocholate sulfate962.21 (1.04-4.70).035
17N-acetylalanine1000.43 (0.19-0.97).036
181-palmitoyl glycerophosphoethanolamine1002.18 (1.02-4.69).037
19Succinate920.47 (0.22-1.00).041
202-hydroxybutyrate (AHB)1000.45 (0.21-0.99).042

With the exception of leucyl-leucine, the 20 metabolites with the lowest P value in men (Table 3) were different from the 20 metabolites with the lowest P value in women (Table 4). Among men, no metabolites were significantly associated with colorectal cancer. In women, glycochenodeoxycholate was positively associated with colorectal cancer (OR = 5.34; 95% CI = 2.09-13.68; P = .0001), although this did not withstand stringent correction for multiple testing. Glycochenodeoxycholate was not associated with cancer risk among men (OR = 1.16; 95% CI = 0.58-2.33; P = .67). The associations between glycochenodeoxycholate, a secondary bile acid, and colorectal cancer did not materially change when the models were also adjusted for history of gallbladder disease, or hormone therapy use among women. Exploratory analyses did not reveal any distinct differences in the associations between metabolites and the colorectal cancers diagnosed in the first 8 years of follow-up versus those diagnosed after 8 years of follow-up.

Examining metabolic pathways did not reveal any associations among men. Five metabolic pathways were statistically significant at the P < .05 level among women, including bile acid metabolism (P = .012), tocopherol metabolism (P = .022), glutamate metabolism (P = .027), Krebs cycle (P = .032), and dipeptides (P = .043); however, after correcting for multiple testing for 44 different pathways, none of these remained statistically significant.

Global tests of associations, combining information across metabolites, similarly suggested that metabolomic profiles were not associated with case status. None of the top 10 principal components were significantly associated with case status, with all likelihood ratio test P values > .2. The classification error rate using Random Forests was 0.497 (P = .483) and the AUC based on the resulting probabilities was 0.521 (P = .405).


Using an agnostic metabolomics approach to investigate serum biomarkers, we found no overall association between serum metabolites and colorectal cancer; although a suggestive inverse association was evident for serum leucyl-leucine and colorectal cancer in analyses combining both sexes and among each sex, neither reached the Bonferroni-corrected level of statistical significance. Serum glycochenodeoxycholate was positively associated with colorectal cancer among women, but this association did not quite reach statistical significance after stringent correction for multiple testing.

For many years, it has been suspected that bile acids play a role in colorectal carcinogenesis through a variety of mechanisms, including apoptosis, tumor promotion, and oxidative stress.[19, 20] A previous study reported that serum levels of bile acids were higher in patients with colon cancer compared to healthy controls.[21] Glycochenodeoxycholate is an acyl glycine and a bile acid–glycine conjugate; more specifically, it is a secondary bile acid produced by the microbial flora in the large intestine.[22] Biologically, experimental studies have shown that glycochenodeoxycholate generates reactive oxygen species,[23] and has been positively associated with hepatocellular carcinoma.[24] The specificity of this metabolite to the colon as well as its role as a bile acid both lend biological credence to the association that we observed between this metabolite and colorectal cancer risk; it is not clear, however, why this suggestive association was only evident among women in our study.

The only metabolite evident in both sexes was an inverse association between serum leucyl-leucine, a dipeptide, and colorectal cancer, although it was not deemed statistically significant after Bonferroni correction. Leucyl-leucine is a product of incomplete protein breakdown, and, to our knowledge, has not previously been studied in relation to carcinogenesis.

Although in our study many of the metabolites were not statistically significant after strict Bonferroni correction for multiple comparisons, we did identify some associations that have been previously reported, albeit not in prospectively collected biospecimens. Serum myristate was inversely associated with colorectal cancer in a previous study,[25] as well as in men in our study; myristate is found in nutmeg, oils, and fats. Although using a different biospecimen type, tissue fumarate was previously inversely associated with colorectal cancer,[26] and serum levels were also inversely associated with colorectal cancer risk in our study. In addition, urinary succinate was previously inversely associated with colorectal cancer,[27] this metabolite was also one of the metabolites with the lowest P values in our study and was inversely associated with risk. Although fumarate is also used as a food additive, both fumarate and succinate are involved in energy production as intermediates in the Krebs cycle.

Previous studies have investigated metabolic profiles in relation to colorectal cancer, but not using prospectively collected samples; instead they are studies in diagnosed cases investigating potential screening or diagnostic markers. A study in Denmark analyzed serum metabolites in metastatic colorectal cancer cases (153 cases and 139 controls) and investigated markers of overall survival,[28] and several other studies have compared metabolic profiles of serum in colorectal cancer cases and controls, including 2 small studies (one of 64 cases25 and another of 60 cases[29]), as well as a larger study (of 222 cases).[30] In addition to serum, studies have analyzed other types of biospecimen in relation to colorectal cancer, including fecal water extracts (among 21 cases),[31] urine samples (before and after surgery in 24 cases,[32] and another study of 64 cases[27]), as well as tumor tissue (all studies had fewer than 32 cases).[26, 33-35] The data from all of these previous studies collected the biospecimens after colorectal cancer diagnosis; therefore, they examined the effects of the disease itself, as well as any treatment, including bowel preparation before endoscopy and biopsy. Our study, on the other hand, was designed to prospectively investigate potential markers of exposure before the onset of disease. We did conduct exploratory analyses to determine whether the cases diagnosed during the first 8 years of follow-up were associated with different metabolites to those diagnosed after the first 8 years, but we did not identify any distinct differences. Nevertheless, future studies should consider the length of follow-up time from the blood collection to the cancer diagnosis and the effect this may have on the metabolite–cancer association.

There were several important limitations to our study. First, we measured metabolites in a single serum sample, which cannot incorporate daily variability in metabolites that could attenuate associations and cannot capture past or lifetime exposures. A previous study highlighted that most metabolites have moderate variability over time, and additional sample collections would improve exposure assessment.[18] Second, the serum samples were not fasting samples; previous studies have shown that 20% to 50% of metabolites can be affected by fasting status. For some metabolites, the effect of fasting status is large, although for most metabolites, the effects are modest.[18, 36, 37] Third, our analysis was limited to metabolites captured by the technology we used; other technologies, such as nuclear magnetic resonance, may have captured different metabolites. In addition, our endpoint is potentially heterogenous, as some studies have suggested there may be distinct molecular subtypes of colorectal cancer, characterized by microsatellite instability or CpG island methylator phenotype.[38] Furthermore, the subsite of the tumor may provide distinct etiologic information; unfortunately, we had too few cases within each subsite to investigate this but future studies would benefit from considering tumor location. Finally, due to the relatively small sample size, particularly for the stratified analyses by sex, it is possible that our findings could be due to chance.

Despite the limitations, our study has many strengths. A major strength is the large number of metabolites assayed in a relatively large sample size, which allowed us to investigate many associations in a population that is generalizable to other US populations. In addition, our study was based on prospectively collected serum samples and incident cases of colorectal cancer, reducing the chances of metabolic perturbations as a result of underlying disease. Finally, the laboratory produced good technical ICCs for the metabolites measured.

In conclusion, analyzing metabolites in nonfasting serum from a single point in time did not unveil any associations with colorectal cancer that withstood Bonferroni correction for multiple testing when both sexes were combined. Among women only, serum glycochenodeoxycholate was positively associated with colorectal cancer, but after correcting for multiple comparisons for each sex, the association did not quite reach statistical significance. These findings for colorectal cancer should be investigated in additional studies; furthermore, it would be informative to conduct metabolomics analyses of colorectal adenomas to compare with the data for colorectal cancer to shed light on potentially important metabolic pathways involved in the adenoma-carcinoma sequence.


This work was supported by the Intramural Research Program of the National Cancer Institute, National Institutes of Health, Department of Health and Human Services.


The authors made no disclosures.