Plasma metabolites associated with colorectal cancer: A discovery‐replication strategy

Colorectal cancer is known to arise from multiple tumorigenic pathways; however, the underlying mechanisms remain not completely understood. Metabolomics is becoming an increasingly popular tool in assessing biological processes. Previous metabolomics research focusing on colorectal cancer is limited by sample size and did not replicate findings in independent study populations to verify robustness of reported findings. Here, we performed a ultrahigh performance liquid chromatography‐quadrupole time‐of‐flight mass spectrometry (UHPLC‐QTOF‐MS) screening on EDTA plasma from 268 colorectal cancer patients and 353 controls using independent discovery and replication sets from two European cohorts (ColoCare Study: n = 180 patients/n = 153 controls; the Colorectal Cancer Study of Austria (CORSA) n = 88 patients/n = 200 controls), aiming to identify circulating plasma metabolites associated with colorectal cancer and to improve knowledge regarding colorectal cancer etiology. Multiple logistic regression models were used to test the association between disease state and metabolic features. Statistically significant associated features in the discovery set were taken forward and tested in the replication set to assure robustness of our findings. All models were adjusted for sex, age, BMI and smoking status and corrected for multiple testing using False Discovery Rate. Demographic and clinical data were abstracted from questionnaires and medical records.

patients/n = 153 controls; the Colorectal Cancer Study of Austria (CORSA) n = 88 patients/n = 200 controls), aiming to identify circulating plasma metabolites associated with colorectal cancer and to improve knowledge regarding colorectal cancer etiology. Multiple logistic regression models were used to test the association between disease state and metabolic features. Statistically significant associated features in the discovery set were taken forward and tested in the replication set to assure robustness of our findings. All models were adjusted for sex, age, BMI and smoking status and corrected for multiple testing using False Discovery Rate. Demographic and clinical data were abstracted from questionnaires and medical records.
What's new? Colorectal cancer exhibits certain characteristic changes in metabolic pathways. To expand upon previous findings, these authors performed a discovery-replication study using two large independent study populations from different countries, Germany and Austria. They tested metabolic profiles of cancer patients and controls, identifying 691 statistically significant features in the discovery cohort. Testing the second cohort narrowed it to 97. These corresponded to 28 metabolites, of which 15 could be identified. It will be useful to go forward with prospective analysis on these 15 metabolites, to determine whether they have predictive or prognostic value.

Background
Colorectal cancer is a major public health concern worldwide, with 1.4 million new cases and an estimated 700,000 deaths annually. 1 Colorectal cancer is characterized by a distinct metabolic phenotype and changes in key metabolic pathways such as glycolysis or the tricarboxylic acid (TCA) cycle. 2,3 Yet, underlying mechanisms involved in colorectal carcinogenesis are still unclear 4 .
Metabolomics is a powerful approach to unravel metabolic changes associated with disease and is gaining momentum in the field of cancer epidemiology. [5][6][7] Compared to other "-omics" techniques, metabolomics is more closely related to a measured clinical phenotype and is increasingly applied as the method of choice to screen for potential metabolites associated with disease status. 8,9 Moreover, metabolomics can help to understand the underlying etiology of cancer development. 10 Differences in metabolic profiles have been reported between colorectal cancer patients and colorectal cancer-free individuals using nuclear magnetic resonance techniques, 11 gas chromatography, [12][13][14][15] and liquid chromatography-mass spectrometry methods. 16 Various amino acids, such as aspartic acid, have been shown to be more abundant in cases in different, relatively small, studies, including a study by Nishiumi and colleagues comparing serum metabolite levels of 60 colorectal cancer patients and 60 healthy volunteers using gas-chromatography time-of-flight (TOF) mass spectrometry. 13 Similarly, a study by Denkert et al. examined metabolic profiles in colon tissue and normal mucosa samples of 27 colorectal cancer patients and 18 colorectal cancer-free individuals. 15 In addition to amino acids, serum taurine was shown to be more abundant among colorectal cancer patients compared to colorectal cancer-free individuals. Another study among 101 newly diagnosed colorectal cancer patients reported a clear difference between serum glutamine, fatty acids, and the urea and TCA cycle metabolites compared to 102 colorectal cancer-free controls. 17 The majority of these previous studies have been limited by sample size and did not perform replication of their findings in independent study populations. As metabolomics studies often identified a wide range of metabolites due to the variety of analytical platforms, clinical protocols, and sample handling procedures used, leveraging an independent population for replication using the same platform and similar protocols is essential to ensure robustness of findings. To date, only few studies have used a discovery-replication design to reproduce results in independent study populations. 16,18,19 Two of these studies investigated metabolic differences between colorectal cancer patients and apparently healthy individuals; 16,18 a third study evaluated metabolomic differences between matched tumor and healthy colon tissue samples from colorectal cancer patients. 19 In addition, a very recent study investigating metabolic profiles in adenomas, colorectal cancer cases and controls conducted analysis in two datasets utilizing different metabolomic approaches, but with both sample sets deriving from the same hospital and cohort. 20 To complement current research, we utilized a powerful combination of untargeted metabolomics analysis, able to reveal (novel) metabolites, a rigorous discovery-replication design, leveraging samples deriving from two independent study populations, as well as relatively large sample sizes to obtain sufficient statistical power. The overall purpose of our study was to discover, and replicate plasma metabolites associated with colorectal cancer to improve knowledge regarding potential disease etiology.

Study populations
We utilized data from two cohort studies embedded in the MetaboCCC Consortium, a consortium of four independent European cohorts to investigate metabolic profiles across the continuum of colorectal carcinogenesis: (1) the Heidelberg site of the international ColoCare Study (ClinicalTrials.gov Identifier: NCT02328677) and (2) the Colorectal Cancer Study of Austria (CORSA). The CORSA and ColoCare studies were selected given the availability of samples from colorectal cancer patients as well as controls. EDTA plasma samples from 621 participants were analyzed, consisting of 268 patients with newly diagnosed colorectal cancer and 353 controls. We applied independent discovery (ColoCare Study: n = 180 patients/n = 153 controls) and replication (CORSA Study: n = 88 patients/n = 200 controls) sets using an identical metabolomics platform (Supporting Information Fig. S1).
The ColoCare Study, in Heidelberg initiated in 2010, is an ongoing, international, multicenter prospective study including women and men newly diagnosed with primary colorectal cancer. Patients are recruited at the University Hospital of Heidelberg and the National Center for Tumor Diseases in Heidelberg, Germany. Participants provided consent prior to tumor resection if they met the following inclusion criteria: newly diagnosed colorectal cancer (both colon (ICD-10 C18) and rectal or recto-sigmoidal cancer (ICD-10 C19/C20)), any stage of the disease, 18+ years at the time of diagnosis, and German-speaking. EDTA blood samples from colorectal cancer patients were collected prior to surgery. Control participants were enrolled in the PRAEVENT Study, a populationbased study subjected to similar protocols and procedures, conducted at the National Center for Tumor Diseases in Heidelberg, Germany. All participants consented to take part in our study and EDTA blood samples were collected during a visit at the National Center for Tumor Diseases at recruitment (usually the same day after the consent dialog and after signing the informed consent form).
In the ongoing CORSA Study participants are recruited in cooperation with the province-wide screening project "Burgenland Prevention Trial of Colorectal Disease with Immunological Testing" (B-PREDICT), since 2003. All inhabitants of the Austrian province Burgenland aged between 40 and 80 years are invited annually to participate in fecal occult blood testing. Positive fecal occult blood tested individuals are subsequently offered a complete colonoscopy, and EDTA blood samples are collected prior to examination. Additional colorectal cancer patients are recruited at the General Hospital of Vienna (Department of Surgery), and at three additional hospitals in Vienna. All colorectal cancer patients included in the CORSA Study are individuals with histologically confirmed, sporadic colorectal cancer. CORSA controls are individuals who received a complete colonoscopy within the B-PREDICT screening but exhibited no pathological findings of disease.
All colorectal cancer samples selected for inclusion into the presented study were collected prior to any clinical treatment, including surgery or neo-adjuvant therapy, and did not have a prior history of cancer. Controls included in the study can be considered as "cancer-free"; having no prior history of cancer.
Patients and controls were 95% of Caucasian origin, recruited within the last 15 years and selected to be matching according to their recruitment time point. Clinical data, including tumor location, staging, and treatment history were abstracted from medical records. Demographic characteristics (e.g. age, weight, height and smoking status) were assessed by study-specific questionnaires. All clinical and demographic data were harmonized across all cohorts.

Sample collection and analysis
In both cohorts, nonfasted EDTA blood samples were collected and processed within 4 h, according to identical processing protocols, and stored at −80 C. Samples at each respective study site were shipped on dry ice to the International Agency for Research on Cancer (IARC) in Lyon, France for analysis. Samples were analyzed with a ultrahigh performance liquid chromatography-quadrupole time-of-flight mass spectrometry (UHPLC-QTOF-MS) system (Agilent Technologies) consisting of a 1,290 Binary LC system, a Jet Stream electrospray ionization (ESI) source, and a 6,550 QTOF mass spectrometer. Samples from each study center were analyzed in cohort-specific batches, which consisted of five and six 96-well plates for CORSA and ColoCare, respectively.
A detailed overview of the sample preparation and a complete description of sample analysis by UHPLC-QTOF-MS, pre-processing of metabolomics data can be found in Supporting Information File S1. A summary of the data processing workflow is shown in Supporting Information Figure S1.

Data analysis
Features with missing values in >50% of either colorectal cancer patient or control samples in both populations were excluded from analysis. The remaining maximum 50% of missing values were not imputed according to the recommendations of Di Guida et al. 21 . Blank adjustment was applied for the ColoCare and CORSA samples separately; features that had a minimum relative mean intensity below the relative mean intensity of blank samples were removed. "Features" were defined as chromatographic peaks formed by specific ions, while "compounds" or "metabolites" referred to a confirmed molecule that can consist of one or more features (adducts, clusters and fragments).
Feature intensities were log transformed using the natural logarithm prior to statistical analysis, to prevent heteroscedasticity. 21,22 Demographic and clinical characteristics are presented as medians with the interquartile range (IQR), or as numbers with corresponding percentages. Body mass index (BMI) was calculated as weight (kg) divided by the square of height (m 2 ). BMI status was categorized based on the recommendations from the World Health Organization (WHO): underweight (<18.5 kg/m 2 ), normal weight (18.5-24.9 kg/m 2 ), overweight (25.0-29.9 kg/m 2 ) and obese (≥30.0 kg/m 2 ). Smoking status was categorized as current, former, and never.
Discovery stage. The discovery analysis was conducted in ColoCare samples. Log standardized odds ratios (OR.std) and 95% confidence intervals (CIs) were calculated using multiple logistic regression models with disease state as dependent variable to test the association with feature intensities. The OR.std represents the change in colorectal cancer occurrence when there is a one standard deviation (SD) change in metabolite intensity, allowing comparison of effect sizes between different features. Since odds ratios were standardized, the SD of the controls were used to calculate the OR.std. Sex, age, BMI (continuous), and smoking status were included as covariates in the final model. Features that showed significant differences between colorectal cancer patients and controls after correction for multiple testing, using False Discovery Rate (FDR) correction, in the discovery stage were carried forward to the replication stage. A priori, an FDR p-value <0.05 was considered statistically significant.
Replication stage. The replication stage was conducted in CORSA Study samples. Significant features (FDR p < 0.05) from the discovery stage were analyzed in the replication stage using the same modeling approach as in the discovery stage. Features were tested if they point in the same direction as the corresponding effects in the discovery stage (one-sided testing). Analyses were checked for any influence by analytical batch, but no marked effect could be identified in both stages. Features with significant test results were selected for identification using authentic chemical standards at IARC. A detailed overview of metabolite identification is explained in Supporting Information Table S1. When more than one mass spectrometry feature corresponded with a metabolite, the feature with the highest intensity was selected and presented in the manuscript (Supporting Information Table S2).
Spearman correlation analysis was used to identify metabolite-metabolite correlations among all identified metabolites and to understand the intra-relation of metabolites. Spearman correlation coefficients were calculated for all pairs of annotated features for samples from the discovery and replication set to account for deviations from linearity. All statistical analyses were performed in R, version 3.3.3. 23

Participant characteristics
Characteristics of the study population are summarized in Table 1. The ColoCare cohort consisted of 63% men in the colorectal cancer group and 38% men in the control group. In addition, ColoCare controls had on average less participants classified as overweight compared to the colorectal cancer patients. The CORSA cohort consisted of 68% men in the colorectal cancer group and 65% men in the controls group. Control patients from the CORSA cohort had on average slightly more participants categorized as overweight compared to colorectal cancer patients.
In both cohorts control groups consisted of more participants categorized as never smokers than compared to the colorectal cancer patients. In general, the distributions of covariates were relatively comparable between the discovery and replication cohorts. Controls from the ColoCare cohort were 13 years younger than controls from the CORSA cohort. The majority of participants have a BMI classified as overweight, except for the controls from ColoCare.

Metabolic profiles discriminating between colorectal cancer patients and controls
Metabolomics analysis yielded 10,015 mass spectrometry features, defined as a chromatographic peak formed by specific ions that were identified across all study samples. After data pre-processing, 1,156 and 1,148 features were carried forward for ColoCare and CORSA samples, respectively.
Next, 691 out of 1,156 features were found to be statistically significantly associated with disease state (discovery stage) after FDR correction and adjustment for age, sex, BMI, and smoking status. The 691 significant features were subsequently analyzed in the replication dataset, i.e. the CORSA Study samples. Of these features, 97 differed between CORSA patients and controls.

Correlation analysis
Spearman correlation analysis was used to identify potential metabolite-metabolite correlations among all identified metabolites (Fig. 2). Correlation patterns demonstrated similar results across the discovery (Fig. 2a) and replication stage (Fig. 2b). For both stages, all LysoPCs were positively correlated (Spearman correlation coefficient range [r s ]: 0.40-0.91) but showed only a weak correlation to LysoPE (22:6) and LysoPE (20:4). Valine and leucine were highly correlated (discovery stage r s : 0.73, replication stage r s : 0.78). In addition, the majority of replicated compounds annotated as unknown (n = 13) were correlated with each other but showed only weak correlations with the other annotated compounds. Spearman correlation coefficients are shown in Supporting Information Table S3.

Discussion
In our study, we identified plasma metabolites that are associated with colorectal cancer and which were replicated in an independent study population. We found 28 metabolites associated with disease state in two independent study cohorts, the ColoCare and CORSA studies. In total, 15 out of 28 metabolites could be identified. Taurine, hypoxanthine, valine, leucine, LysoPCs, and LysoPEs have been reported to be linked with colorectal cancer in previous metabolomics studies. All LysoPCs were positively correlated, valine and leucine were highly correlated, and the majority of unidentified metabolites were correlated with each other. Except for valine and leucine, the identified metabolites were only slightly or not correlated with each other.
Taurine was previously shown to be increased in serum of 60 colorectal cancer patients compared to 60 apparently healthy individuals 13 and in tumor tissue of 16 colorectal cancer patients; 24 which is in agreement with our findings. Recent studies have suggested taurine as a microbiotaassociated metabolite playing a mediating role in microbiomehost interactions. 25,26 Given the knowledge that gut microbiota differ between colorectal cancer patients and healthy individuals, and that microbial composition is linked to colorectal cancer risk, 27 taurine presents a promising candidate for further investigation.
Hypoxanthine has been previously reported to be increased in tumor tissue of colorectal cancer patients compared to normal tissue of healthy individuals. 15 In contrast, a recent study,  Overweight, 25 20 Like taurine, 5 hypoxanthine is an antioxidant and increased levels reported in our study may be the result of increased oxidative stress, 28 which is recognized as an important process in carcinogenesis, including colorectal cancer. 29,30 Inconsistent findings in hypoxanthine levels may be due to the type of specimen analyzed, or lack of statistical power because of lower sample numbers included. Furthermore, a possible reason for the inconsistent hypoxanthine levels may be caused by red blood cell hemolysis during the preparation of serum samples utilized in the Long study in contrast to plasma used in the present analysis. 31 With respect to branched-chain amino acids (BCAAs), we observed that valine was reduced among colorectal cancer patients compared to controls. This result is consistent with two prior studies; Ma and colleagues compared serum of 30 colorectal cancer patients to 30 colorectal cancer-free controls, 12 and Farshidfar et al. investigated metabolomic signatures in colorectal cancer serum of stage I-IV patients. 14 Comparable to valine, decreased plasma levels of leucine were also reported in our colorectal cancer patients compared to controls. Decreased blood levels of BCAAs could reflect increased requirement for amino acids due to the high protein turnover in the malignant setting. 15,19,32 Moreover, seven LysoPCs were detected at lower levels among colorectal cancer patients compared to controls. LysoPC (16:0) and LysoPC (18:0) were reported before to be lower in the plasma of colorectal cancer patients versus control individuals. 33,34 There seems to be a general trend of lower levels of LysoPCs among colorectal cancer patients in existing studies, 17,33,35 which is in line with the findings reported in our study. This pattern might reflect an increased degradation rate of LysoPCs as a result of the accelerated cell proliferation rate of cancerous cells. 36 It has been suggested that decreased levels of LysoPCs could result from weight loss and possibly inflammatory processes related to cancer. 37,38 While the majority of our study participants were classified as overweight, we did not have data on changes in body weight among patients prior to a colorectal cancer diagnosis.
LysoPE (20:4) and LysoPE (22:6) were increased in colorectal cancer patients compared to controls. LysoPEs belong to the group of signaling lipids and are constituents of cell membranes. Recently, serum LysoPEs were found to be elevated among breast cancer patients. 39 However, knowledge is limited regarding the role of LysoPEs in healthy and diseased individuals.
We also identified a notable decrease in MNA, an inactive metabolite of nicotinamide, 40 among colorectal cancer patients compared to controls. MNA has been reported in vivo to be involved in the COX-2/PGI 2 pathway, 40 which plays a major role in inflammation and colorectal carcinogenesis. 41,42 In addition, this is the first metabolomics study to report lower plasma bilirubin levels in colorectal cancer patients compared to controls. Previously, a European study analyzing genomic alterations in promoter variants involved in bilirubin homeostasis, and another study investigating serum bilirubin levels in a large U.S. population have proposed a protective effect of bilirubin against colorectal carcinogenesis; 43,44 our metabolomics findings carefully support this hypothesis. The underlying mechanisms of the relationship between bilirubin and colorectal cancer remain unclear. Controls are defined as individuals not diagnosed with any colorectal malignancy. 2 According to MSI. 3 Log transformed relative intensity values. 4 OR.std: standardized Odds Ratio, represents the relative change in colorectal cancer (CRC) risk when there is a one standard deviation (SD) change in metabolite intensity. OR.std is based on the SD of the controls.  Controls are defined as individuals not diagnosed with any colorectal malignancy. 2 According to MSI. 3 Log transformed relative intensity values. 4 OR.std: standardized Odds Ratio, represents the relative change in colorectal cancer (CRC) risk when there is a one standard deviation (SD) change in metabolite intensity. OR.std is based on the SD of the controls. Untargeted metabolomics is an elegant approach for the discovery of metabolites associated with cancer. However, one may wonder whether the seemingly small differences between colorectal cancer patients and controls are biologically relevant. It is important to keep in mind that findings presented are log transformed relative values. As a consequence, reported results hint towards the direction of the association and quantification of the metabolites is needed to be able to interpret absolute differences. Our results for taurine, hypoxanthine, valine, leucine, bilirubin, and 1-methylnicotinamide suggest future research to   investigate the underlying biological mechanism of these metabolites in relation to colorectal cancer.
A strength of the present study is the use of a discoveryreplication design leveraging two independent, relatively large, patient cohorts, both including patients of Caucasian origin, from two different countries. In general, untargeted methods typically yield data with high amounts of noise and nonbiological information. 45 This makes replication of untargeted metabolomics findings within ethnically homogenous cohorts extremely valuable, as it enables the exclusion of features that are not robustly associated with the case-control status.
A limitation of our study is that due to recruitment procedures we tend to have more early stage colorectal cancer cases (stage I-II) compared to advanced metastatic patients (stage IV). This may indicate that our findings are mostly associated with early metabolic changes in colorectal carcinogenesis rather than with metastatic formation. Furthermore, findings are derived from cross-sectional data. Therefore, it is not possible to explore to which extent metabolites are causally related to cancer or cancer-related changes. Lastly, although our study was performed using a single stringent metabolomics approach across two independent populations, we acknowledge that metabolomics assays can be conducted using a variety of analytical platforms. As such, future studies should include multiple platforms to ensure the highest analytical coverage of the metabolome. Technical progress and the development of more comprehensive metabolite databases will also be needed to improve annotation of unknown compounds, including the unknown metabolites in our study. Future targeted approaches, allowing the quantitative measurement of metabolites, would allow quantification of their absolute concentrations. 46,47 In summary, our study provides new evidence of associations of colorectal cancer with plasma metabolites and also confirms some evidence of previous findings.
The combination of an untargeted metabolomics approach, a rigorous discovery-replication design utilizing large sample sizes from independent cohorts, led to the identification and replication of 28 metabolites associated with colorectal cancer, including 15 metabolites that could be identified. These 15 identifiable metabolites should be carried forward as candidates for targeted analysis in prospective cohort studies, preferably derived from a colorectal cancer screening program, to verify their discriminating or potential predicting properties. Our study provides important leads for further studies focusing on metabolic differences between colorectal cancer-free individuals, and patients with different stages of colorectal cancer. Together, our findings emphasize the power of metabolomics as a strong molecular approach for gaining novel insights regarding metabolic changes associated with colorectal cancer.