Salivary metabolomics with machine learning for colorectal cancer detection

As the worldwide prevalence of colorectal cancer (CRC) increases, it is vital to reduce its morbidity and mortality through early detection. Saliva‐based tests are an ideal noninvasive tool for CRC detection. Here, we explored and validated salivary biomarkers to distinguish patients with CRC from those with adenoma (AD) and healthy controls (HC). Saliva samples were collected from patients with CRC, AD, and HC. Untargeted salivary hydrophilic metabolite profiling was conducted using capillary electrophoresis–mass spectrometry and liquid chromatography–mass spectrometry. An alternative decision tree (ADTree)‐based machine learning (ML) method was used to assess the discrimination abilities of the quantified metabolites. A total of 2602 unstimulated saliva samples were collected from subjects with CRC (n = 235), AD (n = 50), and HC (n = 2317). Data were randomly divided into training (n = 1301) and validation datasets (n = 1301). The clustering analysis showed a clear consistency of aberrant metabolites between the two groups. The ADTree model was optimized through cross‐validation (CV) using the training dataset, and the developed model was validated using the validation dataset. The model discriminating CRC + AD from HC showed area under the receiver‐operating characteristic curves (AUC) of 0.860 (95% confidence interval [CI]: 0.828‐0.891) for CV and 0.870 (95% CI: 0.837‐0.903) for the validation dataset. The other model discriminating CRC from AD + HC showed an AUC of 0.879 (95% CI: 0.851‐0.907) and 0.870 (95% CI: 0.838‐0.902), respectively. Salivary metabolomics combined with ML demonstrated high accuracy and versatility in detecting CRC.


| INTRODUC TI ON
Despite the advances in cancer diagnosis and management in the last decade, CRC still represents a significant global health burden.
Overall, CRC ranks third in cancer morbidity and second in mortality among all cancers worldwide. 1,2 The prevalence of CRC is strongly associated with the westernization of eating and health habits.
Furthermore, it is expected to increase further in developed countries with remarkable economic growth. 1,3,4 Therefore, cancer detection is an important issue in CRC diagnosis and treatment.
Fecal occult blood tests are the most commonly used CRC screening tests in Japan. Although these tests have contributed to the reduction in the mortality rate associated with CRC, 5,6 their limited sensitivity for early-stage precancerous lesions, such as AD or CRC, indicated the need for improvement. 7 In addition, a large proportion of the at-risk population is still often detected in advanced stages. 8 Currently used blood-based biomarkers for CRC, such as CEA and cancer antigen 19-9 (CA19-9), are suitable for surveillance or prognostic indicator in CRC treatment but are unsuitable for screening or diagnosing CRC due to their low sensitivity and specificity, as well as the association with other types of gastrointestinal cancers, including gastric cancer, pancreatic cancer, or gynecological cancer such as ovarian cancer. 9 Therefore, developing a convenient novel screening method with higher sensitivity and specificity is paramount.
Frequently mutated genes have been identified in association with CRC, including APC, CTNNB1, KRAS, BRAF, and SMAD4. 10 The epigenetic variation in CRC changes the hyper-and hypomethylation, which inactivates the tumor suppressor genes and activates oncogenes, leading to epithelial cell growth to cancerous tumor formulation. 11 In addition to these genetic changes, malignant cancers, including CRC, have shown drastic metabolic shifts. For instance, regardless of oxygen availability, tumor cells activate the glycolysis pathway to produce adenosine triphosphate (Warburg effect). 12 In addition, oxidative phosphorylation upregulation has been reported in several cancers. 13 Glutamine is used as a carbon source alternative to glucose via the TCA in proliferating tumor cells to synthesize purines and pyrimidines. 14 In addition, holistic changes in the metabolic pathways have been reported, including amino acid, pentose phosphate, urea cycle, polyamine, and nucleotide pathways. [15][16][17] Therefore, the metabolites in biofluids including blood and saliva that reflect these metabolic aberrances associated with CRC have been analyzed to establish a novel set of biomarkers. [18][19][20][21][22][23][24] Saliva is an ideal biofluid that enables various disease detections. 25 However, salivary components are expected to be fragile compared with those from other biofluids. [26][27][28][29] Therefore, strict protocols must be established for processing saliva samples for reproducible quantification. For example, the unstimulated saliva collection, overnight fasting duration, restriction of any oral treatments before the sample collection, and frozen sample storage should follow standard protocols. 30,31 However, saliva tests allow noninvasive sampling, which is beneficial as cancer screening.
Metabolomic biomarkers in saliva samples have been shown to represent a potential medium for cancer detection. 32,33 Biomarkers for oral cancer and cancers in the organs far from the oral cavity, such as breast cancer and pancreatic cancers, have been reported. [34][35][36][37] To enhance the discriminability of multiple biomarkers, ML is a cornerstone. 38,39 Using urinary polyamine profiles, we previously used an alternative decision tree (ADTree)-based prediction method to detect CRC. 22 Salivary polyamines with this ML method showed high discriminability for breast cancers. 37 In this study, we performed salivary metabolomic profiling of saliva collected from patients with CRC, patients with AD, and HC. We developed ML models to determine the combination of metabolite concentrations that could discriminate among these groups. Mainly, we drew two types of comparison: comparison between CRC + AD and HC and comparison between CRC and AD + HC. More than 2000 samples were examined, and the data were divided into two datasets. One dataset was used for the ML model development, and the other dataset was used to validate the ML model. Our approach has shown the screening potential of salivary metabolomic profiles to detect CRC.

| Saliva collection
Subjects were allowed only water intake after 9:00 p.m. on the day before collection. Salivary samples were collected between 9:00 and 11:00 a.m. They were required to brush their teeth without toothpaste on the day of collection and could not use lipstick, drink water, smoke, brush their teeth, or exercise intensively 1 hour before saliva collection. A polypropylene straw 1.1 cm in diameter was used to assist in the saliva collection. Approximately 400 μl of unstimulated saliva was collected and stored in 50 ml polypropylene tubes on ice to prevent the degeneration of salivary metabolites. 30 After collection, saliva samples were immediately stored at −80°C. Visibly cloudy and highly bubbly saliva was eliminated by visual inspection, and another saliva sample was collected 5 min later. described previously. 26,27,30 Raw data processing was conducted by following the typical data processing flow. 41

| Data analysis
The collected data were randomly split into training and validation datasets ( Figure 1A). The metabolites detected in more than  42 and pathway analyses.
PLS-DA is a classification method frequently used in the metabolomics field, which aims to maximize the covariance between the independent variables (metabolites) and the corresponding dependent variables (groups) by finding the subspace of the explanatory variables, for example, independent components. 43 PLS-DA can generate score plots and VIP score plots. 44 We evaluated the discrimination ability of multiple combinations of metabolites. As one of the multivariate analyses, MLR was used. Stepwise feature forward selection was used to eliminate colinearity of the independent variables. This method limits the number of variables and deals with only linear relationships between independent (metabolites) and dependent variables (groups). Therefore, we also used the ADTree algorithm, an ML method, which is an improved form of the conventional "if-then" decision tree-based method. 45 In addition, multiple datasets were generated by random sampling allowing redundant selection, and ADTree models corresponding to each of the generated datasets were developed. Predictive ability of each ADTree model was averaged to enhance prediction accuracy (ensemble method; Figure 1B). 46 The number of nodes and ADtrees were optimized via k-fold CV using the training dataset. The developed model was validated using the validation dataset ( Figure 1A).
In this study, although data were obtained from three groups (HC, AD, and CRC), we used MLR and ADTree as a two-group classification method. Therefore, we drew two types of comparison for discriminating CRC + AD from HC (1) and CRC from AD + HC (2).
ADTree models were developed for (1) and (2), named ADTree1 and ADTree2, respectively. In addition, MLR models were developed for (1) and (2), named MLR1 and MLR2, respectively ( Figure 1C).    higher concentrations than those of HC. Several metabolites associated with AD also showed higher concentrations than those of HC. The FC between HC and CRC is visualized in Figure S1.

| Overview of profiled metabolites
The acetylated polyamines, such as N-acetylputrescine, N 1acetylspermine, N 1 ,N 8 -diacetylspermidine, N 8 -acetylspermidine, N 1 -acetylspermidine, and N 1 ,N 12 -diacetylspermine, were included, and the first two polyamines showed relatively high FCs in both datasets. Two glycolysis metabolites, pyruvate and lactate, two citrate cycle metabolites, succinate and malate, and four amino acids were also included.

| Partial least squares discriminant analysis and pathway analysis
The overall differences in metabolomic profiles between HC and CRC were evaluated using PLS-DA ( Figure 3A,B) and pathway analysis ( Figure 3C). The score plots showed separated HC and CRC ( Figure 3A). N-acetylputrescine and N 1 -acetylspermine showed high VIP scores, thus highly contributing to this discrimination ( Figure 3B). Pathway analysis detected three significantly enriched pathways, including (1) amino sugar and nucleotide sugar metabolism, (2) alanine, aspartate, and glutamate metabolism, and (3) arginine as well as proline metabolism. However, the pathway impact of pathway (1) was small, while those of (2) and (3) were relatively high.

| Alternative decision tree models
The discrimination abilities of multiple combinations of metabolites were analyzed. Two ADTree models were developed for comparison (1) to discriminate AD + CRC from HC (ADTree1) and for comparison (2) to discriminate CRC from AD + HC (ADTree2; Figure 1C). CV using training data resulted in 16 trees and 7 nodes for ADTree1

F I G U R E 2
Heatmap illustrating salivary metabolite concentrations. Each metabolite concentration was divided by its average for the training and validation dataset. These data were averaged again for each group

| Multiple logistic regression models
To compare the discrimination ability of the ADTree and MLR models, we developed two MLR models for the comparisons (1) and (2) ( Figure 1C). We developed MLR1 to discriminate AD + CRC from HC, and MLR2 to discriminate CRC from AD + HC. The ROC curves using training and validation datasets and stage-specific prediction probabilities are shown in Figure S3. MLR1 and MLR2 included three metabolites, and N-acetylputrescine was selected in both models (Table S1). All AUC values yielded by both MLR models showed a p < 0.0001 (Table S2). However, comparing the results following validation, the AUC values of ML were higher than those of MLR models.

| Comparisons with tumor markers
Carcinoembryonic antigen and CA19-9 of AD and CRC subjects were measured, and the comparisons of sensitivities among the tumor markers (TMs), ML, and MLR models were summarized (Table S3).
For TMs, the subjects showing a CEA > 5.0 ng/ml or CA19-9 > 37 U/ ml 3 were counted as positive. For MLR and ML, the optimal cutoff value was defined to maximize the sensitivity and specificity using the training dataset, and validated using the validation dataset.
The predicted probabilities higher than these cutoff values were counted as positive. In the validation dataset, the sensitivity to CEA and CA19-9 from CRC subjects were 30.5% and 16.9%, respectively.

| Effect of age on the metabolomic profile
Age showed a significant effect on the metabolic profile. Therefore, age-matched data were generated by eliminating HC subjects of lower ages. The MLR1 and MLR2 model's coefficients without feature selection were trained using training datasets and evaluated using validation datasets. In short, these models used already selected metabolites (Table S2). We conducted feature selection using the age-matched data and developed MLR1 and MLR2 (Table S5). Both models with/without feature selection showed high AUC values (p < 0.0001) in both training and validation datasets (Table S6).

| DISCUSS ION
In this study, we investigated the use of metabolomics to discover salivary-based biomarkers and discriminate among CRC, AD, and HC. As described in the heatmap (Figure 2), both training and validation datasets were highly similar. All metabolites in the map were elevated in CRC ( Figure S1). Among them, end products of glycolysis, such as lactate and pyruvate, were elevated only in CRC. Lactate, also an end product of glycolysis, was observed in various reports, including our oral cancer saliva data. 35 It can be inferred that the Warburg effect 48 and glutaminolysis, 49,50 which are characteristic of cancer metabolism, might underlie the observed characteristics.
In addition, several amino acids, such as isoleucine, valine, lysine, and alanine, were elevated in both AD and CRC. The intermediate metabolites associated with these energy and amino acid pathways were frequently reported. 15,51 In both ML models, acetylated polyamines were selected as predictive features. We previously reported the high polyamines in the urine collected from CRC patients as a valuable biomarker. 20,22 The synthesis of polyamines is attributed to various pathways, with the activation of ornithine decarboxylase that converts ornithine to putrescine in cancer cells largely affecting the polyamine contents. 52 The activation of acetylation of spermine and spermidine resulted in high level of their acetylated forms, which spread to the scrounging biofluid ( Figure S4). 53 As frequently reported, we also previously confirmed that N 1 ,N 12 -diacetylspermine showed the most clearly elevated levels in CRC urine. 22,54,55 However, other forms of acetylated polyamines, such as N 1 -acetylspermine and N 8 -diacetylspermidine, were also elevated, consistent with other studies. 56,57 In the current saliva data, N 1 -acetylspermine and N 1 ,N 8diacetylspermidine showed higher discriminability than N 1 ,N 12diacetylspermine ( Figures S5 and S6). The differences in AUC values between all stages and early stages (0, I, and II) of CRC were small.
These data are beneficial in enhancing the capacity to detect earlystage CRC subjects. The prediction probabilities calculated by the two ML models were also successfully utilized. We observed significant differences between each CRC stage and HC, even in the early stages, although stage 0 showed no significance because of the small sample numbers (Figure 4). These metabolites also showed similar discriminability for AD ( Figures S5 and S6). We previously confirmed the similar aberrant metabolomic profiles of AD and CRC, which were shifted by the activation of MYC genes observed in AD. 16 MYC induced the activation of ODC, resulting in the polyamine synthesis activation ( Figure S4). The prediction probabilities of AD by the developed models were significantly higher compared with HC, even though AD was grouped as negative data in ADTree2 ( Figure 4I,J) and MLR2 ( Figure S3G,H). These data indicate the usefulness of detecting AD and CRC, whereas the differentiation between AD and CRC is not satisfactory.
We compared the sensitivity of ADTree models with those of CEA and CA19-9 in AD and CRC data. Both ADTree models showed better sensitivity compared with these two TM. TM did not detect AD subjects; however, the ADTree and MLR models showed more than 60% sensitivity (Table S3). Furthermore, R = 0.216 between CEA and ADTree2 was the highest positive correlation between the ML model and TM in the validation data (Table S4). Therefore, the complimentary use of ADTree models and TM would benefit the screening of AD and CRC.
Right-sided CRC has a higher mortality rate and worse prognosis than left-sided CRC, and both genetic and metabolomic differences between these two sides have been reported. 58-60 Therefore, we evaluated the difference between the left and right colon on prediction accuracy. There was no significant difference between tumor locations even in the training and validation data ( Figure S7). This trend was observed for both MLR1 and MLR2. Therefore, the prediction accuracies of the developed models were not affected by tumor location.
Recently developed liquid biopsies for CRCs include methylation and abnormal levels of circulating tumor DNA and noncoding RNA, mainly micro-RNA, as markers; however, most of these markers are detected in plasma and stool. 61 In saliva samples, MiR-21 has shown CRC and HC discrimination ability. 62 A single marker that shows high specificity for a disease is beneficial for developing simple and reasonable assays as compared with simultaneous analyses of multiple markers for the detection of diseases. The analysis of volatile compounds in saliva has also shown potential to detect CRC. 63 The current study measured only hydrophilic metabolites, and more comprehensive analyses could explore the accurate biomarkers of CRC.
Several limitations need to be acknowledged. First, although the sample size is relatively large, this is a case-control study; in short, the proportion of the three groups does not reflect the actual prevalence of these diseases. Second, the current data has an age bias between HC and the other groups. Therefore, agematched subsets were randomly generated, and the models showing discrimination at a significant level were confirmed (Tables S5   and S6). However, evaluating the developed models using agematched data, including larger samples, is necessary for rigorous validations. Third, the comparison with other diseases, especially using other cancer types, was not performed. For example, salivary polyamines were elevated even in breast and pancreatic cancers. 36,37,64 The elevation of salivary amino acids was also reported in breast cancer. 65 Therefore, a single marker may not be enough for a disease-specific index, and an ML capturing multiple metabolite patterns would enhance the specificity. To use the developed biomarkers for diagnostic purposes, rigorous validation is necessary, for example, comparison of clinical-pathological features between HC and patients with CRC. The approach in the current study showed CRC detection abilities; however, room to improve the sensitivity and specificity of CRC detection still exists. In general, a lower threshold to determine the positive cases enhances the sensitivity and reduces the false-negative cases for screening purposes. Meanwhile, a higher threshold is used to enhance the specificity and reduce the false-positive cases for diagnostic purposes. Saliva metabolomics demonstrated in this study showed a high sensitivity, which is suitable for a screening test; however, the specificity is not enough for diagnostic purposes. The current result can encourage patients who show a higher risk of AD or CRC to undergo other diagnostic tests.
In conclusion, we analyzed the salivary metabolic profiles of CRC, AD, and HC. The data showed consistent profile patterns, including polyamines, with previous studies. The ensemble ADTree models successfully discriminated against these groups with high sensitivity and specificity. We also showed high generality using validation datasets. In addition, the models showed higher sensitivity compared with CEA and CA19-9. The models could contribute to clinical screening for AD and CRC.

ACK N OWLED G M ENTS
The authors thank the members of Center for Health Surveillance and Preventive Medicine, Tokyo Medical University Hospital for collecting saliva samples.

D I SCLOS U R E
Tomoyoshi Soga is an editorial board member of Cancer Science. The authors have no conflict of interest.

Approval of the research protocol by an Institutional Reviewer
Board: This study was approved by the Ethics Committee of Tokyo Medical University (Nos. 2346 and 3405).

I N FO R M E D CO N S E NT
All informed consent was obtained from the subjects.