Salivary metabolites are promising non‐invasive biomarkers of hepatocellular carcinoma and chronic liver disease

Hepatocellular carcinoma (HCC) is a leading cause of cancer mortality worldwide. Improved tools are needed for detecting HCC so that treatment can begin as early as possible. Current diagnostic approaches and existing biomarkers, such as alpha‐fetoprotein (AFP) lack sensitivity, resulting in too many false negative diagnoses. Machine learning may be able to identify combinations of biomarkers that provide more robust predictions and improve sensitivity for detecting HCC. We sought to evaluate whether metabolites in patient saliva could distinguish those with HCC, cirrhosis, and those with no documented liver disease.


| INTRODUC TI ON
In the year 2020, it is estimated that 42 810 individuals will have been diagnosed with liver and intrahepatic bile duct cancers in the United States, resulting in 30 160 deaths. 1 These cancers are the 5th and 7th leading cause of cancer deaths in males and females, respectively. 1 Hepatocellular carcinoma (HCC) comprises 80% of all diagnosed liver cancers. 2 A majority of patients that develop HCC have preexisting cirrhosis, and HCC is a leading cause of death among individuals with cirrhosis. 3 Cirrhosis can develop after infection with hepatitis B or hepatitis C, heavy alcohol consumption, or in individuals with chronic liver diseases such as nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH). [4][5][6] The prevalence of liver cancers has been steadily rising since the 1970s 1 due to hepatitis C infections and increases in obesity resulting in chronic liver diseases. 4,6,7 The 5-year survival rate of HCC is drastically different depending on the stage at diagnosis because curative therapies are often only available if HCC is detected early. 8  Since early HCC detection improves survival, 9 surveillance using ultrasound of the liver, with or without monitoring alphafetoprotein (AFP) levels, is recommended every 6 months for those with cirrhosis. 8 Surveillance using AFP alone identifies HCC with a sensitivity of 61% and specificity of 86%, and using AFP in addition to ultrasound marginally increases the sensitivity and specificity to 62% and 88%, respectively. 10 Therefore, additional informative biomarkers that could be incorporated into the surveillance of these patients could help to prevent false negative diagnoses and enable curative treatment options prior to the onset of advanced disease.
Metabolomics aims to characterize the metabolites present in a particular biospecimen and is demonstrating promise for precision medicine with its ability to distinguishing a variety of disease states. 11,12 To date, metabolite biomarkers for HCC have been identified in blood, breath and urine. 13,14 Saliva is an attractive biofluid for biomarker discovery because it can be collected non-invasively and requires limited training for collection and storage. At present, 44 studies have highlighted potential metabolite biomarkers in saliva for identifying patients with Alzheimer's disease, breast cancer, prostate cancer oral cancer and other diseases. 15 However, to our knowledge, this is the first study to identify salivary metabolite biomarkers that can distinguish patients with HCC from healthy individuals and patients with cirrhosis.

| Subject recruitment
Saliva samples were collected from a real-world clinical cohort of 110 adult patients (>18 years of age) seen at the Cleveland Clinic (Cleveland, OH) between 2018 and 2020 with cirrhosis (N = 30) or HCC (N = 37) that underwent liver transplantation for HCC or cirrhosis, surgical resection for HCC or liver biopsy with confirmed cirrhosis and/or HCC. In addition, patients attending treatment for hernia with no history of liver disease or liver cancer were used as healthy control subjects (N = 43). Clinical characteristics of study participants and diagnostic criteria including cirrhosis aetiology, Child-Pugh class and BCLC stage, can be found in Table 1 and Tables S1 and S2. In addition to an initial assessment from imaging and clinical presentation, a histopathological assessment was performed to confirm HCC and cirrhosis diagnoses as part of the patient's standard of care. Written informed consent was provided by all participants, the study conformed to the ethical guidelines of the 1975 Declaration of Helsinki, and was approved by the Cleveland Clinic IRB (IRB #10-347).

K E Y W O R D S
cirrhosis, liver cancer, machine learning, metabolomics, risk factor

Lay Summary
Changes in the presence and quantity of small molecules in saliva, such as metabolites, can indicate disease in the body. We measured the abundance of 125 metabolites in the saliva of individuals who have liver cancer compared with individuals who have cirrhosis or no documented liver disease. We found that the amount of acetophenone was significantly different among all three groups and seven other metabolites were significantly different in at least one comparison. We then used machine learning to determine if combinations of metabolites could predict if a person has liver cancer as opposed to cirrhosis or a healthy liver. We found that by measuring 12 metabolites in patient saliva, we could correctly classify 90% of patients as having liver cancer, cirrhosis or no liver disease.

| Saliva collection and gas chromatography mass spectrometry
A saliva sample was collected, after a standard mouth rinse, from each subject using the DNA Genotek OMNIgene ORAL OM-505 (Ottawa, Ontario) at the time of their scheduled visit with their physician. Samples were subjected to untargeted gas chromatography time of flight mass spectrometry (GC-TOF MS) at the West Coast Metabolomics Center (Davis, CA). A Leco Pegasus IV mass spectrometer was used with unit mass resolution at 17 spectra s-1 from 80 to 500 Da at −70 eV ionization energy and 1800 V detector voltage with a 230°C transfer line and a 250°C ion source. The analytical GC column was protected by a 10-m long empty guard column which is cut by 20-cm intervals whenever the reference mixture QC samples indicate problems caused by column contaminations. This chromatography method is designed to yield highquality retention and separation of primary metabolite classes (amino acids, hydroxyl acids, carbohydrates, sugar acids, sterols, aromatics, nucleosides, amines and other compounds) with narrow peak widths of 2-3 seconds and high quality within-series retention time reproducibility of better than 0.2 second absolute deviation of retention times. An automatic liner exchange was used after each set of 10 injections to reduce sample carryover for highly lipophilic compounds such as free fatty acids. Samples were run in two batches, resulting in 181 and 163 identified metabolites detected in each batch, respectively, and the relative abundance levels, quantified by peak height, were reported. One hundred and twenty five metabolites were identified in both batches and represented lipids, amino acids, peptides and sugars involved in pathways such as glycolysis, citric acid cycle, the urea cycle, fatty acid metabolism, phospholipid biosynthesis and ethanol degradation among others.

| Data processing and quality control
Missing values (two metabolites in three subjects) were imputed with half the minimum relative abundance across the cohort. Metabolite relative abundance levels were right skewed and log transformation was effective at normalizing the data. Six technical duplicate samples were included in each of the two experimental batches for quality control purposes. Principal component analysis (PCA) revealed variation in the metabolite relative abundance due to experimental batch as evidenced by the separation of these technical replicates.
Mean centring and scaling by the metabolite standard deviations were effectively corrected for differences due to batch ( Figure S1).

| Metabolite associations
Metabolite associations with disease group were performed using the open source, statistical analysis software, R. 16 The relative abundance levels of the 125 identified metabolites were individually tested for associations with disease status (ie healthy, cirrhosis, HCC) using pair-wise logistic regression models. Age, sex and smoking status were tested for association using logistic regression models with each disease outcome and were included as model covariates when significantly associated with disease status (P < .05) (Tables S1 and S2). All metabolite P values were adjusted for multiple testing using Mean bilirubin, total (mg/dL) (SEM) the Benjamini-Hochberg false discovery rate (FDR) approach and an FDR P < .2 was used as the threshold for statistical significance. search to identify the optimal number of trees (ntree), and 150 was chosen as the optimal ntree value based on the mean misclassification, sensitivity and specificity across the LOOCV iterations ( Figure   S2). We then employed an iterative random forest approach (iRF) to select metabolites that would produce a model that would maximize predictive power with a minimal set of metabolites. This was done by generating a model using all 125 metabolites (RF125) and then iteratively eliminating the metabolite with the lowest mean Gini score across the LOOCV procedure until only a single metabolite remained. (2) The second model included 12 metabolites, representing the top 10% of metabolites selected using iRF approach (iRF12). (3) The out of bag error was then used to select the optimal number of metabolites that would produce the best performing model, and this model optimized at four metabolites (iRF4). (4) We also employed a classification and regression tree method (CART) to generate a binary decision tree to classify disease status based on metabolite abundance using the R package, rpart. 18 The 12 selected metabolites from iRF12 were used as input into the CART model, which was built using a LOOCV procedure for the 99 subjects in the training set. The
LOOCV was used to calculate sensitivity, specificity, balanced accuracy, misclassification, positive predictive value (PPV) and negative predictive value (NPV) for each of the four models. Each model was then evaluated for accuracy and overfitting using the withheld test cohort of 11 subjects (four healthy, three cirrhosis, four HCC).

| RE SULTS
Out of the 110 participants (43 healthy, 30 cirrhosis, 37 HCC), a total of 125 metabolites were identified from obtained saliva samples ( Figure 1). There were some significant demographic differences between the groups, which were used as covariates to adjust for potential bias in the metabolite associations. Individuals in the HCC group were on older than individuals in the cirrhosis and healthy groups (P < .05). In addition, there were significantly more males in the HCC group than in the cirrhosis group (P < .05). Lastly, current smoking status was significantly higher in patients with HCC than those with cirrhosis (P < .05) (Tables S3-S5).

| Metabolite associations
Four metabolites -acetophenone, octadecanol, lauric acid, 3-hydroxybutyric acid -were significantly different between two or more groups (FDR P < .2) (Figure 2A,B, Table 2). Acetophenone was significantly different in all three pair-wise comparisons: Compared to healthy individuals, it was significantly decreased in patients with cirrhosis and significantly decreased further in patients with HCC.
Octadecanol was also decreased in both patients with HCC and patients with cirrhosis in comparison to healthy control subjects ( Figure 2A,B, Table 2). Additionally, lauric acid, 3-hydroxybutyric acid, threonic acid, glycerol-alpha-phosphate, butylamine and alphatocopherol were decreased in patients with HCC compared to healthy control subjects (Figure 2A,B, Table 2). Associations for all metabolites with each disease status are provided in Tables S6-S8.

| Metabolite selection using iterative random forest (iRF) and decision tree (DT) approaches
Three RF models were considered based on their mean training LOOCV out of bag (OOB) error rates. The initial model, incorporating all 125 metabolites (RF125) had a mean LOOCV OOB error rate of 35.6% and the range of Gini Scores, demonstrating metabolite importance, across LOOCV iterations for the 125 metabolites is shown in Figure 3A,B. A subsequent model, iRF12, included the top 10% of metabolites (n = 12) selected using the iterative RF approach ( Figure 3A), 19 and had a mean LOOCV OOB error of 19.7% ( Figure 3A,C). iRF4 was the model with the lowest global mean misclassification (15.3%) which utilized the following four metabolites -octadecanol, acetophenone, 1-monopalmitin and 1-monostearin ( Figure 3A,D). A decision tree classification model was developed with the 12 metabolites selected for iRF12 ( Figure 3D). The pruned decision tree selected four metabolites -octadecanol, 1-monopalmatin, 1-monostearin and 4-hydroxybutyric acid -and had a LOOCV OOB error rate of 12.7% of the subjects (Figure 4).

| D ISCUSS I ON
The incidence of HCC continues to increase, due in large part to the prevalence of cirrhosis from hepatitis B, hepatitis C, alcoholic liver disease, and the rapidly increasing incidence of NASH and NAFLD. [4][5][6] Prognoses for patients with HCC decline rapidly from the onset of disease, underscoring the need for inexpensive and accessible testing for individuals at high risk, such as those with cirrhosis. 8 Ultrasound, or ultrasound plus serum biomarker AFP, are the current gold standard for screening patients for HCC. However, the sensitivity of AFP plus ultrasound is only 62%, resulting in too many missed cases. 10 Saliva is an enticing biofluid for biomarker discovery because collection is noninvasive and samples can be stabilized at room temperature for extended periods of time. 20 To our knowledge, this analysis represents the first investigation of salivary metabolites in patients with HCC. We identified metabolites in saliva that differed significantly in abundance among disease states and we used machine-learning to discover combinations of metabolites  Figure 2B). Interestingly, acetophenone has also been shown to be significantly downregulated in the exhaled breath of patients with cirrhotic and non-cirrhotic NAFLD compared to healthy individuals. 23 Acetophenone is naturally found in many types of plants, and is used as a flavour additive in numerous products, including chewing gum and cigarettes, among others. 24 Although there were significantly more individuals with HCC who reported being current smokers compared to individuals with cirrhosis (P = .03) (Table S3), we did not observe a significant association between acetophenone and smoking status (P = .96) (data not shown). By leveraging combinations of multiple metabolites, we were able to discriminate between healthy individuals, those with cirrhosis, and those with HCC with high accuracy (Figure 4). We interrogated four different tree-based machine-learning models to identify the panel of metabolites with the best predictive power. The four models, RF125, iRF12, iRF4 and DT displayed cross-validated sensitivities for detecting HCC of 81.8%, 84.9%, 87.9%, and 87.9% and specificities of 87.2%, 92.4%, 95.5%, and 93.5%, respectively.
Although, we were unable to compare AFP levels in our real-world clinical cohort because the standard surveillance for HCC in patients with cirrhosis may or may not include AFP and is not indicated in otherwise healthy patients, it is notable that all models displayed better sensitivities and specificities across LOOCV than those reported by a meta-analysis of AFP (20-100 ng/mL) (61%, 86%) and AFP plus ultrasound (62%, 88%). 10 When the models were validated on the withheld test cohort, DT correctly classified 73% of the patients, compared to 84% during the cross-validation training procedure, suggesting that it may have been moderately overfitted to the training cohort. However, RF models are known to be robust to overfitting, 28 and RF125 and iRF12 both correctly classified 91% of the withheld test subjects. This indicated that RF models were robust to overfitting and the most likely to have high predictive accuracy. We hypothesized that patients that were misclassified may have been those with early-stage cirrhosis or HCC. However, salivary metabolites appear to be effective at classifying individuals with minimal or early-stage disease with no discernible patterns related to the detection of cirrhosis or HCC based on Child-Pugh class or BCLC staging ( Figures S4 and S5). A single patient with HCC with BCLC stage 0 and 13/15 patients with BCLC stage A were classified correctly (Table   S5). Furthermore, salivary metabolites were effective at classifying patients with early-stage cirrhosis (Child-Pugh class A) (Table S4).
Classification accuracies in the test cohort also supported the ability of the model to accurately classify individuals with early-stage disease (Table S9). In addition to saliva samples being easier to obtain compared to ultrasound or serum AFP, these results support the notion that salivary metabolites show promise for detecting earlystage HCC, with evidence for improved sensitivity and specificity over current clinical tests. However, additional prospective studies will be needed to validate these initial findings.
Several metabolites were not significantly different between the disease cohorts, but were determined to be, in combination with other metabolites, informative for distinguishing groups in the machine-learning models. iRF12 included glycyl-proline, which is a dipeptide cleavage product of glycyl-proline dipeptidyl aminopeptidase (GPDA). Interestingly, GPDA has been shown to be elevated in the serum of patients with HCC, and has also been proposed as a serum biomarker for detection of HCC. 29  (serum, liver tissue), 34,35 between healthy individuals with those with HCC (serum, liver tissue), [35][36][37] and between individuals with HCC and those with cirrhosis (serum, liver tissue). [36][37][38] The enzyme responsible for making glutamine, glutamine synthetase, has been identified as a potential biomarker of early HCC in proteomic analyses and has been shown to promote cell migration by mediating epithelial-mesenchymal transition. 39  after exercise. 45 Additionally, 1-monopalmatin and 1-monostearin (included in models iRF12, iRF4 and DT), have also been identified as in serum as biomarkers of diabetes 46,47 and obesity. 47,48 However, this is the first study to propose that 1-monostearin and  (Tables S1 and S2), and the sample size was too small to determine if the model was more or less effective at detecting individuals based on certain aetiologies of cirrhosis. We also limited our study to classifying healthy individuals, those with cirrhosis and those with HCC, but do not know if these models would also be able to discriminate among other liver pathologies or metabolic syndromes.
To our knowledge, this is the first study to demonstrate the predictive capacity of salivary metabolites to discriminate patients with

E THI C S AND CON S ENT
Written informed consent was provided by all participants, the study conformed to the ethical guidelines of the 1975 Declaration of Helsinki, and was approved by the Cleveland Clinic IRB (IRB #10-347).

CODE AVAIL AB ILIT Y
The analysis pipeline and R scripts used to perform all statistical analyses is available at https://github.com/rotro ff-lab/HCC_Saliva_ Metab olomics.

CO N FLI C T O F I NTE R E S T S
DMR has an equity stake in Interpares Biomedicine, LLC. DMR, FA, DSA hold intellectual property related to the detection of hepatocellular carcinoma.