Improved diagnosis of colorectal cancer using combined biomarkers including Fusobacterium nucleatum, fecal occult blood, transferrin, CEA, CA19‐9, gender, and age

Conventional blood and stool tests are normally used for early screening of colorectal cancer (CRC) but the accuracy and efficiency remain to be improved. Recent findings suggest Fusobacterium nucleatum to be a biomarker for CRC. This study evaluated the role of F. nucleatum and developed CRC diagnostic models by combining F. nucleatum with fecal occult blood (FOB), transferrin (TRF), carcinoembryonic antigen (CEA), carbohydrate antigen 19‐9 (CA19‐9), gender, and age.


| INTRODUCTION
Colorectal cancer (CRC) is the world's fourth most deadly cancer and has the highest rates of incidence in developed countries. 1,2 The 5-year survival rate of CRC patients is above 90% for Stage I and declines to 11%-15% for Stage IV. 3,4 Early diagnosis of CRC is crucial for timely tumor treatment and is the key to reducing the incidence and mortality associated with the disease. 5 Currently, early screening and monitoring of CRC rely on conventional tumor markers including fecal occult blood (FOB), fecal transferrin (TRF), carcinoembryonic antigen (CEA), and carbohydrate antigen 19-9 (CA19-9). [6][7][8] Fecal occult blood test (FOBT) is the most widely used noninvasive approach, while it has drawbacks such as poor specificity and low positive predictive value. [8][9][10] Previous clinical studies found that TRF was as useful as FOB in diagnosing colorectal diseases. 11,12 However, TRF and FOBT results cannot identify CRC lesions that are not accompanied by bleeding. 13 Moreover, both CRC and polyps can release trace amounts of blood in stool, and this may lead to false positive CRC screening results. 14 Serum CEA and CA19-9 are well-known tumor markers for CRC treatment monitoring, especially in chemotherapy patients. 6 Repeated elevations of CEA or CA19-9 levels after surgery may suggest the presence of residual disease or recurrent risks. 15 However, CEA and CA 19-9 have not been recommended as screening markers for CRC, which may be due to their low sensitivities and specificity, as other diseases could also lead to elevated levels of these clinical parameters.
Human gut microbiota plays an important role in host physiology, immunity, metabolism, and nutrition. 16 Alterations of the gastrointestinal microbial community are related to different health and disease status, such as cancer, obesity, and a variety of bowel disorders. 17 Increasing evidence links certain microorganisms in the gut microbiota with CRC together. Metagenomic analyses suggest that the intestinal bacteria Fusobacterium nucleatum is implicated in the development and progression of CRC. [18][19][20][21] Previous studies indicate that symbiotic Fusobacterium spp. were enriched in CRC tissues compared to healthy tissues. 22 F. nucleatum may potentiate intestinal tumorigenesis by modulating the tumor-immune microenvironment, activating the autophagy signaling pathway, or through other mechanisms. 21,23 These findings give new opportunities to take advantage of microbial biomarkers for clinical applications, such as the detection of fecal F. nucleatum may provide much-needed progress in the ability to screen CRC noninvasively. 20,24 However, current detection method of F. nucleatum varies from one another. Developing diagnostics and therapeutics for F. nucleatum remain challenged. 20 The value of F. nucleatum as a screening, prognostic or predictive biomarker for CRC has not been fully defined in clinical applications.
Although biomarkers have many attractive features as cancer screening tests, the major concern for CRC screening or early diagnosis using different clinical biomarkers is their insufficient sensitivity and/or specificity. 7 Due to the low prevalence of cancer in the general population, most biomarkers being used alone have a low positive predictive value in screening asymptomatic populations. 10 Therefore, developing a diagnostic model using a panel of tumor markers may be a feasible approach that can potentially increase the efficiency of CRC screening. 7 However, current approaches using fecal-/blood-based biomarker panels are not cost-effective and the detection accuracies remain unsatisfactory. 25,26 CRC-related gut microbe markers are under development in order to find out the best combination of biomarkers that is cost-effective and maximize the screening sensitivity and specificity. 27 Herein, we present the potential role of F. nucleatum in CRC screening and develop a combined biomarker model using F. nucleatum with other fecal-/blood-based markers including FOB, TRF, CEA, and CA19-9, as well as personal characteristics including gender and age. The generated utilitarian model may provide a new strategy for early CRC diagnosis.

| Patients and sampling
A total of 130 individuals from Shanghai General Hospital during January 2021 to July 2022 were retrospectively analyzed, including 59 CRC patients and 71 Conclusions: F. nucleatum is valuable for CRC diagnosis. Combination of different clinical parameters could significantly improve CRC diagnostic performance.
The combination F. nucleatum + FOB + gender + age may be an effective and noninvasive method for clinical application.

K E Y W O R D S
biomarker, colorectal cancer, Fusobacterium nucleatum, machine learning, risk model, screening healthy controls. The CRC patients were diagnosed and determined according to the International Union Against Cancer (UICC)/American Cancer Society (AJCC). Stages of the CRC patients included 1 Stage 0 (TisN0M0), 3 Stage II, 21 Stage III, 20 Stage IV, and 14 unknown. Control subjects were selected randomly from individuals undergoing health screening. Participants who used antibiotics or probiotics within 1 month before sampling were excluded. Stool and serum samples were collected in both healthy and CRC groups before surgeries. Tissue samples were collected in CRC patients during surgeries. Serum samples were processed for CEA and CA19-9 measurement immediately using the Beckman Coulter DxI 800 immunoassay system. A portion of fresh fecal samples was used for FOB and fecal TRF measurement by chemical kit (BASO) according to the manufacturer's instructions. The rest fecal samples and the tissue samples were stored into RNAlater solution (Thermo Fisher Scientific) at −80°C before DNA extraction. Patient demographic data including age and gender were recorded.

| Ethical approval
This study was approved and supervised by the ethical committees of Shanghai Center for Clinical Laboratory and Shanghai General Hospital (File Nos. 202003 and 221105). All experiments were conducted in accordance with the relevant guidelines and regulations. Written informed consent was obtained from all participants in the study.

| Primers and probes
A quantitative real-time PCR (qPCR) system was designed for detection of the nusG gene of F. nucleatum and the 16S rRNA conserved regions. The oligonucleotide primers and probes were synthesized by Sango Biotech Co., Ltd. The F. nucleatum forward primer sequence was 5′-TCAAG AGG GAC TCG AACC-3′, and reverse primer sequence was 5′-CCTGC ATG TGT TGT TAACTG-3′. The amplicon size was 86 bp. The sequence of F. nucleatum probe was FAM-5′-GGAGA CCG ATG CTC TAC CAA TTGAG-3′-BHQ1. The 16S rRNA forward 5′-GMAAC RCG ARG AAC CTTACC-3′ and 16S rRNA reverse 5′-GCGCT CGT TRC GGG ACTTAA-3′ primers were used for amplification of total bacteria and also as the internal control, with the 16S rRNA probe ROX-5′-GCATG GYT GTC GTC AGC TCGTGT-3′-BHQ2. Experiments were conducted using a QuantStudio™ 5 Real-Time PCR System (Applied Biosystems™, Thermo Fisher). All reactions were detected under the following conditions: Pre-denaturation with one cycle of 95°C for 10 min; denaturation at 95°C 15 s followed by extension/ annealing at 60°C 1 min, repeating 40 cycles; fluorescence signal data were collected at the extension step.

| Optimization of the qPCR reactions
Relative abundance of F. nucleatum was calculated by the 2 −ΔCt method as compared to the total bacteria (16S rRNA), where △Ct is the difference between Ct values of F. nucleatum and 16S rRNA.

| Development of CRC diagnostic models
Seven clinical parameters from all participants in this study were included for modeling, namely CRC-related biomarkers including fecal F. nucleatum, CA19-9, CEA, FOB, and TRF, as well as personal characteristics such as gender and age of the participants. Ten different machine learning algorithms were tested to identify the best diagnostic model with good performance. To ensure the reliability of the developed model and improve the model stability, a 10-fold cross-validation method was used to further optimize the model parameters.

| Statistical analysis
Statistical analysis was carried out using R 3.5.1. A p value less than 0.05 was considered statistically significant.
Mann-Whitney U-test or Wilcoxon signed rank test was used to analyze the differences in F. nucleatum abundance, CEA and CA19-9 levels. Independent t-test or chi-squared test was used to compare differences in other characteristics between groups. Table 1 gives a general description of the 59 CRC patients (CRC) and 71 healthy candidates (Healthy) enrolled in this study. Means with standard deviations or case numbers with percentages were calculated for each group and shown in the table. The two groups had an equal distribution of genders, while the CRC patient group had higher average age, CEA and CA19-9 levels, as well as more FOB and TRF positive results compared to the control group (p < 0.05).

| F. nucleatum was enriched in tumor tissue in CRC patients
The abundance of F. nucleatum in the tumor tissues and paracancerous tissues was investigated in 51 CRC patients (Figure 1), as 8 CRC patients failed to collect tissue samples. The average F. nucleatum abundance with standard error (SE) of the tumor tissues and the paracancerous tissues were 0.082 ± 0.016 and 0.040 ± 0.006, respectively. F. nucleatum abundance in the tumor tissue was significantly higher than that in the paracancerous tissue (Wilcoxon signed rank test, p = 0.0087).

| Fecal F. nucleatum abundance was higher in CRC patients but not related to age
Fecal F. nucleatum abundance was measured and compared between the CRC group and the control group, as shown in Figure 2. The average F. nucleatum abundance with SE in the CRC group (n = 59) was 0.077 ± 0.044 and the control group (n = 71) was 0.0055 ± 0.004. A significantly higher level of F. nucleatum was found in CRC patients than in the controls (Mann-Whitney U-test, p = 0.0005).
As the CRC group has higher average age than the control group (Table 1), further analysis was done to compare the fecal F. nucleatum abundance between different age groups. Candidates with ages above the mean age in each group were set as high-age-group while those with ages below the mean of each group were set as low-age-group. The abundance of F. nucleatum in the high-age-group has no difference from that in the low-age-group (Mann-Whitney U-test, p = 0.428).

Logistic Regression algorithm shows good performance
The seven clinical parameters including FOB, TRF, CEA, CA19-9, fecal F. nucleatum amount as well as gender and age were used to generate an applicable diagnostic strategy for CRC screening. Different machine learning algorithms were employed to develop diagnostic models, and the performance was compared as shown in Table 2 Figure 3 gives the receiver operating characteristic (ROC) curve of the generated Logistic Regression model for CRC diagnosis using the 7 clinical parameters including fecal F. nucleatum, FOB, TRF, CEA, CA19-9, gender, and age of the whole data set. Model parameters were optimized using the 10-fold cross-validation method to improve the model stabilities. The mean AUC score with standard deviation was 0.93 ± 0.08.

| Diagnostic performance of different biomarker combinations
ROC analyses were performed to evaluate the diagnostic value of F. nucleatum or combinations of F. nucleatum with other CRC-related markers such as FOB, TRF, CEA, and CA19-9, as well as gender and age of the candidates. Performance of these combinations was illustrated in Table 3 and Figure 4. The AUC and accuracy scores of F. nucleatum-single-marker ( Figure 4A) were 0.68 and 0.56, respectively, while when combined with FOB, the AUC and accuracy could be visibly improved up to 0.86 and 0.81 ( Figure 4B). With more biomarkers used in the diagnostic model, the AUC and accuracy could be further improved. The 5-biomarker combination F.nucleatum + F OB + TRF + CEA + CA19-9 had an AUC at 0.88 and an accuracy score at 0.84 ( Figure 4D). Furthermore, adding in personal characteristics such as gender and age could also increase the diagnostic performance in this model. The AUC and accuracy scores of F. nucleatum + gender + age ( Figure 4E) were 0.87 and 0.74, respectively, while the combination F. nucleatum + FOB + gender+age ( Figure 4F) had a very high AUC at 0.92 and accuracy at 0.85, which was just slightly lower than the highest AUC (0.93) and accuracy (0.87) scores, achieved when using all the 7 clinical parameters ( Figure 3).

| DISCUSSION
Screening of CRC has been shown to reduce incidence and mortality 3 but CRC diagnosis usually relies on an invasive procedure such as a colonoscopy examination. This invasive application may have a modest risk of trauma to patients, and it is difficult to apply to a large-scale population in routine physical examinations. 28,29 Hence, noninvasive CRC screening methods with low costs and better outcoming benefits are of great importance. 27 Fecal indicators such as FOB and TRF as well as serological indicators such as CEA and CA19-9 are commonly used clinical biomarkers for CRC. 6,7,13 However, they have drawbacks such as low specificity and/or sensitivity. 9,30 Even more,

p=0.0005
Relative abundance of F.nucleatum the diagnostic effectiveness of different tumor markers varied widely. 10,26,31,32 Therefore, looking for other highly sensitive and effective biomarkers or biomarker combinations for noninvasive CRC screening is urgently needed for clinical practice. Numerous studies have provided compelling evidence for the potential role of F. nucleatum in colorectal carcinogenesis and the outcomes of CRC patients. 20 Detection of F. nucleatum might serve as a novel diagnostic tool for CRC. 24 In this study, a fluorescence real-time PCR method was developed and applied for F. nucleatum detection in both fecal and tumor tissue samples. Relative abundance of F. nucleatum in CRC patients and healthy controls was quantified using the optimized qPCR system. The result showed significantly higher abundance of F. nucleatum in tumor tissues than in paracancerous tissues. Moreover, fecal F. nucleatum levels in CRC patients were found significantly higher than that in healthy individuals. These results are consistent with previous reports, 33,34 which demonstrate again that F. nucleatum may be a useful biomarker for CRC detection. Measuring and targeting F. nucleatum will yield valuable insight into clinical screening and management of CRC patients. Considering that tissue-based microbial markers are invasive and less T A B L E 2 Performance of different machine learning models. accessible than stool-based microbial markers, detection of fecal F. nucleatum can be applied as a noninvasive method to aid CRC diagnosis. On the contrary, screening decisions may not be made based on only one indicator. 10,35 In the era of personalized medicine, sophisticated prediction software based on optimized algorithmic models with a panel of biomarkers may better guide screening results. 35 In this paper, 10 different algorithmic models were developed and compared using the 5 fecal-/blood-based biomarkers including F. nucleatum, FOB, FOB, TRF, CEA, and CA19-9, as well as two personal characteristics including gender and age. Among them, the Logistic Regression model using all these seven clinical parameters demonstrated good performance for CRC screening based on our current dataset. The diagnostic performance of F. nucleatum or combinations of F. nucleatum with other clinical indicators (FOB, TRF, CEA, CA19-9, gender, and age) was subsequently evaluated. It seems that fecal F. nucleatum alone was not effective enough for CRC diagnosis as the AUC and accuracy scores were only 0.68 and 0.56, respectively. However, the AUC and accuracy could be visibly increased to 0.87 and 0.74 when combining the fecal F. nucleatum single biomarker with gender and age. This might be because personal characteristics such as gender and age are highly related to CRC risks. Incidence is higher in men than women and strongly increases with age. 3 Adding in these parameters could increase the diagnostic performance of the developed model.

Model
Moreover, combinations of two or more types of biomarkers are often used in clinical practices to improve the accuracy and efficiency of disease diagnosis. 10 Our results demonstrate that better diagnostic performance could be achieved using combined clinical indicators compared to a single indicator. Based on the optimized Logistic Regression model, the 7-parameter combination F. nucleatum + FOB + TRF + CEA + CA19-9 + gender + age showed the highest AUC (0.93) and accuracy (0.87). The combination F. nucleatum + FOB + gender + age had the second highest AUC (0.92) and accuracy (0.85). Besides, F. nucleatum + FOB also showed high AUC and accuracy scores (0.86 and 0.81, respectively). Considering that only fecal specimens are required for detection of F. nucleatum and FOB, combination of these two biomarkers is much easier for clinical application. Therefore, application of fecal F. nucleatum detection combined with other clinical markers could increase the detection rate of CRC in normal individuals, and the combination F. nucleatum + FOB + gender + age may be the best choice in practical work.
Our results provide information on how to apply F. nucleatum as a new biomarker for the detection of CRC occurrence in clinical practice. However, different The model parameters were optimized using the 10-fold cross-validation method and calculated using the mean ± 1 standard deviation of the 10 different curves.
pathological features, such as stages, differentiations, and lymph node metastasis of the CRC patients, were not distinguished in this study. Besides, only patients within the past 2 years were taken from one hospital. As the risk factors for CRC may differ a lot among different populations or habitations, the diagnostic performance could be different when using the generated diagnostic models in other areas. Therefore, a large population study should be finished for further validation of the diagnostic models before they can be applied in real practical work in the future. The use of the generated diagnostic models could be limited but it may provide a pattern for CRC screening with combined biomarkers.

| CONCLUSIONS
Our results provide evidence that F. nucleatum could be a useful biomarker for CRC. Combining fecal F. nucleatum abundance with other clinical markers such as FOB, TRF, CEA, and CA19-9, as well as personal characteristics such as gender and age have a significantly improved performance for CRC screening in the population. The combination F. nucleatum + FOB with gender and age may be an effective method for clinical application. This combined biomarker strategy can help to identify candidates with a high risk of CRC for further diagnostic colonoscopy.