A self‐evaluated predictive model: A Bayesian neural network approach to colorectal cancer diagnosis

Artificial intelligence has shown immense potential in cancer prediction, but existing models cannot estimate prediction uncertainty by themselves. Here, we developed a Bayesian neural network (BNN) model, BNN‐CRC15, for colorectal cancer (CRC) prediction while assessing its reliability. The model was trained on routine laboratory data obtained from 27,911 participants and provided quantified prediction uncertainty, allowing identification of a subset of participants in which the model was confident, mimicking the diagnostic process of human practitioners. Our model exhibited superior performance (area under the curve = 0.918) in the confident participant group, which accounted for 46.4% of the patients, indicating that routine laboratory data alone are sufficient for accurate predictions in this subset. For the non‐confident group, further advanced tests, such as colonoscopy, could be recommended to achieve more accurate predictions. In addition, our model demonstrated superior overall accuracy (0.848) in all patients, outperforming other five traditional algorithms (extreme gradient boosting, support vector machine, logistic regression, random forest, and artificial neural network) and fecal immunochemical test in distinguishing CRC from non‐CRC. These findings suggest that our BNN‐CRC15 model could serve as a valuable tool for improving CRC diagnosis and prevention.


INTRODUCTION
There has been a growing interest in the application of big data in medicine due to the vast amount of medical data being generated in recent years, including electronic health records (EHRs), administrative or claims data, wearable devices, and data from global biobanks and cohort studies. 1,2The use of artificial intelligence (AI) has facilitated the extraction of valuable information from these datasets through data mining, 3 offering great promise in identifying new and complex associations, patterns, and anomalies, and predicting future events in the data. 4Such advancements have the potential to lead to more personalized and precise discoveries in disease pathogenesis, classification, diagnosis, and progression. 5linical laboratory data are a critical component of realworld medical data, and their standardization is of great importance for generating high-quality evidence. 6Laboratory data are highly structured and undergo rigorous quality control throughout production, making it more valuable than other EHR data.In predicting diseases, models based on laboratory data are also less expensive than those based on costly molecular measurements. 7Although previous studies have explored laboratory parameters to some extent for disease prediction, [8][9][10] there is still much room for improvement in study methods, data presentation, and data cleaning, storage, and mining. 6Therefore, more efforts are needed to validate the clinical utility of these tools.
Machine learning has become an increasingly popular approach for predicting medical outcomes. 11It utilizes statistical, probabilistic, and optimization techniques to identify functional patterns in complex and large datasets. 12raditional machine learning algorithms, including logistic regression (LR), support vector machine (SVM), decision tree (DT), random forest (RF), artificial neural network (ANN), and extreme gradient boosting (XGBoost), have shown promising results in predicting diseases. 13evertheless, the accuracy (ACC) of these algorithms is heavily dependent on the quality of training and test samples. 14,15As a result, it is essential to not only achieve high prediction ACC but also provide uncertainty evaluation for new inputs that have not been encountered during the training phase. 16In practical scenarios, the ability of algorithms to handle unforeseen samples is critical.One promising approach to incorporating uncertainty into predictions made by neural networks is the use of Bayesian neural networks (BNNs). 17BNNs treat weights and biases as random variables with prior distributions and update these distributions based on observed data, allowing for uncertainty quantification in the predictions made by the network.BNNs are particularly useful in applications where uncertainty is inherent, such as medical diagnosis or financial forecasting.In contrast to frequentist algorithms, which interpret probability as the frequency of an event occurring over a long period, BNNs interpret probability as the degree to which people believe that an event will happen or not.
Colorectal cancer (CRC), being the third most common cancer, is a significant public health concern worldwide, with high mortality rates ranking as the second leading cause of global cancer-related death. 180][21] Machine learning methods have the potential to identify non-invasive and accurate markers for early diagnosis of CRC.A few investigations [22][23][24][25] have aimed to identify individuals at an increased risk of CRC using machine learning methods based on routine medical results.Among these methods, the ColonFlag test, 22 developed in 2016, is a gradient boosting (GBDT) or RF model identifying individuals with a higher risk of CRC by analyzing blood counts, age, and sex.The area under the curve (AUC) for detecting CRC 3-6 months prior to diagnosis was 0.82 ± 0.01.Other CRC prediction models [23][24][25] achieved AUC values ranging from 0.73 to 0.80.While these studies have made progress in utilizing machine learning for predicting CRC risk, they have typically focused on specific variables for modeling, potentially overlooking the value of other parameters.Additionally, these models adopted GBDT or RF, 22 ANN, 23 or LR. 24,25However, these methodologies cannot reliably predict results for samples that were not included in the training set.
To address these daunting challenges, our study aims to develop and validate a more accurate CRC predictive model based on unintentionally selected laboratory data using a robust BNN following the TRIPOD Checklist statement. 26By initially incorporating a wider range of laboratory parameters and then narrowing down to 15 parameters into our model, we hope to establish a new model for CRC diagnosis and improve the ACC of predictions, particularly for unexpected samples.This approach provides uncertainty evaluation and offers a promising method for providing accurate and reliable predictions while also providing a better understanding of the uncertainty associated with those predictions.

Study approval
The study protocol was approved by the Institutional Ethics Committee at Shanghai Changhai Hospital, with ethical approval number CHEC2018-154.Additionally, it was registered with the Chinese Clinical Trial Registry under registration number ChiCTR1800019105.To ensure the privacy of participants, all personal identifiers were hashed to anonymize the collected data.

Source of data
Figure 1 provides a summary of the dataset and workflow.The participants for this study were recruited from Changhai Hospital in Shanghai.All consecutive patients who underwent colonoscopy between January 1, 2013, and June 30, 2019, were included in the training phase.The independent validation cohort was comprised of all outpatients and inpatients who were prospectively collected from the EHR system database between July 1, 2019, and October 31, 2020.Demographic characteristics, diagnoses, laboratory data, colonoscopy results, and pathological results were extracted for all participants to facilitate the analysis.

Participants
The patients were categorized based on both colonoscopy reports and histopathological diagnoses.In cases where the colonoscopy and histopathological report grouping methods produced different groupings, the researcher checked the participants and determined the final grouping for the training cohort (Figure 2).In the prospective validation cohort, participants were classified into the CRC group or non-CRC group based on the ICD-10 codes.Individuals with a history of other malignant tumors were excluded from the study, and only those with de novo CRC cases were included.

Data collection and preprocessing
The  27 To create dummy features for each of the 17 binary features, we created two dummy features named "X positive" and "X missing".X missing = 1 indicates that the X value is missing, while X missing = 0 indicates that the X value has a value (positive or negative).X positive = 1 is positive, and X positive = 0 is negative.Thus, we selected a total of 87 candidate features, including 34 dummy binary features, 50 quantitative features, age, gender, and BMI for further diagnostic model training.To further address the issues of missing features and imbalanced sample sets, we performed selective under-sampling on the data to maximize the retention of sample information. 28We balanced the sample sizes of CRC and non-CRC patients by excluding CRC patients who performed less than 40% of the selected features and non-CRC patients who performed less than 80%.A threshold of 40% for CRC patients excludes patients whose test records of selected features are missing.A threshold of 80% for non-CRC patients was selected to balance the sample size between the case and control groups, as well as to retain non-CRC samples with more information.To avoid the effects of scaling, we normalized the data.

Data imputation
To handle missing data in qualitative variables, we listed missing data as a separate category.As for quantitative variables, we utilized three distinct imputation methods, including median substitution, mean substitution, and K-Nearest Neighbor (KNN) imputation, using 10-fold cross-validation.The method with the best average imputation performance was selected for final imputation of missing values.

Bayesian Neural network
The BNN is a type of neural network that leverages Bayesian methods for both training and prediction.As a machine learning method under the Bayesian framework, it treats all parameters and outputs as random variables under certain distributions.BNN not only provides predictions but also estimates the associated uncertainty, which is valuable in numerous real-world applications.
BNN is particularly effective with medical datasets, given its ability to handle the overfitting issue and offer a more reliable estimate of the model's parameters.The central concept of BNN involves introducing prior distributions over the model's parameters and using Bayes' theorem to update these distributions based on the observed data.This results in a posterior distribution over the parameters, which is used to predict and assess the uncertainty associated with the predictions.
In a classification problem, suppose we have a training set {X, Y} and a distribution mapping X to Y, denoted as ω belonging to some high-dimensional horizon.In BNN, the posterior distribution over the model's parameters can be obtained using Bayes' theorem, which is given by: where () is the prior distribution over the model's parameters, (|, ) is the likelihood, and (|) is the evidence.
Let X′ and Y′ serve as the input and output of a new dataset to be used for prediction.Once the posterior is obtained, the prediction can be made by computing the predictive distribution, which integrates over the posterior and the likelihood of the observed output: However, using Bayes' theorem directly for classification problems is impractical because integration over the entire range of distributions Ω is computationally demanding.The variational inference is the frequently used as an alternative to the true posterior because it is computationally efficient.Variational inference is an optimization problem that seeks a variational distribution   () parameterized by  that approximates the true posterior distribution (, |) by minimizing the difference between   () and the posterior distribution.One popular measure of the closeness between the true posterior ω and the variational distribution is the Kullback-Leibler (KL) divergence.
The KL divergence between   () and the posterior P(ω,Y|X) based on the dataset {X, Y} is defined as the expectation of the log ratio between the posterior and the variational distribution over the variational distribution.

KL(q 𝜃 (𝜔) | |ω) = ∫
ω q  () In q  () P (, Y|X) If the variational distribution matches precisely with the true posterior distribution, the KL divergence will be zero.Otherwise, the KL divergence will increase with the inconsistency between   () and (, |).During the training process, the parameter θ is chosen by minimizing the KL divergence to find the best distribution  θ() that minimizes the KL divergence.Once the optimized distribution  θ() is obtained, it is used for prediction: Previous studies have used Bayesian-based networks parameterized by multivariate normal distributions to approximate the true posterior distribution. 29,30Building on this idea, we developed a BNN model that consists of the normalized data of selected features as input, a two-class output, and a single latent layer of 200 nodes with ReLU activation.The weights and biases of the neural network are treated as multivariate normal distributions.By optimizing the model parameters, we aim to minimize the KL divergence between the approximate distribution in our model and the true posterior distribution.

Classification model
Our classification problem involves two classes: CRC and non-CRC.To address this, we used a BNN with three layers.The weights and biases of the network are modeled as multivariate normal distributions.We employed the Adam optimizer to train these parameter distributions, minimizing the KL divergence.
After training the model, we generated multiple predictions for each test sample.To obtain each prediction, we randomly sampled a set of network weights and biases from the trained parameter distributions.The resulting forecasts varied and can be considered as random variables with a binomial distribution.We determined the probability of a positive outcome as the mean of these predictions, while the variance represented the degree of uncertainty.To identify confident predictions, we set a threshold value of 0.3 for the confidence level. 31o optimize the model construction process, we utilized 10-fold cross-validation.The training dataset was randomly divided into 10 folds, with nine used for training the classifier and one for evaluating its performance.The cross-validation process was repeated ten times to estimate the algorithm's ACC based on the average precision.The model with the best performance on the validation set was selected.To adapt the model for clinical use, we performed backward stepwise regression and Pearson correlation analysis to reduce the number of model features and improve interpretability.Features with a correlation coefficient greater than 0.6 were eliminated to avoid multicollinearity. 32We then retrained a simplified BNN model with the streamlined features, using 10-fold cross-validation to refine the model and select the one with the highest ACC as the final prediction model.

Model validation
We assessed the performance of the BNN and simplified BNN models and compared them with the fecal immunochemical test (FIT).We also made comparisons with other traditional algorithms by constructing XGBoost, SVM, LR, RF, and ANN models that had the same features as the optimized simplified BNN model with streamlined features.In the independent validation phase, in order to better align with real clinical application scenarios and include a larger population of participants, we only evaluated the performance of a simplified BNN model with streamlined features, and compared it with the XGBoost/SVM/LR/RF/ANN model and FIT.Additionally, we compared the performance of the simplified BNN model using confident data and the full data.
We used ACC and AUC as discrimination measures to assess the models' performance.In addition, we determined sensitivity (SEN), specificity (SPE), positive predictive value (PPV), negative predictive value (NPV), F1 scores, and diagnostic odds ratio (OR) at predetermined SPE values.

Calibration
To evaluate the model's calibration, we conducted the Hosmer-Lemeshow test using predicted probabilities from the validation set.Furthermore, we generated a calibration graph to visually assess the model's calibration.The calibration graph was generated using the following approach: we utilized the BNN model to predict the probability of each sample in the validation set by averaging 101 repeated predictions with various realizations of model parameters.Then, we divided the validation set into 10 quantiles based on the predicted probabilities and calculated the average predicted probability and the average observed probability (the proportion of positive patients) for each quantile.Finally, we plotted the average predicted probability versus the average observed probability for each quantile to evaluate how well the predicted probabilities aligned with the true probabilities.

Feature importance analysis
To understand the importance of features in our model, we used Shapley additive explanations (SHAP), a modelagnostic method designed to explain feature importance in machine learning models.SHAP is derived from the Shapley value in game theory, which measures the contribution of a single feature   by the difference between the outcome predicted by a model containing   and a set of features not containing   .
To calculate the Shapley value, we first defined a subset Φ of features not containing contain   .We then calculated the ACC predicted by our BNN classifier using features in Φ, denoted by   (Φ).The effect of adding   is the increased ACC contributed by feature   , that is,  BNN (Φ ⋃   ) −  BNN (Φ).The Shapley value of x  is the mean of all such differences: where  is the set of all variables and  is the quantity of subsets of {∕  }.However, calculating the contribution of all subsets of features is computationally intensive.Therefore, we generated 200 subsets containing one-third to two-thirds of all features as an approximation.In addition, we randomly selected 2000 predictions made by our BNN to train the explanatory model using SHAP values.After training, the explanatory model generated SHAP values for the other samples' features.These SHAP values represented the predicted probability change for a specific feature value.The SHAP values provided a straightforward and reliable interpretation of our result and such task, which is typically challenging to achieve with conventional machine learning predictions.The SHAP values were used to rank the importance of features in our model and helped us understand the contribution of each feature to the final prediction.

Statistical analyses
All statistical analyses were conducted using Python (version 3. Stepwise regression and proposed imputation methods for missing data were implemented using Sklearn, while Pyro was used to implement our BNN.The SHAP package was used to provide Shapley's value interpreted as feature significance and visualization.Graphs were created using Seaborn and Microsoft Excel 2019.

Characteristics of the participants and missing data imputation
A total of 40,065 participants were included in this study, with 27,911 participants in the training cohort and 12,154 participants in the prospective validation cohort.The assembly of case patients and control subjects for the training and validation sets is illustrated in Figure 2. In the training stage, the age distribution between the CRC and non-CRC groups was significantly different (Table 1), with CRC patients (60.38 ± 11.84 years) being older than controls (55.26 ± 12.84 years).However, the gender compositions of the CRC group (male ratio: 62.52%) and controls (male ratio: 63.62%) were similar.In the prospective validation cohort, the male ratio was 62.93% and 63.97% in the CRC and non-CRC groups, respectively.The ages of the CRC and non-CRC groups were 61.48 ± 11.99 and 54.22 ± 14.38 years, respectively.The age distribution between the training and prospective validation cohorts was significantly different (p < .001),while the gender difference was not statistically significant.Although there were differences in sample distribution between the two cohorts, the applicability and generalizability of our model should be verified.
For further analysis, we selected 70 candidate variables, comprising 50 quantitative indicators and 17 binary indicators, in combination with age, gender, and BMI.The distributions and results of these variables for both groups are presented in Table S1, while Figure S1 displays the number of participants with missing data for each predictor.To address the impact of missing values, we utilized three imputation methods: median substitution, mean substitution, and KNN imputation.While a previous study 33 suggested that KNN imputation was more effective in managing a large (>50%) amount of missing data than traditional mean/median replacement, our findings showed that the performance differences of the three methods were minimal, with mean substitution achieving the highest average ACC (0.847).Therefore, mean substitution was chosen as the method to handle missing data (Table S2).

BNN classifier for CRC diagnosis
We developed a BNN-based CRC detection model using 70 routinely documented variables in EHRs, achieving high diagnostic ACC with 10-fold cross-validation (Table S3).
The mean values of ACC (0.847), SEN (0.768), SPE (0.892), PPV (0.806), NPV (0.875), F1 score (0.845), and AUC (0.830) of this model to distinguish CRC were obtained.The crude OR for prevalent CRC was 31.563.Furthermore, we simplified the model using stepwise regression, as shown in Figure S2.ACC was improved with the addition of each feature, reaching the inflection point with eight features (ACC = 0.82), as well as the relative plateau period of ACC (0.827) with 20 features.These 20 candidate features, in order, included "FIT positive," red blood cell volume distribution width (RDW), Prealbumin, "fecal transferrin missing," "FIT missing," serum sodium (Na + ), age, serum potassium (K + ), red blood cell (RBC), gender, "fecal transferrin positive," platelet-large cell rate (P-LCR), percentage of monocyte (Mon%), total bile acid (TBA), percentage of eosinophil (Eos%), mean platelet volume (MPV), albumin (ALB), total protein (TP), albumin globulin ratio (A/G), and white blood cell (WBC).To prevent collinearity issues and further simplify the model, we conducted a correlation analysis of the 20 features (Table S4).The correlation coefficients between MPV and P-LCR (R2 = 0.96), "FIT missing" and "fecal transferrin missing" (R2 = 0.73), "FIT positive" and "fecal transferrin positive" (R2 = 0.67), A/G and ALB (R2 = 0.64), and TP and ALB (R2 = 0.6) were greater than 0.6.Among them, MPV, "fecal transferrin missing," "fecal transferrin positive," A/G, and TP had lower correlation rankings with CRC and were thus removed.Subsequently, we developed a simplified BNN model based on the remaining 15 features (termed as BNN-CRC 15 ) and conducted 10-fold cross-validation to evaluate its performance in classifying CRC.As demonstrated in Table S5, the simplified BNN model achieved an ACC ranging from 0.830 to 0.863.We compared the performance of the two models and observed that the simplified model did not sacrifice prediction efficiency after subtracting 72 features (Table 2).The mean values of SEN, SPE, PPV, NPV, and AUC of the simplified model for distinguishing CRC were 0.760, 0.898, 0.808, 0.869, and 0.829, respectively.The simplified BNN model with the highest ACC (0.863) and AUC (0.853) showed optimal performance with a SEN of 0.818, SPE of 0.888, PPV of 0.806, NPV of 0.896, F1 score of 0.863, and OR of 35.687 (Table S5).

Advantages of BNN-CRC 15
Since BNN-CRC 15 could provide uncertainty in prediction (Figure 3A,B), the model's performance in confident participants varies with the confidence level.As shown in Figure 3C, as the confidence level increases, the proportion of confident participants also increases, but it comes at the expense of diagnostic ACC.Here, we set the confidence   3D).We also compared the performance of BNN-CRC 15 models on confident data (confidence level = 0.3) and full data (Figure 3).It showed even better performance on confident data, with an ACC of 0.954 and AUC of 0.918 (Table 2, Figure 3G,I).The OR (368.471) for prevalent CRC was particularly advantageous in confident participants (Table 2, Figure 3J).In the prospective cohort, BNN-CRC 15 also outperformed on confident data with AUC = 0.863, SEN = 0.792, SPE = 0.933, compared to full participants (AUC = 0.764, SEN = 0.679, SPE = 0.850) (Table 2, Figure 3H).
Moreover, the BNN-CRC 15 model demonstrated good calibration on the confidential sample in both the training and validation sets, with Hosmer-Lemeshow test p-values of .092and .184,respectively.However, it exhibited some miscalibration on the full training and prospective validation sets, overestimating the probability of cancer for patients in the full training and prospective validation set.The Hosmer-Lemeshow test p-values were .085and <.001, respectively, indicating statistically significant miscalibration in the latter case (Figure 3K,L).

Contribution of parameters to the model
Using SHAP analysis, we identified the most important features in the full model (Table S6), which included all 87 features.RDW, "FIT positive," "fecal transferrin missing," Prealbumin, and P-LCR were among the top five.To simplify the model features and make them more clinically relevant, we retrained the BNN-CRC 15 model using stepwise regression and multicollinearity filtering to select only 15 features.The SHAP values of these features in the BNN-CRC 15 are shown in Figure 4A.RDW was found to be the most important and consistent feature in the model.Additionally, two features were associated with fecal occult blood ("FIT positive" and "FIT missing").RDW, "FIT positive," age, K + , Eos%, gender, and Mon% were associated with a greater risk of CRC, while "FIT missing," Prealbumin, P-LCR, TBA, ALB, RBC, Na + , and WBC were associated with a lower risk of cancer (Figure 4B).
We compared normalized laboratory measurements contributing to the BNN-CRC 15 between the full and confident datasets, as shown in Figure 4C.RDW, age, K + , Eos%, and Mon% had significantly higher values in the CRC group than in the non-CRC group, whereas Prealbumin, P-LCR, TBA, ALB, RBC, Na + , and WBC had significantly higher values in the non-CRC group than in the CRC group, and this trend was more pronounced in the confident dataset."FIT positive" had a higher positive rate in the CRC group, whereas "FIT missing" had a higher positive rate in the non-CRC group.However, there was no obvious difference in gender ratio between the CRC and non-CRC groups (all data: p = .614;confident data: p = .392).

DISCUSSIONS
In this study, we utilized BNN to develop an AI model for classifying CRC using laboratory medical big data from a large cohort of participants.The BNN algorithm predicts whether a participant has CRC or not and provides an estimate of the uncertainty associated with the prediction, much like a doctor making a diagnosis.This approach allows for uncertainty estimates, which can facilitate the acceptance of the prediction model in clinical practice.Namely, our proposed BNN-CRC 15 could predict CRC with a certain probability (Figure 3A,B) and thus enable the doctor to diagnose a patient with CRC with certainty (according to the 15 features of laboratory data of this patient).
Due to the nature of BNN, where the weights and biases are extracted from a trained distribution rather than a single point estimate, the output of the network is a distributional output.This allows us to observe the effect of small changes in the model's parameters on the results, making our model highly robust and more likely to be applicable to other healthcare organizations.Our model can estimate the probability that a given patient has CRC based on their laboratory test results, and we can also estimate the uncertainty associated with that prediction.Our study represents a novel and rare use of BNNs for the analysis of large amounts of medical data and disease prediction modeling, with the potential to improve the ACC of disease diagnosis and treatment planning.
The opacity of machine learning algorithms remains an ongoing challenge.To address this issue, we employed the SHAP method of interpretable AI to explain the significance of features in the BNN model, thereby increasing transparency.By analyzing the SHAP values contributing to the model, we identified a list of highly influential features.Notably, RDW exhibited significant changes and contributed to a greater risk of CRC, consistent with previous studies, as did FIT, 34 age, and gender. 27Other features such as Na + , K + , 35,36 Eos, 37 Prealbumin, ALB, 38 P-LCR, 39 TBA, 40 and WBC 41,42 have also been reported to be associated with CRC in some studies, but require further clinical and functional research to confirm their associations.We also examined why our BNN model was confident about some predictions (46.4% confident participants) but less certain about others (non-confident participants).To investigate this, we analyzed the distribution of 15 features between the CRC and non-CRC groups in confident and non-confident participants (Figure 4C).In the confident population, the differences between these features were more pronounced than in the overall population, enabling our model to distinguish them more easily.
While our study showed promising outcomes, one limitation is the use of data from only one institution.To improve the generalizability of our model, future research could perform multi-center validation with larger populations.Importantly, our method may have safety and acceptability benefits over other algorithms, as our model can express uncertainty when the prediction certainty is low.To investigate the clinical utility of our model in routine care settings, we need to integrate it with laboratory information systems and evaluate its performance in real-world scenarios.
In summary, we have created a BNN-based selfevaluating model for predicting CRC.This model utilizes real-world laboratory medical big data to offer recommendations and an evaluation of the recommendation's reliability, such as a physician's diagnosis.Our model has undergone internal validation and has demonstrated superior performance when compared to current methods such as FIT and XGBoost, particularly in confident populations.Additionally, we have employed the SHAP method to interpret the significance of features in our model, shedding light on the "black box" effect of machine learning.As our models are developed from clinically relevant data, they have strong potential for use in clinical practice.However, further studies should focus on validating our model in larger populations from multiple institutions and examining its usefulness in routine care settings utilizing laboratory information systems.

F I G U R E 1
Workflow of Bayesian neural network (BNN) model development.The diagram illustrates the overview of the model development process.Input data: the training dataset includes birth date, gender, height, weight, and all available laboratory parameters.Data preprocessing: after cleaning and quality control processes (i.e., data cleaning, reduction, transformation, and imputation), the raw data are labeled and used for modeling.Model construction: a binary classification model is constructed using the BNN method to obtain robust results.Shapley additive explanation (SHAP) values are used to explain the feature importance in the BNN models.Output: the model predicts whether a participant has colorectal cancer or not, and provides the uncertainty in the predicted result.Validation: the performance of the BNN model is compared to that of the extreme gradient boosting (XGBoost)/support vector machine (SVM)/logistic regression (LR)/random forest (RF)/artificial neural network (ANN) model, and fecal immunochemical test (FIT), and the model is prospectively validated in a separate dataset.
Study design and enrollment of participants for the training and prospective validation sets.Abbreviation: CRC, colorectal cancer.

7 . 4 )
and the following packages: Numpy (version 1.20.3),Pandas (version 1.20.3),Missingno (version 0.5.1),Sklearn (version 0.24.2),Pyro (version 1.8.2),SHAP (version 0.39.0), and Seaborn (version 0.11.2).The significance of variables between the CRC and non-CRC groups was determined using the Welch t-test or Fisher's exact test.Pearson's correlation coefficient was used to analyze the correlation between characteristics and CRC.Data preprocessing and modeling were performed using Python with Pandas used to organize and clean panel data, and Missingno utilized to discover and display missing data.

TA B L E 1
Characteristics of participants at the baseline.

F I G U R E 3
Bayesian neural network (BNN) classifier for colorectal cancer (CRC) diagnosis and advantages of BNN-CRC 15 .(A) BNN algorithm.(B) The cases of BNN-CRC 15 model prediction.(C) The change trend of area under the curve (AUC) and confident data proportion with the confidence level.This graph shows the relationship between AUC and the proportion of confident data with respect to varying levels of confidence.As the confidence level increases, the AUC decreases while the proportion of participants increases.When the confidence level is set at 0.3, we achieve an AUC of 0.946 at 46.4% of total patients.(D) Prediction accuracy in group with different confidence.(E) Receiver operating characteristic (ROC) curves for BNN, BNN-CRC 15 , extreme gradient boosting (XGBoost)/support vector machine (SVM)/logistic regression (LR)/random forest (RF)/artificial neural network (ANN) models and fecal immunochemical test (FIT) in training cohort.(F) ROC curves for BNN-CRC 15 , XGBoost/SVM/LR/RF/ANN models and FIT in prospective validation cohort.(G) ROC curves for BNN-CRC 15 in confident samples of training cohort.(H) ROC curves for BNN-CRC 15 in confident samples of prospective cohort.(I) Comparison of performance of BNN-CRC 15 between confident data and all data in training cohort.(J) Comparison of odds ratio (OR) between confident data and all data in training cohort.(K and L) Calibration graph.The graph shows the calibration of the model on the full training set and confident participants.The X-axis represents the true probability and Y-axis represents the predicted probability.0-1 of X-axis means that the probability of an event occurring is 0%-100%.(K) The green line is the perfect calibration (true probability) while the orange and blue line represents the calibration of the BNN-CRC 15 on the full and confident training set, respectively.(L) The green line is the perfect calibration (true probability) while the orange and blue line represents the calibration of the BNN-CRC 15 on the full and confident prospective validation set, respectively.TA B L E 2 Comparison of the model performance of Bayesian neural network (BNN), BNN-CRC 15 , extreme gradient boosting (XGBoost), support vector machine (SVM), logistic regression (LR), random forest (RF), artificial neural network (ANN), fecal immunochemical test (FIT), and BNN-CRC 15 on the confident participants in the training and prospective validation sets.

F I G U R E 4
Contribution of 15 features to BNN-CRC 15 .(A) Shapley additive explanation (SHAP) overall and summary plot of 15 feature clusters.Blue dots represent low values (for quantitative features) or 0 (for qualitative features), whereas red dots represent high values (for quantitative features) or 1 (for qualitative features).For each feature, the location of the dot on the x-axis represents its SHAP value, the dots on the left reduce the probability of colorectal cancer; the dots on the right side represent the contribution that increases the risk of colorectal cancer.If the blue dots are mainly on the left side and the red dots are mainly on the right side, it means that the high/positive value of this feature will increase the risk of colorectal cancer (CRC).(B) Feature effect.The average effect of the feature based on each dot of the SHAP value.(C) Laboratory measurements contributing to the BNN-CRC 15 .Standard box plots that present the distribution of variable measurements contributing to BNN-CRC 15 between CRC and non-CRC samples.Box plots indicate median, first and third quartiles, and 1.5× interquartile range of quantitative features.The bar chart shows the positive rate of binary features.***p < .001;**p = .001.Abbreviations: ALB, albumin; BNN, Bayesian neural network; Eos%, percentage of eosinophil; "FIT missing," fecal immunochemical test's data is missing; "FIT positive," fecal immunochemical test positive; K + , serum potassium; Mon%, percentage of monocyte; Na + , serum sodium; RBC, red blood cell; RDW, red blood cell volume distribution width; TBA, total bile acids; WBC, white blood cell.