A novel machine learning algorithm, Bayesian networks model, to predict the high‐risk patients with cardiac surgery‐associated acute kidney injury

Abstract Background Cardiac surgery‐associated acute kidney injury (CSA‐AKI) is a well‐recognized complication with an ominous outcome. Hypothesis Bayesian networks (BNs) not only can reveal the complex interrelationships between predictors and CSA‐AKI, but predict the individual risk of CSA‐AKI occurrence. Methods During 2013 and 2015, we recruited 5533 eligible participants who underwent cardiac surgery from a tertiary hospital in eastern China. Data on demographics, clinical and laboratory information were prospectively recorded in the electronic medical system and analyzed by gLASSO‐logistic regression and BNs. Results The incidences of CSA‐AKI and severe CSA‐AKI were 37.5% and 11.1%. BNs model revealed that gender, left ventricular ejection fractions (LVEF), serum creatinine (SCr), serum uric acid (SUA), platelet, and aortic cross‐clamp time (ACCT) were found as the parent nodes of CSA‐AKI, while ultrafiltration volume and postoperative central venous pressure (CVP) were connected with CSA‐AKI as children nodes. In the severe CSA‐AKI model, age, proteinuria, and SUA were directly linked to severe AKI; the new nodes of NYHA grade and direct bilirubin created relationships with severe AKI through was related to LVEF, surgery types, and SCr level. The internal AUCs for predicting CSA‐AKI and severe AKI were 0.755 and 0.845, which remained 0.736 and 0.816 in the external validation. Given the known variables, the risk for CSA‐AKI can be inferred at individual levels based on the established BNs model and prior information. Conclusion BNs model has a high accuracy, good interpretability, and strong generalizability in predicting CSA‐AKI. It facilitates physicians to identify high‐risk patients and implement protective strategies to improve the prognosis.


| INTRODUCTION
Along with the enhancement of medical technology, considerable progress has been achieved in the surgical treatments of cardiac diseases.
Each year, an estimated 2 million cardiac surgeries are performed worldwide. 1 Acute kidney injury (AKI) is a well-recognized complication following cardiac surgery. The pooled incidence of cardiac surgery-associated acute kidney injury was 24.3% from a global meta-analysis. 2 Furthermore, severe CSA-AKI is positively associated with a higher mortality, prolonged length of hospital stay, and increased medical cost. 3 The occurrence of CSA-AKI involves both demographic and perioperative factors, and the specific mechanism and severity vary with the individual. 4 Early identification of high-risk CSA-AKI patients, with the application of prediction models, allows clinicians to monitor these patients periodically and take prophylaxis to prevent the occurrence of AKI.
Several risk prediction models for AKI have been developed based on logistic regression, such as Clinic Score, 5 Mehta Score, 6 and the Simplified Renal Index Score. 7 Yet, despite the findings that good performance was reported in internal evaluation, most of these models had poor discrimination in the external validation with an area under the receiver-operator characteristic curves (AUC) below 0.7. 8,9 Hence, it is necessary to apply advanced algorithms to develop a more flexible and efficient model to identify AKI, especially severe AKI, in an early stage. Bayesian networks (BNs) is considered as one of the classical machine learning algorithms. It can not only graphically exhibit the relationships between casual or associated variables in a network, but also quantitatively define the conditional probability of each node. In the field of nephron-epidemiology, this characteristic makes BNs more applicable to verify the multiple etiology hypothesis of CSA-AKI.
To this end, we conduct a prospective cohort in patients who underwent cardiac surgeries. The objective is to propose the BNsbased predictive model for revealing the complex inherent relations between CSA-AKI and its associated factors, and then to evaluate the model's predictive ability and external practicality.

| Patient selection
From January 1st 2013 to December. 31st 2015, patients receiving cardiac surgeries from a tertiary hospital in Shanghai of China were enrolled as the study participants. We further excluded those who were under 18 years old, receiving the heart transplant, lacked surgery, and biochemical data, or took less than one serum creatinine (SCr) test. Then the eligible participants were subdivided participants into two cohorts. Patients admitted in 2013 to 2014 were assigned as a derivation cohort, and used for statistical analysis, BNs modeling, and internal evaluation. The rest patients in 2015 were designated as a validation cohort to verify the model's external generalizability. The study has been approved by the Zhongshan hospital institutional board (B2017-039). Participation was voluntary and anonymous, with the confidentiality of patients' identity information assured. Before data collection, informed consent was signed by all participants or their agents.

| Data collection
We applied a self-designed questionnaire to collect demographic information and preexisting comorbidities. Data on perioperative factors were retrieved along with its time records in the electronic medical records. Laboratory indicators within 24 hours at admission were used as the baseline biochemical levels. We finally selected 27 medical indicators that were common in cardiac surgery. These variables were divided chronologically into four groups: ① demographic features: age, gender, body mass index (BMI); ② preoperative features: hypertension, diabetes, coronary angiography, New York heart association (NYHA) grade, left ventricular ejection fractions (LVEF), alanine aminotransferase (ALT), aspartate aminotransferase (AST) and direct bilirubin (DBil), SCr, estimated glomerular filtration rate (eGFR), serum uric acid (SUA), urine protein, urine erythrocyte, album, hemoglobin, hematocrit, platelet, serum sodium, and potassium; ③ intraoperative features: cardio-pulmonary bypass (CPB), surgery type, aortic crossclamp time (ACCT), and ultrafiltration volume; ④ postoperative feature: central venous pressure (CVP) within 6 hours.

| Definition and classification
According to the 2012 KDIGO criteria, 10

| gLASSO-logistic regression
Multicollinearity and high dimensionality are often encountered in clinical data. If not handled properly, they can lead to incorrect parameter estimates or wrong inferences. The LASSO (Least absolute shrinkage and selection operator) method is a shrinkage estimation method proposed by Tibshirani (1996). 11 LASSO can apply a penalty term l 1 , which is based on the ordinary least squares estimation, and compress the regression coefficients β of irrelevant variables to zero, thereby achieving model estimation and variable selection. The gLASSO (group LASSO) is an extension of LASSO, which can select the whole categorical variable as predefined, instead of the single dummy variables. 12 The expression of gLASSO iŝ where G is the number of groups and I g is the variable set of g th group, g = 1, 2, …, G. λ g refers to the penalty parameter of g th group, which can be regarded as an intermediate between the l 1 -and l 2 -type penalty.

| Bayesian networks
Bayesian networks was first proposed by Pearl Judea in 1988 and widely used in the field of machine learning. 13 It contains a directed acyclic graph (DAG) G = (V, A) and a global probability distribution. In DAG, each node v i ∈ V corresponds to a random variable X i . The global probability distribution can be decomposed into smaller conditional probability distributions (CPD) according to the edges a ij ∈ A in the DAG. The factorization of BN' global distribution is specified as: where π(X i ) is a set of parent variables of Xi. Given the value of π(X i ), each node X i is conditionally independent of its non-descendants.
Building a BNs model requires two steps: structure learning and parameter learning. Tabu-search is one of the advanced algorithms in structure learning. Compared with K2 and hill-climbing, Tabu-search can escape local optima, by random restarts as well as single-arc addition, removal and reversals, to achieve an optimal network with minimized score function. 14 Parameter learning refers to defining the numerical parameters of each local distribution by using either maximum likelihood (ML) estimation or Bayesian estimation. ML estimation aims at finding the value of parameter θ, which maximizes the likelihood P(X i | θ), and it is written asθ = argmax θ P X i jθ ð Þ. BNs inference boils down to finding a posterior distribution by applying the Bayesian rules. If we call E as the set of observed variables (Evidence), and Z as the set of target or non-observed variables, then computing an inference on a graphical model finds:

| Statistical analysis
The distributional differences of covariates between the derivation and validation cohorts were assessed by using the standardized differences (SD). SD can distinguish the clinical difference, rather than statistical significance, in a large sample size. If the SD values of most variables exceeded the threshold of 10%, participants in two cohorts were considered to originate from different source populations. In the derivation cohort, we further described the distribution of CSA-AKI and severe AKI in different clinical factors. Then we quantified their association strength with the adjusted odds ratios (aOR) by using multiple logistic regression. The analysis was conducted in IBM SPSS 22.0 (IBM Corp., Armonk, New York) with a significance level of 0.05. The gLASSO-logistic regression was run in "grpreg" packages of R program 3.6.0 (R core team) to select predictors of CSA-AKI from candidate variables. In gLasso penalty algorithm, the 10-fold cross-validation was applied to plot the set of the regularization parameter λ. When cross-

| Baseline characteristics and CSA-AKI incidence
In total, 5533 patients were enrolled in the final analysis. Of them, 3639 patients were assigned to the derivation cohort and 1894 patients to the validation cohort (Supplementary Figure S1). In the derivation cohort, the average age was 55.0 ± 13.2 years, and 59.6% was male, while the average age was 55.8 ± 13.0 years, and 57.5% was male in the validation cohort. Most of the covariates shared a < 10% SD value, indicating that participants from two cohorts were clinically comparable (Supplementary Table S1).

| Preoperative risk factors associated with CSA-AKI
In the derivation cohort, 1364 patients were diagnosed with CSA-AKI (37.5%). Of them, 405 patients developed to severe AKI quickly. In Figure 1, male patients shared a relatively higher AKI risk (42.7% vs 29.7%). Dividing age into four levels, the incidence of CSA-AKI increased significantly from 18.8% in the youngest group to 45.3% in patients over 60 years of age. Obesity (BMI≥28) also increased the incidence of AKI (51.4%).

| Variable selection in gLASSO-logistic regression
Before BNs modeling, we applied gLASSO-logistic regression to perform variable selections for both CSA-AKI and severe AKI.

| BNs establishment and model inference
The predictive models for CSA-AKI and severe CSA-AKI were constructed separately by using BNs analysis. Each predictor was represented by a node, and its relationships with other nodes were linked through an edge. In the CSA-AKI model, there were 13 nodes and 18 directed edges between CSA-AKI and its predictors.  Figure S3B).
F I G U R E 1 CSA-AKI and severe AKI incidence in varied demographics among patients receiving cardiac surgery Perioperative factors of CSA-AKI and severe AKI in patients with cardiac surgery in the derivation cohort (n = 3639)

| Predictive ability of BNs model in the internal, 10-fold cross, and external validation
The precision rate for predicting CSA-AKI was about 70% in both internal and external validation ( Figure 3A). It suggested that BNs model has an advanced suitability for predicting CSA-AKI. Notably, in the severe AKI model, the F-measure value was up to 85.9% in internal validation and 88.7% in external validation, demonstrating a good agreement between the actual observations and the BNs predictions for the risk of severe AKI ( Figure 3B). Figure 3C,D provides the AUCs for both CSA-AKI and severe AKI. It was observed that the internal AUCs for predicting CSA-AKI and severe-AKI were 0.755 and 0.845.
In the process of external validation, AUCs also remained at levels of 0.736 and 0.816. The Mantel-Haenszel test showed that the differences of predictive accuracy were not statistically significant among internal, 10-fold cross, and external validation datasets (P = .106 and .229 in CSA-AKI model and severe AKI model, respectively).

| DISCUSSION
In the present study, the incidence of CSA-AKI was estimated at 37.5%. It is consistent with reported literature, ranging from 3.1% to 42% according to the source population and AKI definition. 15,16 Among these AKI cases, nearly one third (405/1364) could quickly develop to severe condition (stage 2-3) within a short time. CSA-AKI can be described as the type-1 cardiorenal syndrome, inducing an acute or chronic dysfunction of the heart and kidneys. An abrupt deterioration of cardiac or renal function might trigger pathophysiologic disorder of the other organ. 17 Apart from nephrologists, physicians in other departments usually ignore the early detection of AKI, due to a lack of dynamic SCr monitoring. One cross-sectional survey in China reported that over 70 % of hospitalized patients with identifiable AKI could not be recognized. 18 The occurrence of CSA-AKI was affected by a range of risk factors. It refers to not only the demographic features such as age, gender, and comorbidity, but also the perioperative factors such as the surgery complexity, ACCT, ultrafiltration volume, and whether or not CPB are given. The main pathways of CSA-AKI include hypoperfusion, ischaemia-reperfusion injury, neurohumoral activation, inflammation, oxidative stress, nephrotoxins, and mechanical factors. 4,19 In this study, we applied BNs to develop predictive models of AKI and severe AKI in patients with cardiac surgery. Through the directed acyclic graph, the complex relationships between risk factors and AKI were delineated intuitively. Importantly, these interdependencies are consistent with the biological and clinical interpretations. 20 Figure S3, if medical interventions were taken to avoid excessive ACCT and ultrafiltration volume and correct the postoperative CVP level timely, the incidence of CSA-AKI should be reduced significantly from 71.6% to 16.1%. Preliminary evidence also suggests that avoidance of hemodynamic instability and careful control of postoperative CVP and mean arterial pressure (MAP) level may help to alleviate the risk of AKI. [28][29][30] These measures are almost cost-free and can be implemented as secondary prevention strategies in daily clinical practices.
The hospital-acquired AKI could have been reduced by a fifth if physicians paid more attention to monitoring electrolytes, identifying patients with high-risk, and executing kidney prophylaxis. 31 Leveraging BNs model into the personalized risk prediction can contribute to identifying those at risk for CSA-AKI (even before SCr rising) and improve patients' postoperative outcomes. Still, the study limitations should be stated. Firstly, participants came from a single medical center. Although we had tried to recruit as many patients as possible, the representativeness are potentially biased. It may contain the further extrapolation of BNs models. Secondly, this study did not collect the medication history of nephrotoxic drugs due to the extensive lack of drug data. The absence of this factor may affect our model's predictive ability to some extent. Thirdly, it has been reported that some novel biomarkers (interleukin 18 and kidney injury molecule-1) could predict the occurrence of subclinical kidney injury, 32 which can act as a promising tool to improve the early diagnosis. In future studies, we intend to conduct a multicenter, prospective cohort to collect both clinical and molecular data. The BNs' structure and parameters also will be retrained in the more extensive database with a clear causal time-sequence.

| CONCLUSIONS
AKI remains a substantially high incidence in patients who underwent cardiac surgeries. We propose a BNs model based on demographic and perioperative risk factors. It not only can reveal the complex relationships between predictors but also infers the individual probability of developing CSA-AKI. It will facilitate physicians to identify patients with a higher risk of AKI and take protective strategies to improve patients' prognosis.

ACKNOWLEDGMENTS
The study was financially supported by the Major Projects of Scien-