Machine learning for prediction of delirium in patients with extensive burns after surgery

Abstract Aims Machine learning‐based identification of key variables and prediction of postoperative delirium in patients with extensive burns. Methods Five hundred and eighteen patients with extensive burns who underwent surgery were included and randomly divided into a training set, a validation set, and a testing set. Multifactorial logistic regression analysis was used to screen for significant variables. Nine prediction models were constructed in the training and validation sets (80% of dataset). The testing set (20% of dataset) was used to further evaluate the model. The area under the receiver operating curve (AUROC) was used to compare model performance. SHapley Additive exPlanations (SHAP) was used to interpret the best one and to externally validate it in another large tertiary hospital. Results Seven variables were used in the development of nine prediction models: physical restraint, diabetes, sex, preoperative hemoglobin, acute physiological and chronic health assessment, time in the Burn Intensive Care Unit and total body surface area. Random Forest (RF) outperformed the other eight models in terms of predictive performance (ROC:84.00%) When external validation was performed, RF performed well (accuracy: 77.12%, sensitivity: 67.74% and specificity: 80.46%). Conclusion The first machine learning‐based delirium prediction model for patients with extensive burns was successfully developed and validated. High‐risk patients for delirium can be effectively identified and targeted interventions can be made to reduce the incidence of delirium.

syndrome caused by various causes, with impairment of attention, consciousness and cognition. 5 Delirium exacerbates the patient's condition and leads to longer hospital stays, 6 increased mortality, 7 higher hospital costs, 8 cognitive decline and even dementia in patients with extensive burns. 9,10 Despite this, there are very few studies on delirium in combination with extensive burns and a lack of clear efficacy in the pharmacological treatment of delirium. 11,12 It is worth noting that there is evidence that aggressive interventions are effective in reducing the risk of delirium during hospitalization in high-risk patients. 13 Machine learning is widely used in clinical settings and does not have a universally agreed definition, 14 but is generally considered to be the process of identifying groups of information in data. 15 Currently, machine learning models are widely used in medical fields, such as data mining, medical diagnosis, and disease risk prediction, 16 which have good predictive performance. 17 Risk factors for delirium have been used to develop predictive models for delirium in different populations, but no predictive models for delirium in patients with extensive burns have been developed. 18,19 It is difficult to achieve cross-population usage between existing predictive models. 18 Therefore, by studying the risk factors of combined delirium in patients with extensive burns, we established a machine learning-based delirium prediction model, visualized the model and further promoted it to the clinic, aiming to help healthcare professionals identify high-risk patients early, reduce the incidence of delirium in patients with extensive burns, and improve the prognosis of patients.

| Study subjects, design and data collection
This study complies with the Declaration of Helsinki and has been approved by the institutional ethics committee, certification (2022) CDYFYYLK(09-052). Since this study is an observational study and no interventions were administered to the patients, written consent of the patients and their families was waived. Patients with large burns who met the inclusion and exclusion criteria and were admit- The flow chart of our research design route is shown in Figure 1.
A total of 47 pre-operative variables were collected, including patients' personal information (e.g. BMI and education attainment), lifestyle (smoking history and history of alcohol use), pre-existing diseases (e.g. hypertension, diabetes and heart disease), burn status (e.g. TBSA, burn index and inhalation injury), treatment modality (e.g. mechanical ventilation (MV), number of procedures, number and volume of blood transfusions), treatment environment (e.g. admission to burn intensive care units(BICU), family visits and physical restraint), pre-operative laboratory parameters (e.g. hemoglobin(Hb), red blood cells (RBC) and white blood cells), Acute Physiology and Chronic Health Evaluation (APACHE II score) and American society of anesthesiologists (ASA) score. All preoperative variables of concern were collected from the electronic case system of the First Affiliated Hospital of Nanchang University and Ganzhou People's Hospital.

| Delirium assessment
A variety of delirium assessment tools are currently in clinical use, 20 of which the Intensive Care Delirium Screening Checklist (ICDSC) 21 is widely used, particularly in intensive care units (ICU). 13 ICDSC 21 has high sensitivity and specificity which helps clinical workers to accurately identify patients with delirium. Some patients with extensive burns require mechanical ventilation, which affects the patient's speech and therefore limits the use of many delirium assessment tools. However, the ICDSC solves this problem. The ICDSC also does not require the additional use of the Richmond Agitation Screening Scale (RASS) and is suitable for the heavy medical environment of the Burns Intensive Care Unit (BICU). In this trial, ICDSC 21 was used on burn general wards and BICU for extensive burn patients.
Patients were evaluated for delirium by four specialized psychiatrists at 12-h intervals for 5 days after surgery. If the patient was assessed for delirium, the assessment was continued until the patient was in a non-delirious state. The psychiatrists involved in the delirium assessment were not aware of either the variables included in the study or the study objectives.

| Statistical analysis
All data analysis in this study was completed using Python and IBM SPSS 26.0. To avoid overfitting and to obtain more accurate prediction models, an adequate sample size for building prediction models is required. We use a sample size calculated 22 as n = 1.96 2 is the expected outcome ratio(� = 0.37), is the set margin of error ( = 0.05). According to this formula, the minimum sample size for the training set used to develop the model is 358 individuals.
The normality of the distribution of continuous variables was tested using the Shapiro-Wilk test. Normally distributed continuous variables were expressed as mean ± standard deviation (SD) and compared using the independent samples t-test. Skewed continuous variables were expressed as median and interquartile range (IQR) and compared using the Mann-Whitney U-test. Categorical variables are expressed as frequencies and percentages and using chi-square tests or Fisher's exact probability tests. This study used multifactorial logistic regression to identify the variables included in the delirium prediction model. In order to achieve the best prediction, nine models were built for this experiment, including an eXtreme Gradient Boosting (XGBoost), a Logistic Regression (LR), that percentage was lower than 20%, random forest approach was used to fill in the data.
To avoid collinearity between variables and thus affecting the performance of the prediction model, the variables planned for modeling were subjected to multicollinearity and correlation analysis before modeling. Calibration and discriminative were used to test the predictive capability of the predictive models. Decision curve analysis (DCA) assessed the clinical utility of the models. The area under the receiver operating characteristic curve (AUROC) was identified as a measure of discrimination. The calibration plot is used to evaluate the accuracy of the prediction models. The cut-off value was derived from the Youden index which is (sensitivity + specificity − 1).
The best model is selected by comparing the performance between the models. Then the SHapley Additive exPlanations (SHAP) was chosen to explain the best one. Finally, the best model was visualized and then used for external validation. p < 0.05 indicated that the difference is statistically significant.

| Participant characteristics
A total of 518 patients participated in the study. Of these, 146 (28.19%) were female and 372 (71.81%) were male. A total of 191 patients developed delirium, and the incidence of delirium was approximately 36.87%. The study variables C-reactive protein (CRP) and procalcitonin (PCT) were missing >20%, so these two variables were excluded. The missing fraction of preoperative glucose and preoperative albumin accounted for 3.09% and 1.35% of the total data, respectively, and the random forest approach was used to filling the data. The dataset was randomly divided into a training set, a validation set and a testing set. Data from the training set and validation set (n = 414, 80% of dataset) are used to build the model. The data from the testing set (n = 104, 20% of dataset) is used to further validate the model. Patients were divided into a delirium group (n = 191) and a non-delirium group (n = 327) based on the presence or absence of delirium. There were no statistically significant differences in patient characteristics and preoperative variables in the training and testing sets (Table 1). found that pre-op Hb and pre-op RBC had collinearity by multicollinearity analysis ( Figure S1A), and the remaining seven variables after excluding pre-op RBC did not have collinearity ( Figure S1B) and were independent of each other ( Figure S2). We ultimately modeled this using the following seven variables: physical restraint, diabetes, sex, pre-op Hb, Apache II score, time in BICU and TBSA.

| Establishment of prediction models
Nine prediction models were constructed for delirium prediction in patients with extensive burns using the seven variables mentioned above. AUROC is one of the important indicators used to evaluate delirium prediction models. The ROC curves of the nine prediction models are shown in Figure 2A,B, where the RF prediction model has a better AUROC. Also, we performed a 10-fold cross-validation to assess the stability of the prediction models, and the RF prediction model performed better (average AUC = 0.955 ± 0.004). A comparison of the performance of the nine prediction models in the training and validation sets is shown in Table 2 and Table S1, respectively. In addition to the best performance in AUROC, the RF model also performed satisfactorily in terms of sensitivity (0.880), accuracy (0.893) and specificity (0.904).
In addition, the RF performs better in terms of DCA results ( Figure 2C,D). The calibration curve for The RF is closest to the curve with a slope of 45°, indicating the best accuracy ( Figure 2E).
In conclusion, we choose the RF prediction model as the best delirium prediction model. The best cut-off for the RF prediction model was 42.19% according to the Youden index. In the testing set, the AUC value was 82.2%, sensitivity was 89.7%, specificity was 56.9%, accuracy was 72.1%, PPV was 61.9%, NPV was 79.0% and F1 score was 0.733. The SHAP shows the impact of each variable on the predictive power of the model ( Figure 3A) and the importance of each variable ( Figure 3B).

| Validate model performance
Visualization of the RF prediction model using a web-based calculator (available at: https://www.xsmar tanal ysis.com/model/ predi ct/?mid=2127&symbo l=1iO16 fE641 84FN0 220eQ ). We collected data from 118 patients for external validation of the prediction model.  ing to the medical staff that this patient has a high likelihood of delirium and should be given the appropriate interventions ( Figure 4A).
Preoperative information for another patient was entered into the model: sex was female, TBSA was 60%, no diabetes, Apache II score was 6, no physical restriction, pre-op Hb was 98 g/L and 5 days in the BICU. The predicted probability of delirium for this patient was 8.71%, indicating a low risk of delirium in this patient ( Figure 4B).

| DISCUSS ION
There are no models for predicting delirium in patients with extensive burns, and this study is innovative in predicting the risk of delirium in patients with extensive burns using machine learning algorithms. The RF prediction model we developed accurately predicts the probability of postoperative delirium in patients, with satisfactory discrimination and generalization compared to the other eight prediction models.
Delirium is diagnosed mainly by clinical manifestations and adjunctive delirium assessment tools, with the potential for underdiagnosis of hypoactive delirium. Therefore, in recent years, scholars have focused on finding objective indicators that can help diagnose delirium, such as biomarkers 23 and electroencephalography (EEG). 5 Multiple biomarkers such as CRP, 24 Cerebral spinal fluid (CSF) 25 and plasma tau 26 have been used in experimental studies, but due to the complexity of biomarker extraction procedures, it is difficult to apply them clinically. Therefore, we considered whether we could analyze the clinical data to seek commonalities to the extent that early prediction and early intervention of burn delirium could be achieved.
Delirium is a complex disease caused by a combination of factors.
It is also a common complication in patients with extensive burns, and the incidence of delirium in this study was 36.87%. Targeted interventions for high-risk patients can be effective in reducing the incidence of delirium. 13 Preoperative high-dose glucocorticoids reduce the incidence of delirium in adult patients 4 days after hepatectomy. 27 Perioperative use of dexmedetomidine is effective in preventing delirium in elderly patients undergoing open esophagectomy. 28 However, there is a lack of consensus and no guidelines to clearly define patients at 'high risk' of delirium. 13 Therefore, it is important to use delirium prediction models to objectively identify high-risk patients for clinicians.

TA B L E 2
Performance of models in the training set.

Specificity (SD) PPV (SD) NPV (SD)
F score  There are a variety of methods available for developing predictive models, and using advanced statistical methods such as machine learning to develop predictive models can improve the efficacy of the models. 29 The formula for prediction model is complex and currently more studies choose to translate the prediction model into a nomogram for clinical use. 30 Currently, the inflammatory hypothesis is now considered to be one of the pathogenic mechanisms of delirium 32 and many studies point to indicators such as CRP 24 and PCT 33 as predictors of delirium in surgical patients. However, in screening the variables used to construct the predictive model, we excluded inflammatory mediators.
On the one hand, this was because there were too many missing data for CRP and PCT to fill in the data with random forest regression.
On the other hand, infections do not always occur in other surgical patients, but they can occur in every patient with extensive burns due to a compromised skin barrier, immune dysfunction and invasion by pathogenic bacteria. 34 This is why the inflammatory index results are high in patients with extensive burns. Therefore, the differences in these variables were not statistically significant when statistical analyses were conducted and they were not ultimately used in the modeling for this study.
In our study, we developed a delirium prediction model using seven variables: physical restraint, diabetes, sex, pre-op Hb, Apache II score, time in BICU and TBSA. Of these, the RF was translated into a web-based calculator for medical staff as the best predictive model, with a score >42.19% indicating that the patient was at greater risk of delirium and should be intervened with. In the training and validation sets, the RF had the largest AUC values of 95.5% and 84% compared to the other eight models, respectively, and had better sensitivity and specificity. In addition, the RF showed good generalization with satisfactory accuracy when externally validated at another large medical center. This confirms the feasibility of extrapolating the predictive model we have developed to clinical use.
Meanwhile, we found that sex was associated with the incidence of delirium, which is consistent with previous findings. 35,36 Estrogen may be a protective factor for cognitive performance. 37 In addition, men are more in exposure than women to risk factors that impair cognitive function, such as obstructive sleep apnea 38 and alcohol use. 39 On the other hand, different sex may produce different mental disorders under the influence of changes in CRF signaling that occur in the acute stress response. 40 This study confirms that physical restraint is an independent risk factor for delirium in burn patients, and the SHAP analysis also shows that physical restraint is most strongly associated with delirium in patients with extensive burns. This is consistent with previous studies. 41,42 It is therefore necessary to use physical restraints with caution and for shorter periods of time. 43 Several studies exploring risk factors for delirium have shown that diabetes and Apache II score play an important role in predicting the onset of delirium. 44,45 These studies support the findings of this study. In addition to this, other factors that increased the risk of delirium in patients with extensive burns were identified in this study, including time in BICU and TBSA. The skin is the largest organ of the body and burns result in varying degrees of skin damage, 1 and TBSA is an important factor used to measure the extent of the condition after a burn injury. The larger the TBSA, the more complex the patient's condition, which often involves multiple surgeries and a high risk of infection, 3,46 with a consequent increase in the length of stay in the BICU. 47 In addition, the larger the TBSA, the more complex the use of sedative and analgesic medications. 48,49 These conditions increase the risk of delirium in patients.
Hb is the protein that transports oxygen within the red blood cells, and the trend in red blood cell and Hb content is consistent. In the study, the results of the multifactorial logistic regression anal-  in an easy-to-understand manner so as to improve the delirium prediction model. So, we are now embarking on a multi-canter, larger sample study to refine this prediction model and further improve its generalization.

| CON CLUS ION
In this study, we used machine learning algorithms incorporating seven preoperative variables to build nine predictive models of delirium in patients with extensive burns and compared the efficacy of the models using AUROC, DCA and the calibration plot to finally select and visualize the best performing predictive model. It is the first model to achieve individualized prediction of delirium in patients with extensive burns, and is simple to use with good performance and generalization capabilities. We recommend that the model be used routinely to predict the risk of delirium in patients with extensive burns and that aggressive interventions be made for high-risk patients to help reduce the incidence of delirium, thereby reducing the physical, psychological and financial burden on patients, as well as reducing the workload of doctors and nurses and allowing for a more rational allocation of healthcare resources.

ACK N OWLED G M ENTS
The authors would like to thank Huang Zhen for his help in technical aspects.

This study was supported by Science and Technology Research
Project of Education Department of Jiangxi Province [grant numbers GJJ2200141].

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.