Using machine learning to predict adverse events in acute coronary syndrome: A retrospective study

Abstract Background Up to 30% of patients with acute coronary syndrome (ACS) die from adverse events, mainly renal failure and myocardial infarction (MI). Accurate prediction of adverse events is therefore essential to improve patient prognosis. Hypothesis Machine learning (ML) methods can accurately identify risk factors and predict adverse events. Methods A total of 5240 patients diagnosed with ACS who underwent PCI were enrolled and followed for 1 year. Support vector machine, extreme gradient boosting, adaptive boosting, K‐nearest neighbors, random forest, decision tree, categorical boosting, and linear discriminant analysis (LDA) were developed with 10‐fold cross‐validation to predict acute kidney injury (AKI), MI during hospitalization, and all‐cause mortality within 1 year. Features with mean Shapley Additive exPlanations score >0.1 were screened by XGBoost method as input for model construction. Accuracy, F1 score, area under curve (AUC), and precision/recall curve were used to evaluate the performance of the models. Results Overall, 2.6% of patients died within 1 year, 4.2% had AKI, and 4.7% had MI during hospitalization. The LDA model was superior to the other seven ML models, with an AUC of 0.83, F1 score of 0.90, accuracy of 0.85, recall of 0.85, specificity of 0.68, and precision of 0.99 in predicting all‐cause mortality. For AKI and MI, the LDA model also showed good discriminating capacity with an AUC of 0.74. Conclusion The LDA model, using easily accessible variables from in‐hospital patients, showed the potential to effectively predict the risk of adverse events and mortality within 1 year in ACS patients after PCI.

Acute coronary syndrome (ACS) is a common type of cardiovascular disease and represents one of the leading causes of death worldwide. 1Over the past decades, with the advancement of percutaneous coronary intervention (PCI) and other therapeutic strategies, the outcomes have improved among patients.However, up to 30% of patients still suffer from adverse events, which mainly include ischemia, bleeding, kidney failure, and death. 2 Thus, accurate prediction of adverse events, identification of risk factors, and strengthening the management of high-risk patients are essential to improve the prognosis of patients.However, the prognoses of ACS patients are influenced by diverse pathological transformations and individual variations, 3 rendering the accuracy of these scores insufficient for personalized patient management strategies amidst the advancement of precision medicine.Furthermore, the drawback of underestimating or overestimating risks in patients with dissimilar baseline characteristics must not be overlooked.Moreover, most risk prediction algorithms were initially developed for unique ethnicities and may not be suitable for other populations.Based on the traditional statistically derived risk prediction models, the correlation among variables, heterogeneity, nonlinearity, and overfitting also restricted the application, especially in multifaceted data sets with large numbers of features. 4chine learning (ML) methods can overcome the shortcomings of current prediction risk models.As an important branch of artificial intelligence, the advantage of ML is using computer algorithms to identify characteristics in large data sets with numerous, multidimensional, and nonlinear relationships among clinical features to predict various outcomes. 5Thus, ML has become a promising adjunct to prevention, diagnosis, treatment, and clinical decision support.A representative ML-based prediction model is Predicting with Artificial Intelligence Risk after Acute Coronary Syndromes (PRAISE). 6This model showed high accuracy in detecting the risk of all-cause mortality, recurrent acute MI, and major bleeding in ACS patients within 1 year after discharge.However, the study mainly included European samples with little to no inclusion of Asian and African individuals.Some procedure-related variables and angiographical features, such as the number of diseased vessels, were not included in the model, which may influence the outcome of patients.
The aim of this study was to develop an ML-based risk stratification model integrating demographics, concomitant drugs and diseases to predict perioperative MI, acute kidney injury (AKI) during hospitalization, and all-cause mortality within 1 year in patients with ACS undergoing PCI.

| Data sets
As shown in Figure 1  | 1595 evidence should include new pathological Q waves, imaging evidence, or procedure-related complications. 7AKI was considered an absolute increase in serum creatinine level of 0.3 mg/dL within 48 hours or a relative increase of 1.5 times baseline within the prior 7 days according to the guidelines of the Kidney Disease Improving Global Outcomes. 8

| Data processing and feature selection
All data included in the study were extracted from the electronic medical record (EMR) system using SQL.For data processing, the missing values, which were less than 5%, were replaced with the mean value for continuous variables.Moreover, the associated myocardial enzymes were discarded because they included over 50% missing values.All continuous variables were z-standardized, and categorical variables were one-hot encoded.To address the problem of imbalanced classes, the SMOTEENN algorithm was applied to process the original data.
Forty-four potential features that had an impact on ACS were selected based on extensive literature: sex, age, current smokers, BMI, diabetes, hypertension, hyperlipidemia, previous PCI, chronic kidney disease, AKI, diagnosis, numbers of diseased vessels (NDV), Killip class, β-blockers, angiotensin converting enzyme inhibitors, angiotensin receptor blockers (ARB), calcium channel blocker, morphine, diuretics, P2Y12 inhibitor, hemoglobin (Hb), white blood cell (WBC), red blood cell (RBC), platelet count (PLT), mean platelet volume (MPV), platelet distribution width (PDW), platelet large cell ratio (P_LCR), platelet to lymphocyte ratio (PLR), neutrophil to lymphocyte ratio (NLR), neutrophil (Neu), lymphocyte (Lym), highdensity lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), triglyceride (TG), total cholesterol (TC), troponin, brain natriuretic peptide (BNP), glucose, heart rate, systolic blood pressure, diastolic blood pressure, mean blood pressure, uric acid, STsegment elevation MI, and unstable angina/non-ST-segment elevation MI.The importance rank of the features was derived from the Shapley Additive exPlanations (SHAP) mean score by the XGBoost method, and a feature importance score >0.1 was subsequently selected as an input in model building.In addition, the SHAP approach was used to explain the effects of all feature contributions on the outcome of each patient. 9

| Model development and validation
After the selection of important variables, the derivation cohort was randomly split into two data sets: a training (70%) cohort and an internal validation (30%) cohort.Eight common ML classifiers, support vector machine (SVM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), K-nearest neighbors (KNN), random forest (RF), decision tree, categorical boosting (CatBoost), and linear discriminant analysis (LDA), were used to predict adverse events.A ten-fold cross-validation method was used to train the data sets in all models.Receiver operating characteristic (ROC) curves along with the area under the curve (AUC), accuracy, recall, specificity, precision, and F1 score were used to evaluate the performance of the models.A specific formula was calculated according to the confusion matrix (true positive, true negative, false positive, false negative).Then, the precision/recall curve was used to evaluate binary decision problems on imbalanced data sets. 10

| Statistical analysis
Continuous variables are displayed as the mean and standard deviation, and categorial variables are displayed as percentages or numbers.The t-test or Mann−Whitney U test was used for continuous variables when appropriate, and the χ 2 test or Fisher's exact test was used for categorical variables.A two-tailed p < .05 was considered statistically significant.SPSS 25.0 and Python 3.6 were used for the study.

| Baseline characteristics
As mentioned before, 5240 patients were included in the study.
Table 1 shows the characteristics of all cohorts.Overall, the average age of all cohorts was 63.1 years, with 72.8% male.A total of 61.4% and 36% of patients had hypertension and diabetes, respectively.A total of 2.6% of patients died within 1 year, 4.2% had AKI, and 4.7% had MI during hospitalization (the characteristics of AKI and MI cohorts are shown in Supporting Information: Table S1).Age was higher in the death, AKI, and MI cohorts than in each of the corresponding cohorts.Specifically, except for the use of ARBs, diabetes, smokers, lipid-related indicators, P-LCR, PDW, MPV, platelets, and TC, other clinical variables showed a statistically significant difference between the nondeath and death groups.

| The selected features and ML models
The variable importance scores for each outcome according to the permutation importance method in the XGBoost model are shown in Supporting Information: Figure S1.To interpret ML models, SHAP T A B L E 1 Baseline characteristics of all patients according to all-cause mortality.| 1597 values were used to visualize and explain how these features affect events.Figure 2 summarizes the SHAP value plot by combining feature importance with feature effects.Overall, approximately 30 variables were selected based on the SHAP p value >.1 for the outcomes.Herat function (BNP, troponin, Killip, NDV), age, uric acid, and cell blood count (WBC, RBC, Hb, Neu, Lym) were ranked as relatively important features for each outcome prediction.In addition, lipid metabolism (TG, TC, HDL-C, LDL-C), glucose, blood pressure, and platelet-associated parameters (PLT, MPV, PDW, P-LCR) were also important predictors for all three outcomes.For AKI, morphine was only selected as a potential predictor that reflects the specific underlying pathogenesis of disease development.

| Prediction of the ML models
For all-cause mortality, the discriminative performance of the eight ML models is displayed by the ROC curves in Figure 3. Supporting Information: Table S2 presents the confusion matrix and evaluation metrics for all ML models.The LDA model exhibited the best discrimination with an AUC of 0.83, followed by RF (AUC 0.81), XGBoost (AUC 0.79), CatBoost (AUC 0.79), SVM (AUC 0.74), and AdaBoost (AUC 0.72).The decision tree and KNN models performed worst, with AUCs of 0.61 and 0.68, respectively.The range of specificity was 0.2−0.68,and the recall was 0.81−0.96.In general, the LDA model had the best performance among the models when comprehensively evaluating the AUC, accuracy, specificity, recall, precision, and F1 score.Additionally, the PR curves of all models are shown in Figure 3. Figure 4 shows the performance of a single LDA model with an AUC of 0.83 for all-cause mortality, 0.74 for AKI, and 0.74 for MI (more details in Supporting Information: Table S3).
Overall, the LDA model was better than the other models for predicting all-cause mortality, AKI, and MI in ACS patients.

| DISCUSSION
In this study, we demonstrated that ML models showed good discrimination for the prediction of all-cause mortality, AKI, and MI in patients with ACS who underwent PCI.Additionally, we identified important predictors of adverse events from EHR.These findings enrich the risk factors for events and have potential future applications in clinical practice.According to the abnormal indicators, early intervention to improve the prognosis of patients is necessary.
Identifying patients at high risk of developing AKI, MI, and poor outcomes remains a challenge in cardiovascular medicine. 11,12though traditional risk factors are helpful to identify high-risk populations, they are limited for individual risk assessment.Even Note: Continuous variables are displayed as mean ± standard deviation, while categorical variables are as numbers (percentage %).
when using global summary scores, over-or undertreatment is inevitable.Thus, accurate prediction of adverse events still represents an unmet need.ML algorithms have achieved good performance when assessing high-dimensional and nonlinear relations among features. 6 contrast to the prior study, we included 44 variables and employed eight machine-learning models.We demonstrated the predictive power of the models, and the LDA model was superior to the other models, with an average AUC of 0. | 1599 and prognosis of ACS. 13,14However, only one ML algorithm was used to predict in-hospital mortality, 30-day CHF rehospitalization, and 180-day cardiovascular death.
Age, BMI, heart function, lipid metabolism, uric acid, glucose, platelet-associated parameters, and concomitant drugs were considered important variables to predict adverse events.Consistent with a previous study, older patients usually suffered from concomitant comorbidities, such as hypertension, diabetes, dyslipidemia, and kidney disease. 15esity, manifested as a higher BMI, is considered a risk factor for mortality in the general population.However, an obesity paradox phenomenon, higher BMI with better prognosis, was found in ACS patients. 16In our study, BMI was slightly lower in the all-cause death cohort, and the so-called "obesity paradox" was still not well understood.
Many studies have confirmed that uric acid is an independent risk factor for long-term all-cause death in ACS patients. 17Moreover, impairment of glucose and lipid metabolism triggers the development of artery plaques and induces the occurrence of adverse events. 18Platelets play a vital role in ACS, and their reactivity influences prognosis.Studies have reported that a baseline higher MPV is associated with more cardiovascular events. 19A meta-analysis of eight studies also indicated that higher platelet levels at baseline increased the risk of mortality in ACS patients. 20BNP, troponin, Killip class, and multivessel lesions are considered markers of cardiac function.Additionally, these markers have been developed as diagnostic and prognostic tools for ACS. 21rioperative MI and contrast-induced AKI (CI-AKI) could lead to markedly high mortality after PCI, and we innovatively included perioperative MI and AKI for predicting all-cause mortality.However, our model did not reveal them as important confounding variables, and the performance of AUC did not obtain a great improvement.In other words, the adverse effects may be masked by other factors.Additionally, the inflammatory response plays an important role in the progression of atherosclerosis, and PLR and NLR have been the classical markers of inflammation in cardiovascular disease.A previous study demonstrated a higher occurrence of major adverse cardiovascular and cerebrovascular events (MACCEs) following an increased PLR-NLR. 22Therefore, these inflammatory indices were also used as variables to predict prognosis in our study.4][25] However, these models were focused on comparing the performance of ML models and traditional logistic regression and GRACE scores.Additionally, they used several variables and one ML algorithm, which may affect the interpretation of the model.

AKI is a common complication among patients undergoing
interventional procedures, with a reported incidence of up to 30%. 26I is directly associated with both a fivefold increase in intrahospital mortality and an increased risk of end-stage renal failure in the long term. 27An updated simple risk score (  28 Hence, this score could be widely utilized in the clinic based on traditional methods.However, the logistic regression model is theory-driven and requires several putative conditions, such as multiple collinearity problems, a linear relationship between the dependent and logit value of the independent variable, and no outliers.The ML model integrated preoperative variables and intraoperative time-series physiological data to predict AKI after cardiac surgery, and the performance was superior to that of the traditional logistic regression model (AUC 0.843 vs. 0.806). 29Our results revealed additional novel predictors, such as the use of morphine during hospitalization.Morphine is an opioid and has been recommended for the management of acute chest pain in ACS patients. 30Evidence has suggested that opioid overdose can lead to AKI due to dehydration, hypotension, rhabdomyolysis, and urinary retention. 31 our study, the AUCs of the ML models were more than 0.70, indicating a good discrimination capacity.Along with the popularization of PCI, there has been an increase in clinical data for patients.
Hence, the application of the ML method is useful for big data analysis.From a clinical point of view, early assessment of baseline candidate factors on the outcomes could offer targeting of modifiable factors to further improve the prognosis of ACS patients.
There are still some limitations that should be noted in our study.
First, this is a retrospective, single-center study, making it vulnerable to bias.However, these data, which were derived from the EMR system, reflect real-world clinical practice and have high generalizability.Second, we focused on 1-year mortality and AKI.
Future models could strive to predict the prevention, diagnosis, and therapy of ACS.Third, although as many relevant characteristics were collected as possible, no intraoperative and postoperative variables were gathered in the study, which may augment the performance of the current ML model.

| CONCLUSION
In conclusion, our study demonstrated that an LDA model based on easily available variables of patients in-hospital has the potential to predict the risk of all-cause mortality within 1 year and perioperative AKI for ACS patients following PCI.
Massive efforts have been made to construct predictive risk score models that could serve as tools to guide clinical practice and decision-making.Currently, several influential predictive risk scores, which mainly include Thrombolysis in Myocardial Infarction (TIMI), Global Registry in Acute Coronary Events (GRACE), Patterns of Non-Adherence to Antiplatelet Regimen in Stented Patients (PARIS), Dual Antiplatelet Therapy (DAPT), and Predicting Bleeding Complications in Patients Undergoing Stent Implantation and Subsequent DAPT (PRECISE-DAPT), have been used in the clinic.These scores have been developed to predict ischemia, bleeding, and death for patients with ACS or undergoing DAPT after coronary artery stent implantation.
, a total of 7409 patients undergoing coronary angiography and successful PCI who were hospitalized at the Department of Cardiology of the Third Xiangya Hospital from June 2007 to June 2021 were followed for 1 year.The exclusion criteria were (a) patients who underwent multiple surgeries (n = 2111); (b) patients with a hospital stay of less than 1 day (n = 27); and (c) patients who presented with dialysis (n = 31).In all, 5240 patients were included in the data sets.The study was conducted in accordance with the Declaration of Helsinki's ethical guidelines and approved by the Medical Ethics Committee of the Third Xiangya Hospital (Ethics approval number: R18030).Considering the minimal risk, the requirement for informed consent was waived for this study.

2. 2 |
Study outcomes All-cause mortality within 1 year was determined according to the 10th Revision of International Classification of Disease (ICD-10).MI associated with PCI was defined by an elevation of cardiac troponin (cTn) values >5 times the 99th percentile URL in patients with normal baseline values within 48 hours after PCI.In addition, the F I G U R E 1 Flow chart.ACS, acute coronary syndrome; PCI, percutaneous coronary intervention.SONG ET AL.
Accuracy = (TP + TN)/(TP+FP + TN+FN) Recall = TP/(TP + FN) Specificity = TN/(TN + FP) Precision = TP/(TP + FP) F1 score = 2 × precision × recall/(precision + recall) 83.A previous study verified the usefulness of ML techniques in predicting the diagnosis F I G U R E 2 SHAP summary plot of the features by using the XGBoost model.A dot represents a patient and is created for each feature attribution value for the model.Dots are colored according to the values of features for the respective patient and accumulate vertically to depict density.The dot color is redder as the feature value increases and bluer as the feature value decreases.For the adverse event outcome, a higher SHAP value of a feature reflects a higher probability.If the SHAP value of a feature increases in the same direction as the x-axis, this indicates that an increase in the value of the feature increases the incidence of adverse events.Conversely, it indicates that an increase in the value of the feature decreases the incidence of adverse events.(A) SHAP value for all-cause mortality.(B) SHAP value for AKI.(C) SHAP value for MI.AKI, acute kidney injury; MI, myocardial infarction; SHAP, Shapley Additive exPlanations.F I G U R E 3 Receiver operating curves and PR curves.In this study, we trained eight models: KNN, SVM, AdaBoost, XGBoost, CatBoost, random forest, decision tree, and LDA.(A) ROC curve; (B) PR curve.KNN, K-nearest neighbors; LDA, linear discriminant analysis; PR, precision/ recall; ROC, receiver operating characteristic; SVM, support vector machine.SONG ET AL.

F I G U R E 4
Receiver operating curve of the LDA model for all-cause mortality, AKI, and MI.(A) All-cause mortality; (B) AKI; (C) MI.AKI, acute kidney injury; LDA, linear discriminant analysis; MI, myocardial infarction.