Predicting the risk of mortality and rehospitalization in heart failure patients: A retrospective cohort study by machine learning approach

Abstract Background Heart failure (HF) is a global problem, affecting more than 26 million people worldwide. This study evaluated the performance of 10 machine learning (ML) algorithms and chose the best algorithm to predict mortality and readmission of HF patients by using The Fasa Registry on Systolic HF (FaRSH) database. Hypothesis ML algorithms may better identify patients at increased risk of HF readmission or death with demographic and clinical data. Methods Through comprehensive evaluation, the best‐performing model was used for prediction. Finally, all the trained models were applied to the test data, which included 20% of the total data. For the final evaluation and comparison of the models, five metrics were used: accuracy, F1‐score, sensitivity, specificity and Area Under Curve (AUC). Results Ten ML algorithms were evaluated. The CatBoost (CAT) algorithm uses a series of decision tree models to create a nonlinear model, and this CAT algorithm performed the best of the 10 models studied. According to the three final outcomes from this study, which involved 2488 participants, 366 (14.7%) of the patients were readmitted to the hospital, 97 (3.9%) of the patients died within 1 month of the follow‐up, and 342 (13.7%) of the patients died within 1 year of the follow‐up. The most significant variables to predict the events were length of stay in the hospital, hemoglobin level, and family history of MI. Conclusions The ML‐based risk stratification tool was able to assess the risk of 5‐year all‐cause mortality and readmission in patients with HF. ML could provide an explicit explanation of individualized risk prediction and give physicians an intuitive understanding of the influence of critical features in the model.


| INTRODUCTION
Heart failure (HF) is a global problem, affecting more than 26 million people worldwide. 1In the United States, over 1 million people are hospitalized due to HF yearly. 2,3In Iran, results from the Persian Registry of Cardiovascular Disease/heart failure (PROVE/HF) study in Isfahan province showed that the annual rate of readmission and mortality were high. 4For these patients, readmission or death in the postdischarge phase was problematic.In Europe, the mortality rate is 7% within the year and increases to 26.7% each year after hospitalization for HF, 5 and readmission rate within the year is between 20% and 25%. 6,7][10][11] HF enforces a giant economic burden.Global costs associated with HF are projected to rise to approximately $400 billion by 2030. 12though advances in medical and percutaneous therapies such as percutaneous coronary intervention (PCI), angiography, echocardiography, and low high-density lipoprotein (HDL) cholesterol, 13 copeptin, 14 and B-type natriuretic peptide 15 factors have been used to predict HF, 16,17 HF patients are vulnerable to hospital readmission, high mortality, critical damage to the quality of life, and results in significant financial stress on the public health-care system. 17,18 recent years, artificial intelligence has increasingly penetrated the medical field to detect diseases. 19Identifying disease patterns by processing information in artificial intelligence can help provide better healthcare to patients. 19Confidence in using machine learning algorithms has increased in the health sector due to their ability to consider complex relationships between clinical parameters using complex mathematical formulations. 20 algorithms are used in many fields of medicine, such as diagnosis, prediction, treatment, and explanation of medical images. 21,22The ML algorithm informs the physicians and patients about the prognosis of the disease, decision-making, disease management, end-of-life preferences and increases the motivation of patients to follow the treatment. 23 et al. compared the performance of the XG Boost prediction model with three other models by examining 2098 intensive care unit (ICU) patients in a retrospective cohort study.Finally, the XG Boost model had the highest prediction performance among the four models, with an AUC of 0.824.In contrast, SVM had the weakest ability, and the average blood urea nitrogen (BUN) was recognized as the most critical predictor variable. 24A study by Peng et al. aimed to select a ML algorithm to predict 28-day mortality in HF patients with hypertension in the ICU.Neural network (NN) models had the best predictive performance in the test set and external validation cohort, respectively, outperforming traditional logistic regression analysis. 25Angraal et al. conducted a study to predict mortality and readmission of HF patients with preserved ejection fraction (HFPEF).They used Logistic Regression (LR), Random Forest (RF), gradient descent, and SVM models during 3 years of follow-up.The RF model had the best performance for predicting mortality with AUC of 0.72 and readmission with AUC of 0.76. 23nsidering the conflicting results in the available evidence and limited studies in predicting all the consequences of HF patients, we decided to conduct a study for the first time in southeastern Iran aiming at predicting the mortality and readmission of HF patients using the latest machine learning algorithms based on registry data and examining the outcomes of HF.

| Study design
This retrospective cohort study used the records from the Fasa Registry on Systolic Heart Failure (FaRSH).Fasa, a city of around 250,000 inhabitants, is located in Fars province in southwest Iran.
This study gathered data from the people of Fasa and 34 surrounding towns and villages.This research included participants hospitalized due to HF, whether they were acute new-onset HF or individuals diagnosed with acute decompensation of chronic HF.

| Data source
We used the FaRSH database, which consists of 2488 patients with HF.All patients with the census method were included.The research was meant to provide a 1-year follow-up for each participant with a recruiting period from March 2015 to March 2020, with the follow-up ending in March 2025.All people included in the registry were evaluated based on their admission and discharge diagnosis of systolic HF, as determined by the attending cardiologists who examined patients daily and used the International Classification of Diseases, Tenth Revision (ICD-10) coding (Supporting Information S1: Table 1). 26

| Outcomes
After being admitted to the hospital and having their information documented, the patients were followed for 1 month and 1 year, gathered into our three outcomes: 1-month mortality, 1-year mortality, and hospital readmission, respectively.The outcomes were analyzed as dummy variables.The first two outcomes were given as dead or living, whereas the third was hospitalization or not for the second time due to HF.

| Predictors
Fifty-seven factors were evaluated as independent variables to predict outcomes based on a literature review and clinical relevance connected to HF.All these factors were entered into and assessed using machine learning algorithms.Meanwhile, in machine learning, several continuous variables were classified to boost computation speed, known as discretization. 27wo nurses were exceptionally trained for patient evaluation and data entry.For at least 1 month, the nurses were taught by a single supervising senior cardiology nurse and five collaborating cardiologists.They documented information on patients at the time of their admission.
Patients' demographic and anthropometric information, such as age, gender, ethnicity, marital status, place of residence, waist circumference, and body mass index (BMI), were obtained.People were separated into three groups based on ethnicity: Arabs, Persians, and others.They were split into two categories based on their marital status: married and unmarried.People were sorted into three categories based on their BMI: normal (<18.5),overweight (18.5-24.9),and obese (>25). 23e waist circumference cut-off values for abdominal obesity in males and females were 102 and 88 cm, respectively. 28,29They were asked about smoking cigarette and their opium usage.The patients were divided into four groups based on the New York Heart Association (NYHA) classification.In addition, information about the participants' underlying disorders and diseases throughout their hospitalization was gathered.The conditions examined as factors were dilated cardiomyopathy, right ventricular (RV) failure, previous myocardial infarction (MI), atrial fibrillation/flutter (AF), chronic obstructive pulmonary disease (COPD), heart valve disease, prior stroke, left bundle branch block (LBBB), hypertension, and diabetes.In addition, patients were split into three groups depending on previous chronic HF hospitalization: not hospitalized, hospitalized less than 30 days ago, and hospitalized more than 30 days ago.Finally, they were asked about the duration of their HF, which was divided into two categories: over 6 months and below 6 months.The participants' medical histories were fully gathered before and after their hospitalization.This study's desired therapeutic factors were angiotensin-converting enzyme (ACE) inhibitors, β-blockers, mineralocorticoid receptor antagonists (MRAs), diuretics, digitalis, statins, long-acting nitrates, anticoagulants, intravenous inotropic, and acetylsalicylic acid (ASA) or antiplatelet.
Information on device therapy in patients was collected, and they were separated into four groups, including pacemaker, implantable cardioverter-defibrillator (ICD), cardiac resynchronization therapy defibrillator (CRT-D), and having no device.In addition, they provided information about invasive procedures such as coronary artery bypass graft (CABG) and percutaneous coronary intervention (PCI).During hospitalization, blood samples were taken, and levels of hemoglobin, cholesterol, triglyceride, high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), sodium level, potassium level, random blood sugar, white blood cell (WBC), and creatinine were retrieved.The Cockcroft-Gault formula was used to calculate glomerular filtration rate (GFR).The cut-off hemoglobin level was set at 13 for males and 12 for females. 30The cholesterol, triglyceride, and LDL threshold values were 200, 150, and 130, respectively.HDL < 50 (female) or HDL < 40 (male) was considered abnormally low levels. 30The patients were separated into three groups based on their sodium and potassium levels: normal, hypo, and hyper.The remaining laboratory variables were continually entered into the devices.Based on their heart rate (HR), patients were categorized into three groups: bradycardia (HR < 60), normocardia (60 ≤ HR ≤ 100), and tachycardia (100 < HR).The variables of systolic and diastolic blood pressure were evaluated quantitatively.The patients were asked about stroke, HF, and MI in their first-degree relatives.Depending on their caregivers, people were separated into two hospital and clinical care categories.During their stay, patients' electrocardiograms (ECGs) were collected and classified into one of four categories: sinus rhythm, pacemaker rhythm, atrial fibrillation, and rhythms.Transthoracic echocardiography was done on the patients throughout their hospital-

| Normalization
The data should be modified to minimize computations and improve model accuracy.Two approaches feature scaling and one-hot encoding were employed in this procedure.The first approach transformed the values of all continuous variables to a range of −1 to +1.For variables with more than two categories, the second technique was utilized, and categorical variables were transformed into numerical variables consisting of zeros and ones.

| Splitting total data
All data were first split into training and test sets.Test data contained 20% of the entire data.The machine algorithm utilized the training set to teach itself, while the testing set was used to assess the classifier's prediction error rate after learning.

| MACHINE LEARNING MODELS
In this study, 10 supervised machine learning algorithms were utilized, which comprised LR, decision tree (DT), SVM, RF, Gaussian Naive Bayes (GNB), linear discriminant analysis (LDA), K-nearest neighbors (KNN), gradient maximization (GBM), XGBoost (XGB), and CAT Boost (CAT).These ML models use a variety of methodologies and complicated calculations.Depending on the data type, they can carry out various functions.As a result, by applying many models, we may eventually identify the optimum model for our data.Anaconda (Version 4.12.0)implemented all ML algorithms on the Jupyter Notebook Platform (Version 3.3.2).Furthermore, the machine algorithms were run using the Scikit-Learn Module (Version 1.1.3).
Logistic regression: Logistic regression is a traditional machine learning method used to predict the probability of an event occurring.
This method creates a linear model on the training data by using a logistic function to calculate the probability of an event.
Decision tree: A decision tree is a machine learning method used to classify data.This method creates a nonlinear model on the training data using a series of rules.
Support vector machine: SVM is a machine learning technique used to classify data.This method creates a nonlinear model on the training data by using a boundary line to separate two classes of data.
Random forest: Random forest is a machine learning method that uses a set of decision trees to create a nonlinear model.
Gaussian Naive Bayes: Gaussian Naive Bayes is a machine learning method that uses a Gaussian probability distribution to calculate the probability of an event occurring.
Linear discriminant analysis: Linear discriminant analysis (LDA) is a machine learning method used to classify data.This method creates a model on the training data by using a linear model to separate two classes of data.
K-nearest neighbors: K-nearest neighbors (KNN) is a machine learning method used to classify data.This method builds a model on the training data using the K samples closest to the new sample.
Gradient maximization: Gradient maximization (GBM) is a machine learning method that uses a series of decision tree models to create a nonlinear model.
XGBoost: XGBoost is a machine learning method that uses a series of decision tree models to build a nonlinear model.
CatBoost: CatBoost is a machine learning method that uses a series of decision tree models to create a nonlinear model.
F I G U R E 1 Top variable importance values for predicting hospital readmission, 1-month mortality, and 1-year mortality in heart failure patients using Cat boost (CAT).ARB, angiotensin receptor blockers; CABG, coronary artery bypass graft; COPD, chronic obstructive pulmonary disease; GFR, glomerular filtration rate; LBBB, left bundle branch block; MI, myocardial infarction; MRA, mineralocorticoid receptor antagonist; NYHA classification, New York Heart Association Classification; PCI, percutaneous coronary intervention; WBC, white blood count.
We were interested in comparing the performance of these machine learning techniques in predicting the prognosis of HF.
These techniques use a wide variety of algorithms and approaches, so we believe that this comparison can provide valuable insight into the best technique for this task.

| Data augmentation and hyperparameter tuning
When dealing with asymmetric data sets, data imbalance is a problem.For example, the data set would be imbalanced, if categorical outcomes were distributed unequally.According to the three final results of this study, which included 2488 patients, only 366 (14.7%) were readmitted to the hospital, 97 (3.9%) died within a month of the follow-up, and 342 (13.7%) died within a year of the follow-up.To protect the machine learning process, the ratio of our output values should be shared equally.If this is not done, the models may be more accurate, but the F1 score for predicting death and rehospitalization will likely be much lower, and we will not achieve the primary aim of this study. 32Oversampling was utilized to balance the values of each outcome, and data with the outcomes of death and rehospitalization were obtained.The Synthetic Minority Oversampling Technique (SMOTE) is one of the most remarkable oversampling techniques.In this method, the minority class is oversampled by creating "synthetic" instances rather than oversampling using replacement.SMOTE selects samples from the minority class and creates "fake" samples along the same line segment, linking some or all of the minority class's KNN. 33e fivefold cross-validation and hyperparameter tuning approaches were used to train the models and determine the optimal values for each model.Both techniques were only utilized on training data.All of the training data was divided into five equal parts in the fivefold approach, and each time one of the parts was considered validation data, it trained itself.It reported the accuracy, and lastly, the average of all five accuracies was obtained.Each machine's accuracy may now be altered by adjusting its hyperparameters.A different variety of hyperparameters was employed in the hyperparameter tuning procedure. 34To find the optimal combination of hyperparameters.
Data leakage is an issue that frequently arises when SMOTE is used; thus, care should be taken while using SMOTE through a hyperparameter tuning procedure.Only four more components in each fold of the hyperparameter were augmented, not the validation data.

| Data augmentation and training the machine learning algorithms
Models were trained using the training data after the best hyperparameter combinations for each model were identified.In addition, 80% of the total data was in the training set, which is unbalanced.To help the models be trained more effectively, the training data set was augmented and transformed into a balanced data set.Moreover, 20% of the total data was in the test set, and the models had never seen this data before.
Test data is a helpful data set for assessing models since it is unbalanced like real-world data.

| Model evaluation
Finally, all the trained models were applied to the test data, including 20% of the total data.Five metrics were used for the final evaluation and comparison of the models: accuracy, sensitivity, specificity, F1 score, and AUC.The following equations are used to determine the evaluation metrics: where TP is the true positive rate, TN is the true negative rate, FP is the false positive rate, and FN is the false negative rate.AUC and F1 score are the ideal metrics to use to choose the optimal ML model due to the imbalanced structure of the data.

| Feature importance
The model with the greatest performance for all three outcomes was used to rank the variables, and the top predictive variables for each outcome were displayed in order.However, the one-by-one process of machine learning is summarized in Supporting Information S1: Figure 1.

| Ethics declarations
The study protocol was approved by the Research Council and the Ethics Committee of Fasa University of Medical Sciences (IR.FUMS.-REC.1401.113)and all methods were performed in accordance with the relevant guidelines and regulations to this effect.Informed consent was obtained from all subjects and/or their legal guardian(s).

| Basic characteristics of participants
According to the three outcomes from this study, which involved 2488 participants, 366 (14.7%) of the patients were readmitted to the hospital, 97 (3.9%) of the patients died within a month of the follow-up, and 342 (13.7%) of the patients died within a year of the follow-up.Supporting Information S1: Tables 2-4 give a descriptive analysis of variables according to the three planned outcomes.T A B L E 1 (Continued) 4.2 | Comparison of the performance of the machine algorithms

| Hospital readmission
The greatest AUC was associated with GNB (0.75), followed by CAT and LDA (0.74), LR and RF (0.73), and XGB (0.71).DT presented the least AUC (0.54).The AUC of the remaining models was 0.67, 0.63, and 0.59 for GBM, SVM, and KNN, respectively.CAT also had the highest F1 score with a value of 0.91.Table 2 contains all of the remaining information.

| One-month mortality
Ten models were used to predict 1-month mortality; the best AUC was associated with RF (0.62), followed by CAT and GNB (0.61), LR, GBM, LDA (0.60), and XGB (0.56).KNN had the least AUC (0.51).The remaining models were 0.55, and 0.52 for DT and SVM, respectively.Furthermore, CAT has the highest F1 score with a value of 0.13.Table 3 contains all of the remaining details.
In addition, in Supporting Information S1: Figure 2, the AUC of all models is represented visually, separated by three outcomes.
T A B L E 2 Performance of the machine learning algorithms based on hospital readmission.

| Feature importance
Because the CAT model performed best for practically all three of our outcomes, it was utilized to identify and rank the most important factors.Figure 1 displays significant predictors for each outcome.The length of stay in the hospital was the most important predictor of hospital readmission, followed by medical history of CABG, blood sugar, systolic blood pressure, and intravenous inotropic support type.Hemoglobin level was the most important predictor of 1-month mortality, followed by age, prior HF hospitalization, Body mass index, gender, and sodium level.Family history of MI was the most significant predictor of 1-year mortality, followed by sodium level, body mass index, length of stay in the hospital, systolic blood pressure, and diastolic blood pressure.

| DISCUSSION
This retrospective cohort study 23 settled and validated 10 ML models to predict mortality and readmission of patients with HF.The current study supports using the CAT algorithm for risk evaluation in the medical care of HF patients.The CAT algorithm outperformed LR, DT, GNB, RF, LAD, SVM, XGB, KNN, and GBM.
This study recognized some essential variables related to rehospitalization and mortality of patients with HF.In this study, length of stay in the hospital, medical history of CABG, blood sugar, systolic blood pressure, and intravenous inotropic support type were known as the most significant predictor variables related to readmission.The hemoglobin level was the most important predictor of 1-month mortality, followed by age, prior HF hospitalization, Body mass index, gender, and sodium level.Family history of MI was the most critical predictor of 1-year mortality.
Several studies have predicted readmission and mortality among HF patients using different ML models.Although, it is hard to compare the outcomes of these studies because each study evaluated various features of HF patients. 35 et al. and Sun et al. showed that CAT had been used to predict the mortality of patients and may help the physician in decisionmaking. 24,36The same technique can be used in any cohort research, allowing the algorithm to be revalidated over time to reflect an active population.Finally, predictive ML aims to recognize vulnerable HF patients and take action in the postdischarge period to reduce the future risk of HF.There was no evidence that CAT could be used to predict readmission.For example, Sun et al. developed and validated seven algorithms to predict the mortality of HF patients with hypoxic hepatitis (HH).In these algorithms, the authors used the Kaplan-Meier and multivariate Cox analysis to determine the effect of HH on the mortality of HF patients.Internal and external validation suggested that the CAT algorithm had a higher ability than the other algorithms (internal validation: AUC, 0.832; 95% CI, 0.819-0.845;external validation: AUC, 0.757 95% CI, 0.739-0.776). 36In another study, Li et al. validated 11 ML algorithms to predict mortality in mechanically ventilated patients with HF; that CAT algorithm was validated using an external validation and showed the best performance (AUC = 0.806). 24e CAT algorithm works by building a set of decision trees, where each new tree is trained to correct previous errors.This is done sequentially, meaning that each tree is built using information from previous trees.This iterative process continues until a certain stopping criterion such as maximum number of trees or minimum error rate is met. 37 our study, the most important predictors for readmission, 1-month mortality, and 1-year mortality were, respectively, length of stay in the hospital, hemoglobin level, and family history of MI.In the previous study, 41 there was a relationship between the length of stay in the hospital and readmission.Still, the previous study, 41 unlike our study, showed a negative relationship between the length of stay and the possibility of readmission, especially in the case of a heart attack.
Extending the length of stay for some patients may be a means to improve the quality of care by reducing readmissions during the 30-day postdischarge period, they stated in their study.The difference was seen, maybe because, in our study, the length of stay in the hospital indicates a worse condition of the disease, which leads to readmission for HF patients.
Regarding 1-month mortality, as seen in our study, the importance of hemoglobin level was confirmed.In a previous study, 42 it was also stated that patients with anemia were more exposed to inhospital complications such as HF, frequent ischemia, reinfarction, cardiogenic shock, stroke, and major bleeding.Hemoglobin level has a significant effect on the prognosis of HF patients, but other factors such as the age of the patients and the type of HF should also be considered. 43Also, this study states that anemia or in-hospital mortality is associated with 1-month and 1-year mortality.Therefore, anemia and other risk scores should be considered in the initial risk assessment.
Previous studies showed that age was the main predictor of mortality and readmission of HF patients by ML models. 25,35,36,44en HF gets worse, particularly in old age, it can cause severe ischemia, respiratory failure, and death. 25It has been emphasized in various studies [45][46][47][48][49] that older age has been consistently associated with worse outcomes.This issue can be critical, especially considering the aging of the population and the effect of age on HF.Of course, our study did not deal with the nonlinear relationship of age, which can be a topic investigated in other studies.
Also, our study is similar to the previous study 50 and shows the association between a family history of MI and HF mortality.In the previous study, the family history of MI is an independent risk factor for coronary heart disease (CHD) mortality, which differs in terms of the effect based on the gender of the index person and the type of family relationship.The life-course socioeconomic position has little impact on the association between family history and CHD, suggesting that this factor does not confound this effect.This issue may indicate the conditions of stable family eating habits, the same environmental conditions, or genetics that can affect the mortality of HF, which can be specifically investigated in other studies.
As a retrospective analysis, this study had limitations.First, our algorithm was made from a center data set that may not be suitable for another population.But, our algorithm performed well in the internal data set because all cardiac patients from Fasa and the peripheral village came to this center.Second, our models performance, dependent on the data sets accuracy, was collected from patients, so we used trained Employees to collect data.Third, the data set did not contain patients' psychosocial information, which might improve the performance of ML models.Despite these limitations, ML models have a primary role in preventing rehospitalization, reducing mortality, improving patients' quality of life, and decreasing health costs.

| CONCLUSION
This study recognized some essential variables related to rehospitalization and mortality of patients with HF.In this study, length of stay in the hospital, hemoglobin level, and family history of MI were known as the most significant predictor variables related to readmission, 1-month mortality, and 1-year mortality.Health policymakers and managers should pay attention to these features to reduce mortality and readmission of HF patients by improving the quality of life, paying attention to the elderly, and providing free health care services.Doctors and clinical staff can identify HF patients as soon as possible and do the necessary procedures to prevent disease development.In fact, predictive models based on machine learning can help doctors identify HF patients at high risk of readmission or death.This information can help doctors take more preventive measures, such as adjusting medications or providing special care services.

Feature selection approaches play
a crucial role in achieving efficient data reduction, which is essential for developing accurate models.In this work, we employed recursive feature elimination (RFE) in conjunction with a tree-based machine learning model to implement a wrapper technique.RFE is an iterative process that involves repeatedly training a machine learning model and removing the lowest-ranking features, ultimately identifying the most relevant predictors.By utilizing RFE, we were able to identify the ideal number of variables and their combinations for predicting hospital readmission (20 variables), 1-month mortality (25 variables), and 1year mortality (15 variables).Figure1illustrates the top predictors for each outcome.
Performance of the machine learning algorithms based on 1-month mortality.
Performance of the machine learning algorithms based on 1-year mortality.