Predicting 90 day acute heart failure readmission and death using machine learning‐supported decision analysis

Readmission or death soon after heart failure (HF) admission is a significant problem. Traditional analyses for predicting such events often fail to consider the gamut of characteristics that may contribute– tending to focus on 30‐day outcomes even though the window of increased vulnerability may last up to 90 days. Risk assessments incorporating machine learning (ML) methods may be better suited than traditional statistical analyses alone to sort through multitude of data in the electronic health record (EHR) and identify patients at higher risk.

or death in the post-discharge phase after an incident HF admission is particularly problematic. Up to 25% of HF, patients may be readmitted within 30 days of discharge, with an additional mortality risk of 10%. [4][5][6][7] In the past few years, multiple HF risk-prediction models have been proposed to identify HF patients at highest risk of 30-day readmission or death-with the focus on 30-day outcomes, in part related to reporting standards and reimbursement penalties in the U.S. [8][9][10][11] However, window of vulnerability is likely longer, with the spike in event rates occurring out 90 days post-discharge and plateauing thereafter. 5,7 Few existing HF risk prediction algorithms have explored readmission and death assessed over the post-discharge phase transition out to 90 days. [12][13][14] Another limitation of existing HF-risk prediction models is the limited number of risk factors they assess. Compared to traditional statistical methodologies, machine learning (ML)-supported techniques have the advantage of being able to cull through the information-dense electronic health record (EHR) and to account for nonlinear interactions. 15,16 The EHR can be systematically mined not only for routine historical, laboratory, and medication data, but also for variables often lacking in other models, such as comprehensive echocardiographic data or socioeconomic contributors. However, the clinical interpretability of ML algorithms can vary. Combining ML techniques with more traditional statistical techniques like logistic regression (LR) has the potential advantage of greater interpretability, as well as increased performance.
The purpose of this study was to integrate multiple classes of risk factors found in the EHR into a comprehensive, ML-based risk prediction model for 90-day acute HF readmission or all-cause mortality after incident HF admission.

| Patient population
This retrospective, single-center study was approved by the University of Florida (UF) Institutional Review Board and Privacy Office as an exempt study with a waiver of informed consent. This study included 4368 consecutive patients with established outpatient primary care or cardiology care at UF Health Shands who subsequently underwent an incident HF hospitalization at UF Health Shands Hospital between January 2011 and January 2019 and survived to discharge. HF as a primary or secondary diagnosis was identified using ICD-9 or ICD-10 codes, listed in Supplemental Table 1.
In addition to restricting the cohort to those serviced by our hospital system in outpatient care, it was restricted further to those living in the UF-Shands primary service area in order to limit the potential for outside hospitalizations. In addition, 1179 patients were excluded from the study because of lack of recent echocardiographic data prior to discharge or insufficient vital information, resulting in 3189 patients in the analysis cohort.

| Outcomes
The primary outcome was defined as readmission due to acute HF or all-cause mortality within 90 days after discharge from an index HF hospitalization. Readmission was attributed to acute HF if any of the  Overall, 141 variables were considered, but 34 variables with ≥50.0% missing data and 9 variables with purity >99.0% were removed. The final design matrix had 98 variables. Among them, 89 variables had no missing data, 5 variables had <1.0% missing data, 1 variable (Troponin T) had 19.6% missing data, and 3 variables (albumin, bicarbonate, and N-terminal pro-brain natriuretic peptide [NT pro-BNP]) had between 32.7% to 47.7% missing data. Missing laboratory values were imputed using re-sampling based on readmission status. In order to model nonlinear effects, some continuous variables were transformed into categorical variables. Continuous variables were assessed using t-test. Candidate univariates with p < .10 were selected for inclusion in ML-based models for variable selection. These variables were also used to develop an LRonly model as the benchmark for performance comparison. Area under the curve (AUC) was used as the performance metrics.

| Variable selection
For ML-based variable selection, the entire cohort was split randomly into 4-folds. Then, 4-fold cross-validation was performed in which each fold (25% of the dataset) was selected consecutively as a testing dataset and the remaining folds (75%) were combined into a training dataset. Subsequently, four ML algorithms, which are typically, used in this field, [19][20][21] including (a) least absolute shrinkage and selection operator (LASSO), (b) gradient boosting machine, (c) random forest, and (d) support vector machine were applied. During this process, the parameters were optimized for each model by maximizing AUC. Using the best performing model, variables observed to be significant (p < .10) in at least 3-folds were selected for further inclusion into an LR model for interpretable risk factor analysis, which is named as a combined ML-LR model.

| Model development and performance analysis
A final combined ML-LR model with stepwise selection based on Akaike Information Criterion was developed to predict the risk of the 90-day acute HF combined endpoint. Final model performance was assessed using AUC. To classify patients, a discrimination threshold, that is, a cutoff value within the model for elevated 90-day HF risk, was assessed. Patients with a 90-day acute HF prediction value above the discrimination threshold were labeled as at high risk of readmission or death. The threshold was calculated by either (a) simultaneously optimizing the sensitivity and specificity of the model using Youden's J-statistic or (b) minimizing weighted total misspecification cost (by defining a relative cost in terms of a ratio between false negative and false positive cases).

| Risk factor analysis
To identify risk factors of 90-day acute HF readmission or death, relative risks were expressed as odds ratios (OR) with 95% confidence intervals (CI). Those with a p < .05 were considered significant. In addition, sensitivity analyses were performed to resample any variables with >20% imputation and to identify any impact on AUC characteristics.

| Patient characteristics
In the cohort, more than half of all the patients in the study were women and almost one-third were non-white. Mean age was 67.8 years with a standard deviation of 15.0. Coronary artery disease was present in 61% of the cohort. Mean left ventricular ejection fraction was 48.4% with a standard deviation of 16.6%. Among the overall cohort, 58% presented with acute HF. The mean length of stay was 7.2 days with a standard deviation of 13.0.  the variables selected by gradient-boosting models, or using the overlapped variables selected by both the LASSO and gradient-boosting models, and the performance of these combined ML-LR models were slightly inferior to the one using LASSO.

| Final risk assessment model
The final model had an AUC of 0.760 (95% CI 0.752 to 0.767). Random sampling of variables with >20% missing values did not affect the overall performance of the model (average AUC was 0.759, 95% CI 0.758 to 0.760). As shown in Figure 2, in the decision rule analysis, adjusting the threshold to maximize the Youden's J statistic yielded a sensitivity of 70%, a specificity of 71%, and an accuracy of 71%, with a positive predictive value of 45%. Applying a lower threshold to minimize the mis-specification cost resulted in a higher sensitivity of 83%, but a reduced specificity (56%), accuracy (63%) and positive predictive value (38%).

| DISCUSSION
In this study, using data derived from a standard EHR, advanced ML techniques were employed to create a comprehensive combined ML- The current model appears more discriminative with an AUC of 0.760 but its performance requires further validation. Although direct comparison was not performed, performance characteristics appear comparable to commonly accepted 30-day HF models but provides a F I G U R E 4 Identifying at-risk patients for 90-day acute heart failure readmission or all-cause death. A schematic overview of how machine learning algorithms can be used to integrate multiple characteristics into a comprehensive risk prediction model for 90-day heart failure (HF) readmission or all-cause death after discharge from an index admission. Patients identified as greater risk during the 90-day window of vulnerability may receive additional attention and clinical intervention. BUN, blood urea nitrogen; LV, left ventricular; NT pro-BNP, N-terminal pro-brain natriuretic peptide longer outlook in the vulnerable window post-discharge. For comparison, existing 30-day models report AUCs ranging from 0.55 to 0.83. 12 Among them, the Yale Readmission Risk Score, the most widely used risk prediction model for 30-day HF readmission, demonstrates modest predictive ability with an AUC in the range of 0.60 to 0.61. 18,27 The current 90-day model shares some common characteristics with various 30-day models including acute HF admission, lung disease, BP, heart rate, sodium, NT pro-BNP, and left ventricular ejection fraction. However, this model has some distinct elements. 28 Figure 4 summarizes how ML-based processes can be used to identify HF patients at elevated risk at discharge.

| Limitations
One limitation of this study is that it is a single-center retrospective study. Therefore, it has less applicability than existing 30-day models, like the Yale Readmission Score, which have been extensively validated in Medicare recipients. The current model would benefit from validation in a larger multi-center or prospective cohort. Another limitation is that the applicability of this model to patients without baseline echocardiography was not assessed. Similarly, laboratory variables had missing data. Although standard methods for imputing data were employed, variables such as biomarkers that may be ordered less often might be missed in the risk prediction process.

| CONCLUSION
The current study supports a role for the assimilating ML-based risk assessment into clinical care of HF patients after index admission to identify who is high risk. Such combined ML risk assessment is broadly applicable to care of HF patients, particularly in the era of common EHRs. The same technique of data extraction and variable selection can be applied to any institution or cohort, allowing for the model to be revalidated over time to reflect a dynamic population.
Ultimately, the goal of predictive models like this one is to identify vulnerable HF patients and to take actions in the post-discharge period that can reduce future HF risk.

ACKNOWLEDGMENTS
We acknowledge Nancy Lanni, ELS, in the review and preparation of this manuscript.

CONFLICT OF INTEREST
Dr Zhong and Dr Wokhlu received research funding support with a sub-grant to University of Florida from University of Wisconsin, funded by Baxter Healthcare. They (Xiang Zhong, Anita Wokhlu) take full responsibility for the preparation of the manuscript and the decision to publish

DATA AVAILABILITY STATEMENT
The data that support the findings of this study may be available on request from the corresponding author, but will require institutional permission and notification of the sponsor. The data are not publicly available and access may be further restricted due to privacy or ethical restrictions.