Expected and observed in‐hospital mortality in heart failure patients before and during the COVID‐19 pandemic: Introduction of the machine learning‐based standardized mortality ratio at Helios hospitals

Abstract Background Reduced hospital admission rates for heart failure (HF) and evidence of increased in‐hospital mortality were reported during the COVID‐19 pandemic. The aim of this study was to apply a machine learning (ML)‐based mortality prediction model to examine whether the latter is attributable to differing case mixes and exceeds expected mortality rates. Methods and Results Inpatient cases with a primary discharge diagnosis of HF non‐electively admitted to 86 German Helios hospitals between 01/01/2016 and 08/31/2020 were identified. Patients with proven or suspected SARS‐CoV‐2 infection were excluded. ML‐based models were developed, tuned, and tested using cases of 2016–2018 (n = 64,440; randomly split 75%/25%). Extreme gradient boosting showed the best model performance indicated by a receiver operating characteristic area under the curve of 0.882 (95% confidence interval [CI]: 0.872–0.893). The model was applied on data sets of 2019 and 2020 (n = 28,556 cases) and the hospital standardized mortality ratio (HSMR) was computed as the observed to expected death ratio. Observed mortality rates were 5.84% (2019) and 6.21% (2020), HSMRs based on an individual case‐based mortality probability were 100.0 (95% CI: 93.3–107.2; p = 1.000) for 2019 and 99.3 (95% CI: 92.5–106.4; p = .850) for 2020. Within subgroups of age or hospital volume, there were no significant differences between observed and expected deaths. When stratified for pandemic phases, no excess death during the COVID‐19 pandemic was observed. Conclusion Applying an ML algorithm to calculate expected inpatient mortality based on administrative data, there was no excess death above expected event rates in HF patients during the COVID‐19 pandemic.

Conclusion: Applying an ML algorithm to calculate expected inpatient mortality based on administrative data, there was no excess death above expected event rates in HF patients during the COVID-19 pandemic.

K E Y W O R D S
administrative data, COVID-19 pandemic, heart failure, machine learning, mortality prediction

| INTRODUCTION
During the early phase of the ongoing COVID-19 pandemic, numbers of heart failure (HF)-related hospital admissions were significantly decreased. [1][2][3][4][5][6] This was accompanied by an increase in case severity with regard to New York Heart Association (NYHA) class and higher in-hospital mortality rates. 5,[7][8][9] It is unclear, whether the inferior outcome had to be attributed only to differing patient profiles or to additional factors like changes in HF patient care during the pandemic. Risk-adjusted mortality prediction would allow standardized modeling with regard to time intervals and regional differences of endpoints like in-hospital mortality. We previously introduced different machine learning (ML)-based algorithms for the calculation of expected mortality rates on a populational level in a large German HF cohort that only implemented widely accessible administrative data. 10   Comorbidities were identified from encoded secondary diagnoses at hospital discharge according to the Elixhauser comorbidity score. 11,12 All patients with an encoded SARS-CoV-2 infection (U07.1, U07.2!) were excluded. Cases with missing information for NYHA classes (n = 7280) were discarded due to an adequate calibration of ML models. Detailed information regarding used ICD-codes is provided in the Supporting Information Material (Tables S1 and S2). We computed the number of laboratory-proven SARS-CoV-2 infections per 100,000 inhabitants within a federal state using data from the Robert-Koch-Institute and the Federal Bureau of Statistics (Germany) with tertiles defining areas with low (<152), intermediate (152-297), and high COVID-19 case volume (>297). 13 Hospitals were categorized with respect to the number of yearly HF Patients' data were stored in a pseudonymized form and data use was approved by the local ethics committee (AZ490/20-ek) and the Helios Kliniken GmbH data protection authority. Considering the retrospective analysis of double-pseudonymized administrative clinical routine data, individual informed consent was not obtained.

| Model development/testing
All analyzes were performed within the R environment for statistical computing (version 3.6.1, 64-bit build). 14 Data from 2016 to 2018 (n = 59,125 cases from 69 German Helios hospitals, 69.8% aged ≥75 years, 51.9% female) was split into 75%/25% portions used for model development and testing, stratified for in-hospital mortality.
Baseline characteristics were well balanced between the data set parts as can be seen in Table S3. Random forest, gradient boosting machine, single-layer neural network, and extreme gradient boosting were the machine ML-based models being investigated and compared to logistic regression. Variable selection and scaling as well as model tuning were performed as described previously. 10 Final model adaptations including a recalibration of approximated probabilities using a generalized additive model and a reclassification of thresholds based on receiver operating characteristic (ROC) curves and F1 statistics have been carried out according to our previous work. 10 Using the probabilities predicted within the test data and the optimal threshold, the predictive abilities of the algorithm were assessed by the ROC AUC, the precision-recall curve, the area under the precision-recall curve (AUPRC), calibration-in-the-large (overall expected and observed mortality rate), weak calibration (intercept and slope of the calibration curve), calibration plots, F1 statistic and confusion matrices. 15 (Table S4).

| Calculation of expected mortality rates
The model was used to calculate the expected number of deaths in 2019 and 2020 (admissions limited to August 31st) as the sum of individual in-hospital mortality probabilities. The hospital standardized mortality ratio (HSMR) was computed as the ratio between observed and predicted deaths. Its 95% CI was calculated using Byar's approximation. HSMRs within years were compared using the Spearman rank correlation.

| RESULTS
In this retrospective cross-sectional analysis, 26,591 patient cases from 2019 to 2020 were analyzed and in-hospital mortality was predicted using the extreme gradient boosting machine model to compare differences between predicted and observed mortality rates HSMRs within subgroups are presented in detail in Table 2 and illustrated in Figure 1.

| DISCUSSION
In this retrospective cross-sectional analysis, we applied our previously introduced ML algorithm (gradient boosting machine) for the calculation of expected in-hospital mortality rates on a populational level in a nationwide, multicenter cohort of HF patients containing administrative data before and throughout the COVID-19 pandemic. Our model had high-performance indices and was well-calibrated. When comparing model-derived expected with observed mortality for 2019 and 2020, we found regional differences of HSMR values but overall high accordance between calculated and true in-hospital mortality rates. The relative increase of in-hospital mortality in HF patient cohorts that has been previously observed during the COVID-19 pandemic did not exceed the expected variation of death rates that were calculated by our model based on patients' baseline characteristics. 5 There were no significant differences in overall HSMRs of 2019 and 2020. Consequently, the higher in-hospital mortality rate in 2020 is likely attributable to the differing case mix with older patients suffering from a different composition of comorbidities. Increased in-hospital mortality rates in 2020 compared to previous years were also reported by other groups, which were also related to differing baseline characteristics including a higher mean age and the presence of more comorbidities. 8,16 In contrast, other groups reported no or at least no significant differences in mortality rates in the same period. 1 Interestingly, observed mortality rates were even lower in areas with high COVID-19 case numbers. There is no obvious explanation for this observation. Since a similar trend also was apparent in 2019, a fixed regional effect caused by unknown structural differences is likely to influence the results. For example, population density has been shown to impact in-hospital outcomes in HF patients. 17 An uneven distribution of cases discharged as hospital transfers between different areas could also contribute to this finding, as those cases were excluded from our analysis. This was done to avoid a biased in-hospital death rate because no crosslinking of patient cases between hospitals was possible due to data structure and data privacy. Moreover, an early discharge to prevent nosocomial infection and to keep capacities ready especially in areas being highly affected by incident SARS-CoV-2 infections could lead to the transfer of patients to the outpatient sector.
Previous findings of a shortened length of stay during the pandemic are pointing in this direction. 2,5 Other studies reported higher rates of out-of-hospital cardiac arrests with an increased case-fatality rate during the pandemic and overall excess mortality during the first half of 2020 in Germany compared to previous years. [18][19][20][21] Whether this is indicative of an actual shift of cardiovascular deaths from the hospital to the outpatient setting needs to be further studied.
The overall high concordance of expected and observed inhospital mortality rates within different age groups indicates a high reliability of the investigated model. There were only two comparable prediction models focusing on administrative data only, which reported lower AUC values (0.72-0.78). 22,23 A direct juxtaposition with our model is, however, hindered due to a different set of included variables. Our data set does not contain information on ethnicity, insurance data, used medication, and other variables being implemented into the mentioned models.
Possible explanations for this better predictive performance, besides the algorithm itself, might be a higher event rate in our cohort affecting the model quality during the developmental stage.
Furthermore, the cohort size of our training data set was significantly larger at least when compared to the study of Desai and colleagues. 22 Both studies also examined whether the addition of data from electronic medical records would lead to an improvement of the predictive power and presented ambiguous results.
Whereas one report propagated similar model discrimination when only using claims data, the other one showed better performance metrices when augmenting the administrative data set by laboratory results and imaging data. 22,23 Contrary, a previous study by Lagu et al. reported even better performance of administrative data-based prediction models when compared to clinical prediction tools for HF patients. 24 Other algorithms designed to forecast short-or long-term mortality in acute as well as chronic HF showed a similar or lower discriminatory power even when including more sophisticated and disease-specific variables. 25  pandemic. This includes both the standardized comparison of HF-related mortality in quality management programs with regard to temporal and regional differences of medical treatments as well as an interesting solution for hospital benchmarking in general.

| Limitations
Data used for model development has been retrospectively collected with known limitations compared to a prospective data assessment.
However, it has been stated that data collection mode per se did not influence the discriminatory power of the derived prediction model. 31 Differences with respect to baseline variables between the patient cohort used for model development and the cohorts the model was applied to may influence the predictive accuracy. However, absolute differences of variable prevalence were acceptable and are unlikely to impact model-derived predictions relevantly. As administrative data is not stored for research interests but for remuneration reasons, a potential affection of the encoded information is possible. The quality of the results depends to a large extent on the correct encoding of hospital discharge diagnoses. 12 However, regarding the main discharge diagnosis and the adequacy of hospitalization as well as encoding, there is a continuous evaluation by reimbursement companies/health insurances which supports the assumption of overall valid information. NYHA class assignment is influenced by the subjective assessment of the treating physician, but a potential bias would influence all investigated groups and is likely to be attenuated by the large cohort size. Supporting this information with more objective variables would be desirable, but neither data regarding patients' specific medical history, cardiac imaging, laboratory results, F I G U R E 1 Hospital standardized mortality ratios within several subgroups in 2019 and 2020 medication nor treatment-related data was available due to the type and structure of the analyzed database.

| CONCLUSION
Using an ML algorithm processing widely available administrative data, we developed a reliable model to calculate expected in-hospital mortality rates on a population level in a cohort of inpatients urgently admitted for HF. Applying the model on HF patients' data during the COVID-19 pandemic, no significant increase in observed mortality above predicted event rates was found with respect to pandemic phases.

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.