A risk prediction model for heart failure hospitalization in type 2 diabetes mellitus

Abstract Background Antidiabetic therapies have shown disparate effects on hospitalization for heart failure (HHF) in clinical trials. This study developed a prediction model for HHF in type 2 diabetes mellitus (T2DM) using real world data to identify patients at high risk for HHF. Hypothesis Type 2 diabetics at high risk for HHF can be identified using information generated during usual clinical care. Methods This electronic medical record‐ (EMR‐) based retrospective cohort study included patients with T2DM free of HF receiving healthcare through a single, large integrated healthcare system. The primary endpoint was HHF, defined as a hospital admission with HF as the primary diagnosis. Cox regression identified the strongest predictors of HHF from 80 candidate predictors derived from EMRs. High risk patients were defined according to the 90th percentile of estimated risk. Results Among 54,452 T2DM patients followed on average 6.6 years, estimated HHF rates at 1, 3, and 5 years were 0.3%, 1.1%, and 2.0%. The final 9‐variable model included: age, coronary artery disease, blood urea nitrogen, atrial fibrillation, hemoglobin A1c, blood albumin, systolic blood pressure, chronic kidney disease, and smoking history (c = 0.782). High risk patients identified by the model had a >5% probability of HHF within 5 years. Conclusions The proposed model for HHF among T2DM demonstrated strong predictive capacity and may help guide therapeutic decisions.


| INTRODUCTION
Type 2 diabetes mellitus (T2DM) affects nearly 10% of the United States adult population, and the morbidity and mortality associated with T2DM are often attributable to cardiovascular (CV) disorders. [1][2][3] T2DM is a strong predictor of new-onset heart failure (HF) and HF-related sequelae including hospitalization and death. 4,5 Beyond the well-established association between T2DM and common HF antecedents such as atherosclerosis and myocardial infarction (MI), the metabolic aberration defining T2DM (elevated glucose) has been shown to have a direct and unique untoward effect on myocardial structure and function, with the term diabetic cardiomyopathy coined to describe the induced phenotype. [5][6][7] Furthermore, in experimental settings, tight glucose control has been shown to improve both ICD9/10 codes at one or more inpatient encounters; (3) when an oral antidiabetic drug (except metformin) was ordered or listed on a medication reconciliation; or (4) when metformin was ordered or listed on a medication reconciliation in the absence of a diagnostic code for prediabetes or polycystic ovary syndrome. Among patients meeting diagnostic criteria, an index date was defined as the date of the first office visit where T2DM diagnostic criteria were met at least two years following the first EMR-documented encounter. Patients meeting the diagnostic criteria within two years of the first EMR-documented encounter were considered to have pre-existing T2DM at the index date, while those first meeting diagnostic criteria more than 2 years after the first EMRdocumented encounter were considered new diagnoses. Type 2 diabetics with documentation of HF prior to the index date were excluded.
Follow-up for the study outcome (HF hospitalization) continued through December 31, 2016. The study institution's IRB granted a waiver of patient consent due to the retrospective nature of the study.
A collection of 80 candidate predictors drawn from multiple EMR domains was considered for inclusion in the HHF prediction model ( Table 1, excluding medications). Candidate predictors were determined through EMR documentation on or prior to the index date. Historical diagnoses, procedures performed, and CV-related symptoms were determined through ICD9/10 and Current Procedural Terminology (CPT) codes documented at outpatient or inpatient encounters within the specified time window. For vital signs and laboratory tests, the value recorded in closest proximity to the index date was included. Laboratory tests considered in this study were hemoglobin A1c, basic metabolic panels, complete blood counts, liver function tests, and lipid profiles.
The study outcome was a new HHF, defined as an EMRdocumented, postindex date hospital admission with HF as the primary diagnosis in the absence of any prior documented HF diagnosis which constituted an exclusion criterion. A time-to-HHF variable was defined as the number of days from the index date until the first HHF or the last EMR-documented encounter, with the latter defining censored observations. Cumulative incidence rates for HHF were estimated by the Kaplan-Meier (KM) method for all study patients and repeated after stratifying by age and history of an HF diagnosis at the index date.
As a first step in the prediction model development process, T2DM patients meeting study inclusion criteria were randomly divided 1:1 into two independent data sets. Significant predictors of HHF were determined independently within each set using a similar variable selection process with the final proposed model combining the results of both sub-models. In each set, a forward stepwise variable selection procedure was employed to identify the strongest independent predictors of time-to-HHF using Cox proportional hazards regression. A stringent P-value threshold for variable inclusion/exclusion of .0001 was applied due to the large cohort and number of events. All continuous candidate predictors were categorized a priori into clinically relevant groups to facilitate model interpretation and development of an integer-based risk score for HHF as described below. As missing data for vital signs and laboratory tests tend to be not missing at random (missing implies healthier), 17 a conservative approach to missing data imputation was employed which involved taking a single random draw from the empirical distribution of nonmissing values for each continuous predictor. Drug therapies for CV disease or diabetes were not considered as candidate predictors as their effects may reflect confounding by indication.
Within each data subset, the variable selection process produced models with independent predictors each significant at a P-value of .0001. For each model, the regression coefficients and 95% confidence intervals are reported, and the predictive strength of independent predictors ranked according to Wald chi-square statistic magnitude from the multivariable Cox regression model. The discriminatory capacity of each model was quantified by the c-statistic as appropriate for censored data. 18 A final model was fit after restricting predictors to those meeting significance criteria in both models. An integer-based risk score was created based on the regression coefficients from the final model.
Integer scores were created for each variable in the final model by dividing each variable's regression coefficient by 0.243-the coefficient for the weakest predictor in the final model-and then rounding the quotient to the nearest integer. An integer risk score for HHF was created for each study patient by summing risk score components.
T A B L E 1 Characteristics of type 2 diabetes mellitus patients at index date by occurrence of heart failure hospitalization during follow-up Type 2 diabetics with a prior HF diagnosis had a greater cumulative incidence of HHF than those without (Figure 1). Among type 2 diabetics without a prior HF diagnosis, a postindex date HHF occurred in 1884 (3.5%) study patients over 360,258 cumulative years of follow-up (5.2 HHF per 1000 person-years). Mean (SD) follow-up among event-free patients was 6.6 (4.3) years, and maximum follow-up was 13.7 years. KM-based cumulative estimated event rates at 6 months, 1, 3, and 5 years after the index date were 0.1%, 0.3%, 1.1%, and 2.0%, respectively (Figure 1). Event rates varied greatly across age groups ( Figure S1).
The study cohort was randomly split into two equal subsets.  (Table S1). In the second set, 15 predictors were independently associated with HHF with a c-statistic of 0.804 (Table S1). Nine predictors were significant in both models, and after recombination of the data subsets the final prediction model with predictors in rank order of predictive strength included: (1) age, (2) coronary artery disease, (3) blood urea nitrogen, (4) atrial fibrillation, (5) hemoglobin A1c, (6) blood albumin, (7) systolic blood pressure, (8) chronic kidney disease, and (9) smoking history ( Table 2). The final model had a c-statistic of 0.782.
Assignment of risk score points is shown in Table 2. The median (IQR) risk score for the final model was 9 (6,12), and the maximum observed score was 29 (out of a possible maximum of 32) ( Table 2).
The number of risk points defining low, mid-low, mid-high, and high-risk groups was ≤8, 9-11, 12-14, and ≥15, respectively (Supplemental Figure S2). The percent of patients at or above the respective risk scores is shown in Table 3. The observed 5-year risk of HHF was above 5% for all risk scores 15 and above (high risk), and the estimated 1-year risk was >0.5% (Table 3). Estimated event rates were widely divergent across risk strata (Figure 2).

| DISCUSSION
The with the sodium glucose cotransporter-2 inhibitor class of diabetic drugs. 10,16 Notably, these beneficial effects were observed among patients both with and without established HF at randomization. 10,16 These findings contrast with previous trials showing adverse or neutral effects of other diabetic drug classes on HF outcomes. 9 Hemoglobin A1c has been found to be associated with HF in previous work. 28,29 The practical intent of risk prediction models is to identify high risk patients such that cost-efficient provision of advanced management strategies (eg, a novel, efficacious, yet expensive pharmaceutical) can be directed toward those patients most likely to experience untoward events, thus minimizing the number needed to treat for benefit. 30 Ultimately, a prediction model's quantitative output implicitly proposes an action (or not) by separating "high (enough) risk" (take action) from "not high (enough) risk" (do not take action) patients. The current study is the first to our knowledge to propose such an objective "high risk" threshold for HHF among type 2 diabetics: a >5% probability of HHF within the next five years-reflecting the 10% highest risk patients according to model predictors. Any take action threshold is preferably based on absolute risk estimates, thus stressing the importance of evaluating the calibration of the proposed model in new settings. Understanding that variation in what is judged "high risk" will exist, observed event rates for various risk scores (and percentiles) are provided in Table 3. Importantly, the proposed prediction model was developed using data generated from an EMR system which facilitates transporting the proposed model to other healthcare systems with a comparable data source and structure.
Some limitations of this study should be noted. The use of EMR data precludes consideration of certain candidate predictors either not generated during usual clinical care or not routinely available in structured form in our EMR (eg, duration of diabetes). Though the applied operational definitions for study variables were designed to minimize misclassification, measurement error, and missing data, these shortcomings are impossible to resolve completely, and the net effect is likely attenuation of hazard ratios. External validation of the proposed model should precede its application to ensure the model's discrimination and calibration are satisfactory in other settings.
External validation is also vital due to the largely white patient population and limited geographical reach of the study institution.
In conclusion, the proposed 9-predictor model for estimating HHF risk in T2DM showed strong predictive capacity. The proposed highrisk threshold may serve as an action point for selection of antidiabetic therapeutics-a salient issue considering the opposing effects of different antidiabetic drug classes on HF outcomes.