Usefulness of subclassification of adult diabetes mellitus among inpatients in Japan

Abstract Aims/Introduction We aimed to replicate a new diabetes subclassification based on objective clinical information at admission in a diabetes educational inpatient program. We also assessed the educational outcomes for each cluster. Methods We included diabetes patients who participated in the educational inpatient program during 2009–2020 and had sufficient clinical information for the cluster analysis. We applied a data‐driven clustering method proposed in a previous study and further evaluated the clinical characteristics of each cluster. We investigated the association between the clusters and changes in hemoglobin A1c level from the start of the education program. We also assessed the risk of re‐admission for the educational program. Results We divided a total of 651 patients into five clusters. Their clinical characteristics followed the same pattern as in previous studies. The intercluster ranking of the cluster center coordinates showed strong correlation coefficients with those of the previous studies (mean ρ = 0.88). Patients classified as severe insulin‐resistant diabetes (cluster 3) showed a more pronounced progression of renal dysfunction than patients classified as the other clusters. The patients classified as severe insulin‐deficient diabetes (cluster 2) had the highest rate of reduction in hemoglobin A1c level from the start of the program (P < 0.01) and a tendency toward a lower risk of re‐admission for the education program (hazard ratio 0.47, P = 0.09). Conclusion We successfully replicated the diabetes subclassification using objective clinical information at admission for the education program. In addition, we showed that severe insulin‐deficient diabetes patients tended to have better educational outcomes than patients classified as the other clusters.


INTRODUCTION
Diabetes, one of the most challenging global health problems, is a leading cause of microvascular and macrovascular diseases 1 . Diabetes is classified into four major categories according to the conventional classification 2 , of which type 2 diabetes accounts for the majority, nearly 90% 3 . However, as diabetes is a highly heterogeneous disease involving genetic and environmental factors 4 , there is a wide variation in treatment response and complication progression among patients with type 2 diabetes 5 . Therefore, a new and more sophisticated classification system for diabetes is required to achieve precision medicine for diabetes.
Recently, Ahlqvist et al. 6 developed a novel subclassification method to classify diabetes patients into five clusters. They used a data-driven clustering method with six variables (i.e., age at diagnosis, body mass index [BMI], hemoglobin A1c [HbA1c], homeostatic model assessment-2 b-cell function [HOMA2-B], homeostatic model assessment-2 insulin resistance [HOMA2-IR] and glutamic acid decarboxylase antibodies [GADA]). In the subclassification, patients with positive GADA were assigned to severe autoimmune diabetes (SAID), whereas the other GADA-negative patients were classified into four categories: (i) severe insulin-deficient diabetes (SIDD); (ii) severe insulin-resistant diabetes (SIRD); (iii) mild obesity-related diabetes (MOD); and (iv) mild age-related diabetes (MARD). Furthermore, this subclassification was replicated in several studies with large prospective cohorts [7][8][9][10] . However, there are several limitations to this clustering method. First, the applicability of clustering methods in the absence of data at the onset of diabetes has not yet been adequately explored. In particular, in the real world, it is often the case that patients do not undergo a comprehensive examination for the assessment of diabetes at their first visit. Second, previous studies have mainly applied subclassification to outpatients, with no previous studies focusing on inpatients. Finally, the relevance of this subclassification to clinical outcomes, especially educational outcomes, which are key to diabetes care, is still unclear 7,8 .
Diabetes specialty facilities in Japan offer an educational inpatient program for patients who require intensive intervention to enhance their self-reliance in diabetes care. The program aims to improve glycemic control and long-term prognosis based on a thorough understanding of the individual's level of self-reliance in diabetes care, the progression of complications and pathophysiological characteristics. However, key factors that predict the effectiveness of educational programs have not yet been determined 11 .
The present study aimed to evaluate the usefulness of the clustering method using objective clinical information obtained through the educational program. We also evaluated the relationship between the clusters and educational outcomes.

Design and setting
We carried out a single-center, observational study of patients who visited the Center for Diabetes, Endocrinology and Metabolism at Shizuoka Prefectural Shizuoka General Hospital and participated in the educational admission program. The attending physician specializing in diabetes determines the program's participation for patients who require an educational intervention (e.g., when treatment goals are not met at regular visits, when complications arise, or when life or care transitions occur). The program requires hospitalization for a total of 2 weeks. The participants receive guidance from specialist healthcare professionals (doctors, nurses, dieticians, occupational therapists and dental hygienists) on diet, exercise, insulin injections and self-monitoring of blood glucose levels. They are also assessed for the progression of diabetes-related complications and reviewed on their treatments.

Study participants
The study included 1,180 consecutive patients who underwent the educational admission program at Shizuoka General Hospital between January 2009 and December 2020. Of these, we included 714 patients with type 1 or type 2 diabetes defined by discharge diagnosis codes (International Classification of Disease 10the revision: E10 and E11) with complete data available at baseline for the six model variables of the cluster analysis (HbA1c, age, BMI, HOMA2-B, HOMA2-IR and GADA). The other types of diabetes, such as gestational diabetes, maturityonset diabetes of the young, pancreatic diabetes and steroidinduced hyperglycemia, were excluded 8 . In addition, to avoid misclassification of patients with steroid-induced hyperglycemia and pancreatic diabetes, we further excluded 43 patients with a history of oral and injectable steroid administration within 3 months before the admission and/or with a pancreatic cancer diagnosis code (C25). Furthermore, 20 patients were excluded from the current analysis, because the mean value of any model variables was more than five standard deviations (separately calculated for men and women).

Clinical information
We collected the baseline characteristics from electronic medical records at initial participation in the program. If data were missing during the hospitalization, we collected them over a period extending to 1 month before the admission and 1 month after the discharge. In addition, fasting blood glucose (FBG) and C-peptide immunoreactivity (CPR) were limited to those measured in the morning after overnight fasting under hospitalization). We collected primary clinical information and biochemical data, including variables for clustering, as follows: age at the admission, sex, BMI, biochemical information, drug information, HbA1c, FBG, CPR, HOMA2-B, HOMA2-IR, aspartate aminotransferase, alanine aminotransferase, cglutamyl transpeptidase, estimated glomerular filtration rate (eGFR), creatinine, high-density lipoproteincholesterol, low-density lipoprotein-cholesterol, triglycerides, 24-h urine microalbuminuria and GADA. In the absence of the data of the 24-h urine microalbuminuria, we estimated urinary albumin excretion for 1 day using the albumin/creatinine ratio. GADA was defined to be positive if there was a record of exceeding the cut-off value (see below) at any point in the history of our visit. In addition, we collected the HbA1c levels, measured regularly during the outpatient visit. The electronic medical records of Shizuoka General Hospital (Shizuoka, Japan) were a generally available electronic medical record system in Japan (NEWTON until December 2015, Software Service, Osaka, and HOPE/EGMAIN-GX since January 2016, Fujitsu, Tokyo).

Measurements
CPR was measured by electrochemiluminescence immunoassay (Roche Diagnostics, Mannheim, Germany). GADA was measured using a commercially available radioimmunoassay kit (Cosmic Ltd, Tokyo, Japan) until 17 December 2015 and using a commercially available enzyme-linked immunosorbent assay kit (RSR Ltd., Cardiff, UK) from 18 December 2015. The cut-off value for GADA measured using a radioimmunoassay kit and GADA measured using an enzyme-linked immunosorbent assay kit was 0.5 and 5.0 U/mL 12 , respectively. HOMA2-B and HOMA2-IR were estimated from FBG and CPR using the Homeostasis Model Assessment calculator (University of Oxford, Oxford, UK) 13 . HbA1c level was measured using the enzymatic assay kit (Arkray Ltd, Tokyo, Japan). The eGFR (mL/min/1.73 m 2 ) was calculated as 194 9 serum creatinine -1.094 9 age -0.287 9 0.739 (if female) 14 . k-means clustering Following the previous study 6 , we assigned GADA-positive patients to the SAID cluster, whereas the other GADA-negative patients were subjected to k-means clustering using the remaining five variables (e.g., age at admission, BMI, HbA1c, HOMA2-B and HOMA2-IR). The k-means clustering was carried out separately for men and women using standardized values of the five variables with a mean of 0 and standard deviation of 1 6 . The stats package (k = 4, nstart = 25, iter.max = 100) of R version 3.6.3 (The R Foundation for Statistical Computing, Vienna, Austria) was used 15 . We assigned four names (SIDD, SIRD, MOD and MARD) to the four clusters based on their clinical characteristics. To verify the consistency of the clusters with the same names between the results of the present study and those of previous large cohort studies 6,7 , we assessed the intercluster ranking of cluster center coordinate and compared the rankings using Spearman's correlation coefficient.

Evaluation of educational outcomes
We evaluated alterations of the HbA1c levels after the admission in each cluster using a generalized additive model (n = 651) to assess the outcome of the educational program.
For those with follow-up data available for >180 days (n = 475), we evaluated the association between HbA1c reduction and clusters using multivariable linear regression models, adjusting for age, BMI, eGFR, hemoglobin, HOMA2-IR and HOMA2-B. We defined the HbA1c reduction as the difference between the baseline HbA1c level and the HbA1c level at the first outpatient visit >180 days after the educational hospitalization.
We used the Cox regression model to evaluate the risk of readmission for the same educational program among clusters, adjusting for age, sex, HbA1c and eGFR. We excluded SAID patients, mainly including type 1 diabetes, from the comparative analysis. We defined the follow-up period as the period from the date of initial admission to the last date of regular visit. Interruption of regular visits for more than a year was regarded as the termination of visiting the hospital.

Statistical analysis
Intercluster comparisons of the baseline characteristics were carried out using the Mann-Whitney U-test for continuous variables, and the v 2 -test for categorical variables. Two-tailed probability values of <0.05 were considered statistically significant. All statistical analyses were carried out using R version 3.6.3 (The R Foundation for Statistical Computing) 15 .

Baseline characteristics of the participants
The present study included a total of 651 patients ( Table 1). The proportion of women was 36%, and the median age at admission was 63 years (interquartile range 51-72 years). The conventional classification showed 32 of 651 (4.9%) for type 1 diabetes, 546 of 651 (83.9%) for type 2 diabetes and 73 of 651 (11.2%) for latent autoimmune diabetes in adults (Table S1), reflecting the enrichment of challenging cases with poor glycemic control, especially GADA-positive cases, referred to the diabetes center.
Subclassification of participants using data-driven clustering We carried out the data-driven clustering to classify the patients into the five clusters using their clinical information at admission ( Figure 1). Then, we assigned the clustering labels to the corresponding clusters based on the clinical characteristics reported in the previous study 6 . The clustering results were very similar between men and women ( Figure S1A and B). Overall, we found that the distribution of the participants into clusters was comparable to the distribution of each cluster in the previous studies 6-9 (Figure 1; Figure S2). Meanwhile, there was a slightly high proportion of SAID patients in the current study in concordance with the enrichment of GADA-positive cases discussed above (Figure 1). Furthermore, we found that each cluster had quite similar patterns of clinical features to the previous study 6 . The SAID patients (cluster 1), characterized by positive GADA, showed slightly younger age and lower HOMA2-B. The SIDD patients (cluster 2), characterized by severely impaired insulin secretion, showed higher HbA1c and lower HOMA2-B. The SIRD patients (cluster 3), characterized by severe insulin resistance, showed higher BMI and higher HOMA2-IR, and a significantly higher rate of renal dysfunction (Table S2). The MOD patients (cluster 4), characterized by severe obesity with mild insulin resistance, showed younger age, higher BMI and higher HOMA2-IR. The MARD patients (cluster 5) were characterized by older age and mild glucose intolerance.
We further used center coordinates to show the consistency of the clustering results with the previous results of center coordinates 6,7 (Table S3). The Spearman's correlation coefficients of the cluster center coordinates between the present study and the previous studies 6,7 were almost >0.8 (mean q = 0.88;  Table S4). Also, the relative ranking relationships in the present study were shown to be quite similar to those of the previous studies ( Figure S3). These results showed that the clustering method successfully classified the diabetes patients in the current study into five clusters with similar patterns of clinical characteristics, even though we mainly utilized clinical information at the time of educational admission instead of at the onset of diabetes.

Evaluation of the educational impact in each cluster
We then assessed the effectiveness of the education to find novel clinical implications of this subclassification. To evaluate the educational outcomes among the clusters, we assessed the longitudinal changes of the HbA1c level in each cluster (n = 651, overall median follow up: 2.0 years [interquartile range 0.4-4.2 years]). The results showed a common trend among all clusters: the HbA1c level began to decline immediately after the educational program and reached a plateau after approximately 3-6 months ( Figure 2). These results show substantial and long-lasting benefits of the educational program across all of the clusters. Meanwhile, the SIDD patients (cluster 2), with the highest HbA1c levels at baseline of all the clusters, showed the most significant decline in the HbA1c level (b = -3.27; 95% confidence interval -4.05 to -2.49; P < 1.0 9 10 -10 ; Figure 2; Table S5). To further investigate the possible advantage of the SIDD patients, the Cox regression models were used to compare the risk of re-admission between the SIDD patients and patients in other clusters (SIRD, MOD and MARD) after adjusting for age, sex, HbA1c and eGFR. As a result, we observed a trend toward a lower risk of readmission (the repetitive requirement for the educational program) in the SIDD patients (hazard ratio 0.32, 95% confidence interval 0.09-1.15, P = 0.08; Figure 3; Figure S4; Table S6). These results suggest that the educational effect for the SIDD patients tends to be higher than for patients in other clusters, and that this subclassification might help select patients who should receive intensive educational interventions.

DISCUSSION
We have retrospectively replicated the diabetes subclassification method proposed by Ahlqvist et al. 6 The present study had three novel aspects. First, while other previous studies mainly used data at onset, this study used data irrespective of onset, taking into account real-world clinical practice where data at onset is often insufficient. The results of the present study are a significant step forward in introducing this subclass classification into clinical practice. Especially in Japan, patients often do not have sufficient information. Clinicians would like to apply it to hospitalized patients for whom sufficient information is not available at the first visit. Second, this is the first time the subclassification of patients requiring educational admission with poor diabetic control has been applied, which is particularly important in Japan, where many patients are required to be admitted for diabetes education. Finally, although the significance of the clinical results is not yet clear, it is a notable novelty that the study showed that SIDD patients in clusters were most likely to benefit from the education program. We showed that the clustering method is robust using the values of the cluster center coordinates, as well as their patterns of distribution and clinical characteristics. These results showed the adaptability of the subclassification across races, which is consistent with the results of the previous studies including Asian populations [6][7][8][9][10] . Furthermore, the SIRD patients had a significantly higher rate of renal dysfunction at baseline. Zaharia et al. 8 found that patients in the SIRD patients group had lower eGFR and higher cystatin-C levels at both baselines and after 5 years, even with better glycemic and lipid control. Given the strong association between insulin resistance and impaired renal function 16 , it seems to be plausible that SIRD patients are prone to diabetic nephropathy. Furthermore, in the present study, the absolute values of BMI, HOMA2-B and HOMA2-IR were lower than in European studies [6][7][8] . These results are consistent with previous reports involving Asian participants 9,10 and might reflect racial differences in the pathogenesis of diabetes.
There is also some debate regarding the clinical parameters used for subclassification. In the past, some reports reported the possible use of C-peptide or high-density lipoprotein cholesterol instead of HOMA2-B or HOMA2-IR 17 , and another study concluded that simple clinical features were more helpful in predicting the risk of diabetic complications 18 . Hence, further research is required to determine what clinical parameters should be used for a more practical subclassification. Follow-up days Figure 3 | Differences of re-admission risk for the educational program among the clusters. The Kaplan-Meier curve was used to show the probability of no second admission for the same program from the end of the first educational program. We compared cluster 2 with the other clusters (both mainly including type 2 diabetes). We adjusted for age at the first admission, sex, hemoglobin A1c and estimated glomerular filtration rate using the Cox regression model (P = 0.08). As the number of censored cases increases after 2 years, the figure is presented for 2 years (shown for the entire period in Figure S4). MARD, mild age-related diabetes; MOD, mild obesity-related diabetes; SIDD, severe insulindeficient diabetes; SIRD, severe insulin-resistant diabetes.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.    Table S1 | Characteristics of the participants stratified by the conventional classification.