Machine learning approach to predicting albuminuria in persons with type 2 diabetes: An analysis of the LOOK AHEAD Cohort

Abstract Albuminuria and estimated glomerular filtration rate (e‐GFR) are early markers of renal disease and cardiovascular outcomes in persons with diabetes. Although body composition has been shown to predict systolic blood pressure, its application in predicting albuminuria is unknown. In this study, we have used machine learning methods to assess the risk of albuminuria in persons with diabetes using body composition and other determinants of metabolic health. This study is a comparative analysis of the different methods to predict albuminuria in persons with diabetes mellitus who are older than 40 years of age, using the LOOK AHEAD study cohort‐baseline characteristics. Age, different metrics of body composition, duration of diabetes, hemoglobin A1c, serum creatinine, serum triglycerides, serum cholesterol, serum HDL, serum LDL, maximum exercise capacity, systolic blood pressure, diastolic blood pressure, and the ankle‐brachial index are used as predictors of albuminuria. We used Area under the curve (AUC) as a metric to compare the classification results of different algorithms, and we show that AUC for the different models are as follows: Random forest classifier‐0.65, gradient boost classifier‐0.61, logistic regression‐0.66, support vector classifier ‐0.61, multilayer perceptron ‐0.67, and stacking classifier‐0.62. We used the Random forest model to show that the duration of diabetes, A1C, serum triglycerides, SBP, Maximum exercise Capacity, serum creatinine, subtotal lean mass, DBP, and subtotal fat mass are important features for the classification of albuminuria. In summary, when applied to metabolic imaging (using DXA), machine learning techniques offer unique insights into the risk factors that determine the development of albuminuria in diabetes.


INTRODUCTION
The estimated increase in diabetes prevalence is expected to post enormous burden on the health care resources affecting more than 400 million people between that age of 20-79 by the third decade of this century. 1 Furthermore, among the different complications of diabetes, diabetes-related chronic kidney disease (CKD) is of concern due to its gradual and indolent progression over several years, often culminating in renal replacement therapy.
Albuminuria and estimated glomerular filtration rate are early markers of future renal disease if employed promptly and to specific populations. 2 In the past, diabetic nephropathy has been classified into microalbuminuria and macroalbuminuria. 3 Additionally, poor glycemic blood control, increased blood pressure levels, and genetic factors have been identified as risks for diabetic nephropathy. 3 Moreover, proteinuria, the cornerstone of diabetic nephropathy, can accelerate kidney disease progression to end-stage renal disease (ESRD) through multiple pathways. 4 Studies have also evaluated albuminuria in the context of worsening cardiac outcomes and have found it helpful independently and in combination with serum creatinine and e-GFR. 5,6 Dual-energy X-ray absorptiometry (DXA) is an accurate and easy technique to quantify adipose tissue, muscle mass, and bone density in different compartments of the human body. 7 However, although DXA measured body composition has been shown to predict systolic blood pressure, its application in predicting albuminuria is unknown. 8 In this study, we have used machine learning methods to assess the different features that may predict albuminuria in persons with diabetes, using body composition and other widely employed determinants of vascular health. 9

Research design, data, and methods
This study was a comparative analysis of the different machine learning methods to predict the presence of microalbuminuria/overt proteinuria in persons with diabetes mellitus older than 40 years, using the LOOK AHEAD study cohort (an NIH funded study-ClinicalTrials.gov Identifier: NCT00000620) baseline characteristics. [10][11][12] The original study was performed at multiple different locations. We obtained the de-identified data from the NIH-NIDDK repository after obtaining IRB approval from the Johns Hopkins IRB.
The key aims of the study are (1) examination of the utility of body fat distribution in the prediction of albuminuria; (2) compare the different machine learning methods; (3) elucidate the critical determinants of albuminuria when analyzed by the random forest classifier. 13

LOOK AHEAD study cohort
The LOOK AHEAD study had two groups. The intensive lifestyle intervention group achieved weight loss through dietary changes and increased physical activity, and a control group that received only diabetes support and education. 14 The intervention group received indi-vidual and group sessions every week during the trial, while the control group received the usual care involving diet and education. In addition, persons with Type 2 Diabetes who met the following inclusion criteria were part of the study: (1) Age between 45 and 75; (2) Overweight or Obese status (BMI 25 kg/m 2 or more, or 27 kg/m 2 or more while on insulin); (3) blood pressure (BP) 160/100 mmHg or below; and (4) plasma triglyceride below 600 mg/dL. [10][11][12] The inclusion and exclusion criteria can be found in these original manuscripts from the LOOK AHEAD group. 10-12

Measurement of lipid values and A1C
Lipid parameters (total cholesterol, HDL-cholesterol, LDL-cholesterol, and triglycerides) were measured at the Look AHEAD Central Laboratory at Baseline annually for the first few years and every two years, during extended follow-up period of the study. The levels were measured using standard methods previously described. 12,15 Ion exchange, high-performance liquid chromatography was used to measure the A1C (Bio-rad Variant,11). 11,12 Learn. [20][21][22] Python code for the entire processing pipeline is stored in the GitHub repository (https://github.com/prasu2172/Albuminuria).

F I G U R E 1
The correlation matrix after removing the highly correlated variables of body composition with one another and, as mentioned above, were removed from the final analysis. Figure 1 shows a pairwise correlation matrix after removing the correlated features. The confusion matrices for the different models in the training dataset Figure 2. All the models showed excellent precision, recall as well as F1 scores in the training dataset. The confusion matrices for the testing data are shown in Figure 3. Figure 4 shows the results of cross-validation.
The ROC curves are shown in Figure 5.  Figure 6.

DISCUSSION
Our study shows that machine learning algorithms can help enhance Urinary albumin excretion is an established risk factor for the prediction of poor metabolic health. In a research paper from the Framingham heart study cohort, low albumin level(s) in the urine (less than 30 mcg) was associated with increased risk of cardiovascular disease and death, even after adjustment of other important risk factors in the nondiabetic nonhypertensive population. 23 Albuminuria is strongly associated with calcification within the coronary and carotid arteries in Caucasians with type 2 diabetes, even if renal function is preserved. 24 Prior studies have shown that sarcopenia, obesity are all associated with albuminuria in persons with diabetes. 25,26 In a study using the F I G U R E 2 The confusion matrices of the different machine learning models in the training dataset LOOK AHEAD cohort, different predictors like age, sex, race, duration of diabetes, A1C, hypertension, and ace inhibitors administration were used in a multivariate logistic regression model to examine the risk of albuminuria and obesity. 26 The highest quartile of BMI was associated with albuminuria. 26 However, the study only examined the relationship between total body fat percent and albuminuria, and it did not find an association between the two. 26 Central obesity in nondiabetics is an independent predictor of albuminuria in South Asian subjects. 27 This phenotype, in particular, can explain the higher incidence and poor outcome of microvascular complications like diabetic nephropathy in this population. 28

GENERAL NIDDK REPOSITORY ACKNOWLEDGMENTS
The Look AHEAD study was conducted by the Look AHEAD Investigators and supported by the National Institute of Diabetes and Repositories. This manuscript was not prepared in collaboration with Investigators of the Look AHEAD study and does not necessarily reflect the opinions or views of the Look AHEAD study, the NIDDK Central Repositories, or the NIDDK.
The data was provided to us in accordance with the NIDDK-NIH researcher data sharing agreement.

CONFLICT OF INTEREST
The authors of the manuscript have no disclosures to make and report no conflict of interest.

PATIENT CONSENT STATEMENT
The work was conducted on the de-identified data obtain from the NIH central repository as outlined below in the acknowledgement section.