Using machine learning to predict rapid decline of kidney function in sickle cell anemia

1 Center for Biomedical Informatics, University of Tennessee Health Science Center, Memphis, USA 2 Department of Bioengineering, University of Illinois at Chicago, Chicago, USA 3 Department of Health Informatics andData Science, Loyola University Chicago,Maywood, USA 4 Department of Econometrics, Kirklareli University, Kirklareli, Turkey 5 Department of Computer Science, University ofMichigan, Ann Arbor, USA 6 Department ofMedicine, University of Illinois at Chicago, Chicago, USA 7 Center for Sickle Cell Disease, University of Tennessee Health Science Center, Memphis, USA

chronic kidney disease, machine learning models, predictive capacity, rapid decline of eGFR, sickle cell disease Chronic kidney disease (CKD) is prevalent in sickle cell disease (SCD) [1]. Kidney function declines more rapidly in SCD than in the general African-American population [2]. Furthermore, rapid decline in kidney function, defined as an estimated glomerular filtration rate (eGFR) loss of >3 mL/min/1.73 m 2 per year [3,4], occurs more commonly than in the general population [5,6]. As rapid kidney function decline is associated with increased mortality in SCD [2], early identification of patients at risk for such decline may facilitate risk modification.
Machine learning (ML), characterized as the study of algorithms and statistical models that computer systems utilize to learn from previous experience, can assess relationships of multiple variables, create predictions based on characteristics and identify patient groups with comparable patterns [7]. We explored the potential of ML tools to predict rapid kidney function decline in SCD, hypothesizing that ML models are highly predictive of rapid kidney function decline in severe SCD genotypes.
Participants in this retrospective cohort study have been previously described [2,5]. The internal cohort consisted of SCD patients, ≥18 This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. At each visit, GFR was estimated using the creatinine-based CKD Epidemiology Collaboration (CKD-EPI) formula [8]. Visits in which rapid eGFR decline occurred were defined using two thresholds:  [3,4]. Data were censored at the first occurrence of rapid eGFR decline. As the first occurrence of rapid decline may reflect acute kidney injury (AKI), additional analyses were performed, restricted to patients with at least two visits in 1 year and persistent or sustained decline in kidney function. Analyses were performed for each eGFR decline threshold and for prediction of rapid decline 6 and 12 months following a clinic visit. Missing data eJHaem. 2021;2:257-260.
wileyonlinelibrary.com/journal/jha2 257 TA B L E 1 Baseline demographic, laboratory, and clinical characteristics of internal and external cohorts with stratification for rapid decline status  (Figures S3 and S4). In the external cohort, prediction of eGFR decline >3 mL/min/1.73 m 2 at 6 months was similar to that of the internal cohort, while prediction of eGFR decline >5 mL/min/1.73 m 2 6 months in advance was poorer than in the internal cohort (Table 2).

Covariates
In evaluation of feature importance, age, baseline eGFR, eGFR slope 6 months previously, reticulocyte count and SBP were associated with >3 mL/min/1.73 m 2 eGFR decline threshold at 6 months and had similar importance overall ( Figure S5). Age and eGFR slope 6 months previous were the most strongly predictive of rapid eGFR decline at >5 mL/min/1.73 m 2 threshold. Baseline eGFR, age, SBP, serum creatinine and eGFR slope in the previous 12 months predicted rapid eGFR decline for both >3 mL/min/1.73 m 2 and >5 mL/min/1.73 m 2 thresholds at 12 months.
Rapid eGFR decline at 6 months was predicted with fairly high accuracy using ML models, but less so at 12 months. Despite concerns that the first decline in eGFR might reflect AKI, predictive capacities based on sensitivity analyses were only minimally decreased suggesting that use of the first eGFR decline reflected rapid kidney function decline.
Notwithstanding the suitability of ML methods, predictions of distant events may not always be possible as evident by the modest predictive capacities of our models at 12 months. Biomarkers of kidney injury and larger patient populations may be necessary to better predict longterm outcomes.
Our study differs from prior studies evaluating kidney function decline in that we calculated rate of eGFR decline at each visit, while others have calculated decline over the entire observation period [5,9].
We have also used eGFR calculated at each visit to predict future decline. Furthermore, in our model development, we used all the data up to any point in time to predict eGFR decline 6 and 12 months in advance.
Despite limitation by lack of quantification of albuminuria, use of real-world data, and some eGFR data imputation, this study demonstrates a role for ML models to predict rapid decline in eGFR. With the association of rapid eGFR decline with mortality in SCD, ML may play an important role in identifying patients at high risk for progressive kidney disease as early as 6 months in advance. More studies are required to further evaluate ML models in SCD-related kidney disease.
analyzed the data and assisted with manuscript preparation. Robert L.
Davis analyzed the data and wrote the manuscript. Ibrahim Karabayir analyzed the data and assisted in the manuscript preparation. Maxwell Strome analyzed the data and assisted with manuscript preparation.
Yang Dai analyzed the data and assisted with manuscript preparation.
Santosh L. Saraf assisted in study design and manuscript preparation.
Kenneth I. Ataga designed the study and wrote the manuscript.