Improved risk prediction of chemotherapy‐induced neutropenia—model development and validation with real‐world data

Abstract Background The existing risk prediction models for chemotherapy‐induced febrile neutropenia (FN) do not necessarily apply to real‐life patients in different healthcare systems and the external validation of these models are often lacking. Our study evaluates whether a machine learning‐based risk prediction model could outperform the previously introduced models, especially when validated against real‐world patient data from another institution not used for model training. Methods Using Turku University Hospital electronic medical records, we identified all patients who received chemotherapy for non‐hematological cancer between the years 2010 and 2017 (N = 5879). An experimental surrogate endpoint was first‐cycle neutropenic infection (NI), defined as grade IV neutropenia with serum C‐reactive protein >10 mg/l. For predicting the risk of NI, a penalized regression model (Lasso) was developed. The model was externally validated in an independent dataset (N = 4594) from Tampere University Hospital. Results Lasso model accurately predicted NI risk with good accuracy (AUROC 0.84). In the validation cohort, the Lasso model outperformed two previously introduced, widely approved models, with AUROC 0.75. The variables selected by Lasso included granulocyte colony‐stimulating factor (G‐CSF) use, cancer type, pre‐treatment neutrophil and thrombocyte count, intravenous treatment regimen, and the planned dose intensity. The same model predicted also FN, with AUROC 0.77, supporting the validity of NI as an endpoint. Conclusions Our study demonstrates that real‐world NI risk prediction can be improved with machine learning and that every difference in patient or treatment characteristics can have a significant impact on model performance. Here we outline a novel, externally validated approach which may hold potential to facilitate more targeted use of G‐CSFs in the future.


| INTRODUCTION
Prophylactic granulocyte colony-stimulating factors (G-CSFs) can be used to shorten neutropenia duration and prevent febrile neutropenia (FN), which are major doselimiting and resource-intensive complications during cancer chemotherapy. Meta-analyses show that use of G-CSFs significantly reduces the risk of FN, with relative risks of 0.27-0.51 compared to patients with no G-CSF prophylaxis. 1,2 However, risk assessment based on realworld settings are sparse. 3 Significant underuse of G-CSFs in high-and intermediate-risk regimens, and overuse in low-risk regimens, have been reported both in the EU 4,5 and in the US. [6][7][8] G-CSFs contribute also significantly to increased healthcare costs. 9 Therefore, selective use of G-CSFs based on predetermined FN risk is recommended to optimize the use of healthcare resources. 10,11 European and North American guidelines recommend the use of prophylactic G-CSF based on the predetermined risk of FN, classified as low, intermediate, or high, depending on the chemotherapy regimen. 10,11 Prophylactic G-CSFs are recommended for high risk but not for lowrisk regimens. For intermediate-risk regimens, a history of prior neutropenia or leukopenia, age >65 years, presence of comorbid conditions, advanced stage of disease (especially to bone marrow), poor performance status, female sex, and low hemoglobin support the use of prophylactic G-CSFs. [10][11][12][13][14] Risk prediction models for neutropenic complications based on retrospective data from the US 14,15 or on prospective cohorts and trial settings in Europe 12,16 or tailored for specific cancer type 17 14 and has subsequently been re-evaluated in a larger population. 15 Despite the available models, they have not been typically externally validated against data from institutions other than the one used for model training. 17 In the development of any risk prediction model, external validation is of great importance to verify the generalizability of the model to patients outside the training cohort and should always be done before its implementation and wider use in a clinical setting. 18 In addition to the lack of external validation, the development of previous models has typically relied on conventional modeling strategies whose performances Results: Lasso model accurately predicted NI risk with good accuracy (AUROC 0.84). In the validation cohort, the Lasso model outperformed two previously introduced, widely approved models, with AUROC 0.75. The variables selected by Lasso included granulocyte colony-stimulating factor (G-CSF) use, cancer type, pre-treatment neutrophil and thrombocyte count, intravenous treatment regimen, and the planned dose intensity. The same model predicted also FN, with AUROC 0.77, supporting the validity of NI as an endpoint.

Conclusions:
Our study demonstrates that real-world NI risk prediction can be improved with machine learning and that every difference in patient or treatment characteristics can have a significant impact on model performance. Here we outline a novel, externally validated approach which may hold potential to facilitate more targeted use of G-CSFs in the future.

K E Y W O R D S
chemotherapy, clinical decision support, granulocyte colony-stimulating factor, machine learning, neutropenia

Novelty and Impact
There are several risk prediction models for chemotherapy-induced neutropenia, but the existing models may not always apply to real-life patients in different healthcare systems. A novel machine learning-based model was developed to predict neutropenic infection risk in cancer patients. The model performance was externally validated in an independent cohort, outperforming two previously introduced conventional models. In the future, our model may facilitate more targeted use of granulocyte colony-stimulating factors. might be improved with machine learning-based modeling approaches. 19 For example, the widely used model by Lyman et al. is a multivariable logistic regression model developed by applying a stepwise variable selection procedure 14 in which the variable selection process is known to be more unstable compared to penalized approaches and can magnify problems associated with model overfitting. 19,20 The recently re-evaluated version of this model introduced by Li et al. also uses the same modeling approach and same variables as the original study but with few variable-related modifications made based on clinical and numerical rationale. 15 The aim of the present study was to evaluate if a machine learning-based risk prediction model could outperform previously introduced models by Lyman et al. and Li et al. 14,15 for neutropenic complications in real-world patient data. To avoid overfitting and to verify model generalizability, we carried out external validation against independent data from another university hospital.

| Patients
We gathered comprehensive clinical data on all patients who received intravenous chemotherapy between January 2010 and December 2017 at Turku University Hospital, Finland, covering the population of 480 000 inhabitants in Southwest Finland. This data included the patient's sex, age, time of death, weight, height, body temperatures, given diagnoses (ICD-10 codes), electronic prescriptions, intravenous chemotherapies and monoclonal antibodies, G-CSF therapies, laboratory values, and hospital in-and outpatient visits.
All patients aged at least 18 years with ICD-10 code C00-79 before receiving their first dose of intravenous chemotherapy, either in curative or palliative regimens, were included ( Figure 1). Patients treated with investigational products or non-chemotherapy regimens such as immune checkpoint inhibitors (232 patients), patients with nonmelanoma skin cancer (9 patients), and patients having multiple primary malignancies (511 patients) were excluded. This left a total of 5879 patients in the analyses. For model training and internal testing, the data were randomized into separate training (N = 3920, two thirds of the data) and test (N = 1959, one third of the data) cohorts.
Using the same protocol, we collected an independent validation cohort from Tampere University Hospital with the catchment population of 515 100 inhabitants. In total, 4594 patients treated between January 2014 and June 2019 were identified.
The data gathering and analysis were performed with research permissions granted by institutional review boards of Turku and Tampere University Hospitals. Additional information on data gathering and confidentiality can be found in the Supplementary Material.

| Study endpoint and candidate predictors
Febrile neutropenia is a classical endpoint in neutropenia studies, but the definition of FN varies depending on the source. 14, 21 We designed neutropenic infection (NI) as primary endpoint, which was defined as grade IV neutropenia (absolute neutrophil count (ANC) <0.5 × 10 9 /l according to Common Terminology Criteria for Adverse Events (CTCAE) version 4.0) within 14 days of the first chemotherapy infusion combined with subsequent serum C-reactive protein (CRP) level >10 mg/l within 5 days (of neutropenia). Patients who did not undergo any laboratory measurements within 14 days of the first chemotherapy infusion or failed to fulfill either of the two criteria were considered patients without NI. To further verify the selected primary study endpoint, the occurrence of NI was compared with admissions to any tertiary care hospital ward during the NI episode. We also tested FN as an endpoint, defined as body temperature ≥38°C and ANC <1.0 × 10 9 /l according to CTCAE version 4.0 but ignoring the required 1-hour limit for fever. We focused on the first-cycle of chemotherapy when the patients are at highest risk of developing neutropenia. 12,14 The candidate predictors for estimating the risk of NI are provided in Table 1

| Statistical analysis and model development
We applied penalized logistic regression with least absolute shrinkage and selection operator (Lasso) penalty to the training cohort to construct a multivariable model for predicting the individualized risk of NI. To identify the most influential predictors and to account for model variability due to random subsampling, the model construction was performed iteratively in multiple steps similarly as before. [22][23][24] In previous studies, this approach has led to models with fewer variables but retaining the same prediction accuracy as the more complex models. During model development and validation, only patients with complete data for the selected predictors were used. The performance of the Lasso model was compared against previously introduced model for multiple cancer types by Lyman et al. 14 as well as the revised version of the Lyman model introduced by Li et al. 15 (Table S1). The comparisons were done using only those cancer types that were shared by all three studies (breast, ovarian, colorectal, small cell, and non-small cell lung cancer) and patients for whom all risk estimates could be determined. Furthermore, patients who received G-CSFs were excluded from this comparison since they were not included in the development of Li model. Finally, we compared the performance of the Lasso model also with a model developed using the conventional stepwise variable selection procedure. All statistical analyses and modeling were carried out using the R statistical computing environment version 3.4.3 (R Core Team, 2016. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-proje ct.org/). For penalized regression, implementation available in the R package glmnet (version 2.0-16) 25 was used. The discrimination performances of the risk assessment models were evaluated in terms of area under the receiver operating characteristic curve (AUROC) and compared using the DeLong test 26 implemented in the R package pROC (version 1.12.1). Comparisons between the groups either having or not having NI were tested using the Mann-Whitney test for continuous variables and the chisquared test or Fisher's exact test (N < 5) for categorical variables. The level of significance was set at p < 0.05. Additional details on model development and statistical analyses can be found in the Supplementary Material.

| Study populations
In total, the Turku University Hospital cohort consisted of 5879 patients (Table 1). Among these, NI occurred in 262 (4%) of the patients, of whom 225 (86%) were also subsequently admitted to a hospital ward. Out of 5879 patients, 314 (5%) received G-CSFs as primary prophylaxis, of whom eight (3%) developed NI. The patients who had NI were typically women (p < 0.001) with breast cancer (p < 0.001) treated with higher relative dose intensity (RDI) (p < 0.001) and had lower levels of blood ANC (p < 0.001), leukocytes (p < 0.001), and thrombocytes (p < 0.001) in the beginning of the treatment (Table 1). Despite the lower ANC and thrombocyte counts, for nearly all patients (98%) these were above the lower limit of the reference (>1.5 × 10 9 /l for ANC and >150 × 10 9 /l for thrombocyte count).
The validation cohort from Tampere University Hospital included 4594 patients who showed similar distributions in age, sex, comorbidities, cancer types, and RDI compared to the Turku University Hospital cohort (Table  S2). The increased risk of NI associated with lower ANC and thrombocyte counts, however, were not observed in the validation cohort.

| Model development and validation
Of the more than 30 variables included (Table 1), a subset of 10 variables (Table 2) were selected by the Lasso model for accurate NI risk predictions in the training cohort (AUROC 0.87, 95% confidence interval [CI] 0.84-0.89). The selected model showed similar performance also in the internal test cohort indicating good generalizability (AUROC 0.85, 95% CI 0.80-0.90). As expected, use of prophylactic G-CSFs and reduced RDI were among the most influential predictors decreasing NI risk ( Table 2). Intravenous treatment regimens involving antimetabolites demonstrated also decreased NI risk compared to other treatment regimens. Of all cancer types, breast cancer and sarcoma patients showed increased NI risk. The other variables increasing NI risk were use of taxanes alone or in combination with monoclonal antibodies, use of topoisomerase inhibitors, and low pre-treatment ANC and thrombocyte counts. The majority of taxanes used were docetaxel 80 mg/m 2 Q3W to treat breast cancer (65%), docetaxel 50 mg/m 2 Q2W to treat metastatic castration-resistant prostate cancer (14%), and paclitaxel to treat gynecological cancers (11%). We observed that among patients treated with taxanes at full RDI, NI risk was 12%, compared to 3% in those treated with reduced RDI.
Overall, the Lasso model predicted NI risk with high accuracy in the Turku University Hospital cohort ( Table 3). The discrimination performance of the model developed using the conventional stepwise variable selection procedure was comparable to the Lasso model in the Turku University Hospital cohort but significantly worse in the Tampere University Hospital cohort (Supplementary Material).
In Turku University Hospital cohort, 56% of the patients with NI also met the criteria for FN. Overall, FN was observed in 221 (4%) of patients. When applied to predict the potential FN cases, the Lasso model significantly outperformed (AUROC 0.77) the previously introduced Lyman (AUROC 0.50, p < 0.001) and Li models (AUROC 0.73, p = 0.007) ( Table 3).

| Assessing the effect of G-CSFs on predicted NI risk
We observed that, especially in the Turku University Hospital cohort, the higher the predicted risk of NI, the greater benefits from G-CSFs can be expected in terms of reducing the risk (Figure 2). Among patients with the highest predicted NI risk (>20%), the observed NI rate was 29% (79 out of 270 patients) without the use of G-CSF but roughly 3% when G-CSF was used (2 out of 59 patients). The proportion of patients receiving G-CSFs was also highest in this category (18%) compared to categories with predicted risk of 5-20% or <5% with 116 out of 850 patients (14%) or 52 out of 2502 patients (2%) receiving G-CSFs, respectively. A practical example on how to apply the fitted Lasso model to evaluate the effect of G-CSFs on NI risk is provided in the Supplementary Material.

| DISCUSSION
The decision whether to use prophylactic G-CSFs during chemotherapy is a daily question in oncological practice. Our study introduces a novel machine learning-based model to predict NI risk when initiating chemotherapy for cancer patients. The Lasso model uses a limited number of routinely used variables and showed excellent capabilities in predicting NI risk in both training and external validation realworld data cohorts, indicating good model generalizability.
The Lasso model outperformed both previously introduced models as well as a model based on stepwise T A B L E 2 Coefficients and covariates in the Lasso risk assessment model for the occurrence of neutropenic infection during the first round of chemotherapy in the training cohort  variable selection procedure in predicting NI risk with even fewer required input variables suggesting that machine learning-based modeling can improve both usability and predictive performance over more conventional approaches. This finding held true also in the validation cohort. Lack of external validation is a common problem in neutropenia prediction studies, limiting their implementation in clinical practice. The improved performance is explained by the different input variables resulting from the applied iterative variable selection procedure used successfully also before. [22][23][24] Alternatively, poorly adjusted regression coefficients due to inherent differences between patient populations and local healthcare practices or overfitting may explain the worse performance of previous models. For example, the Lyman model had drastically different coefficients for different treatment regimens compared to Lasso and Li models ( Table 2, Table S1) which might explain its underperformance. This highlights also the importance of model validation in the target population before use.
The most common clinically monitored neutropenic complication is FN, for which several different diagnostic criteria can be found in the literature. 10,14,21 However, FN was a challenging outcome measure in a real world, retrospective study setting, as comprehensive body temperature information is usually unavailable. Body temperature is not always documented or unreliable due to fever-lowering medication. Therefore, we chose CRP (>10 mg/l) accompanied with severe neutropenia (ANC <0.5 × 10 9 /l) as the primary endpoint. In clinical practice, it is considered that these laboratory values are always available when initiating antimicrobial treatment, even if the body temperature is below 38.0°C. Notably, the majority (86%) of the patients identified as having NI were also admitted to a hospital ward, supporting the validity of our primary endpoint. The occurrence of first-cycle NI (4%) was close to the occurrence of first-cycle FN reported in the literature. 14, 15 We also tested our risk assessment model for predicting FN, and observed a high level of accuracy, thus supporting the utility of our model for both FN and NI risk prediction.
Of the more than 30 risk factors studied, the Lasso model eventually required only a subset of 10 variables. As expected, the use of G-CSFs showed a similar riskreducing effect against neutropenic complications as reported previously. 1,2,14 Overall, patients receiving taxanes, with or without monoclonal antibodies to treat breast cancer formed the largest group of patients with increased NI risk. This is line with a recent meta-analysis that demonstrated a significantly increased risk of FN with the use of trastuzumab, 27 the most commonly used monoclonal antibody in our study cohort. The second risk group identified consisted of heterogeneous sarcoma patients, often treated with aggressive chemotherapy. Topoisomerase inhibitors, such as etoposide used in small-cell lung cancer and irinotecan used in gastrointestinal cancers, increased risk of NI. However, antimetabolites including fluorouracil commonly used in several types of cancer, were associated with reduced relative NI risk. These chemotherapy regimens remained as important predictors for NI regardless of the RDI used.
Reduced RDI, another mechanism to reduce neutropenia risk along with G-CSFs, was also identified as an important variable and could explain why age or comorbidities did not influence the Lasso model. This further reflects the oncology practice in Finland, where older and comorbid patients seem to be treated with reduced doses of chemotherapy. The risk-reducing effect of reduced RDI was consistent with previous reports. 14,15 Finally, among all the pre-treatment laboratory test results, only lower ANC and thrombocyte counts were associated with elevated NI risk in the final model. Similar effects of ANC and thrombocyte counts on the risk of neutropenic complications have been reported also before. 14,28 Lower thrombocyte count may reflect bone marrow dysfunction and could therefore be linked to simultaneous leukopenia, but this was not observed in the validation cohort and should be interpreted with caution.
In conclusion, morbidity due to neutropenic complications affects the patient's quality of life, creates substantial costs, and may even threaten the outcome of cancer treatment if the treatment schedule is postponed because of infection. To improve the targeted use of prophylactic G-CSFs, well-calibrated risk models applicable to real-world Here, we demonstrate that risk prediction of neutropenic complications can be improved with machine learning and that even previously validated models do not necessarily lead to correct predictions in all patient populations. Our novel machine learning-based model outperformed both previously introduced models and can be easily applied to identify individuals at high risk of neutropenic complications especially in countries with similar clinical practices. These findings were confirmed in an external validation cohort thus supporting the generalizability and clinical applicability of our model. Overall, the presented method holds potential for avoiding resourceintensive and life-threatening neutropenic complications and could facilitate the proper use of G-CSFs in the future.