Added diagnostic value of routinely measured hematology variables in diagnosing immune checkpoint inhibitor mediated toxicity in the emergency department

Immune checkpoint inhibitors (ICI) show remarkable results in cancer treatment, but at the cost of immune‐related adverse events (irAE). irAE can be difficult to differentiate from infections or tumor progression, thereby challenging treatment, especially in the emergency department (ED) where time and clinical information are limited. As infections are traceable in blood, we were interested in the added diagnostic value of routinely measured hematological blood cell characteristics in addition to standard diagnostic practice in the ED to aid irAE assessment.


| INTRODUCTION
Within the immunotherapeutic field of cancer treatment, multiple new and promising treatment options have emerged over the past years. 1 Among these, immune checkpoint inhibitors (ICI) are increasingly being used as an oncologic treatment strategy for multiple types of cancer and have drastically improved survival of responding patients. For example, patients with advanced melanoma treated with combined nivolumab and ipilimumab therapy have shown to result in a median overall survival of over 60 months, 2 whereas the median survival of patients with metastatic melanoma used to be less than 1 year before the introduction of checkpoint inhibitors. 3 The proportion of cancer patients benefiting from ICI is increasing rapidly, with now over 40% of cancer patients qualifying for ICI treatment. 4 However, their use is associated with a wide variety of immune-related adverse events (irAE), such as auto-immune colitis and pneumonitis. 5 Because of overlap in clinical presentation, it can be difficult to differentiate these irAE from progressive disease or other inflammatory conditions, such as infections. Especially in the emergency department (ED) where time and resources are limited, this may lead to diagnostic delay, inappropriate treatment, and a considerable amount of (unnecessary) diagnostic testing. 6,7 Accurate and early diagnosis of patients presenting in the ED with irAE is therefore key to start adequate treatment as soon as possible. 8,9 Currently, there are only a few biomarkers available that can aid in diagnosing irAE. 6,10 A solution to this problem might be found in routinely measured hematological variables. Bacterial infection and viral infections are commonly characterized by high neutrophil and lymphocyte counts respectively, whereas auto-immune diseases and allergies typically show high eosinophil counts. Previous research has found associations between irAE and increased counts of standard hematology measurements (e.g., absolute lymphocyte count and eosinophil count). 6 In addition, changes in B-and T-cell receptor repertoire show associations with irAE onset and prognosis. 6 However, none of these biomarkers have been extensively validated or are used in clinical practice. Most modern hematology analyzers not only provide blood cell counts, but also measure morphologic characteristics, such as cell size, intrinsic properties and cell viability that carry diagnostic and prognostic value. This raises the question whether they may also be of use in the setting of immunological toxicity. [11][12][13] To answer these types of questions, scrutinizing complex datasets with conventional statistical methods, such as logistic regression, do not provide stable estimates of the variable's coefficients as models contain too many variables and a low number of samples. New advanced statistical and machine learning (ML) methods are able to remove irrelevant variables thereby reducing the number of variables. In addition, variables of high importance, also known as predictors, can be identified by evaluating the trained coefficient of the trained model. This way, ML allows for the possible identification of new biomarkers and exploration of new horizons in research to aid irAE diagnosis.
The aim of this study was therefore to determine the added value of routinely measured hematology characteristics, modeled through ML, as compared to the standard diagnostic practice. This may aid in the diagnosis of irAE in the ED and understanding of the pathophysiology.

| Study population
This retrospective observational study included all visits to the ED of the University Medical Center Utrecht (UMC Utrecht) between 2013 and 2020 of patients who were being treated with any type of ICI for any type of cancer, until 3 months after cessation of treatment. Because irAE can occur even after cessation of treatment, we chose to include ED visits up to 3 months after treatment with ICI ended. 14 The cutoff of 3 months was chosen after discussion between the authors. If patients had more than one disease episode (defined as a consecutive period with infection-like symptoms), all patient's ED visits were included separately, whereas for patients with multiple ED visits during one disease episode, only the first visit was included. If patients visited the ED multiple times for the same condition (e.g., due to worsening of symptoms), only the first visit was included.
variables could yield new insights into the pathophysiology underlying irAE and in distinguishing irAE from other inflammatory conditions.

K E Y W O R D S
biomarkers, emergency department, immune checkpoint inhibitor, immune-related adverse events, machine learning

| Data collection
For all ED visits, demographic (age and sex), medication, and hematology data were extracted from the Utrecht Patient Orientated Database (UPOD). In brief, UPOD is a relational database combining clinical characteristics, medication, and laboratory measurements of patients in the UMC Utrecht since 2004. 15 We used hematological variables measured by the CELL-DYN Sapphire hematology analyzer (Abbott diagnostics). The CELL-DYN Sapphire is a cell counter equipped with a 488-nm blue diode laser and uses multiple techniques, such as electrical impedance, spectrophotometry, and laser light scattering, to measure morphological characteristics of leukocytes (incl. 5-part differential), red blood cells (RBCs), and platelets for both classification and enumeration. Each time a component of a complete blood cell count (CBC) is requested, all data generated by the hematology analyzer are automatically stored in UPOD, including a substantial number of raw and research-only values and background data on cell characteristics which are made available for research purposes. Only visits with available Sapphire data within the first 4 h after ED presentation were included in this study to ensure we only used data from patients with infection-like symptoms during the ED visit. UPOD data acquisition and management is in accordance with current regulations concerning privacy and ethics.

| irAE label definition
A manual chart review was done for all ED visits within our study population by two of the authors (TVtH and BV). Visits for evidently unrelated conditions were excluded. We recorded both the preliminary and definite diagnosis. The preliminary diagnosis was defined as the diagnosis made by the treating physician in the ED and was characterized as either suspected irAE or other. The definitive diagnosis was defined as the diagnosis made by the treating physician at discharge from the hospital or at the end of treatment and was characterized as irAE or other. Ambiguous cases were resolved through consensus.

| Model development
Two models were trained to evaluate the added diagnostic performance of the hematology variables for irAE diagnosis. The first model (base) assessed the preliminary diagnosis, sex, and age with logistic regression thereby imitating clinical practice at the ED, whereas the second model (extended) also included the 77 additional hematology variables. A quality control protocol was performed to remove variables with no additional predictive value during model development: hematology variables with a Pearson correlation of >0.80 or with low number of unique (n = 5) values were removed. The extended model was trained using lasso machine learning that can automatically reduce the number of variables, thereby reducing the risk of overfitting and aiding the interpretability of the model. Means and standard deviations are shown for normally distributed variables whereas medians and inter-quartile ranges (IQR) are shown for non-normally distributed variables.
Model performance was assessed using cross validation (CV). With CV, the data are split in K number of partitions (folds), of which K-1 folds are used for training and 1 for testing. This exercise is repeated K times resulting in K models with K performance estimates. Contrary to the conventional train-and-test split, multiple models are trained on multiple data splits, thereby using all data to assess the model's performance. The lasso algorithm performs shrinkage of coefficients that can get as small as 0, thereby removing variables. The lambda hyper-parameter of lasso determines the degree of shrinkage and was optimized in a double loop cross-validation (DLCV) scheme, also known as nested cross validation ( Figure S1). 16 A K of 10 was used for both the CV and DLCV schemes.

| Model evaluation
The discrimination of models was assessed by plotting receiver operator characteristic (ROC) curves. The area under the ROC (AUROC) is a measure of discrimination, an AUROC of 1 indicates a perfect model, whereas an AUROC 0.5 indicates a random model. The 95% confidence interval (CI) of the AUROC was computed with the R cvAUC package by evaluating the test performances of the two model configurations trained in both CV schemes. 17 Variable coefficients of the ten models trained in the DLCV were evaluated as variable importance (predictors).
The clinical application and value of the trained models was evaluated with both calibration plots and net benefit curves. Calibration plots portray the agreement between predicted probabilities and the observed frequency of irAE. A calibration with an intercept 0 and slope of 1 shows perfect calibration, whereas a slope of >1 shows a model that overestimates outcome and a slope of <1 underestimates diagnosis. 80% and 95% CI intervals of the calibration plots were generated with the R givitR package. 18 Net benefit is a measure to evaluate the clinical benefit of a prediction model by comparing the benefit [treating diseased, true positives (TP)] and cost [treating non-diseased, false-positive (FP)]. 19 Net benefit is assessed by subtracting the cost from the benefit for the complete range of predictions values (p t ). Formula 1 shows that the net benefit increases by the number of TP and is penalized by the number of non-diseased (FP), especially when the prediction threshold value increases Besides the net benefit, the number needed to treat (NNT) is shown as a comparison to how healthcare professionals consider whether the patient has a specific illness or that treatment is required. All analyses were performed in R version 4.1.2. 20

| Post hoc subgroup analysis
To assess the independence of the identified biomarkers we adjusted for the baseline clinical variables, we performed a multivariate analysis including the identified biomarkers, age, sex, cancer type, and ICI medication. To reduce the number of coefficients and to remove groups with low prevalence, various cancer types, and ICI medications were grouped.
A second post hoc analysis was performed to check whether the identified biomarkers were associated with disease severity as measured by CTCAE grade.

| Patient characteristics
Between 2013 and 2020, 409 ED visits of 257 patients who were treated with ICI and had available blood counts were included in this study (mean ED visits per patient 1.6). The irAE diagnosis of 91 visits were inconclusive from the medical records, of which the diagnosis was later adjusted in 24 cases. In both the other (n = 268) and irAE (n = 141) subgroups there were more males, 63.1% and 64.5%, respectively (Table 1). Mean age did not differ between the other (62.2) and irAE group (61.7). The use of both ipilimumab and nivolumab were significantly higher in the irAE group (p < 0.01), whereas the use of nivolumab and pembrolizumab were significantly lower in the irAE group (p < 0.01). An overview of the irAE diagnoses is shown in Table S1.

| Model performance
After removing variables that did not meet our quality control criteria, 53 of the 77 Sapphire variables were used in the extended model (Table S2 and Figure S2). The base model had an AUROC of 0.67 (0.60-0.79 95% CI) and the extended model had an AUROC of 0.79 (0.75-0.84 95% CI), a difference in 0.13. The training performance was marginally higher for both the base and extended model as compared to the test performance, 0.74 (0.72-0.76 95% CI) and 0.86 (0.84-0.87 95% CI), respectively, providing evidence there was no overfitting. In line with the AUROC metrics, the extended model trained on all data shows the best ROC and PRC curves (Figure 1).

| Discriminative metrics
To assess the potential value in clinical practice of the extended model, predictions of the base and extended models were evaluated with both calibration and net benefit plots. The extended model showed better calibration than the base model ( Figure 2). The 95% CI of the base model are very wide compared to the extended model and the predictions of the extended model are more equally distributed. In addition, decision curve analysis showed improved net benefit of the extended model as compared to the base model over the complete threshold probability range (Figure 3).

| Variable importance
Variables' coefficients, as well as the number of times a variable was selected by the extended model, were documented during training, and are shown in Figure 4 and Table 2. The preliminary diagnosis was highly predictive for irAE diagnosis in both the base and extended model with a coefficient of 3.53 ± 0.14 and 2.88 ± 0.18, respectively. The extended model also identified the following Sapphire variables as predictors for irAE diagnosis: number of eosinophils (eos), red blood cell count measured with impedance (rbci), coefficient of variance neutrophil depolarization (ndcv), and red blood cell distribution width (rdw), of which the latter was negatively associated with irAE. Eos was highly correlated with percentage of eosinophils (peos) and rbci with other red blood cell measurements variables (rbco, hgb, and hct) ( Table S2). The sex and age variables were not selected by lasso in any of the ten iterations in the DLCV scheme.

| Post hoc subgroup analysis
After adjusting for age, sex, cancer type (grouped as skin, lung, urological or other) and ICI medication (grouped as ipilimumab, nivolumab, pembrolizumab, ipilimumab, and nivolumab, or other) we found that three of the four identified variables were still significantly associated with irAE, namely: eos (p-value 0.0144), rbci (p-value 0.0035), and rdw (p-value 0.0003).
In this model we did not find a significant association for ndcv (p-value 0.0781). Furthermore, we did not find an association between the values of the identified variables and the irAE severity as measured by CTCAE grade (Supplementary Figure).

| DISCUSSION
Accurate identification of irAE in patients using ICI in the ED is of vital importance to guide treatment decisions.
With new statistical methods and ML, we explored the possible added diagnostic value of 77 hematological variables measured by the CELL-DYN Sapphire in diagnosing irAE in patients using ICI as compared to standard clinical practice. The extended model showed improvement in discrimination, calibration, and net benefit as compared to the base model, indicating that the hematological variables indeed have added value in the diagnostic process of identifying irAE in patients using ICI in the emergency department setting. Our extended model showed better performance as well as calibration over the base model. However, due to the low number of values of the base model and the good predictive performance of the preliminary diagnosis, the predictions of the base model were not equally distributed. The net benefit of the extended model was better than the base model, especially in the therapeutic range around 25%. The exact threshold for the number needed to treat will vary depending on the characteristics of the individual patient and the severity of the symptoms. A false-positive diagnosis of irAE will lead to cessation of the checkpoint inhibitor, which would possibly withhold a life-saving therapy from the patient. On the other hand, a false-negative diagnosis will lead to a delayed treatment for irAE, which is potentially fatal. 21 F I G U R E 1 ROC of the base (red line) and extended (blue line) models test predictions. Predictions on the test folds of the double loop cross validation scheme were concatenated to draw the ROC curves. The black dot denotes the discriminative metrics of the preliminary diagnosis. The diagonal line shows the performance of a random model. Of all variables, the preliminary diagnosis was deemed highly important by both the base and extended models indicating that the first diagnosis of the physician is a very good proxy for irAE diagnosis. Both age and sex showed low importance in the base model and were not selected by the lasso algorithm in any of the 10 DLCV iterations, which is in line with existing evidence. 22 Interestingly, only a few of the 77 hematological variables were selected by the lasso algorithm in each iteration. This diagnostic study cannot not determine causality. However, a causal relationship can be postulated based on the literature. F I G U R E 2 Calibration plots of both the base and extended models. Both calibration curves computed with the number of expected (model predictions) and observed irAE are shown, as well as the 80% and 90% confidence intervals (CI). The segments on the lower part of both plots indicate the computed predictions for each model.  Eosinophiles are thought to play a pathogenic role in auto-immune disorders and are known to be associated with irAE. 6 Neutrophil depolarization is a feature of neutrophil activation, which has also been associated with auto-immunity, but this has not been studied extensively. 23 We found the red blood cell distribution width (rdw) to be negatively associated with irAE. Increased rdw is known to be associated with infections, which are arguably the most likely alternative diagnosis when considering irAE. 24 Our study has some limitations. The population is highly heterogeneous, with multiple types of tumors and treatments. This may have hampered the identification of a specific predictor for a particular subset of patients. Unfortunately, we did not have enough data to stratify patients based on either cancer type or medication. Even though the post hoc group analysis showed significant results for 3 of the 4 identified variables after adjusting for the baseline characteristics, future research is needed to validate these results. Moreover, the diagnoses were retrospectively defined or changed as our data was collected on routine basis.
To our knowledge, this study is one of the first of its kind in exploring the diagnostic potential of these raw and research-only hematological variables using ML in the emergency department setting. Since the raw data from this type of hematology analyzer are not ubiquitously available, we were not able to externally validate our results. As a result, this study has to be viewed as exploratory and more research is required before these hematological variables, either individually or in a model, can be used in clinical practice. The diagnostic performance of such a model might be improved by combining hematological variables with other new sets of biomarkers, as well as the preliminary diagnosis.
This study raises the question if the hematological variables might also have diagnostic value in the setting of other diseases and treatments. [11][12][13] As they are inexpensive and relatively easily and rapidly obtained in general blood counts, they could be an interesting new tool in future diagnostic research. As shown here, a clinical diagnostic model may aid the clinical decision-making process of a physician by providing a continuous prediction score that can be combined with the professional interpretation by a clinical chemist to accommodate integral diagnostics of a patient's clinical state. 25 Instead of looking at differences between patients using cross-sectional data, within-patient differences may be a better approximation of a patient's health trajectory potentially allowing for predicting the incidence of irAE at the start of ICI treatment.
Overall, we show that hematological variables show diagnostic performance in the identification of irAE in patients using ICI at the ED and that they have added value compared to standard diagnostic practice. Our results suggest new directions for further research using (advanced) hematological variables for irAE diagnosis in the emergency setting.

FUNDING INFORMATION
None.

CONFLICT OF INTEREST STATEMENT
MN is employed by SkylineDx, Rotterdam and receives a PhD fellowship from SkylineDx, Rotterdam. KS: Consulting/advisory relationship: Bristol Myers Squibb, Merck Sharp and Dome, Abbvie, Pierre Fabre, Novartis. Honoraria received: Novartis, Roche, Merck Sharp and Dome. Research funding, TigaTx, Bristol Myers Squibb, Philips, unrelated to this project. All paid to institution and outside the submitted work. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

DATA AVAILABILITY STATEMENT
Data from the UMC are available upon reasonable request from the corresponding author (S. Haitjema).

ETHICS STATEMENT
This study was performed in accordance with the Declaration of Helsinki and the ethical guidelines of our institution. The institutional review board of the UMC Utrecht approved this study (reference number 20-591/ C) and waived the need for informed consent as only pseudonymized data were used for this study. Data collection and handling was conducted in accordance with European privacy legislation (GDPR).