Evaluation of the Sysmex DI‐60 digital morphology analyzer on Wright‐stained samples with a focus on prevalence‐dependent quality indicators

This study aims to evaluate the trueness of the DI‐60 Digital Cell Imaging Analyzer on Wright‐stained samples with a focus on prevalence‐dependent quality indicators for differential blood counts requested from non‐hematology wards.

plasma cells).If the pre-test probability of these conditions is increased, manual microscopic processing may be recommended.

K E Y W O R D S
DI-60, differential count, hematology, prevalence, trueness

| BACKGROUND
2][3][4] In addition to manual microscopic differential counting, which is considered the gold standard, automated solutions are meanwhile widely used.Devices based on flow cytometry, in which cells are assigned on the basis of physical, chemical or immunological properties, have been deployed for fairly a long time. 5In addition, digital morphology analyzers are available that use a neural network to recognize and assign cells of a digitalized blood smear. 6,7e such system is the Digital Morphology Analyzer DI-60 (Sysmex, Kobe, Japan).This system can be integrated into a hematology analysis line, thereby enabling time-efficient sequential processing of hematology parameters. 6][10][11][12] Most of the evaluations used selected healthy and pathological samples. 8,13However, the evaluation of clinical performance indicators in particular requires the use of real-world data to evaluate a scenario that most closely resembles clinical practice.
The probability that a certain cell type detected (e.g., a basophil) is indeed a basophil depends not only on the sensitivity and specificity of the diagnostic tool but also on the prevalence that is, the expected frequency of occurrence of the cell.Because basophils are a lowprevalent cell type, the specificity of the test system used to detect it must be near perfect to avoid producing a significant number of false positive results.With a specificity of only 99%, 1 in 100 cells would be misclassified as a basophil.Since the normal range of a basophil is 0%-1%, the false positives could easily outnumber the true positives.
The positive predictive value (PPV), that is, the probability that a cell detected as "basophil" is actually a basophil, would therefore be less than 50% in this example.The PPV would rise with increasing prevalence for example, when testing samples from patients with suspected myeloproliferative neoplasms.This implies that test systems should always be verified under the particular conditions in which they will be operating.
Particularly the staining method seems to have a significant influence on the performance of the DI-60 system, 14 this is not surprising, as it affects the visual parameters of the individual cells and thus influences the matching of detected cells with the database underlying the AI.Most evaluation studies published so far are based on the use of Wright-Giemsa staining 9,15 or May-Grünwald-Giemsa staining. 6,11In contrast, only few data are available for the Wright staining used locally.
The aim of this work was therefore to evaluate the trueness of the DI-60 system for the digital generation of Wright-stained differential blood counts from peripheral blood collected primarily at non-hematology wards at a central European tertiary hospital.Particular attention was paid to prevalence-related quality indicators, such as PPV, negative predictive value (NPV), and accuracy.

| Study design and participants
A total of 299 samples from 284 individuals collected at two different time points (N = 198 consecutive samples at time point T1, N = 101 consecutive samples at T2) could be included in this prospective performance evaluation study of a CE-labelled in-vitro diagnostic device (IVD).Evaluation object was the trueness of the results as specified in CLSI H20-A2.The procedure deviated in the preparation of the manual reference counts, which, to emphasize the real-life aspect, did not consist of 200-cell differentiations prepared by two different technicians but of 100-cell differential blood images.This procedure was discussed before and considered acceptable. 16,17In brief, only samples that were sent to the First Hematologic Laboratory, Department of Laboratory Medicine, Medical University of Vienna, for diagnostic purposes and required a microscopic evaluation were used.Hence, participation in the study was not entailed with any additional risks and it was therefore reasonable to refrain from obtaining informed consent.However, the study protocol was reviewed and approved by

| DI-60 digital morphology analyzer
Digital morphological assessment was performed on a DI-60 digital morphology analyzer (Sysmex, Kobe, Japan) integrated into an XN-9100 hematology analyzer series (Sysmex).For this, blood smears from EDTA-whole blood samples requiring morphological examination are prepared by the SP-100 slide preparation unit and stained onboard using a Wright stain (CellaVision, Lund, Sweden).After this, the slides were evaluated both manually (see Section 2.3) and on the DI-60 system.Cells detected by the DI-60 system were automatically assigned after recognition by the CellaVision software.The allocation was based on parameters specified by the developer applying a neuronal network. 6Subsequently, a manual post-processing took place.In this process, those cells that have not been classified correctly or unambiguously are assigned manually.For parameters of erythrocyte morphology, samples were graded manually on a four-level scale (not present, sporadically present, "+", "++").

| Total blood count and microscopic evaluation
Total blood counts (red blood cell count, hemoglobin, erythrocyte indices, platelet count, white blood cell count, differential count) were performed on Sysmex XN-hematology analyzers.Manual microscopic evaluation (100 cell differential counts) was performed by trained biomedical technicians a) if explicitly requested by the clinician, or b) if an automatic differential count on the Sysmex XN was requested but quality-flagged by the analyzer (flags "positive morph", "positive diff", or "positive count").Parameters of the erythrocyte morphology were graded on a five-level scale (not present, sporadically present, "+", "++", "+++").All analyses were performed at the Department for Laboratory Medicine, Medical University of Vienna, in an ISO 9001:2015-certified and ISO 15189:2012-accredited environment.

| Sample characteristics
One hundred and fifty-six (52%) of the samples were collected from female patients, the median age of the patients was 56 (41-67) years.
Samples were sent from the following departments/wards: medical Note: Sensitivities were calculated as the number of true positives (TP) divided by all visually re-classified events of the respective category (TP + false negatives [FN]).Specificities were calculated as the number of true negatives (TN) divided by the sum of all events except for the category under investigation (TN + FP).Positive predictive values (PPVs) and negative predictive values (NPVs) were calculated based on the prevalences present in the data set (events in cell category divided by N of all events, which was 49 934 comprising classifiable events and 7994 artefacts).Accuracies are defined as the percentage of all correct classifications (TP + TN) by the N of all events.Abbreviations: LGL, large granular lymphocytes; WBC, white bood cells.

| Comparison of visually adjusted DI-60 results and manual-microscopic differential counts
In the next step, these visually adjusted differential counts from the DI-60 were compared to manual-microscopic differential counts performed from the same sample.Agreements were compared by Passing-Bablok regressions (continuous data, see Figure 1) as well as by Cohen's Kappa (dichotomized data).Due to high expected inter-rater variability, band and segmented neutrophils were combined into "neutrophils" for the following analyses.Moreover, "lymphocytes" comprise visually unsuspicious lymphocytes as well as atypical lymphocytes and LGLs.As only a few granulocyte precursors and blast cells were seen, these were combined into the category "precursos and progenitors".Spearman's ρ, as a measure of variability, was >0.9only for neutrophils and lymphocytes.The highest variability was recorded for basophils, with a ρ of 0.29 for absolute and of 0.38 for relative counts.
The agreement in the detection of the clinically relevant categories (see: Section 2.4) between adjusted DI-60 counts and manualmicroscopic differential counts can be derived from Table 2.In brief, the agreement was good for the detection of neutropenia, eosino-

| DISCUSSION
In the on-site evaluation of diagnostic test systems, particular consideration must be given to local conditions.One of the most relevant parameters in this respect is the individual prevalence of diseases and T A B L E 2 Interrater-agreement (Cohen's Kappa and its 95% confidence interval [CI]) between visually adjusted DI-60 counts and manual-microscopic differential counts was calculated for five defined conditions (neutropenia: neutrophils <1.5 Â 10 9 /L, lymphocytosis: lymphocytes >4.0 Â 10 9 /L, monocytosis: monocytes >1.0 Â 10 9 /L eosinophilia: eosinophils >0.5 Â 10 9 /L and basophilia: basophils >0.2 Â 10 9 /L).clinical presentations, as it influences the pre-test probability for the diagnostic findings.
PPVs, NPVs, as well as the overall accuracy (=1 À error rate [%]), for example, depend on the actual prevalence of the respective cell category in our data set.When comparing AI-classified results with visually adjusted results on the DI-60, the PPV describes the probability of an AI-classified cell to be a true positive that is confirmed to be the respective cell after visual adjustment.Lowprevalent cell types require extraordinarily high specificities in order not to be misclassified, as the low number of true positives might be easily outnumbered by only sparse false-positives.Moreover, for low-prevalent cell categories, a high specificity leads to a lower error rate and, thus, higher accuracies, whereas cell categories with a high prevalence have to be identified with an appropriate sensitivity to achieve a low total error rate.At these low prevalence rates, the number of false positives (1-specificity) contributes more to the total error than the already expectedly low number of false negatives (1-sensitivity).Under this premise, it can be explained why basophils, which are low-prevalent (0.6% of all events) and have a much better sensitivity than non-atypical lymphocytes (97.9% vs. 92.1%),yield at an identical specificity of 99.3% only a slightly better overall accuracy (99.3% vs. 98.3%).Likewise, despite better sensitivities and specificities than neutrophils, the probability of AI-classified eosinophils and basophils of being a false positive is much higher (27.6% and 54.9% vs. 1.4%).Concludingly, the AI-driven classification of cells might give a good overall impression of the differential count but might be erroneous if small changes in the percentage of lowprevalent cells are to be identified.
The results of visually adjusted DI-60 counts were largely consistent with those of manual-microscopic differential counts.For all cell categories except for progenitors/precursors and normoblasts, Passing-Bablok regressions revealed insignificant or diagnostically irrelevant systematic additive or proportional errors.However, some cell categories, namely basophils and monocytes, presented with a considerable variability, yielding a Spearman's ρ below 0.7.Furthermore, a discrepancy in automated compared to manual differential count was observed within segmented and band neutrophils.Due to observer dependency in classification of band neutrophils, summarization of neutrophils was, therefore, considered to be more reliable.In fact, the band neutrophil count was found to be of little value already years ago, especially in the diagnosis of bacterial infections, which may justify our approach. 18reover, the DI-60 tends to underestimate progenitor and precursor cells compared to manual-microscopic counts.This could possibly be the result of the restricted field of view of the digital morphology analyzer.During the preparation of the blood smear, progenitor and precursor cells often collect at the margins of the smear due to their size and the associated physical properties.However, for technical reasons, these margins cannot be inspected by the DI-60, which can lead to a slightly different distribution of cells within the differential count.
These findings are widely consistent with the previous literature, which also reported limitations in the correct automated identification of basophils, eosinophils, certain precursors and, to some extent, of monocytes, as well as a higher variability between adjusted DI-60 counts and manual-microscopic differential counts for these lowerprevalence cell categories.Note: N: number of samples, in which the respective parameter was at least sporadically detected by manual microscopy.% Agreement: percentage of cases, in which both methods graded identically (manual-microscopic grades "+++" and "++" were combined, as the DI-60 features only three distinct positive grades").

the
Ethics Committee of the Medical University of Vienna (EK 1689/2022).The First Hematologic Laboratory receives samples from all departments of the General Hospital of Vienna.Exempt from this are, among others, submissions from the Pediatric Department and the Department of Medicine I (harboring the Divisions of Hematology, Bone Marrow Transplantation and Infectiology) on working days until 15:00, which are processed elsewhere.The following inclusion-and exclusion criteria were applied: 2.1.1 | Inclusion criteria 1. Consecutive EDTA-whole blood (collected at two different time points) sent to the First Hematologic Laboratory at the Department of Laboratory Medicine, Medical University of Vienna for routine blood counts 2. Microscopic differential count either requested by the clinician or microscopic evaluation required according to lab guidelines 2.1.2| Exclusion criteria 1. Insufficient material 2. Total WBC count <1.0 Â 10 9 /L (since no manual differential count is performed routinely according to the standard operating procedures at the Department of Laboratory Medicine, Medical University of Vienna, in samples with WBC counts <1.0 Â 10 9 /L) 3. Automatic differential count not possible (each <100 cells detected in two successive counting attempts)

2. 5 |
Statistical analysisContinuous data are presented as median (interquartile range), categorical data as counts (percentages).Paired continuous data were compared by Passing-Bablok regressions.A significant intercept of the regression line indicates a systematic additive error, whereas a significant slope might be related to a systematic proportional error.If applicable, intercepts and slopes are given together with their 95% confidence interval (CI).Agreements of paired categorical observations were evaluated by Cohen's Kappa with quadratic weights.

3. 4 |
Agreement between DI-60 results and manual-microscopic differential counts depending on Sysmex flags Subsequently, it was aimed to investigate whether the aforementioned suboptimal agreements could be enhanced if Sysmex XN-2000 flags were accounted for.If only those samples that lacked the Sysmex flag "Monocytosis" were included (N = 293), Cohen's Kappa for monocytosis remained largely unchanged (0.47 [0.28-0.67]).Likewise, F I G U R E 1 Passing-Bablok regressions for relative and absolute numbers of white blood cells (WBC), as well as for relative numbers of progenitor/precursor cells and normoblasts compared between visually adjusted DI-60 counts (vertical axes) and manual-microscopic differential counts (horizontal axis).the detection of precursors and/or progenitors was not improved after dichotomizing for those with or without a sysmex flag "IG (immature granulocytes) present" (0.24 [À0.04 to 0.53], N = 59 with positive flag; 0.19 [0.04-0.33,N = 240 with negative flag).Similarly, exclusion of those with a positive flag "Basophilia" did not increase Cohen's Kappa for the detection of basophilia (0.28 [0.04-0.51],N = 296).
Quality criteria for automatic classifications of the DI-60 system compared to the results after visual adjustment.
3.2 | Comparison of the DI-60 raw values and manual re-classified differential countsAs mentioned above, the DI-60 software classifies cells applying pre-defined AI algorithms.Subsequently, misclassifications are to be corrected manually.Of the 49 934 recorded events (WBCs, normoblasts, giant thrombocytes and platelet aggregates, smudge cells and artefacts), 41 118 (82.3%) were correctly classified by the AI algorithms.A confusion matrix is provided in the supplement (see TableS1).Specificities were high for all cell categories (>98%-99%), except for segmented neutrophils yielding a specificity of 96.4%.However, most false positives (771/994, 71.4%) were visually adjusted as band neutrophils.When classifying band and segmented neutrophils together as "neutrophils", the specificity of this new category increased to 98.8%.T A B L E 1 Interrater-agreement (DI-60 visually adjusted vs. manual-microscopic erythrocyte morphology) for different morphological erythrocyte parameters.
T A B L E 3