Expert-level automated malaria diagnosis on routine blood films with deep neural networks

Over 200 million malaria cases globally lead to half a million deaths annually. Accurate malaria diagnosis remains a challenge. Automated imaging processing approaches to analyze Thick Blood Films (TBF) could provide scalable solutions, for urban healthcare providers in the holoendemic malaria sub-Saharan region. Although several approaches have been attempted to identify malaria parasites in TBF, none have achieved negative and positive predictive performance suitable for clinical use in the west sub-Saharan region. While malaria parasite object detection remains an intermediary step in achieving automatic patient diagnosis, training state-of-the-art deep-learning object detectors requires the human-expert labor-intensive process of labeling a large dataset of digitized TBF. To overcome these challenges and to achieve a clinically usable system, we show a novel approach. It leverages routine clinical-microscopy labels from our quality-controlled malaria clinics, to train a Deep Malaria Convolutional Neural Network classifier (DeepMCNN) for automated malaria diagnosis. Our system also provides total Malaria Parasite (MP) and White Blood Cell (WBC) counts allowing parasitemia estimation in MP/ μ L, as recommended by the WHO. Prospective validation of the DeepMCNN achieves sensitivity/specificity of 0.92/0.90 against expert-level malaria diagnosis. Our approach PPV/NPV performance is of 0.92/0.90, which is clinically usable in our holoendemic settings in the densely populated metropolis of Ibadan. It is located within the most populous African country (Nigeria) and with one of the largest burdens of Plasmodium falciparum malaria. Our openly available method is of importance for strategies aimed to scale malaria diagnosis in urban regions where daily assessment of thousands of specimens is required.

learning object detectors requires the human-expert labor-intensive process of labeling a large dataset of digitized TBF.To overcome these challenges and to achieve a clinically usable system, we show a novel approach.It leverages routine clinicalmicroscopy labels from our quality-controlled malaria clinics, to train a Deep Malaria Convolutional Neural Network classifier (DeepMCNN) for automated malaria diagnosis.Our system also provides total Malaria Parasite (MP) and White Blood Cell (WBC) counts allowing parasitemia estimation in MP/μL, as recommended by the WHO.
Prospective validation of the DeepMCNN achieves sensitivity/specificity of 0.92/0.90against expert-level malaria diagnosis.Our approach PPV/NPV performance is of 0.92/0.90,which is clinically usable in our holoendemic settings in the densely populated metropolis of Ibadan.It is located within the most populous African country (Nigeria) and with one of the largest burdens of Plasmodium falciparum malaria.Our openly available method is of importance for strategies aimed to scale malaria diagnosis in urban regions where daily assessment of thousands of specimens is required.

| INTRODUCTION
Plasmodium falciparum malaria remains one of the greatest global health burdens with over 219 million cases globally in 2017. 1 It is a widely prevalent disease, especially ubiquitous in parts of sub-Saharan Africa.In 2017 there were approximately 435 000 deaths due to malaria worldwide, with the African region accounting for 93% of these deaths, mostly among children. 1 Early diagnosis is important for reducing the mortality rate due to malaria.Although there are a range of techniques that have been developed for the diagnosis of malaria, 2,3 conventional light microscopy on Giemsa-stained thick and thin blood films remains the gold standard. 1Techniques such as polymerase chain reaction, flow cytometric assay 4 and fluorescence-dye based 5 approaches lack a universally standardized methodology, present high costs, and require quality control improvement. 2While some of these approaches have shown promising results independently, they require infrastructure (eg, cold chain logistics for preservation of reagents) which makes them poorly suited for the resource-constrained sub-Saharan African region.Other methods based on lateral flow assays, known as malaria rapid diagnostic tests, are not ubiquitous in all settings, do not provide estimates of parasitemia and have not been able to outperform the well-established TBF clinical microscopy for malaria diagnosis. 6ick Blood Film (TBF) microscopy remains the internationally recognized gold standard. 1Thick blood film clinical microscopy requires a trained human microscopist to visually inspect Giemsa stained blood films under a light microscope, to identify and count the P. falciparum parasites.Unfortunately, visual inspection of thick blood films strongly relies on the availability of trained personnel, and it is time-consuming and subject to human error caused by fatigue and cognitive overload in busy clinical-microscopy services.As with other visual based diagnostic techniques, accuracy depends on individual technician performance, which makes standardization difficult and reliability poor. 7A wrong diagnosis of malaria can have negative consequences for patients and for anti-malarial therapy resources.Additionally, shortcomings in the availability of trained personnel in certain regions of the world can lead to over-treatment, which subsequently leads to parasite resistance.
The World Health Organization (WHO) has persistently encouraged the development of rapid and efficient diagnostic testing that will allow proper treatment to be given on time.Here we address the problem of automated diagnosis in color brightfield digitized images of Giemsastained thick blood films captured with a 100x/1.4N.A. oil-immersion objective lens.The thick blood film, a concentration technique, is desirable for analysis compared to thin blood smears or red-cell monolayers, because a larger volume of blood is examined, and thus potentially higher parasite density per image field providing greater sensitivity.
Although a series of classical computational vision and machine learning approaches have been used to identify various types of malaria parasites in digitized thin Giemsa-stained blood smears 8,9 or fluorescence-dye based red-cell monolayer, 5 only a few have attempted parasite detection in digitized thick blood films. 10,11More recently, some studies have attempted to use deep learning classifiers [12][13][14] to distinguish malaria parasites from staining artifacts, which remains a challenge. 15vances in deep learning methods for object detection in natural images 16 offer great potential for malaria parasite detection in blood films. 17However, training of such object detectors involves an extremely laborious process.Human experts label ring malaria parasites in large numbers of field-of-views from digitized thick blood films.Moreover, parasite object detection remains an intermediary step in achieving automatic patient diagnosis, which requires the analysis of multiple fields-of-view (FoV) of the TBF.The inherent parasite false positives detected by the computer vision approaches need to be taken into consideration when establishing such a final diagnosis.
To the best of our knowledge, only one group has attempted to do this on samples from patients admitted in clinics for malaria testing. 12,13Briefly, their approach consisted in classifying a positive malaria sample, if the number of parasites detected by a deep learning model in 300 FoV surpassed a certain empirically determined threshold.Such an approach is likely to misclassify samples with low parasite counts.
To overcome these challenges and achieve a clinically usable Positive and Negative Predictive (PPV/NPV) performance, here we show a novel way to leverage routine clinical-microscopy diagnostic labels.They are from our quality-controlled malaria clinics, and will be used to train a Deep Malaria Convolutional Neural Network classifier (DeepMCNN) suitable for automated malaria diagnosis.We prospectively validate the DeepMCNN against expert-level diagnosis and assess its performance across the all-year-round malaria context of our clinical healthcare settings.This is in the densely populated metropolis of Ibadan, located within the most populous country of Africa (Nigeria), with one of the largest burdens of P. falciparum malaria.

| Study site
All study participants were recruited under the auspices of the Childhood Malaria Research Group (CMRG) at the 850-bed tertiary hospital, University College Hospital (UCH) in the city of Ibadan, Nigeria, in west sub-Saharan Africa.Ibadan is a densely populated urban metropolis in Nigeria with about 5-million inhabitants.Malaria transmission and severe disease occur throughout the year.Although severe malaria syndromes are predominant in children under 5 years, there is still a large burden of severe disease in children up to 16. [18][19][20]

| Malaria screening
Malaria parasites (MPs) were detected and counted using humanexpert operated microscopy following Giemsa staining of thick and thin blood films.The criterion for declaring a participant to be malaria parasite-free was no detectable parasites in 100 high-power (100x) fields in thick films.We validated the diagnosis outcome by randomly selecting one in ten thick blood films for independent review, by local external experienced senior malaria-microscopy technologists.Parasite density (PD), malaria parasites per microliter (MPs/μL), are calculated by dividing the number of observed MPs by the number of counted white blood cells (WBC), and then multiplied by 8 × 10 3 . 15

| Data acquisition and pre-processing
We captured images using an upright brightfield microscope (Olympus BX63) fitted with a 100X/1.4NA objective lens (MPLAPON100XO), a motorized x-y sample positioning stage (Prior Scientific) and a color camera (Edge 5.5c, PCO) to capture images of Giemsa-stained, thick blood smears prepared in our clinics.For each sample we captured 100 non-overlapping FoV, each covering an area of 166 μm × 142 μm.Such large numerical aperture object lenses have limited depth of field.To capture the entire thickness of the blood film (typically $5 μm) a z-stack of 14 focal planes with a separation of 0.5 μm was captured for each field.With a camera exposure time of 5 milliseconds the total acquisition time per sample was approximately 5 minutes (Figure S1 in Appendix S1).To reduce the data volume and render images into a form more amenable to annotation, after white balancing, z-stacks were projected onto a single plane using a wavelet-based Extended Depth of Field (EDoF) algorithm. 21iefly, each focal plane was decomposed using a 12 level "sym8" wavelet, and for each level and sub-band the coefficients with the maximum values were chosen among the 14 decomposed focal planes.Following a spatial and a sub-band consistency check, the inverse wavelet transform was applied to the selected coefficients.

| Parasite and white blood cell detection
We tested the use of deep learning-based object detection methods to identify both P. falciparum parasites and white-blood-cell (WBC) nuclei in the digitized EDoF thick blood films images.Current state of the art deep learning object detectors usually follow two stages: first a sparse set of region proposals that should contain all foreground objects are generated while excluding most of the background locations. 22Next, these proposals are fed to a CNN providing each region with a class label probability and a refined bounding box. 235][26] In contrast, simpler and faster one-stage detectors 27 are applied over a regular, dense sampling of possible foreground object locations.Among these detectors, RetinaNet 28 exceeded the performance of previous two-stage approaches thanks to a focal loss function aimed to give more attention to difficult examples.
We trained and tested three of these state-of-the-art object detectors: Faster R-CNN 25 ; R-FCN 29 ; and RetinaNet. 28

| Automated diagnosis with negative adjustment
In this previously suggested approach, 12,13,30 only the negative samples from the training set (Table 1) were used.The trained RetinaNet parasite detection model described in the previous section was applied to these samples.Next, the average number of false positives per 100 image fields mean fp and its SD std fp were computed.Further on, a threshold θ computed using the mean fp and std fp values was applied to the test samples for diagnosis.where  In more detail, we obtained a variable number of potential malaria parasites (Np) from each sample 100 FoVs (Figure 1).These were then cropped from the FoV using a 64 × 64 pixel window corresponding to 4.2 × 4.2 μm which is large enough to encompass a malaria ring parasite (3 × 3 μm).A VGG-19 model 31 was trained to classify these stacks of potential parasite images as positive or negative.The weights of the convolutional layers were initialized with weights from a VGG-19 model pre-trained on the ImageNet dataset. 32For each stack of variable images, the Np features vectors corresponding to the input of the fully connected layers are averaged into one single feature vector (Figure S3 in Appendix S1).This allows a variable number of potential parasite images as an input for the MCNN classifier.
Mathematically, the classification problem can be re-formulated as described in the next paragraph.In general, the outcome of a CNN classifier can be written as where D represents the output label and I is the image to be classified.
A batch size of 1 is assumed for simplicity, that is, one input image The Deep Malaria Convolutional Neural Network (DeepMCNN) diagnostic classifier approach architecture.DeepMCNN leverages routine clinical microscopy human-expert diagnostic labels to provide expert level malaria diagnosis of a thick blood film specimen.EDoF, extended depth of field; FOV, field of view; MP, malaria parasite; Parasitemia MP/μL, parasitemia in malaria parasites per microliter; WBC, white blood cell.
and one output label.So, M is the CNN transformation of the input image until the softmax layer and can be written as: With C I ð Þ the flattened output of the convolutional layers (CNN feature vector) and W i and b i the weights corresponding to the fully connected layers.To accommodate a variable number (Np) of input images corresponding to one single label, M was modified as follows: This is equivalent to an average pooling of the feature vectors.
Equation ( 2) becomes: Where D is the patient diagnostic (malaria positive or negative) after inspecting Np potential parasites from 100 FoV.A gradient descent optimizer with a fixed learning rate of 0.0003 and a cross entropy loss function were chosen to optimize the CNN weights.where mp detected and wbc detected represent the number of MP and WBC, respectively, detected by the object detector.This formula assumes on average 8000 WBC per μL.The predicted parasitemia was compared to the parasitemia computed with the human MP/WBC count.The diagnosis performance of the DeepMCNN on the validation set is shown in Table 1 and Figure 2. We benchmarked our DeepMCNN automated diagnostic method to a previously proposed method referred to as Negative Adjustment 30 (NA), as described in the methods section and Figure S4 in Appendix S1.The NA detection threshold θ in Equation ( 1) was estimated at 177 MPs per 100 FoVs for a specificity on the train set (Table S1 in Appendix S1) of 0.9.Our DeepMCNN achieves a sensitivity of 0.92; a specificity of 0.90 and an accuracy of 0.91 on the validation set with PPV/NPV of 0.92/0.90,outperforming the NA approach (Table 1).The trained DeepMCNN outputs a higher sensitivity (0.92) than the NA approach (0.66) for a specificity, equal or higher than 0.9 (Table 1).

| Study participants, datasets and annotations
To explore the clinical utility of DeepMCNN we calculated PPV and NPV for malaria prevalence values ranging from zero to one, and compared to that of NA (Figure 2A).DeepMCNN NPV clearly outperforms the NA approach (Figure 2A red-line).Moreover, DeepMCNN PPV and NPV performance across these prevalence ranges makes it usable in a wide range of clinical settings.
To evaluate PPV/NPV performance in relation to our Ibadan holoendemic (all-year-round) setting, we calculated PPV and NPV using the actual mean monthly prevalence obtained from our large clinical settings, serving five million inhabitants of the city of Ibadan in the sub-Sahara (Figure 2B).The mean monthly prevalence data (Figure 2B dotted line) is calculated from our large database over a five-year period from 2014 to 2019, and therefore represents an accurate and current snapshot of the burden of malaria in our clinical settings.DeepMCNN clearly shows NPV of over 0.9 across all months which is clinically usable in our settings (Figure 2B red-line).
On the contrary, the NA approach falls below 0.9 during the long Ibadan rainy season (Figure 2B red-line) which hinders its utility in sub-Saharan settings.
Looking closer at the classification of the positive samples (Table S2 in Appendix S1), the NA method 30 completely misses all the low parasite count samples (less than 160 MP/μL).However, our DeepMCNN classifies 0.75 of these as positives, for a diagnostic specificity ≥0.90 (Table 3).In medium (160 to 1600 MP/μL) and high (>1600 MP/μL) parasite densities DeepMCCN has sensitivity greater than 0.9 also clearly outperforming the NA method (Table S2 in Appendix S1).
Figure 3A shows that DeepMCCN automated patient diagnosis is achieved by assessing a median of well above 1000 WBC per 100 FoV, for both malaria positive and malaria negative specimens.
For the vast majority of specimens more than 500 WBCs were assessed to achieve patient diagnosis and parasite counts (Figure 3A and Figure S3 in Appendix S1).This is twice the required WHO sampling protocol for the human expert microscopist.We then compared the estimated DeepMCNN parasitemia (see methods) against the computed parasitemia, using the manual count reported by the human-expert microscopist, across a range of low, mid and high parasitemia (Figure 3B).In patients with high parasitemia, our approach agrees closely with the human-expert estimates.In those patients with low and mid parasitemia our approach overestimates the parasite densities (Figure 3B).While some approaches attempt to validate automated detection of malaria using the polymerase chain reaction (PCR) as reference, 33 we evaluated our approach against human-expert Thick Blood Film microscopy, since it remains the internationally accepted and realizable gold standard in sub-Saharan regions.Similarly, our diagnosis and parasitemia estimation follows well accepted WHO protocols in the region.In large urban holoendemic settings such as ours, healthcare providers often lack the capacity to carry out TBF microscopy every six hours, once malaria treatment has commenced, which is required in severe malaria clinical pathways.Our automated approach could further facilitate the healthcare provider to process follow-up TFB to support these clinical pathways.
Overall, our DeepMCNN approach provides better accuracy in terms of diagnosing samples as malaria positive or negative compared to the NA approach. 30The NA approach classifies a sample as malaria positive, if the number of detected parasites exceeds a specific threshold determined empirically on a hold-out set of negative samples, so that the specificity on that set exceeds 0.9.Our experiments show that the NA method misdiagnoses samples with low to mid parasite densities where the number of overall detections in 100 image fields is below the decision threshold.In contrast, our DeepMCNN approach does not have this limitation as it does not rely on a decision threshold.
From the clinical point of view, it is generally accepted that in any child with fever, malaria diagnosis is so important that a false-positive is better than a false-negative.Our DeepMCNN achieves NPV consistently greater than 0.9 across all months in the Ibadan settings rendering the system well-suited to provide pediatric clinical pathway support.This is reinforced by the DeepMCNN PPV performance observed during Ibadan's lengthy rainy season.Furthermore, its performance at low parasitemia levels of less than 160 MP/μL is well suited to handle the adult population in high-transmission West sub-Saharan regions, whicht are more likely to have low to asymptomatic parasitemia.
Our DeepMCNN system provides a WBC count which, together with the MP count estimation, is used to determine a patient's diagnosis and parasitemia according to the WHO recommendations.With a median of more than 1000 WBCs observed, our approach is well above the recommended 500 WBCs required in low-parasitemia specimens.
Parasite densities estimates produced by both, human-expert and deep-learning system have their own drawbacks.The human expert is subject to cognitive load when counting objects over a large number of FoV, while the automated approach is limited by the ability of the object detector to discard staining artifacts.Taking this into account, our method overestimates parasite density, when compared to the human-expert, in low and mid parasitemia specimens.However, the human expert is prone to fatigue and as a consequence their counting accuracy might fluctuate over time. 7In contrast, our method consistently uses the parasite detection accuracy to adjust the parasite density estimate.We are of the opinion that this leads to a more robust estimation of the patient's parasite density.
Patient level human-expert diagnostic labels routinely produced by our malaria clinical microscopy services are exponentially easier to obtain than object level labels from digitized blood films.Our study shows that our strategy does deliver a deep-learning system, that is capable of handling the burden of malaria disease observed in a large The internationally recognized ethics committee at the Institute for Advanced Medical Research and Training (IAMRAT) of the College of Medicine, University of Ibadan (COMUI) approved this research.It is on the platform of the Childhood Malaria Research Group (CMRG) within the academic Department of Pediatrics, University of Ibadan.It is also at school and Primary Care centers throughout the city of Ibadan, with permit numbers: UI/EC/10/0130, UI/EC/19/0110.Parents and/or guardians of study participants gave informed written consent in accordance with the World Medical Association ethical principles for research involving human subjects.
Our expert microscopists annotated a total number of 239 EDoF FoV containing 2986 MP and 1272 WBC nuclei (Figure S2 in Appendix S1).Two thirds of the annotated FoV were used to train the object detector models while the rest was used for evaluation (Figure S2 in Appendix S1).These image fields were obtained from 13 unique blood films.Geometrical transformations were applied "on the fly" during training to the image fields to augment the training dataset.At each iteration the image fields were rotated by a uniformly random angle between 0°and 270°.Additionally, the resulting rotated image would be randomly flipped vertically, horizontally or not at all.An example of MP and WBC RetinaNet detections in a full FoV is shown in Figure S5 in Appendix S1.The dataset is available at https://doi.org/10.5522/04/12173568under open licence CC BY-NC-SA 4.0.
mpd = number of potential MP detected θ = mean fp + α Á std fp ( with D the automated diagnostic (malaria positive or negative) and α ɛ [0,2] a sensitivity parameter.That is, a sample was classified positive if the number of parasites detected in 100 image fields was larger than the threshold θ (Figure S4 in Appendix S1).

2. 7 |
The deep malaria CNN classifier (DeepMCNN)Here we propose a novel approach to leverage routine clinicalmicroscopy labels from our malaria diagnosis clinics.We trained the DeepMCNN (Figure1and FigureS3in Appendix S1) classifier as follows.First, the RetinaNet was applied to each of the 100 FoVs obtained from each sample from the training set (TableS1in Appendix S1).Second, stacks of detected MP regions extracted from each image field are then used together, with the human-expert clinical-microscopy diagnostic (malaria-positive or malaria-negative) label, to train the DeepMCNN classifier (Figure1and FigureS3in Appendix S1).

Once a patient sample
has been classified as positive, the patient parasitemia was estimated in the following manner: Let re mp and pr mp , re wbc and pr wbc be the recall (or sensitivity) and precision (or positive predictive value) of the object detector for the MP and the WBC respectively on the test image fields.The patient parasitemia pp (MP/μL) computation according to the WHO recommendation15 was adjusted: pp = 8000 Á mp detected Á pr mp re mp Á re wbc wbc detected Á pr wbc: Training and validation data used for DeepMCNN automated patient diagnosis is described in TableS1in Appendix S1.Each Thick Blood Film (TBF) corresponds to an individual with a total of 169 in the training set and 130 in the validation set.The training set is comprised of 84 malaria-positive and 85 malaria-negative TBFs, each with 100 EDoF fields of view (Table S1 in Appendix S1).The validation set contains 60 malaria-positive and 70 malaria-negative TBFs each with 100 EdoF fields of view (TableS1in Appendix S1).Malaria-positive thick blood films have a range of parasitemia from 60 to 10 5 MP/μL.All the specimens have been collected and prepared at our qualitycontrolled malaria clinics and assessed by our expert microscopists and clinicians.

3. 2 |
Automated malaria diagnosis with the deepMCNN classifier Malaria parasite detection in an individual FoV from a TBF only represents an intermediate step in achieving patient final malaria diagnosis.To achieve patient level diagnosis, we proposed and trained (see methods section) a novel Deep Malaria Convolutional Neural Network (DeepMCNN).It leverages routine clinical microscopy labels from our malaria diagnosis clinics, to achieve an automated final diagnosis by assessing 100 FoVs (Figure 1 and Figure S3 in Appendix S1).

4 |
DISCUSSIONPrompt, reliable and accurate malaria diagnosis is a challenge for healthcare providers servicing large urban metropoles in holoendemic malaria settings, such as the one presented in this work.Leveraging both the well-established malaria diagnosis gold-standard, and deeplearning image processing approaches, could provide automated scalable solutions amenable to be deployed in these clinical settings.However, the bottleneck of every deep learning-based approach is the lack of sufficient annotations.Obtaining a large number of accurate object-level human-expert annotations of malaria parasites is extremely time consuming and immensely laborious.To overcome these challenges and to create a deployable clinically usable automated diagnosis system, here we show that routine clinical-microscopy human-expert diagnostic labels could be leveraged to train a Deep Convolutional Neural Network.It achieves NPV and PPV performance suitable for clinical services within Ibadan, a densely populated metropolis located in west sub-Sahara Nigeria, where malaria is prevalent all year.

F I G U R E 3
Deep MCNN WBC counts and malaria parasite density estimation.A,DeepMCNN total number of WBC assessed per sample 100 field of views.WBC, white blood cell; EDoF, extended depth of field; FoV, field of view; MP, malaria parasite; MP (+ve), malaria parasite positive; MP (−ve), malaria parasite negative.Violinplot horizontal line = median; violin-plot horizontal dottedlines = inter-quartile range.B, Scatterplot of estimates of parasite densities by human-expert vs DeepMCNN estimates.X-Y axes parasite densities in MP/μL, parasitemia in malaria parasites per microliter.R 2 = 0.55; 95% CI = [0.48-0.73] Plasmodium falciparum holoendemic setting.Our open data and easily deployable DeepMCNN provide a clinically relevant platform, where other healthcare providers could harness their readily available patient level diagnostic labels, to tailor and further improve the accuracy of the DeepMCNN classifier for their clinical pathway settings.In turn, this should increase their quality, allowing them to process large number of blood films as required in large urban holoendemic malaria sub-Saharan settings.Further investment in research and development is needed to make advances in deep learning assisted diagnosis accessible in peri-urban and rural settings.