An open‐source, expert‐designed decision tree application to support accurate diagnosis of myeloid malignancies

Abstract Accurate, reproducible diagnoses can be difficult to make in haemato‐oncology due to multi‐parameter clinical data, complex diagnostic criteria and time‐pressured environments. We have designed a decision tree application (DTA) that reflects WHO diagnostic criteria to support accurate diagnoses of myeloid malignancies. The DTA returned the correct diagnoses in 94% of clinical cases tested. The DTA maintained a high level of accuracy in a second validation using artificially generated clinical cases. Optimisations have been made to the DTA based on the validations, and the revised version is now publicly available for use at http://bit.do/ADAtool.


INTRODUCTION
National institute for health and care excellence (NICE) guidance recommends multi-disciplinary meetings (MDMs) as best practice in the diagnostic work-up of suspected haemato-oncology patients. Compared to some areas of medicine, haematological malignancies are already a very well-studied discipline. A sizeable proportion of treatments are based on randomised controlled trial data, and with regard to the diagnostic criteria, the commonly used WHO Classification of Tumours is supported by 4500 references [1,2].
Despite the wealth of information to support clinicians, making accurate diagnoses in a busy MDM can be difficult-this same abundance of information can be difficult to navigate, is subject to periodic updates and requires the integration of multiple sources of data across different platforms (clinical, morphological, genetic, radiological), all in a time-pressured environment [3]. Poor documentation may lead on to inefficiencies of clinical service, incorrect treatment strategies and poor quality of data for local and national audit [4].
Technology is widely recognised as offering an opportunity to improve the accuracy and efficiency of diagnostics in medicine [5,6].
These include big data analyses of hospital records, artificial intelligence or automated algorithms to transform how we care for patients.
The use of algorithms is not new in diagnostic healthcare; they help to standardise how a diagnosis is reached and are widespread in everyday clinical practice [7]. Typically, these algorithms are described and shared as text documents or flowcharts and are designed for human use, not for automated or computational use [8,9]. These formats are also not amenable to automated testing or verification. However, many of the diagnostic criteria in the WHO are clearly set out and therefore give the possibility of designing a digitised algorithm to support clinicians with the complexity of the diagnostics.
We have previously shown that digital solutions can improve the accuracy of MDS-subtyping in a single-centre retrospective study [10]. This multi-centre study set out to validate the DTA by independent testing. It shows our DTA is accurate and can improve precision in the diagnosis of commonly occurring myeloid malignancies.

METHODS
The DTA was designed using the open source platform esyN and is publicly available as a web application [11]. Accuracy and F1 scores (the harmonic mean of precision and recall) were calculated using scikit-learn 0.20.3 in Python 3.7.0.

RESULTS AND DISCUSSION
Sixty-two cases of myeloid malignancy were logged in the DTA from To further test the DTA, 108 randomly generated clinical cases that covered all of the major branches and diagnoses within the DTA were selected for validation by pairs of clinicians working independently.
Eight cases were subsequently removed for being implausible. After independent review, the DTA was found to be incorrect in 15 of 100 cases, giving an overall accuracy of the DTA of 85%. The average accuracy for the individual doctors was 53% (range 43%-71%).
Adjustments to the DTA were made to reflect the feedback from The accuracy of the DTA was lower when processing randomly generated cases compared to genuine clinical cases. The case selection was designed to test the decision boundaries between diagnoses; so many cases only had small differences between them, or nearly fit the criteria for an alternative diagnosis. Feedback from clinicians assigning diagnoses to these cases was that there was more heterogeneity of clinical features than normally found in routine clinical practice. These factors may explain the lower than expected accuracy of the individual doctors.
The ability of the DTA to perform well in this context suggests that the algorithm is durable even with artificially generated, atypical presentations.
The review process of the artificially generated cases demonstrated some clinical scenarios where consensus was difficult to reach. For

DATA AVAILABILITY STATEMENT
Source results from patient records used at all sites in the study will not be available due to inability to safely fully anonymise up to the Information Commissioner Office (ICO) standards given the highly sensitive and granular nature of the data (e.g. blood results, genetic tests, diagnoses etc.). A subset of the data that support the findings of this study is available on reasonable request to the corresponding author.