Guidelines for clinical trials using artificial intelligence – SPIRIT‐AI and CONSORT‐AI†

The rapidly growing use of artificial intelligence in pathology presents a challenge in terms of study reporting and methodology. The existing guidelines for the design (SPIRIT) and reporting (CONSORT) of clinical trials have been extended with the aim of ensuring production of the highest quality evidence in this field. We explore these new guidelines and their relevance and application to pathology as a specialty. © 2020 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.

Although the word 'revolution' is somewhat overused in technology circles, the recent leap in performance of artificial intelligence (AI) systems surely does justify the term. Driven by advances in a particular type of neural network called 'deep learning' [1], computers have achieved human-level performance in a number of tasks previously considered to be some decades in the future [1][2][3].

Relevance to pathology
The area of pathological diagnosis has been included in this revolution [4] and arguably pathology data (and specifically image interpretation) are ideally suited to the application of deep learning, which at its core is a pattern-recognition tool 'trained' on data to classify new 'test' data. In a short period of time, we have seen the technology applied successfully in a variety of applications, with resulting histopathology-focused papers in high impact general medical and science journals [5][6][7][8][9][10][11], many claiming pathologist-level performance.
But AI is neither magical nor truly 'intelligent' like a human. Despite impressive results in test datasets under controlled conditions, in real-world applications it does not always deliver according to the hype and excitement of initial discoveries. This 'brittleness' has a variety of causes, including over-sensitivity to training data, lack of variety and depth in training sets, and failure to anticipate real-world conditions of deployment [12,13]. Many studies to date have been small, remote from real-world clinical use, and actual real-world application of AI in pathology is exceptionally rare.
The consequences of this are seriousa possible 'replication crisis' in digital pathology AI, and worse still, clinical harm due to the use of inaccurate or unreliable AI systems in clinical practice without proper oversight. The novelty of AI and relative inexperience of our community with the technology combines with the commercial pressure on AI companies to show positive results and the publication pressures on academic pathologists to create a potentially serious risk.
New guidelines recently published will go some way to alleviate this risk. The EQUATOR network was founded to bring together researchers, medical journal editors, peer reviewers, developers of reporting guidelines, research funding bodies and other collaborators with mutual interest in improving the quality of research publications and of research itself [14]. The EQUATOR mission is to achieve accurate, complete and transparent reporting of all health research studies to support research reproducibility and usefulness [14,15]. To address potential issues around AI, extensions to the SPIRIT and CONSORT guidelines were registered as 'guidelines under development' with the EQUATOR network in 2019 [16,17].

SPIRIT-AI and CONSORT-AI guidelines
Using a systematic approach with domain experts and methodologists, the existing guidelines for the design (SPIRIT) and reporting (CONSORT) of clinical trials have been modified to address the challenges provided by AI. The guidelines have been extended to include 15 and 14 new items, respectively, covering areas such as: • The need to clearly describe the intended use of the AI intervention • Indications for how to use the AI intervention in the clinical setting • Details on the data inputs to train the AI tool, and the outputs it produces • Descriptions of how errors or failures of the system are reviewed • Human-computer interaction aspects of the AI intervention The intention of the guidelines is not to be prescriptive or reduce innovation, but to improve the consistency of the design and reporting of research in this area and improve transparency so that systems and results can be more easily evaluated. As such, the guidelines offer a much-needed framework in which researchers can frame their plans to evaluate AI technologies, which will drive up the quality of research in this area. The authors acknowledge that this is a rapidly evolving area and there will probably need to be frequent reviews and updates of the guidelines.
There are several areas for future workdespite the publicity around AI, only seven clinical trials of AI have published results on clinicaltrials.gov (that is across all domains, and none in histopathology [17]). So, as evidence and experience accumulate, trial design and reporting will probably become more sophisticated. Relatively little work has been carried out using AI in pathology and more domain-specific recommendations may be needed. Finally, the guidelines specifically exclude the reporting of 'continuously improving' AI, as this is a more novel method that may require a different (revolutionary!) approach to design and reporting.

Conclusions
As we sit at the precipice of a technological transformation in the use of AI within pathological assessment and diagnosis, a quote from Alan Turing (considered the father of modern computing and AI) in The Times newspaper of 11th June 1949 remains pertinent: 'This is only a foretaste of what is to come, and only the shadow of what is going to be'. Nonetheless, in the urgency to develop these technologies, we must at the same time recall our Hippocratic Oath to 'do no harm' and ensure we create the best quality evidence for the benefit of our patients.