Exploring a new paradigm for the fetal anomaly ultrasound scan: Artificial intelligence in real time

Advances in artificial intelligence (AI) have demonstrated potential to improve medical diagnosis. We piloted the end‐to‐end automation of the mid‐trimester screening ultrasound scan using AI‐enabled tools.


Key points
What is already known about this topic?
� Artificial intelligence has shown great promise in medical diagnosis, including in antenatal settings � Most published work has been based on retrospective data, with very little work exploring how AI might be used in real-life clinical practice What does this study add?
� We have shown that real time use of AI in obstetric ultrasound scanning is feasible and can fundamentally disrupt how sonographers perform the scan � AI-assisted scans were significantly faster than standard manual scans � Automatically measured fetal biometry was highly accurate � The performance of automatic standard plane acquisition needs to be improved before these tools can enter mainstream clinical use

| INTRODUCTION
The mid-trimester fetal anomaly ultrasound (US) scan is now in widespread use, but quality is not consistent. Despite a trend showing improvement, international antenatal screening detection rates remain variable. For example, the rate of antenatal diagnosis of severe congenital heart disease internationally ranges from 13%-87%, with wide variation within countries. 1-3 A recent study examining why fetal heart defects are missed during screening found that in the majority of such cases, either the correct sonographic plane was not correctly obtained, or the defect was clearly demonstrated on screen but not recognised by the operator. 4 Artificial intelligence (AI) has been shown to achieve human-level performance in some medical imaging analysis tasks. 5 This raises the potential for automating aspects of the fetal US scan, including automated image identification in real time and automatically measured fetal biometry. [6][7][8][9][10][11][12][13] Although excellent model performance is often described using retrospective data in silico, relatively little work has been published on translating these findings to the more chaotic and unpredictable clinical world. This has created a gap between AI development in the literature and peer-reviewed clinical validation in real-world clinical settings. 14 Very few prospective clinical trials have compared AI to human performance in medical imaging, and this pilot study bridges that gap by embedding integrated AI tools into a live clinical scan. 15 Although commercial US manufacturers have recently introduced new AI-driven obstetric ultrasound products to the market, these aim to complement and augment the sonographer's skills as 'assistive' optional compartmentalised tools and do not fundamentally alter how the scan is performed. 16 We propose a new paradigm, in which AI is used to free the sonographer from scan processes that force task switching. This would fundamentally alter the scanning workflow and may allow the human operator to focus more attentively on other diagnostic aspects of the examination during live scanning.
The aim of the present study was to pilot the end-to-end automation of multiple elements of the mid-trimester US screening scan using AI-enabled tools and to assess the impact of these on the efficiency and quality of the fetal examination compared to a standard manual scan.

| METHODS
This study was performed as part of the intelligent Fetal Imaging and Diagnosis (iFIND) project, NRES number = 14/LO/1806 (ISRCTN = 16542843). All participants gave informed written consent. Scans were performed by two UK trained obstetric sonographers with 20 years combined experience (JM and ES) and equivalent professional qualifications. The trial design was a single centre prospective method comparison study. A sample size target of between 20 and 25 participants was pragmatically selected to assess the feasibility of end-to-end automation and to assess the initial impact on efficiency, measurement reliability and image quality before a larger trial.

| Participants
The inclusion criteria were: completed 18 +0 -20 +6 routine fetal anomaly ultrasound scan with no subsequent onward referral to specialist services, and gestational age between 20 +0 and 24 +0 weeks.

| Study protocol
Participants were required to attend a dedicated research clinic, where two fetal US scans were performed sequentially, using a Philips EpiQ US system with a C5-1 MHz curvilinear transducer (Philips Healthcare, Best, Netherlands). The manual (standard) fetal ultrasound scan was performed, and the AI-assisted scan (aided by automated scan plane detection, automated biometry and autoreport tools) was conducted as a comparator modality (see Figure 1  (considered the gold standard). Supporting Information S1 gives further details of the research US scanning methodology with example standard images. The 13 standard planes for which acquisition was attempted during each scan were: trans-ventricular brain; trans-cerebellar brain; abdominal circumference view; femur length view; facial profile; lips and nose; right and left outflow tract; four chamber view; three vessel tracheal view; kidneys; sagittal spine; coronal spine. The sonographers alternated between performing the manual scan and the AI-assisted scan for each participant to control for any difference in scanning ability or technique, blinded to each other's scan procedure and results. The time to perform each scan was recorded independently. The scan was considered completed when each view was either successfully acquired or had been attempted a maximum of three times (after the mother was repositioned, or drank some water to encourage fetal movement, as is common practice in a clinical setting).
The same display was used for both manual scanning and AIassisted scanning. In manual mode only the information normally displayed by the EpiQ system was presented. In the AI-assisted mode, a 'traffic light' system ( Figure 2) provided additional realtime information for each anatomical view about the automated image capture and storage (green: high confidence of detection; amber: moderate confidence; red: insufficient confidence). The only input data for the AI algorithms was from the full live stream 2D US data that was collected for the duration of the examination. During AI-assisted examinations, the sonographer dynamically scanned and observed the fetus until they were satisfied that a comprehensive visual assessment had been completed. This was in combination with confirmation that the required planes had been captured (i.e. green  At the end of an AI-assisted scan, a report (automatically populated with five candidate images for each required plane) was generated in .html format (Supporting Information S2). To complete the auto-report, the sonographer selected the best quality plane from each set of candidate views. This selection triggered the inclusion of the corresponding biometric measurements automatically extracted from the images, presented both as numerical values and on relevant growth curves. Other clinical data (e.g. gestational age and fetal position) were manually entered by the sonographer, and the report was saved as a .pdf file when complete.
During the manual scan, acquisition of each standard view was attempted, and biometric measurements were repeated, three times on different images (as is considered best practice in the FASP guidelines). 17 The 13 standard views (which include the four core biometric views) were saved in the standard way on the ultrasound machine's hard drive ready to be sent to a picture archiving and communication system (PACS) system after the examination. The written manual report was produced using Astraia, a standard obstetric reporting software package (Astraia software GmbH, Munich, Germany). This required physical input of the technically best biometric measurement selected by the sonographer.

| Image quality and biometric measurement validation
After the data collection period of the study was complete, a third independent sonographer (KL), blinded to the scan method, performed a subjective assessment of image quality. A purpose-built interface that presented side-by-side images of the same nominal standard view from the manual and the AI-assisted scans unlabelled and in random order was used for this assessment. For each image pair, the sonographer was asked to select which was of a higher quality, or if they were of equal quality. To allow for inter-observer comparison of biometry, the same sonographer also repeated all biometric measurements offline using MITK workbench software (German Cancer Research Center Division of Medical Image Computing, Heidelberg, Germany).

| Analysis of constituent parts of the scan
To validate our findings from the blinded and paired comparison study between the manual and AI scanning methodologies, we used a bespoke software script to automatically analyse the frozen, that is 'non-scanning' time in the manual scans for each study participant. To further assess how scan time savings may translate into an uncontrolled clinical environment, 782 consecutive scan datasets from the previously described iFIND-1 cohort 7 which had been verified for completeness, were analysed in a similar fashion.

| Artificial intelligence algorithms
To ensure a seamless real-time end user experience during the AIassisted scan, the following six elements were required to work simultaneously (see Table 1, and for full technical methods see Supporting Information S3): (1) Real-time image capturing through HDMI video output and frame grabber. F I G U R E 2 Research clinic room set up and display monitors. Large white arrow: AI feedback overlay, displaying real-time detection confidence for each standard view. Small white arrow: 'Traffic light' system indicating the overall confidence of the completeness of the data capture for each standard view (high, moderate, low) [Colour figure can be viewed at wileyonlinelibrary.com] (2) Automatic standard plane detection, image saving, and clustering 7 (SonoNet (EpiQ), ClusterSPD.v2).
(5) Real-time display of the original images and AI-derived information ('traffic light' system).
(6) Dedicated computer running all the above simultaneously in real time.
The number of standard plane image frames collected per case by the SonoNET (EpiQ) algorithm was too large for a sonographer to review during clinical reporting. 'ClusterSPD' was developed to address this and was used to provide five candidate views from which the operator performed a final offline selection. This provided both an efficient and manageable process and afforded sonographers the chance for a second review of key clinical information.

| Data analysis
Means and standard deviations (SD) were calculated for the biometry, scanning and reporting times for each scan condition. Paired absolute and relative difference were compared between scan conditions across subjects and tested using a two-tailed paired t-test (where p < 0.05 was considered significant). The growth in gestational days equivalent to size differences in different measurements were calculated using fetal growth reference charts. [19][20][21][22] All data were analysed using SPSS (version 24, SPSS Inc, Chicago, Ill, USA) and Excel (version 15.0, Microsoft Corp, Redmond, Washington, USA). At the end of the study, a short feedback survey was conducted with the scanning sonographers to gain some structured feedback on the use of the tools.

| RESULTS
Between May 2019 and March 2020, 23 pregnant women were recruited into this single centre study to undergo both AI-assisted and standard manual prospective mid-trimester fetal US scans as per the NHS FASP protocol (Figure 1). 17 In total, 299 manually acquired and 260 AI-assisted FASP standard image views were achieved. Table 2 outlines the baseline participant information, and

| Examination duration and processes
Scan times were 34.7% shorter in the AI-assisted scan compared to the manual scan (mean duration 14.32 vs. 21.93 min, mean time saving 7.62 min; Table 3). With the exception of two cases, AIassisted scans were faster than the matched manual scan. When a subgroup of manual scans was analysed, the mean duration for the screen to be frozen during the manual scans (corresponding to periods where the images were being visually assessed, measured, and saved) was 7.8 min, corresponding to 38.8% of the total scan duration (Table 4, where comparable data is also shown from a large clinical cohort database collected from a working antenatal clinic, further details in discussion).
Sonographers reported that the AI tools were easy to use, perceiving a change in their scanning approach during the AI-assisted scan compared to the manual scan. Both sonographers agreed that the AI tools made it easier to concentrate on image interpretation, with no adverse impact on patient interaction during the scan (Figure 4).
Despite a fundamental change in written report production (during which the AI-assisted process required the sonographers to undertake an additional review of the automatically obtained images for selection into the report: see Supporting Information S2), there was no significant difference in the time taken to complete the reports between the two methods (Table 3). When selecting a specific image for use, the auto-report automatically included any corresponding biometric measurements, displaying these in conventional tabular form and plotting them on population centile charts.

| Scan completeness, image quality and biometrics
In considering completeness of the reports for the core fetal views (trans-ventricular brain view (TV), trans-cerebellar brain view (CB), abdominal circumference view (AC) and femur length view (FL)), the AI-assisted report included 93% of the required views, compared to 98% in the manual report ( Figure 5 and Table 5). When all 13 standard views were considered (i.e. core biometry views plus additional facial profile, lips and nose, right and left outflow tract, four chamber view of the heart, three vessel tracheal view, kidneys, sagittal and coronal spine: see Supporting Information S1), the AI-assisted report successfully saved 73% of the required images compared to 98% in the manual report.
Independent, blinded assessment of image quality demonstrated that the automatically extracted image was of a superior quality than the paired manual equivalent image in 33%, 41%, 44%, and 19% of the TV, CB, AC, and FL view respectively, and the manual view superior in 33%, 29%, 38%, and 69%. The views were felt to be of equivalent quality in 33%, 29%, 19%, and 13% ( Figure 5).
There were no statistically significant differences between the manual and AI-assisted measurements for any of the biometrics other than for the head circumference (Table 6). Although for this measurement there was a statistically significant difference between the manual and automated measurements, the human inter-observer difference was greater than the manual versus AI-assisted difference (−5.37 mm, SD 3.36 vs. −2.44 mm, SD 3.65). Additionally, neither of these differences were considered clinically significant (i.e. they MATTHEW ET AL. -53 would not change routine clinical management) based on the associated equivalent change in gestational age (in this case, between 1 and 3 days).

| DISCUSSION
Standard manual obstetric US examinations feature an interrupted mode of operation with repeated pauses to save images and make measurements. A reporting process then follows, which is primarily a mechanical data entry task. In contrast, the AI-assisted approach that we tested fundamentally changed the way in which the scan was performed and reported. Many tasks were completely automated, enabling the sonographer to focus on optimal scanning and offering parent-centred care. The reporting process was transformed into a secondary image review task: as the images for review had been automatically acquired during live scanning, the operator could critically appraise image quality and evaluate likelihood of anomalies.

| Algorithm performance and clinic workflow
The AI-assisted scans were significantly faster (saving on average 34.7%) than manual scans. When we analysed the constituent parts of a subgroup of manual scans, we found that the screen was frozen for 39% of the total scan time (i.e. a similar duration to the AI time saving). Selecting the ideal image plane and placing callipers for biometric measures are not required in the AI-assisted scans, therefore it is likely that this is where the time savings are gained. To investigate how this may translate into a true clinical environment, we analysed the constituent parts of a retrospective sample of 782 full video recordings of anomaly scans from a single centre (Table 4).
During these the proportion of the scan time where the screen was frozen was approximately 19.5%. This represents a clinically useful real-world time saving should our results be replicated outside of the controlled study environment.
Cognitive load theory concerns the overloading of working memory by extraneous factors that do not contribute to task completion. 23  -55 reduce cognitive load and improve performance in US-based tasks. 24, 25 We did not directly assess cognitive load in this study, however sonographer feedback suggests that the less disrupted approach of the AI-assisted scan seemed more efficient and made it easier to focus on the interpretation (rather than acquisition) of relevant images.
We utilised a modified SonoNet algorithm 7 to automatically acquire standard planes (SonoNET EpiQ). This algorithm had excellent performance when previously tested with retrospectively acquired data, but unsurprisingly had poorer performance in our 'real-world' prospective setting. Considering the possible complete 13-image dataset, a suitable image was saved for 73%. This is improved to 93% when we include only the four core views, but this is still inferior to the standard manual scan (98% for both). This discrepancy highlights an area for necessary improvements before this technology enters mainstream clinical use.
Automatically acquired fetal measurements were found to be highly reliable, and comparable with manually obtained measurements. No clinically significant differences were found between the two methods, and the discrepancy between automatic and manual measurements was less than the difference between two independent human observers. This demonstrates that as well as being faster to

| Challenges and limitations
Despite the recent volume of research on the use of AI in medical imaging, the majority of studies are based on the testing of trained algorithms using retrospectively acquired data. 5,15 Compared to other modalities, US has little work published on the use of AI. 16,26 This is likely due to additional challenges posed by ultrasound, particularly the high degree of operator-dependence and susceptibility to artefact. The way in which US is performed is also funda- Limitations of this study include a relatively small sample size, although this is to our knowledge the largest study of the prospective use of fully integrated AI in a complete fetal screening examination.
Also, the fact that sonographers were unblinded to the knowledge that the subjects had previously had a routine anomaly scan without onward referral meant that they were aware of a low risk for significant clinical findings. The dedicated research setting included additional observers in the scan room (a timekeeper and engineers on standby for technical support), and the scan environment was artificial by clinical standards with a stricter protocol, possibly resulting in higher protocol adherence compared to a typical US clinic with multiple operators.

| Future research
The two areas requiring improvement before our package of AI tools could be implemented in a real-world clinical environment are: (1) the completeness of the AI-assisted scans for an extended set of standard views, and (2) quality of the acquired images. The quality of the acquired images compared between the AI-assisted and the manual scans (as assessed by a blinded independent observer) was broadly similar for the core views of the head and abdomen but were poorer in the views of the femur. These areas of potential improvement highlight the previously discussed challenges in translating AI into an actual clinical environment. 14 This study was designed to pilot the introduction of AI into the UK fetal anomaly screening programme (FASP). As such, only views which are specified in the FASP protocol were used. There are additional views such as views of the fetal bladder or umbilical cord that are often used by fetal medicine specialists to diagnose important fetal disease.
Also, more advanced ultrasound methods such as colour Doppler are routinely used by fetal medicine and cardiac specialists. Although not currently part of population-level screening in the UK, the application of AI to aid the interpretation of these additional views and other imaging techniques may be a novel way of improving the performance of the screening programme in the future.

| CONCLUSIONS
We have demonstrated the feasibility and utility of integrated AI-tools in the performance of obstetric US scans. These tools have the potential to transform the current practice ultrasound scanning methodology by altering workflow. By removing tasks that interrupt live scanning, the ultrasound operator achieves a faster scan and with similar image quality, although further work is required to improve the performance of the image plane detection algorithms for some standard views when used in real time. The subjective sonographic experience was also changed, most likely a result of the reduction in task switching and interruption of observational scanning. Large scale multicentre trials motivated by smaller evaluation studies (such as this one) will be needed to determine if AI assistance results in increased antenatal fetal anomaly detection rates. We hypothesise that in the future, AI will release human operators from repetitive tasks, allowing them to focus on other aspects of the scan which could enhance scanning quality and antenatal care for expectant families.

ACKNOWLEDGEMENTS
Firstly, we would like to thank the parent volunteers for taking part in this study. We would also like to thank the wider iFIND team and Care. The funding bodies had no influence in the study design, data collection, analysis, interpretation, preparation of the manuscript, or decision to submit or publication.

CONFLICT OF INTERESTS
AG reports consultancy fees from Ultromics Ltd. Bernhard Kainz reports consultancy fees from ThinkSono Ltd, Ultromics Ltd. and Cydar Medical Ltd. Daniel Rueckert reports consultancy fees from Heartflow, and IXICO, and in addition Daniel Rueckert has a patent US20200027237A1 pending. All other authors report no conflict of interests.

ETHICS STATEMENT
The study has been granted NHS R&D and ethics approval, NRES ref no = 14/LO/1086.

DATA AVAILABILITY STATEMENT
The collected and analysed datasets during the current study are available from the corresponding author upon request. The source code for the plug-in based real-time ultrasound software (PRETUS) is available at github.com/gomezalberto/pretus.