The development of “automated visual evaluation” for cervical cancer screening: The promise and challenges in adapting deep‐learning for clinical testing

Abstract There is limited access to effective cervical cancer screening programs in many resource‐limited settings, resulting in continued high cervical cancer burden. Human papillomavirus (HPV) testing is increasingly recognized to be the preferable primary screening approach if affordable due to superior long‐term reassurance when negative and adaptability to self‐sampling. Visual inspection with acetic acid (VIA) is an inexpensive but subjective and inaccurate method widely used in resource‐limited settings, either for primary screening or for triage of HPV‐positive individuals. A deep learning (DL)‐based automated visual evaluation (AVE) of cervical images has been developed to help improve the accuracy and reproducibility of VIA as assistive technology. However, like any new clinical technology, rigorous evaluation and proof of clinical effectiveness are required before AVE is implemented widely. In the current article, we outline essential clinical and technical considerations involved in building a validated DL‐based AVE tool for broad use as a clinical test.


| INTRODUCTION
Cervical cancer remains a leading cause of women's morbidity and mortality in resource-limited settings. 1 The World Health Organization's (WHO) global call to eliminate cervical cancer relies on high-coverage of human papillomavirus (HPV) vaccination and screening with accurate and practical technologies to detect and treat precancers. 2 Existing cervical cancer screening and triage technologies fall into three categories: visual, microscopic (eg, cytology) and molecular (eg, HPV testing). 3 Visual inspection of the cervix after applying acetic acid (VIA), though widely used in low-resource settings for primary screening or triage, is poorly reproducible across settings and not reliable in discriminating precancers from benign HPV-related and "look-alike" changes. 4 Cervical cytology as performed in most low-resource settings has had poor historical impact due to lack of infrastructure, poor quality assurance, need for repeated screening and poor follow-up of screen positives. 5 HPV testing is the most sensitive primary screening method for detecting precancers, thus providing long-term reassurance for HPV-negative women. 6 Moreover, HPV testing is compatible with self-collected vaginal specimens. 7 However, to avoid overtreatment, HPV positivity is best followed by triage testing to identify the minority of HPV infections linked to precancer. 7 Deep learning (DL)-based automated visual evaluation (AVE) of cervical images is emerging as an alternative novel, low-cost screening and triage solution. Machine learning (ML) is a type of artificial intelligence (AI) that uses computers to detect patterns in data without being explicitly programmed to do so. 8 DL, inspired by the network of neurons in the human brain, is a kind of ML method that uses many layers of arithmetic operations 9 to arrive at a model that mimics the pattern identification for which it has been trained. DL has numerous applications in medicine (eg, image recognition algorithms like AVE, automated dual-stain cytology, diagnostic radiology and automated diabetic retinopathy [DR] screening). [10][11][12] In a DL model for image recognition, information on different characteristics (eg, texture, edges and curves) associated with target of interest is gathered from individual pixels in an image through different layers. Through big data and advanced computational resources, these elements, combined in what we call an algorithm, are analyzed to provide accurate diagnosis for previously unseen images. 9,13,14 AVE as an assistive technology to VIA 15,16 offers an opportunity to improve VIA to create a screening process that supports accelerated control of cervical cancer.
General reporting guidelines for clinical trials with AI-interventions have been reported previously. 17,18 This article, however, outlines our collective view of considerations required, specifically for developing and adopting a DL-based AVE algorithm for cervical precancer detection. We aim specifically to ensure its applicability as a wellvalidated clinical test in cervical cancer screening programs globally, although most principles are likely to be applicable for any AI-based clinical tests. The text in this article elaborates on an accompanying checklist to guide the development and validation of an effective and clinically relevant DL-based AVE algorithm. Particularly, we wish to caution clinicians and policymakers for the need to evaluate the clinical effectiveness and applicability of those tools when they are applied in cervical cancer screening programs to avoid premature introduction (Table S1).
2 | STEP-WISE CONSIDERATIONS FOR AI-BASED AVE 2.1 | Before training the algorithm 2.1.1 | The indicated use of AVE Detecting and treating precancer is the main aim of cervical screening. 19 However, the point-prevalence of precancer, even in previously unscreened populations, is only 1% in the general population, and 2.5% in the women living with HIV (WLWH). 20 Therefore, as a general screening tool, AVE needs to detect precancers sensitively, but with the perspective that almost all screened women (>95%) will never develop cervical cancer. 21 In contrast, among the HPV positives, the prevalence of precancer increases considerably from~1% to >5%. 20 Based on the wellestablished role of HPV as a necessary cause in cervical carcinogenesis, 21 together with the evidence of long-term negative predictive value of HPV tests (virtually zero risk over 5 years), 6 an ideal use-case of AVE is for triage of HPV-positive women (Box 1).
If found to be effective, this screening strategy is envisioned by our group to be scaled up in a community-based campaign combining: (a) screen-and-treat screening of mid-adult women (ie, 25 or 30 to around 45 or 50), and (b) single-dose vaccination of multiple birthcohorts of girls and younger women to induce herd protection. 32 Such a conjoined primary and secondary prevention effort is likely to lead to accelerated cervical cancer control in low-resource settings.
The prevalence of visual abnormalities and precancer further increase in a colposcopy clinic, where most women have been referred for equivocal or minor cytologic abnormalities such as HPV-positive atypical squamous cell of undetermined significance (ASC-US) or squamous intraepithelial lesion (SIL), respectively. 33 Within this context, women referred for colposcopy also have an increased prevalence of cervical visual abnormalities regardless of final diagnosis. Therefore, an AVE algorithm trained for indicated use in general screening should not be assumed to be suitable for use as a tool for triage in a colposcopy setting and vice versa unless the accuracy of both approaches is explicitly demonstrated in a formal evaluation (Figures 1 and S1).
In addition, until sufficient supportive evidence accumulates regarding accuracy, reliability and portability of the method to different settings, AVE is best used as an ancillary technology to aid health workers performing VIA to improve their accuracy, rather than a standalone tool. 34,35 2.1.2 | Clarifying target population for using AVE Any visual cervical screening methods, including AVE, works best when applied at an appropriate age range (eg, 25-49 years). 36 Within this age-range, HPV infections are more likely to be clinically meaningful than at younger ages at which transiently detectable HPV are extremely prevalent but cancer is very rare. 37 Moreover, prominent glandular epithelium ("ectopy" or "ectropion"), common at younger ages, may lead to false-positive AVE predictions. Also, in mid-adulthood compared to older ages, the squamocolumnar junction (SCJ) at which most cancers arise is frequently still fully visible, 20 and lesions, if detected, could still be treated safely without disproportionate risk of damaging atrophic pelvic structures. 38 Using an AVE algorithm on cervical images when the main site of cervical cancer, the SCJ, is no BOX 1 AVE as a triage for HPV positives HPV testing for carcinogenic HPV types is the most sensitive method for cervical cancer screening, providing many years of reassurance (negative predictive value). 6 Therefore, HPV testing is a desirable primary screening test, mainly when few screening rounds are possible. 22,23 Currently, the cost is the prohibitive factor in adopting HPV as a primary screening test in many low-resource settings. However, based on available tests that cost <5 US dollars and take <1 hour to perform and offer partial HPV genotyping, even lower-cost, pointof-care HPV tests will likely be widely available in a few years. 24 HPV infection is too common to treat all infected women, most of whom do not need treatment, particularly given possible iatrogenic harms. Relying on negative HPV testing to reassure most women against cervical cancer risk permits public health efforts to focus on the triage of HPV-positive women with newer technologies like HPV typing and AVE.
Risk-informed hierarchical partial genotyping of HPV, if incorporated with minimal additional cost into HPV testing, provides important risk stratification useful for triage of HPV-positive women. 25 Even among the types of HPV defined as carcinogens, there are at least four distinguished categories based on the risk of invasive cancers. HPV16 (species alpha-9) is uniquely carcinogenic with the highest risk of cervical precancer and cancer, causing~60% of squamous cancers. HPV18 and HPV45 (species alpha-7) cause~15% of squamous cancers and with HPV16 also account for >90% of adenocarcinomas. 26 The types of HPV closely related genetically to HPV16, namely, HPV31, HPV33, HPV35, HPV52 and HPV58, account for another~15% of squamous cancers and are conceptually worth distinguishing from the lower risk, minimally carcinogenic types (HPV39, HPV51, HPV56, HPV59 and HPV68), accounting for 5% of squamous cancers. 27,28 Of note, HPV35 is particularly pernicious for women of African origin. 29 It is pertinent to note that if AVE is used alone for standalone primary screening, "look-alike" confounding conditions like severe cervicitis could lead to over-treatment 15 of many women with benign conditions unrelated to cervical cancer. Hence AVE is used as a triage test for the relevant set of HPV-positives. Cervical sampling for HPV testing abrades the cervix's critical transformation zone (TZ; where most cancers arise), complicating the use of AVE for triage. Fortunately, vaginal sampling, either by the woman herself or a clinician, has been convincingly shown now to be almost equivalent to clinician sampling of the cervix when a sensitive HPV DNA test is used. 30 In addition, self-sampling is also demonstrated to permit very high-throughput cervical screening in a COVID-safe manner. 20,25,31 Recognizing the eventual importance of vaginal HPV testing, we aim to develop a screening strategy using HPV self-sampling, with risk-informed partial HPV typing 28 and AVE. When used sequentially in combination, this will classify the woman into risk strata (of highest to lowest probability of precancer) to guide treatment and limit overtreatment.   Figure 3, 20,21 which must be defined clearly to avoid misclassification by teaching ("training") the AVE on incorrect labels. In this regard, defining the cervical carcinogenesis stages based on nonreproducible historical grading systems (eg, dysplasia or cervical intraepithelial neoplasia [CIN] stages) is no longer optimal. [42][43][44] Rather, the four stages can be defined as follows.
Invasive cervical cancer is defined histologically unless the clinical picture is so severe that surgical pathology is not obtained.
Precancer is defined stringently as a histopathologic CIN3/AIS (adenocarcinoma in situ) since most histopathologic CIN3/AIS cases contain the same HPV types found in invasive cancers. 19,45 Moreover, CIN3/AIS histopathologic diagnosis of precancer is reasonably reproducible without resorting to expensive molecular markers of cellular transformation (eg, viral methylation and viral DNA integration). Additionally, selected high-risk histopathologic CIN2, if the diagnosis is corroborated by expert gynecologic pathologist review and accompanied with highest risk HPV-type positivity, is likely to represent precancer. 46 However, one needs to be cautious in including all CIN2 as a precancer target because CIN2 is a poorly reproduced diagnosis with a mixture of high-grades and regressive low-grades (associated with noncarcinogenic HPV types as HPV53), creating a phenocopy of early precancer. For colposcopic biopsy to be sensitive, multiple biopsies of all visible lesions (based on turning white after application of vinegar, called acetowhitening) is necessary, rather than targeting of the most severe appearing lesion. Clinician colposcopic impressions, even when performed by experienced gynecologists, are subjective and variable in distinguishing precancer from benign HPV-related changes and "lookalike" conditions. [47][48][49] An algorithm trained on target class definitions based on human interpretation of cervical images instead of histopathologic diagnosis, particularly for "precancer" target, will be restricted by the same limitations in accuracy and intraobserver and interobserver variability as other visual methods (eg, VIA). 50 Thus, multiple biopsies and histopathologic definition of precancer are preferable to high-grade colposcopic impression.
However, histopathology cannot define the normal cervix, as most normal women are never thoroughly biopsied. Since the negative predictive value of the HPV test is very high, the ideal definition of "normal" (in the sense of virtually no imminent risk of cancer) will be images from confirmed HR-HPV negative women. 6 Alternatively, in the absence of HPV results, the absence of any acetowhitening (ie, entirely "pink" cervix) on expert review of images from women at a general screening clinic can be used to define normal because acetowhitening is a sensitive measure of the risk of precancer, 51 and chances of finding CIN3/AIS in women at a general screening with no cervical acetowhitening is very low. 52 Once cancer and precancer are defined histologically (and ideally virologically as well), and the normal cervix is defined visually, the remaining category can be conceived of as "HPV-related and other equivocal changes." Histopathology has limitations in defining this category due to subjectivity in microscopic diagnosis and biopsy placement errors (eg, targeting only the worst appearing lesions). 53 In our experience, an algorithm not trained explicitly to recognize these "equivocal" images tends to give extremely erratic predictions on these images ( Figure S2). Since it is in this "equivocal" zone where the experts also struggle the most and since the associated risk of cervical cancer is likely to be intermediate (ie, nonzero but much lower than precancer), it is desirable to train the cervical images with acetowhite changes as a separate target interposed between "normal" and "precancer" targets.
Ongoing work by our group is addressing how best to include this equivocal class in training (ie, training a multiclass ordinal classifier). an accurate yet generalizable AVE algorithm via DL approach, can be assumed to be hundreds or greater for each target class to achieve satisfactory disease discrimination. 54,55 It is worth recalling that, even in high-burden settings, cervical precancer is relatively uncommon 52 ; thus, ethical acquisition 35 of accurately labeled, representative case images, is challenging.

| Image quality evaluation and pre-exclusion
The provider's training to capture good quality images is a first step for AVE's successful application. However, when an AI-based image recognition tool is applied in real-world clinics, variation in the quality of images is inevitable. The image quality is affected, in addition to the user training, by the lighting (eg, external ring light vs built-in camera flashlight, shade of white light), image capture device and postcapture processing of images by device-specific software, anatomic variation, speculums (eg, metallic vs transparent plastic) and so on.
Without a quality check, AVE will provide a prediction for any image given to it as an input, including images not even recognizable as cervix and images with a completely obstructed region of interest (ROI) (ie, SCJ) ( Figure 2B,C). 20,56 Therefore, a manual or automated gatekeeping mechanism should be in place to exclude poor-quality images from training and evaluation to minimize false predictions.
Various parameters define the image quality, such as blur, Gaussian noise, resolution, color, angle and glare/reflections; not all affect the AVE's performance equally. The composite minimal image quality standards needed to obtain a good performance on AVE is an ongoing advanced research topic.

| Choosing DL methods for training AVE
Training a DL algorithm is more complex than the simple explanation described previously. 15 Multiple technical choices need to be taken while training the algorithm (Box S1), 9,13,14 which may have implications for interpreting the output 54,57,58 ( Figure S3). The aim is: (a) to achieve accurate and reliable prediction on hold-back images from the same database as the training set (called "internal validation"), (b) not to lose generalizability in new images from different databases ("external validation"). Ongoing work from our group is exploring the optimal DL approach to train an AVE algorithm to achieve maximum risk discrimination that has external validity.
In addition, the choice of methods has implications for time and computational speed requirements to run the algorithm. Ideally, a scalable AVE algorithm should be available to run as a standalone app (without internet) on the image-capture device itself, providing quick (within few seconds), and real-time predictions for on-site patient's management to minimize loss-to-follow ups.

| Reproducibility of AVE
The essential first parameter in assessing AVE's validity, like any medical test, is reproducibility. Like a thermometer, giving a consistent reading of body temperature on the repeated measurement of the same person, an AVE algorithm should give virtually identical outputs when asked to predict the same image repeatedly. However, in the case of near-duplicate images (ie, images collected from a woman under the same image capture protocol consecutively), subtle changes in the numerical pixel values of the image due to changes in body or camera position may alter the AVE predictions especially for equivocal images, despite the visual similarity of the images to the human eyes.
Clinically, it is confusing to the user if an AVE algorithm were to label one image as a precancer and a near-duplicate image (or same image in a different run) as normal ( Figure S4). Therefore, before its use for clinical decision-making, AVE's robustness for near-duplicate pairs of images should be measured and reported. 59

| Internal validity of AVE
To "teach" the algorithm to recognize the target of interest, we provide it with sets of "labeled" cervical images in each target class as a "training" (to learn the features associated with the outcome of interest) and a "validation" set (to iteratively check on and optimize the algorithm's performance as part of training). 15 It is important to note that the validation set is not a true blinded test set. A performance achieved by the algorithm on the validation set is likely to be misguiding and over-optimistic. 58 When the "training-validation" set is limited, an algorithm is prone to overfitting to the image features in the "training-validation" set and may completely fail on the third independent "hold-back" set of previously unseen (ie, blinded) images from the same database as training and validation set. 55,60,61 Therefore, it is essential to assess AVE's performance on an independent completely blinded "test" set of images not included in the "trainingvalidation" process ( Figure 4A) to have a realistic estimate of internal validity of AVE on a dataset.
In addition, it is important to include a realistic set of images in the "test" set on which the performance is finally evaluated. For example, we may observe good case-control discrimination by AVE on a hence is theorized to have lower intraobserver variability in addition to lower interobserver variability than VIA and colposcopy, leading to higher consistency.

| External validity (generalizability) of AVE and avoiding overfitting
Verifying an AVE algorithm's performance is a two-step process.
Testing the algorithm's performance (achieving accurate predictions without overfitting) on an independent "test" set of images derived from the same source as the training set (called "internal validation") is a crucial first step, 15 but not a final benchmark. This testing set will be limited by the same "finite" representation and idiosyncratic random variations as in the "training" set. Thus, the process does not reflect true validation of an algorithm in terms of how it will perform in actual clinical practice with "infinite" variations in patient characteristics, user training and image capture protocols. 9 For example, an AVE algorithm that is overfitted to a particular set of images from a clinic 15 will learn to recognize random (ie, nonrelevant) variations in the particular training set that distinguish precancer from normal, but these distinctions are not necessarily generalizable to other settings (eg, images from different clinics captured under different light sources by different providers) to distinguish patterns associated with precancer detection 55,60,61 ( Figure S5). Therefore, to assess true generalizability, one needs to evaluate the AVE algorithm's performance on a diverse set of images from various clinical settings worldwide. In addition, ideally, multiple independent formal efficacy assessments should demonstrate replicability of the results. 34 Unless efforts to develop a device-agnostic algorithm are successful, a dedicated image capture device or devices, with algorithms trained with their image types, will need to be used to ensure accurate and time-stable AVE performance.

| Anatomical and biologic confounding factors and effect modifiers for AVE
Several patient characteristics may contribute to the erroneous classification of a given image by AVE ( Figure S7). 15 For example, ectopy among young women may result in a high AVE severity class prediction due to the ruddy glandular epithelium extending onto the ectocervix. Similarly, severe cervicitis, female genital schistosomiasis (FGS) can be misclassified as precancer, and certain noncarcinogenic HPV types (eg, HPV71) with no relationship to cervical cancer may F I G U R E 4 (A) AUC results for the discrimination of disease vs no disease in a validation set and a test set. Notice that the AUC value from the same study images decreases from 0.94 to 0.86 when the algorithm was tested in a hold-back test data set images that were not used at all during the training and validation of the algorithm. Source: Binary classification algorithm trained on cervigram images from NHS tested on cervigram images from NHS (unpublished results by NCI HPV-AVE research group). (B) Score values obtained in a binary classification algorithm trained on cervigram images. AVE prediction scores were presented per definite case, definite control, equivocal case and equivocal control. When using a selective set of clearly defined cases (precancers) and controls (normal), the algorithm easily discriminated between disease strata, but when adding equivocal images, as it would be in a real-life scenario, the score distribution tended to be wider and less discriminative of diseases status. goal of AVE, ideally, is to directly predict the risk (conceptually a continuous probability from 0 to 1) of a woman having a precancer today while having some reassurance (ie, negative prediction) for the future. 67 However, the current classifier AVE algorithm approach is trained to predict discrete target classes (eg, histopathologic cancer or precancer, low-grade, normal). Such a classifier AVE provides a score associated with each target class it is trained to predict. However, it is important to understand that these scores themselves are not true risk estimates (ie, woman with a "raw" score of 0.9 associated with the "precancer" class does not necessarily have a 90% probability of precancer) and are not reliably portable. 8,68 In order to obtain a clinically meaningful, reliable and portable estimate of the true risk of precancer from a classification network, the final AVE class label prediction needs to be translated into a risk value (ie, observed total number of women with precancer out of the total number of women with a given class prediction should match the expected number of women with precancer based on the absolute risk prediction for the given class, for example, 90 observed women with precancer out of 100 for the precancer risk of 90% for a precancer class prediction), taking into account other co-factors (ie, age, HIV status, HPV status, HPV types, etc), if available ( Figure 5 3 ), to accurately risk-discriminate low-risk and high-risk individuals for risk-based clinical management.

| Field implementation
The considerations described here are mainly focused on the technical efficacy of AVE. When scaled-up for implementation, outside the F I G U R E 5 A recommended approach for cervical cancer screening based on HPV genotyping and AVE. HPV extended genotype provides a risk stratification that, when added to the AVE class label prediction, provides 17 risk strata. When each stratum is calibrated to represent the absolute probability of a woman having a precancer (ie, risk), a direct risk-based clinical management decision can be taken tailored to resources availability. Reprinted with permission from Wentzensen et al 3 [Color figure can be viewed at wileyonlinelibrary.com] research settings, even well-validated algorithms will have many challenges (eg, data privacy, patient acceptability and provider training) as observed in other medical fields. 63,69,70 For example, even a highly accurate DR screening algorithm inside a computer lab has been documented to have failed in the field clinics due to practical challenges. 71 Some of these challenges present valuable parallels to the AVE implementation work. For example, the DR algorithm could not read a high proportion of images due to poor quality attributed to variation in lighting conditions across the field clinics, differentially affecting the retinal dilations. 70,71 Also, it was sometimes impossible to take a single image capturing the entire field of view (eg, retina), leading to a failed prediction by the algorithm. 70 These image quality issues lead to a similar dilemma as in AVE of balancing the risk of predictions based on imperfect data against the risk of inaction from losses-to-follow-up (LFU) during referrals. 70 There is a balance between efforts to improve image quality by users against making the algorithm robust enough to tolerate "less than perfect" images. The delayed or failed image analysis on a cloud-based DR algorithm due to poor internet connectivity at the field clinics is another parallel with AVE, confirming our group's insistence on the absolute need for the AVE algorithm to work off of a local hardware with sufficient processing power without internet connectivity. 70,71 The challenges encountered in developing a robust and reliable DR algorithm also have many analogs for the AVE development effort. Some of these challenges are: generating reproducible ground truth labels with high interobserver agreement among the experts for training the algorithm, particularly for the classes with high interclass similarities (eg, hard vs soft exudates) 72 ; difficulties in detecting lesions in the presence of noise (eg, optical reflections) and commonly encountered nonlesion structures (nerve fiber reflections, vessel reflections and drusen) 72 ; and developing a generalizable algorithm that could work accurately across inevitable common variations in the clinical environment (eg, images collected from multiple centers on machines ranging from smartphone cameras to high-end fundoscopes). 73 The main important considerations specific for implementation of AVE for cervical cancer screening are human resource capacitybuilding to manage screen-positive women detected by AVE, developing data management systems to support tracking women needing referral and cost-effectiveness analysis to evaluate AVE's impact in real-life programs.
It is important to emphasize that to prevent cancer we need to detect precancer lesions and treat them adequately. Absence of treatment is a major and unfortunately very common reason for screening program's failure. For women requiring treatment, thermal-ablation using a battery-operated mobile device is currently the most portable option given that it is safe, effective, affordable and does not require sophisticated equipment. 74 However, because not all women are eligible for thermal-ablation due to abnormalities or benign changes on the cervix, 75 local providers will need to identify which women require referral for further evaluation for more invasive treatments (eg, conization, Large Loop Electrosurgical Excision of the Transformation Zone [LLETZ]), unavailable in many resource-limited settings. For providers, this assessment is prone to variability and challenges. 76 DL-based AVE based on expert reviews of cervical images is under development to predict a woman's eligibility for treatment with ablation; an initial pilot suggesting good performance. 16 3 | CONCLUSIONS DL-based AVE of the cervical image is a promising but still evolving clinical test. Even though the inner workings of DL remain obscure, DL-based AVE, in the end, is no different from any other clinical diagnostic test. Since the limitations of the DL described here might not be fully appreciated by end-users, the onus lies on the developer of an AI-based device to make the subtle issues explicit, particularly in the less regulated markets. Raising awareness and knowledge of the goodness-of-fit and limitations of DL-based AVE among end users is critical to improve clinical practice. Nonetheless, some AVE-type products are already being marketed without substantial documentation of effectiveness. 77,78 Thus, in line with the WHO guidance, 35 we maintain that premature introduction of AI-based methods, without transparency and accountability, threatens their eventual acceptance and best use.

ACKNOWLEDGMENTS
The research was funded by the NCI Cancer Moonshot SM and NIH intramural research program. The work of BB was supported by NCI/NIH under Grant T32CA09168.

CONFLICT OF INTEREST
DL is the co-founder of MobileODT. In the last 3 years, he was an executive into company and sat on the board of directors. He is no longer with the company, but still own some stock. He is currently the owner of Imaging and Analytics Consulting, Ltd., a small consulting company based in Israel. His wife is the owner of DL Analytics, LLC, a small business based in California. Other authors have nothing to declare.