Advances in AI‐based cancer cytopathology

Cytopathological examination plays a crucial role in cancer diagnosis as it reflects the cellular pathology of cancer. However, this process traditionally relies on the visual examination by cytopathologists. Recent advancements in computer and digital imaging technologies have enabled the application of artificial intelligence (AI)‐based models to identify tumor cells in images, thereby assisting cytopathologists in achieving enhanced performance. AI‐based models can improve the accuracy and reproducibility of image evaluation and streamline clinical workflows. Moreover, AI‐based models can analyze a diverse range of sample types, including peripheral blood, urine, ascites, and bone marrow. AI‐based cytopathological recognition can help clinicians screen and diagnose cancer, predict prognosis and recurrence of cancers, such as leukemia, cervical cancer, urothelial carcinoma, and gastric cancer. Additionally, AI‐based models can predict the types of mutations in leukemia. A growing number of studies emphasize the potential of computational image analysis and deep learning‐based AI to build novel diagnostic tools that are conducive to the biomedical field. This review describes the recent developments in AI‐based cytopathological recognition and offers a perspective on how AI tools of cytopathology can help improve cancer diagnosis and prognosis prediction. Future developments in AI model applications can further contribute to the improvement of human health.

well as other new technologies. 5 However, despite the continuous innovation of new technologies, the delayed diagnosis limits the effectiveness of available treatments and leads to poor patient outcomes. [6][7][8] Therefore, the early diagnosis of cancer is particularly important.
Pathological biopsy is the golden standard for the diagnosis of cancer, which relies on removing diseased tissue from the patient through cutting or puncturing for pathological examination. 9,10 Another important component of tumor diagnosis is cytopathology, which can also help identify cancerous or precancerous changes and provide valuable information for diagnosis and treatment planning. 11 Cytopathology involves the microscopic examination of individual cells collected from a variety of body sites through fine needle aspiration, brushing, scrapings, or washings and commonly uses staining techniques to enhance cellular features, which improves the visibility of cells and highlights specific structures, such as the nucleus, cytoplasm, and other cellular components. 12 Moreover, cytopathology offers several advantages as a complementary diagnostic tool. Firstly, it is less painful and more cost-effective for patients. 13 Secondly, cytopathology can provide a rapid diagnosis in many cases and can be used for monitoring disease progression or treatment response as it can be repeated more frequently than tissue biopsy. 14 Additionally, cytopathology can be used to sample a wider range of anatomical sites, such as the urinary tract or serous cavity, which may not be accessible through biopsy.
The evaluation of cytopathology plays a crucial role in cancer diagnosis, prognosis and recurrence prediction. 15,16 Traditionally, cytopathology diagnosis is performed by visual examination of cells by trained pathologists who assess the morphology and other features of the cells to diagnose cancer. 17 However, it heavily relies on the subjective interpretation of the cytopathologists, leading to inter-observer variability and potential diagnostic errors. 18 In addition, this process can be timeconsuming in time-limited clinical practice and laborintensive, especially when analyzing a large number of samples. 19 Accurate diagnosis of cell abnormalities and dysplasia requires rich experience, which is a great challenge for inexperienced cytopathologists and in samples with low tumor cells. 20 Furthermore, the sensitivity and specificity of visual examination can be affected by several factors, including the expertise of the examiner and the quality of the samples. 21 As a result, there can be a risk of misdiagnosis, especially when dealing with rare or unusual cell types. Therefore, there is a need to develop a new method to decrease the probability of miss and misdiagnosis, as well as improve the performance of clinical cytopathologists in morphological examination.
With the advancement of artificial intelligence (AI), computer algorithms and statistical models are employed to analyze and interpret large volumes of data, identify patterns, and make predictions or decisions. 22,23 In the medical field, AI has shown great potential in improving diagnosis, treatment, and patient care. For example, AI algorithms have been developed to analyze medical images and detect abnormalities, assist in surgical procedures, and predict treatment outcomes. 24 AI also has the ability to analyze large amounts of medical data and identify patterns that may not be easily recognizable by humans, allowing for more accurate and personalized diagnoses and treatments. 25 By leveraging advanced algorithms, AI can enable machines to perform complex tasks that were once thought to require human intelligence, improving efficiency, accuracy and decisionmaking in healthcare. 26 In this review, we summarize the application of AIbased cytopathology diagnostic systems in different cancers, including leukemia, cervical cancer, urothelium carcinoma, and gastric cancer. We also analyze the workflow of different AI models and discuss the advantages and challenges of AI in cytopathologic analysis and information integration. Finally, we explore the prospects and potential of AI in improving human health.

APPLICATIONS
AI is a rapidly developing field that involves the development of intelligent machines or systems capable of performing tasks that normally require human intelligence. 27 The development of AI can be traced back to the 1950s, but it was not until recent years, with the advent of big data and cloud computing that AI has seen significant breakthroughs and become increasingly integrated into our daily lives. 28 Huang et al. 28 summarized the connection of AI, machine learning (ML) and deep learning (DL) and features of ML and DL, as displayed in Figure 1. In simple terms, ML is a technique used to implement AI, and DL is a subset of ML that employs artificial neural networks to enable more complex learning ( Figure 1A). Differences between ML and DL as well as the advantages and disadvantages of the different working principles are shown in Figure 1B. ML relied on engineered features extracted from specific regions and used large amounts of data for training. Finally, ML completes the output of specific tasks through the results of dynamic learning of algorithms ( Figure 1C). DL, on the other hand, is considered capable of independent learning because it can automatically learn features from the data without manual feature engineering. It consists of several layers, which simultaneously carry out feature extraction, selection, and classification during the training process ( Figure 1D). ML and DL are both subfields of AI and data-centric. With the continuous improvement of hardware technology and data volume, ML and DL has been widely applied in the medical field.

BONE MARROW CYTOPATHOLOGY
Peripheral blood and bone marrow cytopathology is a key method to diagnose and evaluate hematopoietic diseases. 29 Peripheral blood and bone marrow cell classification and counting is one of the most common diagnostic methods in clinical practice, which can be used to assess patients' hematopoietic function and identify potential blood lesions. 30 Bone marrow cytopathology can also be used to assess disease progression and therapeutic effect. 31 In recent years, more and more attention has been paid to the study of peripheral blood and bone marrow cytopathology. With the development and improvement of technology, new technologies including automation and digital technology are widely used in the analysis and diagnosis of peripheral blood and bone marrow cells. 32 Digital imaging techniques can also be used to digitally scan and analyze bone marrow samples to more accurately assess cell counts and morphological features. 33 These new techniques are expected to bring greater accuracy, efficiency and reliability to peripheral blood and bone marrow cytopathology.
In 2021, Sidhom and colleagues 34 proposed a DL framework, which can be adapted to any diagnostic pipeline, to assist the diagnosis of acute promyelocytic leukemia (APL), a subtype of acute myelogenous leukemia (AML). They enrolled 34 APL patients and 72 AML patients as determined by FISH. Peripheral blood collected from enrolled patients was processed with CellaVision for smearing, staining, cell image acquisition, and pre-classification. The obtained images of single cells are divided into three categories for training: APL, non-APL and blurred cells. Three fully connected convolutional layers and a classification layer are used to learn and extract the features of single-cell and classify each cell ( Figure 2A). They took the average single-cell predicted value of each individual as the probability that the patient would have APL ( Figure 2B iii, vi). The results show that single-cell DL can predict and reveal the morphological characteristics of APL. The area under curve (AUC) of this model is 0.822 in the discovery cohort and 0.739 in the validation cohort ( Figure 2B i, iv). Besides, they used an established integral gradient method to identify single-cell images of the most representative APL and non-APL cells ( Figure 2C). They found that the (D) Workflow of DL. Both ML and DL are data-centric and involve AI. AI stands for artificial intelligence, ML stands for machine learning, and DL stands for deep learning. Reproduced with permission. 28 Copyright 2020, Elsevier.
chromatin of non-APL cells dispersed and concentrated at the edge of cells, while the chromatin of cells associated with APL focused on the center, which had not been reported in the previous studies.
In some cases, underlying genetic changes can be observed through cellular morphological changes in AML. 35 Eckardt et al. 36 presented a multi-step convolutional neural net (CNN)-based predictive model to recognize bone marrow smears of AML and predict NPM1 mutation. Their training and testing cohort included digitized bone marrow images from 1251 AML patients and 236 healthy donors, and mutation status was determined by screening for NPM1 mutations. The first step in this workflow is the initial cell segmentation using the human-in-loop cell segmentation method, in which hematologists make the corrections, named Faster Region-based Convolutional Neural Net (FRCNN), reaching an accuracy of 0.97 for cell segmentation. Subsequently, the cell characteristics of the obtained cells after segmentation, including cell lineage, cell type, plasma particles and Auer rods, were manually labeled by hematologists. The ability of the model to distinguish AML from healthy donors and to predict NMP1 mutations was tested by extensive training of the labeled images. The results show that AML cases can be accurately distinguished from healthy donors with the mean area under the receiver operating characteristic (AUROC) curve and AUC for precision-recall curve of 0.9699 and 0.9691, respectively. The model can also accurately predict NPM1 mutation status from bone marrow cell morphology with an accuracy of 0.86 and AUROC of 0.92. In addition, this study reveals that AML with NPM1 mutation is associated with cup-like blasts and a pattern of condensed chromatin accompanied by perinuclear lightening zones.
A study developed the first AI system in the world for evaluating dysplasia on bone marrow smears and revealed the outcomes of AI prediction for decreased granules (DG), which is one of the most representative dysplasia 37 ( Figure 3A,B). Two cytopathologists assessed the degree of dysplasia through a four-point scale between 0 and 3, 0: normal, 1: intermediate, 2: dysplasia and 3: severe dysplasia, and each image of bone marrow smear was manually labeled for training. The input images in the inference cohort were rotated by 16 different angles to obtain the regressors. Finally, the DG score was obtained by calculating the regressors and outputting the prediction of DG ( Figure 3C). The outcomes suggested that for the morphologist-labeled DG1-3, the positive predictive value (PPV), specificity, sensitivity, accuracy and negative predictive value (NPV) were 76.3%, 97.7%, 91.0%, 97.2% and 99.3%. Nonetheless, after the exclusion of DG1, PPV, F I G U R E 2 A DL model for single-cell classification from peripheral smears to predict and reveal morphological characteristics of APL. (A) The DL architecture is designed to train the cell-level classification of peripheral blood smear leukocytes. (B) The proposed model is trained, and its performance is tested in different discovery queues and independent prospective validation cohort. (C) The trained model was applied to the blast validation cohort to assess performance. Reproduced with permission. 34 Copyright 2021, Springer Nature. specificity, sensitivity, accuracy and NPV were improved to 80.6%, 98.9%, 85.2%, 98.2% and 99.2%, respectively.
AI-based approaches for cell segmentation and image classification provides a fast and highly accurate way to identify myelodysplasia and leukemia, as well as predict gene mutation, from peripheral blood and bone marrow cell morphology. In addition, the doctor-in-the-loop mechanism, which corrected more than 10% of the labels, improves the accuracy and robust of AI model. 36 These studies using the DL model reveal specific morphological features previously unreported in peripheral blood smears of APL and in bone marrow of AML with NPM1 mutations. The AI model allows laboratory technicians to work synchronously, avoiding treatment delays in highly suspected AML cases due to the results of other diagnostic procedures such as flow cytometry and molecular genetics.

| AI IN CERVICAL SECRETION CYTOPATHOLOGY
Cervical secretion cytopathology, also known as Pap smear, is an essential screening tool for detecting cervical cancer and precancerous lesions. 38 In recent years, liquid-based cytopathology (LBC) has emerged as an alternative method for cervical secretion cytopathology. 39 LBC offers several advantages over the conventional Pap smear, including increased sensitivity, reduced false negative rates, and improved reproducibility. 40 However, accurate diagnosis of LBC remains a challenge due to the lack of experienced and qualified cytologists or cell technicians, as well as factors such as diagnostic experience, mood and fatigue. Therefore, establishing an AI system that can assist clinical cytopathologists to improve work efficiency and reproducibility is a feasible solution.
A study 41 collected over 81,000 cervical liquid-based cytological smear samples from several major medical institutions and classified them by labeling strictly according to the Bethesda system (TBS) diagnostic requirements (>1.7 million annotated images). The researchers developed a cervical LBC AI-assisted TBS classification diagnostic system (AIATBS system) that included quality control solutions by integrating the features of the YOLOv3 model for target detection ( Figure 4A), the Xception classification model for feature extraction (Figure 4B), and a logical decision tree for fitting the features ( Figure 4F). Squamous intraepithelial lesions were further classified into the subtypes required for TBS classification using an XGboost model, which included a patch-based region classification model and nuclear segmentation model that could exploit existing features already extracted ( Figure 4C-E). With multicenter prospective validation (>34,000 samples), the results confirmed that the AIATBS system was not only fast (<180s/smear) and had high specificity (82.14%) but also demonstrated better sensitivity (>83.00%) than senior cytopathologists. Moreover, the AIATBS system can adapt to smear samples preparation using different staining and scanning methods.  The development of the AIATBS system reduces the workload of cytopathologists and greatly improves their diagnostic accuracy and work efficiency. The AIATBS system shortens the time required for clinical decisionmaking and demonstrates great potential in assisting diagnosis.

| AI IN URINARY CYTOPATHOLOGY
Urinary cytopathology is a diagnostic tool used to detect and diagnose cancers of the urinary system, including bladder, kidney, and ureter cancers. 42 The technique is non-invasive, cost-effective, and easy to perform, which had made it a valuable tool for cancer screening and surveillance. 43,44 However, interpreting urine cytology can be challenging due to the variable morphology of urinary tract cells, the presence of inflammatory cells, and the need for highly trained personnel to distinguish between benign and malignant cells. 45,46 Recent advances in AI have also shown promise in improving the accuracy and efficiency of urine cytopathology diagnosis. ML algorithms can be trained to recognize and classify abnormal cells, reducing the subjectivity and variability of human interpretation. 47 Masatomo et al. 48 used AI to develop a urine cytological classification system that employed CNN algorithms to classify images of urine cells as either negative (benign) or positive (atypical or malignant) ( Figure 5). They collected 195 urine smears from patients confirmed with urothelium carcinoma (UC), annotated and labeled independently by two cell cytopathologists, and ultimately selected 4637 images of cells with consistent diagnosis, including 3128 negative cells and 1111 positive cells. The labeled cells were used for training and testing at a ratio of 4:1 respectively, and five-fold cross-validation was employed to verify the model performance ( Figure 5A). In data augmentation, all images are resized to 256�256 pixels, and then cells were segmented and mixed using a customized CutMix called Circle Cut to develop a robust model against background image noise ( Figure 5B). Augmented images were trained and tested on EfficientNet. During the test process, feature vectors are extracted from input images to calculate the cosine similarity with the representative vector, which is conducive to classify the samples into the closest category ( Figure 5C). The AI model showed great performance in distinguishing negative and positive cells, with the AUC of 0.99, highest accuracy of 95%, sensitivity of 97%, and specificity of 95%, respectively.
Based on the automatic analysis of AI model, in addition to the differentiation of benign and malignant urine cells, UC can also be diagnosed. Zhang et al. 49 conducted a feasibility study for identifying UC by means of an ML algorithm based on Morphogo. A cytopathologist reviewed 37 urine slides that had been obtained as training dataset, as well as identifying and manually annotating representative cell classes. The algorithm was validated on 27 slides from 37 samples of urine and was tested on 12 unknown slides including high-grade urothelial carcinoma (HGUC), low-grade urothelial carcinoma (LGUC), prostatic adenocarcinoma, renal cell carcinoma and variety of non-oncologic diseases. In the test dataset, 2 HGUC samples were initially positive for tumor cells in urine, and Morphogo analysis revealed urine cytopathology in six patients with UC, including LGUC. What surprised the group was that Morphogo found abnormal cells in prostatic adenocarcinoma and LGUC, which means that Morphogo can also be used on some unusual UC, like prostatic adenocarcinoma.
To standardize diagnostic criteria of cytology diagnosis, the Paris System (TPS) containing concrete cytological criteria and diagnostic categories was proposed and is extensively applied for the UC urine cytopathology diagnosis. [50][51][52] TPS are partly objective (nuclear-to-cytoplasmic [N/C] ratio) and partly subjective (nuclear atypia/irregularity/hyperchromasia) in urine cytology. Poor interobserver agreement in the critical range of 0.5 to 0.7 of the N/C ratio is not sufficient for a correct diagnosis. 53 Sanghvi et al. 54 thought that a combination of automated image analysis and TPS assisted screening for UC urine cytopathology seems feasible. They developed a CNN model for predicting classification of urine cells and grade of UC based on digital whole slide images (WSIs). The study collected a total of 2405 ThinPrep slides after cell annotations and enrolled patients diagnosed according to TPS for training. CNN outputs TPS-dependent diagnosis by integrating cell-level features and slidelevel features, including HGUC, suspicious for HGUC, LGUC and negative for HGUC. The model showed specificity of 84.5% and sensitivity of 79.5% with 0.88 AUROC for HGUC. This work uses a larger dataset than previous studies, exploiting the WSI level rather than just the single-cell level.  The application of AI models will enhance the repeatability of urine cytopathology detection and may decrease the workload of cytopathologists. Besides, AI systems can assess a great deal of image data in seconds, but cytopathologists would need minutes or more. The superior processing power of AI possesses the potentiality to evidently reduce the time required for the evaluation of urine cytopathology. These advantages can be reinforced through continuous training utilizing reliable data and through updating the algorithm in the future.

| AI IN SEROUS EFFUSION CYTOPATHOLOGY
Cytopathology analysis of serous effusion plays an important role in the diagnosis and recurrence of cancer and helps guide treatment decisions. 55,56 Currently, there are many challenges associated with the cytopathological analysis of serous effusion, such as the difficulty of obtaining adequate samples and the low cellularity. 57 With the help of AI technology, the analysis of effusion cytopathology can be more standardized and objective, reducing the subjectivity associated with traditional visual examination. 58 Recent studies have explored the use of AI algorithms in the diagnosis of effusions, showing promising results in terms of accuracy and efficiency.
In 2018, Khin et al. 59 proposed a computer-aided diagnostic (CAD) system to assist in detecting malignant cells in pleural effusion. Their CAD system consists of seven main stages: preprocessing, nuclear segmentation, post-processing, identification and isolation of overlapping nuclei, feature extraction, feature selection and final classification. They developed a new hybrid method of Simple Linear Iterative Clustering (SLIC) and K-Means for preprocessing, segmentation and extracting features from pixels from the entire image. They extracted 201 features from the nucleus and proposed the use of simulated annealing combined with artificial neural network (SA-ANN) to select the most remarkable features with less verbose information. These selected features are used as inputs to classification models to predict malignancies. The method achieved a sensitivity of 87.97%, specificity of 99.40%, accuracy of 98.70% and F score of 87.79%.
In a recent study by Feng Su et al. 60 an AI algorithm was proposed to detect malignant cells in ascites and evaluate peritoneal metastasis and recurrence of gastric cancer (GC). They 60 collected ascites samples of 139 GC patients with peritoneal metastasis and photographed the ascites smears after H&E staining and Pap staining. Images were divided into labeled training sets and test data sets of the model. The model can be divided into two parts, DetectionNet and Classi-ficationNet, which can realize automatic cell location and cell classification through transfer learning. Histograms of the diagonal length of the cells show significant differences between benign and malignant cells ( Figure 6A). Before further processing, all single-cell images were normalized into the same diagonal length to ensure consistent spatial resolution before inputting the ClassificationNet model ( Figure 6B). The model achieved 0.8851 AUC and 96.80% precision in automatic localization and identification of ascites malignant cells ( Figure 6C) and showed great performance in true positive (TP), false positive (FP), false negative (FN) and true negative (TN) (Figure 6D), demonstrating that the DL system developed using transfer learning technology can achieve accurate cytopathological interpretation.
The major advantage of using AI for effusion cytopathology is the ability to analyze large amounts of data quickly and accurately, allowing for more precise and comprehensive diagnosis. Additionally, the use of AI can help identify subtle changes in cell morphology or composition that may be missed by the human eye, providing valuable diagnostic information.

FUTURE PERSPECTIVES
AI-based cytopathology recognition can help clinicians screen, early diagnose, predict prognosis and recurrence of cancers, as well as predict gene mutation. In DL, the network autonomously normalizes images to common standards and learns meaningful features from standardized images for training. 61 The feature of automatic learning will shorten the output time. 62 The advantages of AI-based cytopathologic examination can supplement the current clinical deficiencies and improve the performance of cytopathologists for cancer assessment. Thus, the adoption of cytopathology detection using AI can improve the diagnostic process, making it more efficient and repeatable, and improving accuracy and precision. A growing number of studies have emphasized the potential of computer image analysis and AI to benefit the biomedical field. They can build new diagnostic tools to reduce costs and time, and improve task repeatability, as shown in Table 1.
AI could be integrated into clinical workflows by assisting cytopathologists in the analysis of large volumes of data. With the help of AI, cytopathologists can quickly and accurately analyze vast amounts of data, identifying patterns and potential abnormalities that might otherwise be missed. Additionally, AI can assist in the integration of multiple sources of data, such as genomic, imaging, and clinical data, to provide a more comprehensive view of a patient's condition. The optimal use of AI in cytopathology will require ongoing collaboration between pathologists and AI developers, with a focus on developing algorithms that are accurate, clinically relevant, and ethically sound. In addition to its application in digital images, AI holds great promise for the future in big data analysis in the medical field. With advancements in machine learning algorithms and the availability of large datasets, AI can help clinicians and researchers identify novel biomarkers, 63 understand disease mechanisms, 64 and develop personalized treatments. 65 Additionally, AI can facilitate the integration of multi-omics data and improve the accuracy of diagnoses and prediction. 66,67 It could even be used to discover or identify new drugs. [68][69][70] Although AI show great promises to revolutionize medical development, technical, social and legal challenges lie ahead. As AI depends heavily on large amounts of high-quality training data, care must be taken to the access of data that is representative of the target patient population. The lack of public datasets truly representative of clinical practice hinders the clinical application of AI-based algorithms for cytopathology recognition. 71 The generality of AI models cannot be assessed because of the variability of datasets from different sources. 72 Due to the lack of public data sets, the limited generality of AI algorithms may be one of the most significant obstacles limiting their large-scale clinical implementation. In addition to public datasets, data quality and model selection are particularly important. Unstructured data with a high level of noise is sparse and inconsistent, which requires dedicated management and data cleaning before being fully utilized. 73,74 Studies have shown that complex models with well-annotated codes are still not widely adopted. 75 The replicability of a model is essential for the validation on external datasets and translational application of AI model in the clinic. Therefore, data and models should be taken into account in the development of AI-based cytopathologic recognition methods.
DL is currently known as a "black box" due to the ability of DL models to output results that have been trained but not explicitly programmed. 76 This phenomenon has led to research into explainable AI, which can be developed to understand the functions of DL models better. 77 Unlike traditional diagnostic approaches, AI models may not provide clear explanations for their  output, making it difficult for pathologists to understand how the diagnosis was reached. By understanding how AI model study characteristics of given image data automatically, additional mechanistic studies on the explainability and transparency of AI models can be conducted and errors made during model training can be corrected more easily, which will also make the clinical application of AI models more consistent with the opinions of clinicians. 78 Another challenge is the ethical and legal issues surrounding the use of AI in cytopathology. 79 These include issues related to patient privacy, data ownership, and liability in cases where the AI model produces incorrect diagnoses. Patients may not fully understand the implications of having their samples analyzed by AI models, and there may be concerns around privacy and data ownership. It is important to ensure that patients are fully informed about the use of AI models in their care and have the opportunity to provide informed consent. 80 Another major concern is the potential for biases in the data used to train the AI algorithms, which can result in the perpetuation of inequalities and disparities in healthcare outcomes. 81 If the training data used to develop an AI model is biased toward a particular demographic group, the resulting model may not perform as accurately on samples from other demographic groups, potentially leading to misdiagnoses and inadequate treatment. Thus, clinicians and patients must be able to understand how these models arrive at their diagnoses and decisions, and there should be clear mechanisms for identifying and addressing errors or biases that may arise.
To overcome these challenges, enhanced efforts in medical data collection, labeling and processing are needed to improve the quality and availability of data. There is also a need to strengthen human-machine collaboration in AI algorithm design and development, and to integrate doctors' clinical experience and knowledge into AI algorithms. In addition, research on the interpretability of AI algorithms needs to be strengthened so that doctors can understand the decision-making process and results of AI algorithms, so that they can better cooperate with AI to complete medical tasks. Continued research and collaboration between computer scientists, clinicians, and researchers will hold great potential for the future of AI in medical big data analysis.