Delphi expert consensus for whole slide imaging in thyroid cytopathology

Despite an increase in thyroid fine needle aspiration (FNA) and advances in whole slide imaging (WSI) adoption, digital pathology is still considered inadequate for primary diagnosis of these cases. Herein, we aim to validate the utility of WSI in thyroid FNAs employing the Delphi method strategy.


| INTRODUC TI ON
Although malignant neoplasms account for less than 1% of all spaceoccupying lesions of the thyroid, 1 tumours arising from this gland represent the most common endocrine malignancy. 2 The incidence rates of thyroid nodules has consistently increased, at least partially due to more incidentally discovered lesions during ultrasound examinations. 3Despite significant improvements in ultrasound methods and molecular analysis, definitive diagnosis of thyroid nodules still relies on cytomorphological interpretation of aspirated material.Different diagnostic categories have been created, such as the widely recognised Bethesda System for Reporting Thyroid Cancer, 4 and the 5-tiered classification system routinely employed in Italy. 5A cytological diagnosis is key in guiding clinical management.However, the cytological evaluation of thyroid fine needle aspirations (FNAs) remains challenging. 2option of digital pathology (DP), including whole slide imaging (WSI), could benefit the routine evaluation of thyroid nodule FNAs by introducing efficiency, automation, workload balancing and potentially facilitating the use of artificial intelligence (AI)-assisted tools.WSI involves scanning (digitising) an entire glass slide, thereby allowing pathologists to remotely navigate and even share this digital slide using monitors.Digital pathology and the application of AI has been broadly investigated in many fields of pathology, [6][7][8] and has now been approved for diagnostic use by several regulatory bodies. 9Nevertheless, digitisation of cytology slides harbours some technical peculiarities, mainly the need for employing multiple scanning planes (Z-stacking) to focus on three-dimensional material, 10,11 which has hampered its widespread adoption.Even updated College of American Pathologists (CAP) guidelines for the validation of WSI for primary diagnostic purposes considered cytopathology to be still "immature." 12,13en evidence is lacking and research is limited, a putative solution can exploit the Delphi method.This technique makes use of collective intelligence for generating a reliable consensus opinion in order to reach best practice guidance.This is accomplished by gathering a panel of acknowledged experts on a specific topic and asking them to respond to questionnaires and give their controlled feedback. 146][17] Herein, we sought to employ the Delphi method to validate the adoption of WSI in thyroid cytopathology by gathering a broad panel of recognised experts and asking them to evaluate a heterogeneous collection of digitised thyroid FNAs.

| Cases acquisition and expert panel recruitment
All procedures performed in the present study involving human participants received institutional review board approval and were in accordance with the ethical standards of the relevant institutional and/or national research committee and with the Declaration of Helsinki.All patients gave their written informed consent to diagnostic procedures and treatment according to institutional rules for everyday clinical practice and experimental evaluations on archival tissue.
In order to validate the proposed method on a reliable cohort, reflecting common challenges faced in routine practice, seven widely recognised reference Italian cytology centres were asked to gather, digitise, and then share FNA cases of thyroid nodules encompassing each of the six Bethesda reporting and also the five Italian system categories.The original pathology diagnosis was considered the ground truth reference standard to guide the proper selection of the slides.The whole series consisted of 80 slides from 80 thyroid FNA cases, ranging from 9 to 14 per centre, and included direct smears (91%, 71/80 cases) and fewer liquid-based cytology (LBC) slides (9%, 9/80 cases).The most representative slide according to the original light microscopy (LM) examination was chosen for each of the included cases of the whole series.Slides were stained with either haematoxylin and eosin (17%, 14/80), Diff-Quik (25%, 20/80), or the Papanicolaou method (56%, 46/80).
Results: High consensus was achieved for all parameters, with an overall average score of 4.27.The broad majority of items (84%) were ranked either 4 or 5 by each physician.Two badly scanned cases were responsible for more than half of the lowranked (≤2) values (57%).Good to excellent (≥3) diagnostic confidence was reached in more than 95.2% of cases.For most cases (78%) WSI assessment was not limited by technical issues linked to the image acquisition process.

Conclusion:
This systematic Delphi study indicates broad consensus among participating physicians on the application of DP to thyroid cytopathology, supporting expert opinion that WSI is reliable and safe for primary diagnostic purposes.

K E Y W O R D S
consensus panel, Delphi method, digital pathology, thyroid cytopathology, validation, whole slide imaging All slides were digitised with a Panoramic P1000 scanner at 40× magnification with Z-stacking combining 30 different focusing planes and then uploaded on a dedicated web platform provided by Epredia (Figure 1).The file sizes of the digital slides ranged from 308 MB to 3.2 GB.More than one pathologist per institution with recognised experience in dealing with DP was invited to join the study.
Participants were able to access and navigate the scanned slides.
Image viewer tools permitted end-users to annotate regions of interest, insert comments on images, use linear and square measurement tools, and take snapshots if needed.

| Delphi study procedure
This Delphi study was conducted over two rounds (Figure 2) via a series of two surveys matched with controlled opinion feedback.In Round 1 (exploration), before assessing the WSIs, all of the participants were asked to answer the following questions: 1. Which pre-analytic parameter is more likely to significantly affect the interpretation of FNA thyroid specimens?2. Which morphological findings carry the most relevant meaning in the evaluation of FNA thyroid specimens (e.g.background, architectural growth pattern of thyroid cells, etc.)?
3. What are the main issues specifically related to WSI that potentially hamper the cytological assessment of these thyroid FNA specimens?
One of the investigators (M.S.) collected the answers to the Round 1 survey and applied them to build the Round 2 (analysis and validation) questionnaire items.Specifically, once the members of the panel had reviewed the WSIs on the aforementioned online platform, six investigators from the participating centres were asked to respond to the following questions, giving a quantitative score of their confidence to assess the digital slides on a Likert scale ranging from 1 (totally unable) to 5 (comparable to LM):

| Round 1
The answers to the first explorative round of the study by participants from all institutions are summarised in Table 1.Regarding the pre-analytic phase, all respondents indicated that either specimen fixation (57%) or the manual smearing technique (71%) would be the main factors potentially affecting their evaluation of slides.
Blood contamination and inferior staining were also mentioned as confounding issues.For cytomorphology, the vast majority of pathologists considered preservation of architectural growth patterns of follicular cells (86%) and of their nuclear detail (71%) to represent key cytological features necessary to reliably interpret a thyroid FNA.Furthermore, cellularity of the smear, cellular background (e.g.lymphocytes), and characteristics of colloid material were all noted as relevant findings for the evaluation of thyroid FNAs.Specific to the digitalisation process, loss of the three-dimensional distribution of cells was the main concern for most of the participants (71%).
Similarly, significant blood contamination (43%) and focus problems diminishing nuclear details (29%) were also considered to be problematic.

| Round 2
As mentioned, answers from   to LM about the slides' evaluation (Figure 4A).Higher rates were reached for determining global cellularity and colloid evaluation, respectively achieving 93% and 90% ≥4 values.Pathologists experienced more difficulty in assessing nuclear details, although less than 10% of the slides were scored ≤2 for this specific parameter (Figure 4B).Moreover, two poorly scanned slides accounted for more than half of these low-ranking rates (57%).When asked about rendering a final diagnosis on WSIs, the respondents reached good to excellent (≥3) rates of confidence in more than 95% of the cases (95.2%).Of note, the two aforementioned badly scanned slides were responsible for three-quarters (74%) of the low-ranked scores (≤2).
Thus, excluding such smears, participants felt unable to assess the overall cytological diagnosis on WSIs in less than 2% of the cases.
With regard to the yes/no questions, in most cases (78%) the evaluation of smears was not significantly limited by any relevant issue related to the digital acquisition process (Figure 4C).A partial lack of nuclear detail was noted in nearly a third of the slides (34%), despite not influencing pathologists' ability to reach a definitive diagnosis.Instead, blood contamination and loss of the threedimensional perception of the smear were reported in the minority of the cases (20% and 11%, respectively) (Figure 4D).

| DISCUSS ION
In recent years, DP has revolutionised the approach in pathology for both diagnosis and discovery.Nevertheless, broad adoption of WSI for primary diagnosis has been met with opposition by the pathology community.Indeed, limited validation cohorts have been published regarding the use of WSI in cytopathology.In the present study, we employed a consensus-based Delphi method to compare WSI diagnostic ability to conventional LM.By reviewing a broad series of scanned thyroid FNAs, our expert panel achieved an average 4.27 score for all evaluated parameters on a 1 to 5 Likert scale.Furthermore, reliable diagnostic confidence was reached by all respondents in 95.2% of the cases, which is in line with the CAP agreement threshold for validating a DP system. 12en though fair agreement between DP and LM in cytopathology has been similarly previously reported, 18 a few considerations are worth mentioning about the data from the present study.First, in our series DP received wide consensus for each of the pivotal cytomorphological features identified by the experts, including preservation of follicular cell nuclear details.Second, consensus was not influenced by either the type of stain employed or slide procedure (i.e.smear vs LBC preparation).This is interesting because in general, direct smears are generally purported to be less suited to digitalisation than LBC slides. 19Finally, our cohort was made up of a broad number of patients, surpassing the CAP's proposed threshold of 60 cases.Of note, it is worth stressing that the assessment of pathologists' diagnostic confidence in this study was one of the main goals in order to validate the routine use of WSI in thyroid cytopathology.similarly designed studies are warranted for widening the currently available evidence on this subject, potentially embracing novel image analysis tools as well. 22These considerations notwithstanding, our data validate the reliability and safety of DP and WSI for thyroid cytopathology diagnosis.

1 .
Are you able to evaluate the architectural growth pattern of thyroid cells? 2. Are you able to estimate the overall amount of thyroid cells?3. Are you able to assess the nuclear details of thyroid cells? 4. Are you able to appraise the amount and quality of colloid material? 5. Are you able to focus on cells apart from follicular cells (e.g.lymphocytes)?6.How is your overall confidence in establishing a definitive cytopathology diagnosis?Moreover, in this second round of the Delphi procedure, physicians were also requested to answer these additional three yes/no questions: 1.Is blood contamination in WSIs a relevant diagnostic problem? 2. Does WSI evaluation prevent you from proficiently assessing the three-dimensional distribution of cytological material?3. Are you concerned about focus-related issues (e.g.loss of nuclear detail, stratification of cells from different planes)?All surveys were completed via an Excel form (Microsoft).Participants remained anonymous during both rounds.Two of the authors (S.M. and M.S.) collected and then analysed the response data.
Round 1 were gathered to build this phase of the survey, the results of which are detailed in Tables 2 F I G U R E 1 Platform for the digital sharing of whole slide images and 3. Examples of WSIs highlighting different cytological features are provided in Figure 3. Investigators were fairly confident with interpreting WSIs, reporting an overall average score of 4.27 for all parameters, ranging from a 3.9 mean value for the assessment of nuclear details of follicular cells to a 4.69 average rate for the global estimation of smear cellularity.Regardless of the parameter F I G U R E 2 Flowchart illustrating the Delphi study process.WSI, whole slide imaging TA B L E 1 Participants' answers to the first round of the survey Pre-analytic phases affecting interpretation of FNAs Key morphological findings of thyroid FNAs WSI-related issues Centre no.
broad majority of elements (84%) were either ranked 4 or 5 by each pathologist, indicating broadly comparable results

F I G U R E 3 5 |
(A) Diff-Quik stained slide showing follicular aggregates (original magnification 100×).(B) Microphotograph of a haematoxylin and eosin whole slide image with atypical cell aggregates revealing prominent nuclear grooves and pseudoinclusions (original magnification 400×).(C) Picture from a haematoxylin and eosin-stained case with large eosinophilic pools of colloid material (original magnification 200×).(D) Papanicolaou-stained slide highlighting plump follicular cells intermingled with scattered fewer small lymphocytes (original magnification 200×)deals with slanting the slide surface 1 degree relative to the focal plane, concurrently spanning multiple layers, and then acquiring and fusing the digitised images to avoid defocus blur, ultimately obtaining high-quality files in restricted amounts of time.21Similarly, the latter method (also called "deep-focusing"), consists of capturing images with the best focus of the cellular material located in various planes and then assembling these images into a comprehensive, larger final composite file where all the optimised areas are gathered in a single plane.19CON CLUS IONIn the present work, we have proficiently demonstrated the suitability of DP for the evaluation of thyroid FNAs, relying on the Delphi method.Unlike histopathology, due to the lack of internationally acknowledged guidelines we chose to leverage a panel of recognised experts on the topic and achieved excellent results across the entire range of different key parameters identified.Of course, other

Are you able to evaluate the architectural growth pattern of thyroid cells? Are you able to estimate the overall amount of thyroid cells? Are you able to assess the nuclear details of thyroid cells? Are you able to appraise the amount and quality of colloid material? Are you able to focus other
Abbreviations: FNA, fine needle aspiration; WSI, whole slide imaging.TA B L E 2Single investigators' answers on a Likert scale for each evaluated parameter and global summary in the second round of the study

Are you able to evaluate the architectural growth pattern of thyroid cells? Are you able to estimate the overall amount of thyroid cells? Are you able to assess the nuclear details of thyroid cells? Are you able to appraise the amount and quality of colloid material? Are you able to focus other cells apart from thyroid ones?
20wever, as formulating a specific cytological diagnosis was beyond the aim of this article, we did not assess interobserver agreement for each of the shared cases.WSI acquisition problems are well-known to hinder the widespread use of digital cytopathology.Whilst scan failure is a recognised technical issue for histology specimens,20this is particularly true for cytology smears because they contain three-dimensional cellular groups and often have obscuring material.Z-stacking (i.e.digitising slides in multiple focal planes) has been proposed as a putative solution.However, this process may lead to prolonged acquisition times and larger digital files.Alternative options include "slanted scanning" and "volumetric scanning."Briefly, the former approach