Quick Annotator: an open‐source digital pathology based rapid image annotation tool

Abstract Image‐based biomarker discovery typically requires accurate segmentation of histologic structures (e.g. cell nuclei, tubules, and epithelial regions) in digital pathology whole slide images (WSIs). Unfortunately, annotating each structure of interest is laborious and often intractable even in moderately sized cohorts. Here, we present an open‐source tool, Quick Annotator (QA), designed to improve annotation efficiency of histologic structures by orders of magnitude. While the user annotates regions of interest (ROIs) via an intuitive web interface, a deep learning (DL) model is concurrently optimized using these annotations and applied to the ROI. The user iteratively reviews DL results to either (1) accept accurately annotated regions or (2) correct erroneously segmented structures to improve subsequent model suggestions, before transitioning to other ROIs. We demonstrate the effectiveness of QA over comparable manual efforts via three use cases. These include annotating (1) 337,386 nuclei in 5 pancreatic WSIs, (2) 5,692 tubules in 10 colorectal WSIs, and (3) 14,187 regions of epithelium in 10 breast WSIs. Efficiency gains in terms of annotations per second of 102×, 9×, and 39× were, respectively, witnessed while retaining f‐scores >0.95, suggesting that QA may be a valuable tool for efficiently fully annotating WSIs employed in downstream biomarker studies.

to the approximate width of the desired superpixel, and works well when set to the approximate width of the structure of interest. The nonnegative compactness value determines the regularity of the superpixel boundary, wherein higher compactness encourages superpixels to retain their initial square shape, while lower compactness allows for greater boundary irregularity. As an example, our epithelium use case employed a lower compactness setting due to highly irregular boundaries, versus nuclei which tend to be more consistently circular and thus have a smoother boundary. Lastly, setting a higher edgeweight encourages the DL model to focus the loss function on incorrectly classified boundary pixels; increasing this weight is beneficial when clear boundaries are hard to distinguish.

Section SM 2. Experiment setup and workflow
In this paper we focused on 3 histologic structures for segmentation: pancreatic nuclei, colorectal tubules, and breast cancer (see Figure S1).
Each use case followed the workflow presented in Figure S3. However, each use case benefited from slight variations in each step, due to histologic structure size, in order to optimize annotation efficiency (Section SM 3). All experiments were conducted on a Windows 10 desktop with a Nvidia RTX2060 8GB GPU.

Section SM 3. Use case specific workflows and insights SM 3.1 Nuclei Case
We selected 5 pancreatic cancer WSIs scanned at 40x from TCGA-PAAD dataset verified by Saltz's Group [21]. These 5 WSI images were divided into 2000 x 2000 image tiles. We selected 100 tiles from the generated ROIs. In accordance with the workflow presented in Figure S3, 20 nuclei tiles were uploaded into QA, a u-net autoencoder was trained, and patches were plotted on the embedding plot.
Patches were then selected and manually annotated for 5 minutes. The DL prediction model was then trained and predictions were reviewed for modification and acceptance. The process iterated using batches of 20 tiles until all 100 tiles were completed.
In the first 5 minutes of nuclei annotation, even though no DL model was available, QA performed twice as fast as manual segmentation using QuPath [10] due to QA's superpixel functionality (0.27 vs 0.14 nuclei per second). Superpixels enabled one-click selection for a subset of nuclei, notably improving annotation efficiency. As the DL model began to produce better predictions due to more training data, fewer modifications need to be made before accepting the model's proposals. This corresponded to the jump in improvement observed in Figure 3A.

SM 3.2 Tubules Case
We selected 10 colorectal cancer WSIs from TCGA-COAD dataset. These 5 WSI images were divided into 1000 x 1000 image tiles and down sampled to 10x magnification, from which 100 tubule containing tiles were selected. To begin, 20 tiles were uploaded into QA after which the same workflow as to the nuclei use case was employed. Figure 3B shows the efficiency changes over time as more tubules are annotated and the DL performance improves. Performance fluctuations were the result of differences in quality of WSIs, resulting in some tiles requiring additional correction. The superpixel feature continued to display a oneclick selection for many tubules ( Figure S1). Compared with nuclei annotation, tubule annotation efficiency converged faster and gave reliable suggestions with fewer annotated patches. The large difference in efficiency performance between nuclei and tubules (103x vs 9x, respectively) resulted from the fact that tubules occupy larger area, and thus there are fewer of them per 1000 x 1000 tile, implying more time is spent transitioning between tiles.

SM 3.3 Epithelium Case
We selected 10 WSIs from an in-house estrogen receptor positive (ER+) breast cancer dataset scanned at 40x, and were processed similar to the tubule use case.
In the first 5 minutes of epithelium annotation, the bulk of the effort was spent manually delineating regions, as superpixel boundaries were not reliable ( Figure S1 yellow arrow). This manual process is observed to be slower than the other 2 use cases due to the epithelial compartment's intricate structure.
It appears that once a sufficient training set is created, coinciding with 246 annotated regions, the user starts to largely accept the DL suggestions. After this transition point, QA starts to provide improvements in both efficiency and annotation precision. For example, QA was able to provide better pixel-level segmentations in delicate regions which may be intractable for manual annotators ( Figure   S4).  . While QA is able to recapitulate the annotation with high fidelity in less complex images (C, f-score =0.89), in more complex regions (F, e.g., areas indicated with arrows, f-score=0.69) the prediction QA generates appears to be able to provide a level of precision beyond that which would be achievable with human efforts. QA employs an intuitive binary classification such that all pixels are labeled as either epithelial or not epithelial. The latter class includes the totality of all other classes on the image, for example stroma and slide glass.