TagLab: AI‐assisted annotation for the fast and accurate semantic segmentation of coral reef orthoimages

Semantic segmentation is a widespread image analysis task; in some applications, it requires such high accuracy that it still has to be done manually, taking a long time. Deep learning‐based approaches can significantly reduce such times, but current automated solutions may produce results below expert standards. We propose agLab, an interactive tool for the rapid labelling and analysis of orthoimages that speeds up semantic segmentation. TagLab follows a human‐centered artificial intelligence approach that, by integrating multiple degrees of automation, empowers human capabilities. We evaluated TagLab's efficiency in annotation time and accuracy through a user study based on a highly challenging task: the semantic segmentation of coral communities in marine ecology. In the assisted labelling of corals, TagLab increased the annotation speed by approximately 90% for nonexpert annotators while preserving the labelling accuracy. Furthermore, human–machine interaction has improved the accuracy of fully automatic predictions by about 7% on average and by 14% when the model generalizes poorly. Considering the experience done through the user study, TagLab has been improved, and preliminary investigations suggest a further significant reduction in annotation times.

data set has been prepared, current GPU performance allow for the rapid optimization of automatic recognition models. While fully automated solutions offer dramatic reductions in human effort, their accuracy is still lower than human experts can achieve for complex scenarios.
Such general considerations are applicable to the field of underwater monitoring. Large-area imaging is an increasingly common solution in the study of subtidal environments. Coral is a framework-building species, and its growth is directly responsible for creating and maintaining coral reef habitats. Spatio-temporal analyses of seabed orthoimages increase the understanding of demographic patterns and the spatial dynamics of coral reef communities. Images are annotated (i.e., corals are outlined for the fine-scale colony mapping) either with standard general-purpose photo editing software or special purpose marine image annotation software (a short review is given in Section 2.2). However, the application of artificial intelligence (AI)-based assisted tools remains marginal, as the required pixel-wise tracing accuracy is only achievable manually. Manual tracing is very time-consuming, as each square meter of imagery demands up to an hour of human effort. As large volumes of unprocessed imagery data are already available and new imagery is continually being created, such human-driven data extraction efforts limit the rate of productivity in the analytical process. Reef-building corals are an incredibly diverse group of organisms, consisting of around 850 species (Hoeksema & Cairns, 2019), and thus offer a unique set of challenges to the field of automatic image processing. Coral biology introduces challenges to the process of automated segmentation, due to the complexity and asymmetry of many coral growth forms and the considerable morphological variability within and among species. Importantly, as corals are relatively slow-growing species (linear extension rates can be less than 1 cm/yr), the level of precision required to accurately document changes and colony evolution is exceptionally high. Poor visibility and floating particles damage image clarity in underwater data. These factors complicate the design of fully automatic semantic segmentation models for coral taxa, a task in which human experience remains central and not replaceable.
TagLab implements a human-centric pipeline that has been proven to speed up the annotation work, retaining the accuracy of the manual approach and ensuring experts keep control of the annotation process.
The annotation pipeline comprises three steps: (1) an AI-assisted/ manual labelling, in which intelligent tools based on CNNs speed up the annotation from scratch; (2) a learning pipeline to create, test, and use custom recognition models; (3) an editing/validation final step, in which the expert can improve automatic predictions. In terms of data analysis, TagLab integrates ad hoc image-processing and imageanalysis tools, supports georeferencing, and interoperates with GIS software. The software includes the following original resources: • an AI-based flexible annotation workflow, which unlike similar software allows the per-pixel editing of predictions. More details are given in Sections 2.2 and 3, • an Edit Border tool, which facilitates the manual editing of complex boundaries (Section 3.1.2), • the support of multichannel images, which enables multimodal coregistered data to be handled. For example, the loading of digital elevation models (DEMs) allows users to approximate the 3D surface area of coral colonies (see Section 4), • a Multitemporal comparison tool, which automatically tracks the temporal evolution between segmented regions and allows for interactive visual inspections of extracted data (see Section 4). This paper reports a thorough evaluation of the improvements brought by TagLab both in terms of annotation time (efficiency) and accuracy. This evaluation was carried out through a comprehensive user study conducted with the Scripps Institution of Oceanography (UCSD). Semiautomatic and automatic tools demonstrated their efficiency in speeding-up up coral reef large-image analysis. In addition, the interactive editing of automatic predictions also proved essential for achieving the high accuracy levels required in ecological studies.
TagLab is available on GitHub (https://github.com/cnr-isti-vclab/ TagLab). Reducing the time required for postprocessing of coral reef imagery enables researchers to process increasingly large volumes of data, thus facilitating a greater capacity to understand and predict future changes to coral reef ecosystems. We closely collaborate with several marine research laboratories, continuously updating the software (see Section 6.4) and developing novel automated/assisted strategies to support digital underwater monitoring ( Figure 1).

| RELATED WORK
The widespread use of supervised deep-learning solutions has recently led to the development of several software applications and algorithms that speed up the preparation of training data sets. In terms of the semantic segmentation task, many of these applications exploit weak supervision. Object areas are rapidly marked using points, scribbles, bounding boxes, or polygons. Starting from this partial information, an algorithm then generates segmentation masks.
This section provides a brief description of the most popular weakly supervised annotation methodologies, followed by an overview of the current tools developed for marine species annotation.

| Weakly supervised methods
Drawing a bounding box is a quick and intuitive task. Khoreva et al. (2017) use bounding boxes to extract an initial proposal of the object mask. The region outside the box is marked as the background, while an algorithm (Pont-Tuset & Van Gool, 2015) evaluates the inside area. A recursive training frame-work based on a CNN achieves the final prediction. Deep Grabcut (Xu et al., 2017) uses the bounding box as a soft constraint, designing a CNN which takes as its input an image concatenated with a "distance map" defined as starting from the object bounding box.
One of the first interactive segmentation methods (Boykov & Jolly, 2001), (Boykov and Jolly, 2001), involves the tracing of the object's background and foreground scribbles. The segmentation task PAVONI ET AL. | 247 is then formulated and solved as a graph cut problem. Other classic solutions, such as GrabCut (Rother et al., 2004), are based on energy minimization. Geodesic Star (Gulshan et al., 2010) exploits a weighted geodesic distance based on pixel statistics to obtain segmented regions starting from scribbles. ScribbleSup (Lin et al., 2016) uses a graph-based model in conjunction with a fully convolutional network (FCN) (Long et al., 2015) to propagate the scribble information to entire segmentation regions. The graph is built on a superpixel subdivision of the input image.
Many interactive methods output semantic regions starting from point clicks. Xu et al. (2016), integrate positive clicks in the foreground with negative clicks in the background in a learning scheme. Euclidean distance transforms the clicked points into two separate maps, which are later concatenated with the input image to feed a FCN. A graph cut optimization (Rother et al., 2004) applied on the FCN output leads to the final segmentation. Le et al. (2018), transform user clicks into an interaction map by expanding Gaussians centred on them; which then feed an FCN network (Long et al., 2015) to output a rough predicted mask. Finally, a standard geodesic path solver (Cohen, 2006), applied on the boundaries map refines the segmentation. A recent solution (Forte et al., 2020) builds upon a U-Net (Ronneberger et al., 2015) architecture reaches an exceptional accuracy of between 95% and 99% of mean intersection-over-union (mIoU), a measure of overlap between labelled regions, by using an elevated number of user's clicks (around 20). A solution between bounding boxes and point-clicks is represented by clicking the object's extremes (top, left, bottom, and right). The efficiency of using extremes in terms of the bounding boxes has been demonstrated by Papadopoulos et al. (2017). They report a median time for annotating an object of 34.5 s, 25.5 s for drawing the bounding box, and 9.0 s for confirming annotation's correctness.
Picking extreme points is five times faster then drawing bounding boxes and requires only 7 s on average, thanks to its small cognitive workload. Thus, the experimental results of Papadopoulos et al. show that the performance of automatic recognition models (Fast R-CNN, Girshick, 2015, for object detection and the DeepLab for object segmentation) trained by using them is higher. This means that, in general, humans provide tighter bounding boxes around the objects using this paradigm. Maninis et al. (2018), propose Deep Extreme Cut (DEXTER), a CNN for the interactive agnostic segmentation based on the extremes point paradigm. DEXTER follows technical solutions used in DeepLabV3+ to achieve high-res results.
Objects without holes can be precisely annotated using enclosing polygons, but as drawing a polygon typically requires many clicks (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40), this is a high time-consuming labelling method. Polygon-RNN (Castrejón et al., 2017) speed up polygon tracing using a recursive neural network (RNN). As soon as the user starts clicking points, the Polygon-RNN network processes the placed clicks and automatically predicts the next ones. This process has been found to speed up the generation of segmentations by a factor of 4.7 when tested on the Cityscapes data set (Cordts et al., 2016). RNN++ (Acuna et al., 2018) then follows Polygon-RNN, in which the features are extracted using a modified version of the ResNet-50 to increase their resolution. An Evaluator Network then estimates the accuracy of predicted polygon via Reinforcement Learning. Finally, the output is refined using a Graph Neural Network (GNN).

| Annotation and segmentation solutions for marine organisms
In this section we review the algorithms and software tools developed for the annotation and segmentation of marine organisms. The web platform CoralNet is a widely known AI-based solutions for creating manual and assisted point-based annotations (Beijbom F I G U R E 1 TagLab's main user interface splits into three main components: the central Working View, the Toolbar on the left, and a right area containing three panels: the Labels, the Region info, and the Map Viewer. The Working View covers the central part of the interface and visualizes the orthoimage with overlaid semantic annotations (colored polygons) et al., 2015). Images are annotated directly in the web browser, and when a sufficient number of data have been annotated, CoralNet trains a classifier and helps label the remaining images. Squidle+ (Friedman, 2017) is a cloud-based platform for annotating and georeferencing underwater visual data. This is extremely versatile in handling image, video, and orthomosaics (as a collection of tiles).
TagLab and Squidle+ follow different approaches. First, Squidle+ handles point-based annotation while TagLab labels regions.
Generally, point-based information is not sufficient for identifying the demographic drivers of change in coral communities (Edmunds & Riegl, 2020). Second, the AI-assisted part of Squidle+ implements an active learning approach; the interactive system asks the user additional inputs to improve its classification performance. TagLab offers a nonrigid working pipeline, offering assistive tools for the direct editing of automatic predictions. In terms of interactive tracing, DeepSegment (Andrew, 2018) adopts an image segmentation approach based on GrabCut (Rother et al., 2004) and superpixels (Achanta et al., 2010). Parameters must be tuned manually for each colony to achieve high accuracy. DeepSegment segments the entire image in small subregions. The user must add sematics separately to each one, which is a time-consuming process. CoralSeg is another recent algorithm that exploits superpixels in a hierarchical way to expand the sparse labelling, thus obtaining a coherent semantic segmentation (Alonso et al., 2019). This algorithm has been successfully applied to repeatable surveys of benthic communities (Yuval et al., 2021). CoralMe (Blanchet, 2016) adapts the Geodesic star convexity algorithm (Gulshan et al., 2010) to corals segmentation.
This algorithm takes an internal and an external sketched curve as input and returns the colony's accurate boundary outlining. The two initial curves must already be close to the contours to be effective, making the process accurate but not fast. Biigle (Langenkämper et al., 2017), is a web-based image and video annotation software that allows collaboration between users. It integrates an instance segmentation CNN, the Mask R-CNN (He et al., 2017); and like TagLab, the fine-tuning of this network follows a human-in-the-loop approach, as detailed in (Zurowietz et al., 2018). The main differences between this approach and our method is that the fine-tuning of the Mask R-CNN is achieved by accepting or discarding automatically generated proposals (yes/no paradigm) while TagLab allows for the rapid creation of a data set for the fine-tuning of DeepLab V3+ from scratch (due to assisted annotation) or by editing the obtained predictions and reusing them for the training. The complete workflow of TagLab is described in the next section.

| TAGLAB: A HUMAN-CENTRIC AI APPROACH
Scientific applications usually involve specific image data containing uncommon objects and complex recognition tasks, which require deep field knowledge and a high cognitive effort. Uncommon objects are usually underrepresented in machine learning benchmark datasets (which contain mostly everyday objects), thus affecting the potential of current CNN recognition models. In addition, the automation of complex recognition tasks following a supervised approach demands a massive amount of highly targeted training data.
Automatic labelling techniques can then fail to reach the accuracy levels achieved by experts (over 90%), and AI human-centric technologies that empower (rather than replace) human abilities are usually more successful than fully automated solutions. TagLab follows this principle by proposing the working pipeline illustrated in Figure 2. First, TagLab speeds up the manual annotation through a combination of AI-assisted tracing algorithms and specialized tools, thus creating suitable training data sets (Step 1). Next, the user is guided to a fully automatic custom semantic segmentation model optimization ( Step 2). The process starts with the custom data set F I G U R E 2 TagLab's annotation pipeline consists of three steps. (1) The assisted annotation.
(2) The learning pipeline, which guides users to optimize a custom semantic segmentation model. (3) The AI-assisted manual editing, where humans re-enter the annotation loop by correcting the automatic results using specialized tools. Additionally, TagLab integrates data analysis functionalities (4) accessible from different stages of the annotation process PAVONI ET AL. | 249 preparation; then the user sets the learning hyperparameters using the train-your-network (TYN) feature and launches the model optimization. Once the optimization ends, the user evaluates the learning metrics (such as the confusion matrix and the mIoU), visualizes predictions on the test tiles, and decides whether to save the model. This model can then be used to infer predictions on new unlabelled orthoimages. After the automatic classification, the human expert can re-enter the annotation loop (Step 3) and correct the prediction errors with the editing tools, as in Step 1. Finally, TagLab offers several options for analysing the annotated images (Step 4). In addition, TagLab supports the import of color-coded images, allowing for the refinement of annotations/predictions as inferred outside of TagLab.
TagLab has been implemented in Python using the Pytorch frame-

| AI assisted/manual labelling
TagLab handles the pixel-wise assisted/manual labelling of large orthoimages, eventually containing thou-sands of labelled regions. All of the interactive tools work at the orthoimage full resolution, and segmented regions are approximated and stored as polygons with subpixel accuracy.

| AI boundaries tracing
AI-based interactive annotation tools have two major advantages over standard image processing algorithms for interactive segmentation. First, CNNs are content-aware, and thus knowing what an object is, for example, in coral outlining, allows them to distinguish between the internal and external regions of a colony. Second, no additional parameters need to be specified (as in DeepSegment).
TagLab integrates two interactive segmentation CNNs that are both fine-tuned to work on corals shapes: the 4-clicks and positive/negative clicks tools. Below we only detail the 4-clicks tool, as the second interactive CNN was introduced after the user study (see Section 6.4).
The 4-clicks tool implements a custom version of the DEXTER (Maninis et al., 2018), which exploits the extreme points paradigm, as described in Section 2. This CNN was originally trained using two datasets, PASCAL VOC 2012 and the Semantic Boundaries Data set, specifically for semantic contour prediction. These data sets mainly contain everyday objects, so the original CNN tends to trace regular profiles. The version implemented in TagLab has been optimized to predict complex, jagged natural shapes after learning from a data set of 15,000 manually segmented coral instances. The Deep Extreme Cut network takes as inputs 4-channel data, the RGB image object, and a heat map created by centring four Gaussians on the extreme points, as indicated by the user. To produce the heat maps, we extracted the extreme points from each segmentation then simulated the uncertainty induced by human annotation (it is hard for an annotator, however accurate, to exactly pick the extreme points with pixel precision), by adding a random displacement in a range of 10 pixels around each extreme point. All of the network parameters were unfrozen during the fine-tuning. To avoid any forgetting effect, we set a learning rate ten times lower than the first training. The augmentation included both colour and geometry. The optimized model achieved an accuracy of 0.967 and a mIoU of 0.853. In Taglab, the 4-clicks tool activates a cross cursor which helps the user place the coral extreme points. The CNN outlines a pixel-level mask, which TagLab converts into a dense polygonal line approximating the colony's boundaries at sub-pixel accuracy.

| Advanced editing tools
The Edit Border and the Refinement are advanced editing options in TagLab instantaneously. This solution has proven to be more efficient in pixel-wise editing than click-based interactive refinements solutions, particularly when using a graphics tablet, given the high complexity of coral morphologies.
The Refinement automatically improves the segmentation accuracy with the constraint that the refined segments must be close to the originals (Figure 3). This tool implements a custom version of the graph-cut segmentation algorithm (Boykov & Jolly, 2001). In the graph-cut based approaches, separation curves are determined by a boundary term (usually related to the RGB image gradient) and by a regional term. In TagLab, the re- Divide labels avoid counting pixels belonging to overlapped regions twice, which can invalidate spatial analyses.

| Learning pipeline
The high specificity of scientific data requires the creation of ad hoc classifiers tuned to custom data. TagLab, therefore, applies specific F I G U R E 3 From left to right: A Pocillopora colony, the associated labelled polygon, the edit curve, the edited polygon, and the automatically refined polygon. The editing curves snap to the mask allowing pixel-level editing operations. The automatic refinement uses a variant of the graph-cut algorithm to improve polygon adherence to the coral boundaries  died, split, and fuse). To our knowledge, this last simple but effective visualization feature is not available in any other marine image analysis tool. The users can interactively correct eventual mismatches. As each orthoimage may contain thousands of colonies, this tool greatly simplifies the temporal evolution analysis of benthos. Although the user study in this paper does not include this analysis tool, the functionality has already been adopted in a recent publication .

| USER STUDY
This study is aimed at assessing the performance of the TagLab assisted annotation pipeline (Step 1 in Figure 2) and of the automatic labelling plus editing pipeline (Step 2 and 3 in Figure 2) by evaluating both the annotation time and the label accuracy. Six ecologists from the Scripps Institution of Oceanography were involved as annotators, completing the study in February 2020. All the materials of this user study (orthoimages, ground truth labels, label maps) are collected in the Supporting Information Material.

| Materials
The 10 orthoimages used as the training data set for the model optimization and the four orthoimages labelled during the user study were obtained from the 100 Island Challenge project (http:// 100islandchallenge.org), headed by the Scripps Institution of Oceanography, UC San Diego. The protocol for orthoimage creation is detailed in Kodera et al. (2020). To summarise, plots were imaged using a Nikon D7000 camera, which captured highly overlapping images per plot to create a single contiguous 3D model of each plot using the structure-from-motion software Agisoft Metashape (Agisoft, 2019). The dense cloud was then imported into the custom visualization platform Viscore (Petrovic et al., 2014) to create orthoprojections. Finally, scale bars and ground control points were deployed in the field to provide scaling and orientation of the 3D model relative to the ocean surface, which is required for subsequent orthorectification. The annotated training data set and the ground truth labelled maps for the user study were created following the con-

| Methods
Ground-truth annotations were generated through mutual agreement by the two ecologists who lead the processing and analysis of (a) (b) F I G U R E 9 MAL orthoimage (left), the associated ground truth label map (right), and the colour code used. Label colours are black for the Background, light blue for Porites, green for plating Montipora, olive green for encrusting Montipora and pink for Pocillopora. Note that, in this orthoimage, no Porites is present the image-based data products at Scripps. Remaining ecologists formed the two subgroups of "annotation beginners" (indicated with U1 and U2 in the following) and "annotation experts" (indicated with U3 and U4 in the following). Beginners had ample experience of coral taxonomy and ecology but minimal experience with the Photoshopbased labelling workflow, while the experts were already comfortable with manual annotation using Photoshop. Both experts and beginners had no previous experience with TagLab. Before embarking on the user study, both groups received the same written instructions about using TagLab and practiced alone for about one hour. Each user performed each task on the four orthoimages in a randomly assigned order, to prevent systematic bias and to avoid, for example, the same image always occurring in the same task. We logged all the user's operations to evaluate the interactions with each specific tool and estimated the annotation time. The annotation tasks assigned to annotators are: Task 1: Label the orthoimage following the Scripps' Photoshop pipeline and report the annotation times. Task 3: Label the orthoimage by using the assisted agnostic segmentation tool. Editing and refinement options are allowed.
The goal of Task 3 is to assess the improvements offered by the 4-click based segmentation tool, considering both the time reductions in labelling (comparing the total time required to complete Task 3 relative to Tasks 1 and 2) and the labelling accuracy (of Task 3 in terms of the ground truth label maps created by the head ecologists).
Task 4: Run the fully automatic classification and correct any outliers. This test evaluates if editing the automatic prediction is more convenient than the assisted labelling. Again, we evaluate both time and accuracy. We assessed the labelling quality of each task by calculating the accuracy and the mIoU of each label map compared to the ground truth. Additionally, we evaluated the per-class user agreement through Cohen's kappa coefficient (Schoening et al., 2016), while at the end of Section 6.1 we applied a voting scheme with the two purposes of first, visualising the per-pixel agreement among different annotators and second, assessing the reliability of the user study, as the votes were derived from the labels produced in different tasks.
Finally, we calculate TagLab's efficiency gain by estimating the perpixel contour tracing speedup relative to Task 1. As the orthoimages can contain from a few to thousands of corals with different shape complexity, the per-pixel tracing speed is a reliable and robust measure.

| RESULTS AND DISCUSSION
As a general consideration, the high quality and the consistency of results confirmed the reliability of the user study and the absence of systematic errors. In addition, the overall accuracy of each task relative to the ground truth was high and consistent among the users. Figure 10 gives the four label maps produced by the different annotators using different tasks associated with the MAL orthoimage.
6.1 | Accuracy and user agreement Figure 11 gives the accuracy values calculated for each task, map, user, and class.   The analysis of the vote maps reveals the classification reliability and agreement among users. A vote map counts how many times the same label has been assigned by the four annotators for each pixel, and takes the maximum of these votes. Table 3 illustrates that for each orthoimage, between 94% and 98% of pixels have been classified as belonging to the same class by at least three users. Agreement labels, that is, the labelling obtained by voting, are built by considering the label that receives the maximum number of votes for each pixel. Figure 12 displays the vote map and the corresponding agreement labels for the MAL orthoimage (see the Supporting Information material for the other orthoimages).
When comparison per-pixel voting with the ground truth, we find excellent accuracy, and greater than that obtained by single users (see Figure 11 and Table 2). The per-pixel voting maps are derived from different tasks, demonstrating that the annotators produced highly accurate labelling (close to the ground truth) independently of the tasks and the tools used. This also demonstrates that the design of the user study does not suffer from any bias. The others voting maps are included in the Supporting Information material. The voting maps of VM01 and VM03 confirm that the annotators disagree on entire regions of the encrusting Montipora.
6.2 | Work time analysis and TagLab efficiency Figure 13 shows each user's registered time in performing each task.
The overall time demonstrates that there is no advantage of using TagLab without the automatic or assisted tools when compared to the Photoshop workflow. However, this perceived lack of advantage has several caveats, including the users' unfamiliarity with TagLab.
The log files analysis reveals some periods of inactivity (see, e.g., user U4 in Figure 15). Even if users documented the breaks during the annotation, there are some discrepancies between the reported time spent working and the log files report. However, since such differences are not great (about 10% on average with a maximum of 20% for U2), and we have only the declared time for the Photoshop labelling (Task 1), we used the self-declared working times in our evaluations. As demonstrated in Figure 10, some users were more likely to separate colonies, while others were more likely to include several distinct colonies in the same polygon. This decision is typically based on whether a given patch of coral is represented by contiguous live tissue (Bak & Meesters, 1998), a difficult cognitive task even for those with considerable expertise. This effect is clear in the results, as some users produced more accurate label maps but required more time to do so. Figure 14 shows contour tracing per-pixel speed. | 257 other maps. The automatic segmentation accuracy of VM03 was also the lowest of the four maps, (see Figure 11), particularly for the class Pocillopora. This is probably because VM03 contains Acropora, a class of corals that are morphologically similar to Pocillopora but do not appear in the training data. Note that in the automatic classification, (a) (b) F I G U R E 12 Vote map and agreement label for the MAL orthoimage. White corresponds to four agreed votes, light grey to three votes, red to two votes, and dark red to one. Almost all the pixels are white or light grey, highlighting the high agreement between users. The pixel labelling produced according to the maximum votes is very close to the ground truth (see also Table 3) The table shows the self-reported times each user spent on each task for each map. The final row shows the total time spent by each user on each study. These times are used in the speed calculations, although the log files' analysis reveals several periods of inactivity (lasting up to tens of minutes) F I G U R E 14 Stated times do not provide a measure of how fast each user produces segmentations, as each drew a different total length of outlines. The per-pixel speed provides a better understanding of the actual improvement introduced by TagLab. The (average) speed gain is evident in Task 3 (42.6%) and less in Task 4 (12.1%). Annotation beginners, U1 and U2, benefitted from assisted annotation far more than the experts. This larger performance improvement for the beginners likely means that their limited Photoshop pipeline experience has made them more adaptable in using TagLab

| Tools' analysis and limitations
An in-depth analysis of the log files reveals that the cognitive workload for assigning four points is moderate, even when coral shapes are complex. The mean time for this assignment ranges from 4 to 8 s, depending on the user. This is in line with the previous study of Papadopoulos (Papadopoulos et al., 2017), who reported an average time of 7 s to indicate an object's extremes.
The processing time varies with the corals' size but is around 3-4 s on the PCs used in this study (each equipped with a GeForce GTX 1080Ti). Our implementation of the 4-clicks tool does not support the editing of polygons' boundaries, and according to the logged time (Figure 15), manual editing greatly impact the overall efficiency. Figure 15 gives a synopsis of the We evaluated the performance of the new release by repurposing the assisted segmentation task at only two users, the ones who had segmented the two medium-difficulty orthomosaics, MAL and VM01, in Task 3. We asked them to repeat Task 3 using TagLab with the improved manual tracing tool (Task A) and TagLab with the improved manual tracing plus the new interactive CNN (Task B).
After more than 1 year from the first experiment, we can assume there is no repetition bias in Task 3; moreover, these annotators have not done any further labelling work using TagLab. The users demonstrated slightly better performance in Task A, achieving an average annotation speed of 8.25 px/s compared to 6.93 px/s for the previous attempt (see Figure 14). They then achieved an average speed of 13.1;5 px/s in Task B. In Task

| CONCLUSIONS AND FUTURE DIRECTIONS
The semantic segmentation of image-based landscapes composed of complex natural shapes demands novel workflows to ensure pixellevel accuracy. Fully automatic models that yield pixel-wise predictions and perfectly generalize on complex data and in complex tasks are beyond the scope of current technologies. Therefore, experts must retain control over the annotation pipeline, which motivates the design of human-centric AI-system solutions that support them and provide a significant speed increase. TagLab fulfills this demand by offering several integrated functionalities that accelerate robust data annotation production.
We tested TagLab's potential on the spatial analysis of coral colonies on reefs, which is a challenging real-world scenario. We found that TagLab successfully sped up the coral colony tracing task, preserving a level of accuracy comparable to that of humans.
The results are surprisingly good, if we consider that the performance of TagLab's tools suffered from a lack of previous experience. TagLab's interactive segmentation feature, the 4-clicks tool, dramatically reduces the human effort in annotating complex object from scratch (Step 1 of Figure 2) without affecting the segmentation accuracy. Our user study indicates that TagLabassisted segmentation provides an annotation speed gain by about 42% on average (90% for nonexpert annotators). TagLab's interactive annotation efficiency increases by combining the new interactive labelling solution, the positive/negative clicks tool, with the 4-clicks. According to the preliminary results in 6.4, the assisted annotation with the current Taglab version leads to a total speed gain of +93%, halving the annotation time w.r.t the manual Photoshop-based annotation pipeline.
The Train-Your-Network feature gives scientists access to automatic ad hoc model optimization, which represents a powerful resource for accelerating data extraction from imagery (Step 2 of Figure 2). To enhance the functionality of this feature, in the next releases we will investigate further domain adaptation strategies for improving model generalization. When the automatic classification generalizes properly (as in the case of MAL), editing the automatic predictions (Step 3 of Figure 2) almost halves the assisted annotation time and reduces the manual time by two-thirds (for the same level of accuracy). However, the speed-up gain is limited by the great number of manual editing operations when the automatic model poorly generalizes, even if the image processing interactive tool for the boundary adjustment is functional, and particularly with the graphics tablet. The introduction of an instance segmentation network for the coral taxa could further reduce editing times, thus improving Step 3 of the proposed workflow.
TagLab also performs data analysis; some of its functions, such as the Multitemporal Comparison tool, proved extremely useful for extracting spatial information from orthoimages .
However, the current version of TagLab has limitations on the size of the orthoimages handled, which cannot exceed 32000 × 32000 pixels. Therefore, we are evaluating a multiresolution approach for managing larger images. TagLab has already been tested successfully on other application contexts, such as Architectural Heritage (Pavoni, Giuliani, et al., 2020). Here, the fully automatic plus editing annotation strategy (Steps 2 and 3) was found to be extremely efficient.