Automated segmentation of cell organelles in volume electron microscopy using deep learning

Recent advances in computing power triggered the use of artificial intelligence in image analysis in life sciences. To train these algorithms, a large enough set of certified labeled data is required. The trained neural network is then capable of producing accurate instance segmentation results that will then need to be re‐assembled into the original dataset: the entire process requires substantial expertise and time to achieve quantifiable results. To speed‐up the process, from cell organelle detection to quantification across electron microscopy modalities, we propose a deep‐learning based approach for fast automatic outline segmentation (FAMOUS), that involves organelle detection combined with image morphology, and 3D meshing to automatically segment, visualize and quantify cell organelles within volume electron microscopy datasets. From start to finish, FAMOUS provides full segmentation results within a week on previously unseen datasets. FAMOUS was showcased on a HeLa cell dataset acquired using a focused ion beam scanning electron microscope, and on yeast cells acquired by transmission electron tomography.

to quantification across electron microscopy modalities, we propose a deep-learning based approach for fast automatic outline segmentation (FAMOUS), that involves organelle detection combined with image morphology, and 3D meshing to automatically segment, visualize and quantify cell organelles within volume electron microscopy datasets.From start to finish, FAMOUS provides full segmentation results within a week on previously unseen datasets.FAMOUS was showcased on a HeLa cell dataset acquired using a focused ion beam scanning electron microscope, and on yeast cells acquired by transmission electron tomography.

Research Highlights
• Introducing a rapid, multimodal machine-learning workflow for the automatic segmentation of 3D cell organelles.
• Successfully applied to a variety of volume electron microscopy datasets and cell lines.
• Outperforming manual segmentation methods in time and accuracy.
• Enabling high-throughput quantitative cell biology.
automated segmentation, cell biology, image analysis, neural-network, volume electron microscopy Imaging in life sciences is currently experiencing a boost, and imaging data are growing exponentially.Biological processes, ultrastructure, and molecules can now be visualized at unprecedented resolution in time, depth and scale (Walter, Mannheim, & Caruana, 2021).Large volumetric reconstructions of entire cells can be routinely achieved at nanometer resolution using volume electron microscopy (vEM).The quantitative analysis of such large amounts of data is the novel bottleneck in biological projects.Within a decade, what used to be considered extreme large datasets (Höög et al., 2007) and analyzed over a PhD period, is now routinely processed (12 Gb RAM is common on laptops).One important goal in vEM is to quantitatively annotate and segment the volume stacks to quantify organelle distributions and shapes to understand the structure-function relationship.Many diseases are associated with abnormal organelle morphologies and distributions within cells, including a growing number of neurodegenerative diseases, such as Alzheimer's (Zhu et al., 2013) or Lewy-Body-Dementia (Gagyi et al., 2012).EM visualizes the ultrastructural details and rich contextual information based on protein/lipid or stain-density gradients.Not only the structures of interest are visible, but also all membrane-delineated ultrastructural cell content.
The signal-to-noise ratio is low and, up to date, at the expense of time, organelles have mainly been deciphered from one another based on their membrane delineation by the human eye.As conventional segmentation schemes are often based on thresholds or manipulations of the image histogram assuming that strong gradients match object boundaries, unsupervised binarization algorithms, such as minimum error thresholding, maximum entropy thresholding or Otsu's single-level method (Otsu, 1979), fail to reliably identify and segment organelles.In practice, automatic segmentations generated based on thresholds or manipulations of the image histogram usually require extensive manual post-editing to achieve the desired accuracy.Therefore, the segmentation of cell organelles is currently mainly performed manually using segmentation tools included in commercial software, such as AMIRA (Stalling et al., 2005) or Imaris (2022), or freeware tools, such as ImageJ/Fiji (Schneider et al., 2012;Schindelin et al., 2012), IMOD (Kremer et al., 1996), or Ilastik (Berg et al., 2019).For an entire HeLa cell that is imaged at 5 nm isovoxel resolution using a focused ion beam scanning electron microscope (FIBSEM), manual segmentation of important organelles (such as mitochondria, nucleus, ER, and endosomes) will take several months if carried out by a single person and requires comparative segmentation to cross-validate the results.
Progresses in computational methods for automatic segmentation of organelles in vEM has led to increasingly accurate results (Seyedhosseini et al., 2013), using for example training of classifiers to detect supervoxels that most likely belong to the boundary of the segmentation target (Lucchi et al., 2012).While there are packages available that already use learning-based approaches, such as Ilasitk or Cell Profiler, they usually do not allow training on new datasets, limiting their application to a specific and small range of datasets or require substantial expertise in image analysis.
For light-microscopy datasets (acute signal-to-noise ratio), several deep-learning solutions for segmentation and quantification, such as cell detection or morphological measurements, have already been published (Carpenter et al., 2006;Hodneland et al., 2013;Ronneberger et al., 2015).Object detection is a technique that allows the computer to find the location (x and y coordinates, width and height) of a particular shape, or organelle in an image.Instance segmentation takes this one step further and isolates the foreground pixels of the shape or organelle.U-Net (Ronneberger et al., 2015) was pioneering work in the field of instance segmentation that was initially applied to microscopy data.The U-shaped Deep Learning architecture is capable of capturing and generalizing high level descriptors of image data as the information reaches the convolutional valley of the U.By concatenating this encoded data with the finer convolutional layers from higher levels, the network can reconstruct the boundary of the shape instance.The U-Net architecture is used as the backbone of 'Etch a cell' (2022), a crowd-sourced approach to generate large quantities of labeled data.
To the best of our knowledge, for comprehensive segmentation of all organelles in large volumetric EM data sets, only a few open-access approaches have been suggested.The trainable WEKA segmentation toolkit (Arganda-Carreras et al., 2016) can train segmentation pipelines using generic hand-tailored image features.Dee-pEM3D (Zeng et al., 2017) aims at improving reproducibility while providing open access to deep-learning algorithms for image segmentations using a cloud-based setup that does not require a local GPU.
Other approaches focus on single imaging modalities, such as COSEM for automated identification of all intracellular substructures within isotropic FIBSEM datasets, or on specific organelle tools for semiautomatic 3D segmentation, including mitochondria or neuron tracing (Heinrich et al., 2020).Last but not least, Ilastik 1.3.3contains modules for pixel classifications via training using simple brush strokes.This approach is designed for users without machine learning expertise, and may prove useful in simple segmentation scenarios where optimizing the training parameters yields little benefit.The very first commercial solutions have also been launched (ariadne.ai,2022) and relies on a large internal human expertise of the segmenting scientist to edit the final model.
In summary, despite the urgent need in the vEM and structural biology communities, there is no quantitative segmentation workflow available that was proven successful for different biological single cells across volume EM modalities.To improve the quantitative performance of automated image segmentation of large volumetric datasets, we identify the need for a generic, accessible and tractable segmentation software that is assessed against the current gold standard of manual segmentation.YOLO (Bochkovskiy et al., 2020)  It re-frames the object detection problem so that the model not only infers the category of the object, but also its position and size in the image at the same time.
We present our machine-learning pipeline and algorithm for automated segmentation of organelles.We showcase the workflow for two different vEM approaches: FIBSEM of a HeLA cell and serial section electron tomography of yeast cells using transmission EM (TEM) and quantitatively compare the results with manually segmented datasets as the current gold standard.Since it does not make any prior assumptions about the morphology of the organelles to be segmented, the pipeline can be easily applied to segment diverse organelles across cell types and modalities, including soft x-ray microscopy (Walter, Pereiro, & Maria, 2021).FAMOUS, although perfectible, yields to a comparable accuracy in classification and localization to manually segmented dataset, within a fraction of the period.

| RESULTS
The amount of data generated in vEM for life sciences usually ranges from gigabytes to terrabytes per dataset.It is practically impossible to manually segment out the information content of a vEM dataset in the reasonable time period of a publication, let alone to create meaningful statistics across cells.To automate image segmentation, we propose a simplified pipeline where we exploit innovative image analysis based on neural networks to deliver a full volume segmentation of cell organelles within a week.
First, all structures of interest within a limited subset of the data, i.e. from about only 1% of the entire 3D stack, need to be accurately manually flagged.This annotation is used to train the image recognition algorithm and automatically isolate the structures of interest using YOLO (see Section 4).This instance segmentation of the organelle is then fed back to the open-access, 3D rendering software Blender (Blender Online Community, 2021), where a final 3D segmentation of the exact outline of the organelles is created based on meshing to remove noise and smooth the membrane segmentations.
Blender is also used for the visualization of specific subsets of organelles within the vEM image stacks.After the identification and classification, the organelles are segmented by applying conventional histogram-based filters to a cropped-out region and averaging noise out.This computationally efficient pipeline uses parallel processing (GPU) on each cropped-out region.No large computing capacity is required.The workflow is illustrated in Figure 1.

| Detection and classification performance evaluation
To evaluate the performance of our automated segmentation pipeline (denoted as Automatic [green]) against the current ground truth standard of manual segmentation by experienced cell biologists, the dataset was segmented twice manually by two independent experts (denoted as M1(red) and M2(blue)).To showcase the versatility of our FAMOUS pipeline, we applied the segmentation workflow to different cell types that were acquired using different vEM modalities.In both cases, the results were compared to manual segmentation results.The manual and automatic segmentation results are F I G U R E 1 Instance segmentation workflow for quantitative 3D cell organelle segementation.Using YOLO, based on a training subset of about 1% of the entire 3D volume, the organelles are identified and isolated.Using conventional image processing, membrane discretization and noise removal in Blender, the organelles of each section of the dataset are then extrapolated into a 3D volume and repositioned into the original 3D raw dataset.
depicted per organelle class in Figure 2 for a HeLa cell that was acquired using a FIBSEM, and in Figure 3 for a yeast cell that was acquired using array tomography vEM.The visual comparison of these segmented datasets already indicates that the presented segmentation pipeline FAMOUS reliably identifies the organelles and yields similar results compared to the manual segmentations (in a fraction of the time).
For a quantitative comparison, between the manually segmented stacks and the manually and the automatically segmented stacks, we computed the Jaccard index and compared the segmented volumes.
To establish the intervariability between two manual segmentations of two independent experts, we also compared their results with each other.This gave us insights into the deviation between two manual segmentations and served to benchmark the automatic segmentation.
The segmented organelles, in the HeLa cell dataset, were early endosomes, late endosomes, mitochondria, lysosomes and the nucleus; in the yeast dataset, the nucleus, mitochondria, golgi, vacuoles, multivesicular bodies, and lipid droplets were segmented.The difference in the numbers of detected organelles was quantified for each organelle category, and each diverging label or misdetection was  In a few cases, FAMOUS wrongly identified one object as multiple objects that share the same space, the TP and FP values were adjusted accordingly, to avoid getting multiple positive identifications of the same object.To compare the performance of FAMOUS on the macroscopic level (detection efficiency, identification and classification performance), we used four separate criterions: 1. Precision-of all the classes how many were correctly predicted?Precision = TP/(TP + FP) 2. Sensitivity-if a positive rate is predicted, how often does this take place?Recall = TP/(TP + FN) 3. The similarity between the manual and automatic segmentation: As can be appreciated in Tables 1 and 2, the overall precision, sensitivity and Jaccard Indices achieved by FAMOUS are comparable with those achieved between the two gold standards of the experienced manual segmenters.For the FIBSEM dataset, the Jaccard indices as the similarity between the two manual segmentations (i.e., the ground truth similarity) range from 100% for the nucleus, about 70% for early endosomes, lysosomes and mitochondria, to only 32% for late endosomes.The Jaccard indices of the comparison between manually and automatically segmented organelles lie within a similar range of 100% for the nucleus, about 79% for mitochondria (mean value between the two manual segmentations), about 63% for early endosomes, and about 61% for lysosomes.While there is more agreement between the manually and automatically segmented mitochondria than between the ground truths, the similarity index is slightly less for lysosomes and early endosomes.The only striking difference between the automatic segmentation and the ground truth can be found for the late endosomes that only show a mean Jaccard index of 9% compared to the low, but higher 32% for the comparison of the two ground truths.This difference is further discussed in the Discussion.As can be appreciated in Table 2, for the array tomography dataset of the yeast cell, the variability between the automatic segmentation and the manual segementations and the intervariability between the two ground truths are nearly the same, ranging in both cases from 100% for the cell, nucleus and mitochondria segmentation, to about 50% for the Golgi, and about 30% for the lipid droplets.

| Comparison of volumes, areas, and evaluation metrics
To evaluate our segmentation approach further, we conducted a volume comparison of each individual class (Figure 4).The total volume of all objects in an individual class was calculated for both the manual We explored the intersection value, that is how much one unique object differs in its segmented properties (surface, periphery, center of mass, etc.) from one method to the other and what is the distribution amongst this class.This was achieved using a Boolean union operator, which joins two objects into one, while removing their intersection.The volume of the automatic workflow was subtracted from the total volume of both the automatic and manual workflows, thus providing the difference between the two workflow volumes.The volumes were calculated in μm 3 (Table 3).The volume results are dependent on the correct classification of objects into their classes and the position of the misclassified objects.As was expected from the previous metrics, there is a very good volume overlap between all automatically and manually segmented organelles, which is in the range of that between the two manually segmented datasets.As outlined above, only the late endosomes were not faithfully assigned.Late endosomes are volumetrically the smallest class, and only a few misclassified organelles can create a large distortion in the total volume of the entire class, thus skewing the final numbers.The total volume distribution of the dataset per size is presented in Figure 4.The volume results are similar for manual and automatic segmentations for most organelles and size intervals.FAMOUS only detected significantly less volumes at very small scales up to 0.02 mm 3 for late endosomes and lysosomes, which, specifically due to the small numbers and size of the late endosomes, led to a significant volumetric deviation, and hence low Jaccard Indices, between manual and automatic segmentation.
While the manual segmentation for the entire FIBSEM dataset was achieved by each segmenting scientist in about 200 h and that of the array tomography dataset within 120 h, including visualization, our presented automated segmentation pipeline required 12 and 8 h, respectively, in terms of actual (guided) input time by the user, including the preparation of a training set.Within about a sixteenth of time, FAMOUS reliably automated a full-stack segmentation, visualization and quantification of an entire cell acquired by vEM-with an accuracy similar to the current gold standard.This time savings will become even greater if a statistically relevant number of cells at different conditions is acquired-as required for cell biology with meaningful quantification.The workflow thus substantially facilitates quantification and analysis in high-resolution structural biology and can be quickly reproduced as described in the Section 4.
T A B L E 1 Precision, recall and Jaccard index scores for the comparisons of the manually segmented dataset by two independent cell biologists M1 and M2 with each other and with the YOLO-segmented dataset of the HeLa cells using focused ion beam scanning electron microscope.Note: Except for the late endosomes (see Section 3), the deep-learning based FAMOUS segmentation achieves Jaccard indices across all organelles that are similar to the Jaccard indices between two independently created ground truths by manual segmenation and lie within the intervariability of two manually segmented datasets.

| DISCUSSION
In this paper, we have presented a novel automatic segmentation tool for vEM datasets across modalities that segments cell organelles as reliably as manual segmentation.An important issue that arose early is that the manual segmentation cannot be viewed as fully and exclusively representative of the actual ground truth data, mainly due to human errors.There were instances where the automatic segmentation identified organelles accurately, but the manual segmentation did classify the organelles in the same class as the automatic segmentation or missed them entirely (Figure 6).In such cases, the automatically segmented organelle was labeled as an error.These cases biased the final accuracy numbers of the automatic segmentation, and can only be corrected by visual inspection.The subjective assessment of the expert who carries out the manual segmentation plays a significant role in the final results, meaning that different experts classify the same organelle into different classes, as quantified by the Jaccard Indices below 1 between the two manually segmented datasets for almost all organelles.For a better illustration of such cases, a 3D mesh intersection with the slice was done, after which an outline of the intersection was created.This is highlighted in Figure 5, where the automatically segmented outline is shown on the left in green, while the manually segmented outline is shown on the right in red.In addition, the automatic workflow identified organelles that the manual segmentation did not (Figure 6).The reverse situation is also present, where the automatic workflow failed to accurately identify T A B L E 2 Precision, recall and Jaccard Index scores for the comparisons of the manually segmented dataset by two independent cell biologists M1 and M2 with each other and with the YOLO-segmented dataset of the yeast cells using array tomography.Note: The deep-learning based FAMOUS segmentation achieves Jaccard indices across all organelles that are similar to or even better as the Jaccard indices between two independently created ground truths by manual segmentation and lie within the intervariability of two manually segmented datasets.
organelles that the manual did.However, in these cases, the automatic workflow did not fail in recognizing that the organelle existed, but the organelle was identified as the wrong class.This issue only arises when two classes have similar visual features.In the FIBSEM dataset, the organelles that fall into this category are the late endosome and lysosomes, which contributed to the low Jaccard Index for the late endosomes between manual and automatic segmentations.However, it should be noted that the Jaccard Index for the late endosomes is already very low even for the comparison of the two manually segmented datasets and further highlights the lack Note: The analysis confirmed the previous findings of the Jaccard Indices: There was a good volume overlap between all manually and automatically segmented organelles (in the range of the overlap between two ground truths) with the exception of the late endosomes-as discussed in the Section 3.
of objectivity and reproducibility of the 'ground truth' for certain organelles and vEM modalities.It is interesting to note that even experienced scientists cannot unambiguously agree upon assigning organelle structures in a cell volume, which provides another argument on why automation of the process (and hence objectifying it) is of utmost importance.For the array tomography yeast dataset, FAMOUS and the manual segmentation yield similar accuracy in the detection and segmentation of the organelles (compare Tables 1   and 2), when comparing the mean Jaccard indices for both manual and automatic segmentations.F I G U R E 7 Examples of fast automatic outline segmentation (FAMOUS) segmentation errors after visual inspection.FAMOUS identified the organelle as a late endosome (shown in red), while manual inspection showed that it is a mitochondrium (in green).This illustrates sporadic wrong classifications of FAMOUS in cases where the organelle was composed of a round part that was connected to an intricate, bridge-like protrusion.
F I G U R E 5 Examples of manual segmentation errors.(a) Fast automatic outline segmentation (FAMOUS) correctly identified the organelle as a late endosome (green), whereas the manually segmenting expert assigned it as a mitochondrium (red).(b) FAMOUS correctly identified the organelle as a lysosome (green), whereas the manually segmenting expert assigned it as a late endosome (red).
(c) FAMOUS correctly identified the organelle as a late endosome (green), whereas the manually segmenting expert assigned it as a lysosome (red).
FAMOUS mainly struggled with complex objects that were connected by small "bridges" between the larger, more rounded parts of the object (Figure 7).In these cases, the automatic segmentation sometimes identified every major part of the complex object as a separate entity and did not recognize them as a singular object, or mislabelled the organelle.The "bridge" parts of the objects proved to be problematic, as they are usually very thin in size and blend in to the background pixels.However, the workflow was precise enough to detect each individual part of the complex object.The issue is easily remedied by visual inspection, and manually joining all of the separate parts of the complex object into one whole by adapting the filtering of the morphological image operations (see Section 4).However, automating this particular process has proven to be a difficult task, and as such remains unsolved in this version of the workflow.This issue was only relevant in the yeast dataset, and also explains the large number of small objects identified by FAMOUS but not by the manual workflow.Specifically golgi, multivesicular bodies and liquid droplets had the mentioned issues, as these structures are complex and have intricate interconnections that FAMOUS did not always detect.
While the pipeline was showcased for the 3D segmentation of six different organelles in two different cell types using two different vEM modalities, we are confident that the automatic segmentation can be generalized to large-scale EM segmentations, including specific cell types in tissues or organoids.Such huge volumes of several mm 3 can now be achieved in higher throughput using state-of-theart vEM modalities, such as multibeam FIBSEM.In fact, the segmentation of various organelles in EM represents one of the more challenging problems in segmentation due to the highly crowded and membrane-redundant environment at a low signal-to-noise ratio (which is why conventional segmentation schemes fail at this task).
We also assume that this pipeline can be easily adapted to fluorescence microscopy datasets.Apart from its versatility in tackling different segmentation challenges, the pipeline is easy and fast to implement: The YOLO and BLENDER frameworks are open-access freeware and the corresponding segmentation scripts (see Section 4) can be run without prior knowledge of machine learning after a dedicated training session, which will only use a very small subset of the structures to segment.
The problem with any manual segmentation, is the human factor.
vEM image data usually consist of hundreds to thousands of images that need to be analyzed.Such work is usually done by students, who may get only a short briefing and whose judgment must be relied upon.Differences in performance are to be expected.Often, not even the evaluation of the structures to be identified is the biggest problem, but the completeness of the evaluation.Many organelles are overlooked.Certainly, the efficiency of manual segmentation also depends on the equipment; a person with a high-quality graphics tablet will get better results than someone with a small screen and a computer mouse.We consider it of outmost importance to hence 'objectify' the process of organelle segmentation for vEM datasets and think that the FAMOUS pipeline is an important step towards a high-throughput quantitative and standardized analysis of vEM datasets.

| FIBSEM sample preparation and data acquisition
Hela cells were grown on a CryoCapsule (Heiligenstein et al., 2021) in DMEM culture medium containing 10% FBS for 3 days, then vitrified by High Pressure Freezing using an HPM Live instrument μ.The samples were then freeze substituted in Dry acetone plus 1% H 2 O, 0.05% Uranyl Acetate, and 0.1% Glutaraldehyde for 48 h at À90 C, warmed up to À45 C at +5 C/h rate, kept at À45 C for 5 h, rinsed in dry acetone (3 Â 10 min) and impregnated in R221 resin (CryoCapCell, France) for 2 h at 25%, 50%, and 75% in acetone.The temperature was raised to À20 C for the last impregnation in 100% R221 (overnight infiltration followed by a second step in 100% for 2 h prior to UV polymerization).UV polymerization was conducted for 48 h at À20 C, then the temperature was progressively raised to +20 C at a 5 C/h rate, and UV irradiation was continued for 48 h at +20 C. The samples were then evaluated for ultrastructure preservation by transmission electron microscopy prior to analysis by the FIBSEM.
FIBSEM data were collected using a Crossbeam 540 FIBSEM with Atlas 5 for 3-dimensional tomography acquisition (Zeiss, Cambridge).Prior to loading the sample into the SEM, the sample was sputter coated with a 10 nm layer of platinum.The cell of interest was located by briefly imaging through the platinum coating at an accelerating voltage of 20 kV.On completion of the preparation for milling and tracking, images were acquired at 5 nm isotropic resolution throughout the region of interest, using a 10 μs dwell time.During acquisition the SEM was operated at an accelerating voltage of 1.5 kV with a 1 nA current.The EsB detector was used with a grid voltage of 1200 V. Ion beam milling was performed at an accelerating voltage of 30 kV and a current of 700 pA.Prior to segmentation, the dataset was cropped, inverted, and registered (using the plugin 'Linear Stack Alignment with SIFT' (Schindelin et al., 2012)).The volume of the final dataset was approximately 346.16 μm 3 (1778 images, 10.22 μm Â 3.81 μm Â 8.89 μm).

| Yeast cell sample preparation and data acquisition
Saccharomyces cerevisiae cells were grown in YPD media (pH 6.5) with 2% glucose to an optical density (OD 600 ) of 0.5.The medium was then removed by filtering the cells using a 0.22 μm filter (Ding et al., 1993) and high-pressure frozen in a Wohlwend Compact 3. The samples then underwent freeze substitution in a Leica AFS2 in 2% uranyl acetate in anhydrous acetone (made from a stock solution of 20% uranyl acetate in methanol) for 1 h at À90 C.This was followed by three washes in acetone (2Â 1 h and 1Â overnight) and stepwise embedding into Lowicryl HM20 resin at À50 C. The embedding steps were 20%, 40%, 50%, 80% Lowicryl HM20 for 2 h each, followed by two more pure resin steps, first overnight, then for 2 h, before final embedding in pure resin.Lastly, they were polymerized using UV light for 5 days whilst allowing the temperature to reach 20 C. The resulting sample blocks were sectioned using a Reichert Ultracut S to serial 350 nm sections and deposited onto formvar-coated copper slot grids, stained with 2% aqueous uranyl acetate (5 min) and Reynold's lead citrate (1 min).Gold fiducials (15 nm) were added onto both surfaces before imaging.Tomograms were acquired using an FEI TF30 at 300 kV (University of Colorado Boulder) on a Gatan OneView, at a pixel size of 0.86 nm.Dual-axis tomograms were acquired over a ± 60 range at 1.5 increments.The resulting pixel size after reconstruction was 1.72 nm.

| Manual segmentation
To evaluate our automated segmentation approach, the same dataset was manually segmented using Amira 6.0, Thermofisher software (Stalling et al., 2005)

| FAMOUS segmentation pipeline
On a volumetric set of 1800 successive layers of FIBSEM input images, we used the YOLOMark user interface to define the object classes.We randomly took 20 images from the dataset and manually and tightly boxed out every compartment in each image according to the class/morphological group we were expecting the compartment to belong to.This preliminary work is the only manual input required by the end-user and is achieved in about 3 h for 10 classes.
We used this classification to train YOLOV4 to identify each individual compartment and assign it to a morphological group.This is the 'Instance Segmentation'.Every organelle is classified and boxed out for each single plane of the stack.Given that we know the layer number for any given 2D organelle instance and the distance in nanometers between layers, we can infer the exact 3D location for each organelle location.
In addition, the workflow is fine-tuned to each morphological group to generate a cloud of points outlining the individual compartment based on a conventional image-processing pipeline.On each layer, each identified structure seeks out for the structures located directly above and below itself and looks for correspondences in clas-

| Image processing
YOLO is an object detection algorithm, meaning that it is able to draw bounding boxes around positive examples of classes of objects it is searching for, but it is not able to isolate the relevant pixels belonging to the object.We solved this problem using basic image processing techniques.A series of morphological operations (erosion, dilation, Gaussian blurring and thresholding) was used to achieve the separation of foreground and background pixels.Each class of organelle had a custom, yet similar (excluding lysosomes) procedure for extracting pixels that belonged to the organelle in each identified region of interest (Algorithm 1).
We distinguished between early endosomes that are generally light areas against a dark background and late endosomes, mitochondria and nuclei that were the opposite (Algorithm 2).It was difficult to consistently morphologically isolate the pixels pertaining to lysosomes due to the nearly imperceptible difference between the foreground and background pixels.We therefore assumed that successfully detected lysosome pixels occupied the ellipse that best fit the bounding box of the YOLO detected instance, as seen in the Algorithm 3.

| Organelle composition from layers
The above-described methods of extracting salient pixels from bounding boxes is not without fault, but does quickly result in usable 2D points that are assembled into point clouds in 3D space: For each of   The first step in the 3D meshing reconstruction was the generation of a rough approximation of the point cloud surface as a 3D mesh using a Convex Hull operation.The Convex Hull of a set of points P represents the smallest convex set containing P, thus covering all the points of the point cloud with a 3D mesh.Convex Hull trades precision for speed, and is hence prone to creating undesirable 3D artifacts in the reconstructed mesh (panel 3 in Figure 8).To resolve this issue, a re-meshing algorithm was introduced.The process of re-meshing changes the geometric layout of an object, without changing the shape of the object.Panel 4 in Figure 8 illustrates the differences between the initial Convex Hull geometry and the re-meshed geometry.Improved geometry allows for more complex deformations of an object.We used Blender's voxel re-mesh implementation that uses OpenVDB (Museth et al., 2013) to generate a new manifold mesh from the input geometry.
In the second step the point cloud and the re-meshed Convex Hull were loaded into the same environment and overlayed on top of each other, as can be seen in panel 5. Next, depending on the object shape, either the rough approximation was scaled by a dynamically calculated amount (1-3% of the full scale) or the rough approximation was projected onto the point cloud before scaling.The object shapes where points were distributed in a uniform manner relative to the center of the object (i.e., all points are relatively at the same distance from the center) used the former, other objects used the latter.Projecting a 3D mesh onto another object is the process where the geometry of the mesh is deformed to the shape of the object onto which the projection is performed in a gift-wrapping manner.The point cloud itself served as the underlying object around which the 3D mesh was deformed.The re-meshing step ensured a successful projection since the projection is directly dependent on the geometry layout of the object.
In either case, the rough approximation was scaled, and the point cloud was divided into interior and exterior points in regards to the convex hull approximation.The mesh projection was repeated, ignoring the exterior points and hence reducing noise that remained in the point cloud.A visualization of the resulting, cleaned point cloud is shown in panel 6.
In the final step, the Convex Hull of the cleaned point cloud was calculated again.For this, the Convex Hull was re-meshed again to create a more dense geometry that is, a geometry that can be deformed to a larger extent, thus allowing for more detailed surface reconstruction.Once that step was completed, the mesh was projected onto the point cloud, as is shown in panel 7.
The mesh was deformed to reflect most of the surface imperfections.However there still existed sharp edges on the mesh, that did not accurately represent the contour of the point cloud locally.We implemented a smoothing algorithm after the projection was completed.The final result of the reconstruction is shown as panel 8 in Figure 8.
is a 'you only look once' framework for deep learning that accurately performs image-based object detection in real time with minimal training data.
identified and analyzed further.Taking both manual workflows as the ground truth, and the FAMOUS detection as the comparison, we classified all organelles into: 1. Objects correctly identified by FAMOUS are considered True Positives (TP) 2. Object inadequately identified by FAMOUS are False Positives (FP) 3. Object identified in the manual workflow and not identified by FAMOUS are False Negatives (FN) F I G U R E 2 Qualitative comparison between manually and automatically segmented organelles of a HeLa cell acquired using a focused ion beam scanning electron microscope.The manual segmentation of the first expert is marked in red, of the second expert in blue, and the deeplearning based segmentation using the presented fast automatic outline segmentation (FAMOUS) pipeline in green.The visual inspection of the automatically segmented results confirms that the organelles can be reliably identified and segmented by FAMOUS.(a) Segmented early endosomes; (b) segmented late endosomes; (c) segmented mitochondria; (d) segmented lysosomes; (e) segmented nucleus.
4. Object detected by FAMOUS and missed by the manual workflow are True Negatives (TN)

F
I G U R E 3 Qualitative comparison between manually and automatically segmented organelles of a yeast cell acquired using transmission electron microscope array tomography.The manual segmentation of the first expert is marked in red, of the second expert in blue, and the deeplearning based segmentation using the FAMOUS presented pipeline in green.The visual inspection of the automatically segmented results confirms that the organelles can be reliably identified and segmented by FAMOUS.(a) Segmented nucleus; (b) segmented mitochondria; (c) segmented Golgi; (d) segmented vacuole; (e) segmented multivesicular bodies; (f) segmented lipid droplets.and the automatic segmentation and plotted to quantify differences at the whole volume scale.
The evaluation was performed by visual inspection and additionally quantified by Jaccard Indices and volume comparisons.The workflow (FAMOUS) can analyze and quantify an entire dataset of several terabytes within a few hours based on a small training dataset, i.e. in a fraction of time compared to manual segmentation.FAMOUS will hence significantly contribute to high throughput and automation in vEM, and help to push the field towards quantitative imaging and statistically solid results.

F
I G U R E 4 Volume comparison between manual segmentations (M1 red, M2 blue) and fast automatic outline segmentation (FAMOUS) segmentations (yellow) to study the similarity between the ground truth and FAMOUS.The number of detected and segmented objects is plotted against the volume range for: (a) Early endosomes; (b) late endosomes; (c) mitochondria; (d) lysosomes; and (e) the total segmented organelle volume.The detected volume is similar for the ground truths and the automatic segmentations.Only for small scales, specifically for late endosomes and lysosomes, the number differ significantly, which confirms the low Jaccard indices for the late-endosome class.T A B L E 3 Volume overlap between the manually segmented dataset by two independent cell biologists M1 and M2 and their overlap with the automatically segmented FIBSEM dataset of the HeLa cell.

F
I G U R E 6 Examples of organelles that had been missed by the manually segmenting expert.(a) Fast automatic outline segmentation (FAMOUS) correctly identified the organelles as early endosomes (in green), which was missed by the manual segmentation.(b) FAMOUS correctly identified the organelle as a lysosome (in green), which was missed by the manual segmentation.
, using the brush tool and interpolation function of the segmentation editor.Organelles were identified based on their size, shape, and structure, mainly on the xy-sections, all along the Z axis.The orthoslice view was used to correct the Z-positioning of the labeling when necessary.Each segmented organelle was assigned to a morphological group.When the correct assignment was unclear, the orthoslice view was used to help the segmenting scientist.The final volume classes were exported as *.stl files for quantitative comparison with the automatically segmented organelles and further analysis.The entire manual segmentation and visual examination for the FIBSEM-dataset alone took about 200 h for the segmenting scientist.
ses.A larger 3D cloud of points outlining the organelle is then repositioned into the original volume, and post-processing is used to smooth the 3D shapes, remove noise, patch holes and re-assemble the cell compartments.This hybrid method uses the YOLO network to classify and box out each compartment, then apply light weight conventional image processing pipeline to accurately segment each compartment class.The expertise of the biologist is used to identify structures in a reasonable time frame, while the image analyst focuses on YOLO training and class segmentation, followed by 3D rendering ready for analysis.The processing power required is contained (one GPU on a workstation is sufficient), and accurate results are generated within a week for one type of dataset with minor input by the end user.The computer hardware used in the FAMOUS machine learning and image processing pipeline was a regular desktop Windows machine, with 16 GB RAM (DDR3, CL16, 2133 Hz), Intel i7 7700K with a clock speed of 4.2 GHz, and an NVIDIA Geforce GTX 1060 GPU with 6 GB VRAM.
the 1800 FIBSEM input images, we have n sets of 2D points that correspond to pixels of individual organelle instances, as well as the class of each identified organelle.This information effectively gives us the 3D positions of each point of each organelle in the entire sample.Next, we joined the identified organelle slices between layers into individual, coherent 3D organelles.Each bounding box is assigned an ID number, where bounding boxes of organelles of the same class that meet the necessary criteria to form part of the same organelle are assigned the same ID number.Algorithm 4 describes this procedure.The resulting sets of 3D points are referred to as point clouds, since we still do not have complete 3D organelles at this point.Techniques for cleaning noise and outliers are used to create the final set of point clouds.Point clouds are transformed into 3D shapes via the meshing procedure described below.4.7 | Cleaning point cloud noiseThe output of the network is a set of 3D points, known as a point cloud.Every point is described by four parameters: the x, y, z coordinates in 3D space, as well as the normal vector direction of the point.Creating watertight 3D objects from such point clouds requires the use of surface reconstruction algorithms.Such algorithms are extremely sensitive to noise and outliers in the data.Due to this, a preprocessing of the data was implemented before the reconstruction was started: Each point in the point cloud can be described by the number of other points that surround it-neighboring points.Statistical analysis of the point clouds per class outputs an average distance to the neighboring points.Using this number as a threshold, points that do not meet the criterion for the distance are flagged as outliers and removed from the point cloud.This ensures that the sparsest parts of the point cloud are removed and will not influence the reconstruction.This step is done on a per-class basis and results in processed point clouds that can be used for further 3D reconstruction.Community, 2021), a free, open-source software for general work with 3D objects.A 3D object can be described as a set of points, edges and faces that define the shape of the object.A singular term for these building blocks of the object is object geometry.The number and distribution of these elements define the complexity and quality of the object itself.
Surface reconstruction and noise removal of the segmented organelles.The following steps were sequentially applied: (1) Raw point cloud data; (2) Point cloud after neighbor-based point removal; (3)-(5) 3D meshing through a 3-step process of noise clearing using iterative re-meshing and Convex Hull operations; (6) Cleaned point cloud after removal of the exterior points; (7) Re-projection of a Convex-Hull based mesh onto the cleaned point cloud; (8) Final result of the surface reconstruction of the organelle after implementing a smoothing filter after the completion of the final projection.Panel 1 in Figure8shows an example of the raw point cloud data that was generated from the workflow.Panel 2 shows the result of the initial, neighbor-based point removal.Only the most extreme outliers in the point cloud were identified and removed.3D meshing was achieved through a three step process of noise clearing (panels 3-5).
Precision, recall and Jaccard index scores of the automatically segmented data by FAMOUS compared to M1-FIBSEM Precision, recall and Jaccard index scores of the data-M1-yeast Algorithm 4 Organelle composition from layers input: Images I 1::1800 , Set of bounding boxes BB layer,x,y,width,height,class output: Set of object labels L for all organelle Bounding boxes BB