WatershedCounting3D: A New Method for Segmenting and Counting Punctate Structures from Confocal Image Data


  • Thomas J. Gniadek,

    Corresponding author
    1. Department of Cell Biology, Yale University School of Medicine, 333 Cedar Street, PO Box 208002, New Haven, CT 06520-8002, USA
      Thomas J. Gniadekthomas.gniadek@yale.edu
    Search for more papers by this author
  • Graham Warren

    1. Department of Cell Biology, Yale University School of Medicine, 333 Cedar Street, PO Box 208002, New Haven, CT 06520-8002, USA
    Search for more papers by this author

Thomas J. Gniadekthomas.gniadek@yale.edu


Current research in cell biology frequently uses light microscopy to study intracellular organelles. To segment and count organelles, most investigators have used a global thresholding method, which relies on homogeneous background intensity values within a cell. Because this is not always the case, we developed WatershedCounting3D, a program that uses a modified watershed algorithm to more accurately identify intracellular structures from confocal image data, even in the presence of an inhomogeneous background. We give examples of segmenting and counting endoplasmic reticulum exit sites and the Golgi apparatus.

Investigations into the mechanisms that regulate the function, morphology and biogenesis of many intracellular organelles often rely on data from 3D confocal microscopy (1). For cell biology in particular, validated markers are used to identify the organelles which they label. Once the image data are acquired, a step known as segmentation is performed to identify regions within each image that correspond to a labeled organelle or structure of interest. Identified organelles are then counted or analyzed further, see for example Jokitalo et al. (2). Because image data are actually stored as an array of intensity values, each of which represents the light intensity measured at one pixel (2D) or voxel (3D) within the image, segmentation can be carried out mathematically using a computer program that imports the image file and subsequently processes the raw intensity data.

Thresholding is one method that is widely used to analyze confocal image data. In practice, this involves the creation of a binary mask from image data by testing whether the value of each voxel is greater than a certain value, which is known as the threshold value (3). A binary mask is simply an assignment of either a value of True or False to each voxel within the image data, depending on whether the intensity value is greater (True) or less (False) than the threshold value. Voxels within the binary mask that have True values can then be grouped into regions of adjacent voxels, whose values are also True. Each region is then either counted as individual objects or broken down further into subobjects. The simplest and most commonly used form of thresholding is known as global thresholding, which applies the same threshold value to all voxels within the image data set (3). This works well if each object being segmented contains voxels with intensity values greater than the global threshold and is completely surrounded by voxels with intensity values less than the threshold value.

However, when the density of objects is non-uniform, the intensity of background labeling often becomes inhomogeneous. An example is shown in Figure 1A where BSC-1 cells have been labeled for endoplasmic reticulum exit sites (ERES). These structures collect newly assembled cargo molecules for onward transport to the Golgi apparatus. In mammalian cells, there are several hundred such sites, dispersed throughout the cell (4,5). They are identified by the aggregation of coat protein II (COPII) on the endoplasmic reticulum (ER) membrane (6). Figure 1A shows an intensity profile through three such sites, two of which are close together. The background intensity between these two close objects is higher than the background between ERES that are spaced further apart. This inhomogeneity means that there is no global threshold value that will yield three objects. In the example shown, thresholding at the level indicated by either lines A or C yields two objects, while thresholding at the level of line B yields only one.

Figure 1.

Figure 1.

    Comparison between global thresholding and watershed segmentation. A) BSC-1 cells were fixed, permeabilized and labeled with antibodies to the COPII subunit Sec31p, followed by secondary antibodies tagged with Alexa 488. The image at top left shows discrete ERES, three of which are cut by the red line in the adjacent image, the corresponding 1D intensity profile plot being drawn below. Three possible global threshold values (A–C) are illustrated with blue lines drawn on the intensity profile to indicate the 1D segmented regions that would result. The corresponding 2D segmentation results are shown on the right. Note that there is no global threshold value which segments this 1D profile into three objects. B) The raw image and intensity profile data from A are shown with local maxima labeled with a red asterisk. A hypothetical watershed segmentation applied to this intensity profile and seeded at local minima of the inverted data (i.e. local maxima of the raw data) segments this profile into three regions (i–iii). Application of the WatershedCounting3D algorithm to the corresponding raw 3D confocal data shows, in the image on the right, that three segmented regions are obtained.

    In the field of image analysis, other thresholding methods have been devised to address the problem posed by such an inhomogeneous background (3,7). Some of these methods sample a subregion of the image to determine a threshold value that will be applied to voxels within that region. Such methods work well if variations in the background level are smooth. However, this is rarely the case in biology because the clustering of objects themselves often introduces variations in background levels. Variations in background staining can also be caused by the presence of marker molecules outside of the structure being segmented. This is partially solved by multiple component analysis and its related methods, which decompose an image using a set of basis functions. However, the choice of which basis functions to use will ultimately determine the number of individual structures identified (8). Thus, these methods fail to reproducibly count the number of organelles within a cell under different conditions.

    A method developed by Vincent and Soille, called segmentation by morphological watersheds, offers the means to address these segmentation issues (9). This method makes an analogy between raw image data and a topological terrain map of the earth, where intensity values in the image data represent vertical elevation. Voxels are then grouped into regions that are analogous to water catch basins or watershed regions. Thus, each watershed region contains one local minimum along with all the points from which water would flow toward that minimum. If water would flow toward more than one minimum from a particular location, such as a ridgeline, then that point can be defined as a watershed line (or border) between two regions. The minimum of a watershed region is often referred to as the seed point of the region since it is the first identified point within a region.

    Fluorescence images from confocal microscopy are also analogous to a terrain map, but the structures of interest are the peaks, not the catch basins. To identify the peaks using a watershed algorithm, the intensity data need to be inverted before analysis, so that the local maxima become the local minima. For visualization purposes, the original intensity data are then shown with seed points defined at local maxima, which is equivalent to local minima of the inverted data. The robustness of this algorithm stems from the fact that each object is first identified by the presence of a seed point, without having to use a global threshold. A watershed algorithm separates the ERES that could not be segmented by the global thresholding method (Figure 1A,B).

    Unfortunately, the basic watershed algorithm is not directly applicable to segmenting organelles from 3D confocal image data. One problem is that confocal image data have better resolution in the x-y direction compared with the z direction. Further work into watershed segmentation has addressed the anisotropic nature of confocal data, and these methods have been used to segment and classify nuclei in confocal data (10,11). However, the lack of a robust method for identifying individual organelles or a method for removing noise without applying a global threshold has prevented the widespread use of a watershed algorithm for segmenting intracellular organelles. Here, we have extended the original watershed algorithm and optimized it for segmenting and counting organelles from 3D confocal image data. As an illustration of this new method, it was applied to the 3D confocal data that contain the subregion shown in Figure 1. The corresponding subregion after application of this algorithm is shown in the right of Figure 1B with segmented objects drawn in blue. The image shows that this method can identify the three ERES that could not be discriminated by global thresholding applied in Figure 1A.

    Results and discussion

    This article describes the development of a modified watershed segmentation algorithm used to automatically segment and count intracellular objects from confocal image data. The algorithm first defines seed points, each of which corresponds to an individual foreground object or noise within the image. Then, a watershed algorithm is used to separate seed points that correspond to foreground objects from those caused by background noise. The algorithm has been implemented in Java and is freely available as an ImageJ plug-in (12).

    Local maxima as seed points

    Ultimately, the problem of accurately segmenting and counting organelles becomes a problem of how to define the presence of an intracellular object based on the characteristics of an indirect marker. For many intracellular structures with sizes that are near the resolution limit of light microscopy, one solution is to assume that each local maximum in the image data represents one and only one object of interest or background noise. Because a local maximum is the first voxel to be associated with each object in an image, the maximum can be believed of as the seed point for the segmentation of each object. Local maxima are voxels whose intensity value is greater than the intensity values of voxels within some local, neighboring region. The local region dimensions can take into account the anisotropic nature of confocal image data and generally would be less than or equal to the minimum expected separation between objects. This assumption holds if the objects being segmented do not overlap below the resolution limit of the microscope and if the density of labeling with the marker is smooth within each object. Such assumptions work well for structures such as the ERES, where each local maximum corresponds to a local aggregation of COPII with a diameter of between 0.5 and 1 μm (13). For the remainder of this article, where examples of ERES and Golgi stacks will be considered, seed points will be assumed to be local maxima within the image data.

    To improve performance, the algorithm samples the margins of the image data to determine the mean and standard deviation of the image background. Subsequently, the analysis is restricted to voxel values greater than the background mean plus three times the background standard deviation. This cutoff is known as the Grubb's test and is used to quickly eliminate outliers of a normal distribution of values (14). This step significantly reduces the computation time of the program by eliminating most voxels that do not correspond to points within a cell.

    Classification of local maxima as background or foreground: counting ERES as an example

    The set of all local maxima within an image is made up of maxima caused by structures of interest and maxima caused by background noise. It is assumed that background maxima are caused by noise during image acquisition, non-specific staining by the dye or antibodies being used, or the presence of marker molecules in other structures at some small concentration (15). To form regions of influence around each local maxima, a watershed algorithm is applied to the image data using the local maxima as seed points. To separate watershed regions that represent structures of interest from regions represented by background, two general assumptions are made: that each structure of interest will in some way aggregate marker more strongly than any form of noise and that background noise will produce regions, which tend not to exceed a certain minimum volume.

    Thus, we assume that the strength of marker aggregation within a watershed region is proportional to the maximum rate of change of marker concentration, also known as the gradient, within that region. In general, we observed that watershed regions that contained objects of interest also tended to contain sharper edges than regions made up of background noise. We find the gradient by calculating the maximum difference in intensity between a voxel and one of its 26-neighbor voxels, then dividing this by the spatial separation between the voxels. Again, this step takes into account the anisotropic nature of confocal data. We have found that noise is reduced by applying a mean filter prior to calculating this gradient (3). Next, the gradient value for each watershed region is set to the maximum gradient value that occurs within its borders.

    To determine which watershed regions are because of noise, we rely on the observation that our structures of interest are above the resolution limit of confocal microscopy, but noise within the image is generally below this limit. Because we sampled at the Nyquist rate, which is below the resolution limit, background noise tended to produce watershed regions made up of a small number of voxels, which tended to be volumetrically smaller than the minimum resolvable volume (16). Therefore, we assume that a watershed region is most likely because of noise whenever the minimum resolvable volume cannot fit within it. Given our pixel sampling rate, we have used a 3 × 3 × 3 cube as this minimum volume. In this way, the total set of watershed regions are separated into likely background regions, which are smaller than the minimum volume, and likely foreground regions. Figure 2 shows a cell labeled for ERES using antibodies to the COPII marker, mammalian Sec31p (17). The upper right image shows all watershed regions (both foreground and background). Next, we use the assumption that the maximum gradient value for background regions will be less than foreground regions. This holds true if our structures of interest bind more strongly to our marker than background staining and noise, thus producing sharper edges with higher gradient values within the set of likely foreground objects. Next, a gradient cutoff is defined as the intersection of the histograms of likely foreground and likely background regions, as shown in the graph in Figure 2 (open arrowhead). Finally, watershed regions caused by noise are defined as those with a gradient value less than this cutoff value and the remaining watershed regions are defined as structures of interest. The lower right image of Figure 2 shows only the foreground watershed regions.

    Figure 2.

    Figure 2.

      Identifying foreground watershed regions. The top left image is a confocal z-slice of BSC-1 cells labeled for ERES prior to segmentation using the WatershedCounting3D algorithm. The upper right image is a z-slice of the segmentation result before removing background regions. The same z-slice, with background regions removed, is shown in the lower right. In the images on the right, segmented voxels are given a blue channel intensity value of 255, while the green channel retains the intensity value of the original data. The histogram plot shows the number of hypothetical foreground (blue) and hypothetical background (pink) regions that have a particular gradient value. The gradient value of the intersection of these two histograms (open arrowhead) represents the gradient cutoff value used to define the foreground regions illustrated in the lower right image.

      Early efforts to count the number of ERES per cell in mammalian cells were based on a global thresholding method (5). As shown in Figure 1A, it may not be possible to find a global threshold that does not undercount closely packed ERES or ignore faint and isolated ERES. However, as was shown in the blue-masked image in Figure 1B, our WatershedCounting3D algorithm does not suffer the same difficulty. Yet, this method will fail if the distance between the local maxima is less than the sampling rate of the confocal microscope, if local maxima do not represent at most one structure of interest, or if the staining does not produce intensity gradients that are higher than gradients found in background noise. The lower right image in Figure 2 shows the result of our WatershedCounting3D algorithm at the level of the whole cell. This image represents a single z-slice from a 3D confocal stack labeled for ERES (green). Voxels classified as ERES using this algorithm are shown in blue.

      Segmenting and counting the Golgi

      After proteins exit from the ER, they pass through the cisternal stacks of the Golgi apparatus (18). In mammalian cells, the Golgi comprises multiple stacks linked together by non-compact zones into a ribbon-like structure next to the nucleus (19). This tight packing has made it difficult to determine the number of Golgi stacks using the global thresholding method because the background is inhomogeneous.

      To test the WatershedCounting3D algorithm, the effect of nocodazole on the Golgi ribbon was exploited. Nocodazole breaks down the ribbon into individual, dispersed stacks that are more readily counted (20,21). To compare cells with equivalent numbers of Golgi, BSC-1 cells were synchronized by shearing-off mitotic cells (22) and plated onto coverslips for 5 h, the last 2 h in the presence or absence of nocodazole. After fixation and labeling for Golgi using the marker GM130 (Figure 3, left panels), the Golgi were segmented using the WatershedCounting3D algorithm (Figure 3, right panels) and the number counted. The mean number of Golgi in the absence of nocodazole was 157 ± 11 (n = 14) and 142 ± 10 (n = 10) in its presence. The similarity between these figures argues that the WatershedCounting3D algorithm can identify Golgi stacks even within the tightly packed ribbon.

      Figure 3.

      Figure 3.

        Golgi segmentation. BSC-1 cells were synchronized by shake-off, then fixed at 5 h after plating following treatment for 2 h with 200 ng/mL nocodazole or a dimethyl sulfoxide control. Cells were then permeabilized and labeled for Golgi (anti-GM130 with a Biomedia Cy5 secondary, red). Images show a single confocal z-slice before (left panels) and after (right panels) segmentation using the WatershedCounting3D algorithm. In the right panels, the blue channel intensity of segmented voxels is 255, while red channel intensity values remain unchanged. The inset numbers indicate the average number of Golgi/cell in control (n = 14) and nocodazole-treated cells (n = 10).

        Comparison between global thresholding and WatershedCounting3D

        As an additional demonstration of the usefulness of this new algorithm, it was compared with the most commonly used method, global thresholding. To do this, the number of ERES and Golgi were counted in cells that were synchronized and treated with or without nocodazole prior to fixation. Cells were then fixed, permeabilized and double-labeled for Golgi and ERES. Confocal images were analyzed using both methods, and the results are presented in Figure 4.

        Figure 4.

        Figure 4.

          Comparison between global thresholding and WatershedCounting3D. BSC-1 cells were synchronized by shake-off, then fixed at 13 h after plating following treatment for 2 h with 200 ng/mL nocodazole or a dimethyl sulfoxide control. Cells were then permeabilized and double-labeled for Golgi (anti-GM130 with a Biomedia Cy5 secondary) and ERES (anti-Sec31 with an Alexa 488 nm secondary). The number of ERES (left panel) and Golgi (right panel) were counted using either the global thresholding method (blue) or the WatershedCounting3D algorithm (red). 3D Confocal images were analyzed using ImageJ. For each sample, n = 10. Error bars show standard error of the mean.

          The left panel shows that the WatershedCounting3D algorithm identified more ERES than the global thresholding method in both the control and the nocodazole-treated cells. This further emphasizes the point made in Figure 1 that many ERES are clustered together and cannot be easily discriminated using global thresholding.

          The right panel shows that the WatershedCounting3D algorithm again counted about the same number of Golgi in both the control cells and those treated with nocodazole. In marked contrast, the global thresholding method counted approximately threefold fewer Golgi in the control cells as opposed to nocodazole-treated cells. This emphasizes the ability of the WatershedCounting3D algorithm to better discriminate closely associated Golgi stacks.

          Availability and extensibility

          We have implemented this algorithm as a Java-based ImageJ plug-in that is available for download under the GPL license at http://www.WatershedCounting3D.com. ImageJ is a freely available, National Institutes of Health (NIH) funded, Java-based open-source imaging program that can be downloaded from http://rsb.info.nih.gov/ij/. Instructions for downloading and installing ImageJ can be found at the URL.

          To use this ImageJ plug-in, simply install ImageJ then place the WatershedCounting3D.jar file in the ImageJ/plug-ins folder. After restarting ImageJ, a menu button called WatershedCounting3D will appear under Plug-Ins in the main ImageJ menu bar. Open a stack of raw image data to be analyzed, then press the WatershedCounting3D menu button to launch the plug-in. The WatershedCounting3D setup window, which is shown in Figure 5, will then appear.

          Figure 5.

          Figure 5.

            WatershedCounting3D setup frame. This image shows the screen-shot of the WatershedCounting3D setup frame, which is used to initialize and run the algorithm from ImageJ.

            Within the setup window, users can select which color channel of the input image to analyze, the dimensions to sample when defining local maxima and the format of the segmentation results. A visual mask of the results can be displayed either as a new stack of black and white images or a colored mask overlaying the original image data. Users can also enter the resolution and pixel sampling rate of the image data, so that results can be exported in the proper spatial dimensions. Once the algorithm is complete, text statistics of the segmentation result will appear in a new window and a second image window will display the visual results in whatever format was chosen. The text results can be saved to a comma separated text file that can be imported into many third-party spreadsheet and analysis programs. The visual results can be saved as a tiff file using ImageJ and analyzed/rendered with the user's method of choice. More detailed instructions for all these procedures are available at www.WatershedCounting3D.com and a Help window is available in the plug-in.

            Materials and methods


            ERES were labeled using a rabbit anti-Sec31 antibody at 1:2000 dilution (17). An Alexa 488 nm goat anti-rabbit secondary (catalogue number A11034; Molecular Probes) was used at 1:100 dilution. The Golgi was labeled using a mouse anti-GM130 antibody (catalogue number 610823; BD Biosciences), which recognizes a domain at the C-terminus of GM130, at 1:100 dilution. A goat anti-mouse Cy5 secondary (catalogue number SJ25101; Biomedia) was then used at 1:100 dilution.

            Cell culture

            BSC-1 cells (ATCC CL 26) were grown in minimal essential medium (Invitrogen) with 10% fetal bovine serum (Gemini, Irvine, CA, USA) in addition to 100 U/mL each of penicillin and streptomycin (catalogue number 15140-122; Invitrogen) together with 2 mM l-glutamine (catalogue number 25030-081; Invitrogen) at 37°C under 5% CO2.

            Immunofluorescence labeling

            Cells were grown for 24 h on glass coverslips (catalogue number 12-545-80; Fisherbrand), then fixed in 10% formalin for 20 min, washed with PBS and permeabilized using 0.1% Triton-X-100 for 5 min. Residual formalin was blocked using a 4% solution of BSA for 15 min. Antibodies were diluted in 4% BSA in PBS and applied for 15 min. PBS washes were performed after each step. Coverslips were mounted onto slides using Gel/Mount Aqueous Mounting Medium with anti-fading agents (catalogue number M01; Biomeda).


            Confocal images were obtained using a Ziess LSM 510 confocal microscope with a 100× NA 1.4 oil lens and 10× condenser. Images were acquired with a zoom setting of 1, x-y resolution of 0.08 μm per pixel, and z-step size of 0.2 μm. Binning was set to 1, averaging to 4, and pixel time to 1.6 μseconds per pixel. The pinhole in the 488-nm channel was set to 1 Airy unit, corresponding to an optical section thickness of 0.7 μm. For multichannel imaging, the pinholes of the non-488-nm channels were set to produce identical optical section thickness in all channels, while keeping the pinhole diameter at 1 Airy unit in the 488-nm channel. The gain was set to eliminate oversaturation and undersaturation. The laser power was 5% for 488 nm and 15% for 633 nm laser lines. Images were saved in 8-bit format.

            Cell synchronization

            BSC-1 cells were synchronized by mechanically shearing-off mitotic cells. In a manner similar to the mitotic shake-off method, mitotic cells were preferentially detached from the culture dish by passing pre-heated media over a plate of adherent cells, then media was collected along with detached mitotic cells (22) The procedure was repeated at 30-min intervals to collect cells that recently entered mitosis. Collected cells were plated onto Alcian Blue-coated coverslips.

            Drug treatment

            Nocodazole (catalogue number M1404 – 10 mg; Sigma) was applied to cells at a final concentration of 200 ng/mL. Stock solutions were prepared in dimethyl sulfoxide at a concentration of 0.2 mg/mL. Cells were treated with nocodazole for 2 h at 37°C under 5% CO2 prior to fixation.


            All programming was performed using the Java programming language (Sun Microsystems Santa Clara, CA, USA). The Eclipse integrated development environment was used to write, test and compile the program code (23). ImageJ libraries were used to construct an ImageJ plug-in.


            We thank FS Gorelick for anti-Sec31p antibody and for useful discussion as well as M Maggioni for technical advice. Also, we thank CL de Graffenried and A Satoh for their critical review of the manuscript. This work was supported by NIH MSTP TG 5T32GM07205 to T. J. G. and an NIH grant to G. W.