A flexible and robust approach for segmenting cell nuclei from 2D microscopy images using supervised learning and template matching


  • Cheng Chen,

    1. Center for Bioimage Informatics, Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    Search for more papers by this author
  • Wei Wang,

    1. Center for Bioimage Informatics, Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    2. Department of Electronic and Information Engineering, School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
    Search for more papers by this author
  • John A. Ozolek,

    1. Department of Pathology, Children's Hospital of Pittsburgh, Pittsburgh, Pennsylvania 15201
    Search for more papers by this author
  • Gustavo K. Rohde

    Corresponding author
    1. Center for Bioimage Informatics, Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    2. Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    3. Computational Biology Program, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
    • Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213.
    Search for more papers by this author


We describe a new supervised learning-based template matching approach for segmenting cell nuclei from microscopy images. The method uses examples selected by a user for building a statistical model that captures the texture and shape variations of the nuclear structures from a given dataset to be segmented. Segmentation of subsequent, unlabeled, images is then performed by finding the model instance that best matches (in the normalized cross correlation sense) local neighborhood in the input image. We demonstrate the application of our method to segmenting nuclei from a variety of imaging modalities, and quantitatively compare our results to several other methods. Quantitative results using both simulated and real image data show that, while certain methods may work well for certain imaging modalities, our software is able to obtain high accuracy across several imaging modalities studied. Results also demonstrate that, relative to several existing methods, the template-based method we propose presents increased robustness in the sense of better handling variations in illumination, variations in texture from different imaging modalities, providing more smooth and accurate segmentation borders, as well as handling better cluttered nuclei. © 2013 International Society for Advancement of Cytometry

Segmenting cell nuclei from microscopy images is an important image processing task necessary for many scientific and clinical applications due to the fundamentally important role of nuclei in cellular processes and diseases. Given a large variety of imaging modalities, staining procedures, experimental conditions, and so forth, many computational methods have been developed and applied to cell nuclei segmentation in 2D (1–9) and 3D images (10–14). Thresholding techniques (15, 16), followed by standard morphological operations, are among the simplest and most computationally efficient strategies. These techniques, however, are inadequate when the data contain strong intensity variations, noise, or when nuclei appear crowded in the field of view (7, 17) being imaged. The watershed method is able to segment touching or overlapping nuclei. Direct use of watershed algorithms, however, can often lead to over segmentation artifacts (6, 18). Seeded or marker controlled watershed methods (2, 3, 5–7, 19, 20) can be utilized to overcome such limitations. We note that seed extraction is a decisive factor in the performance of seeded watershed algorithms. Missing or artificially added seeds can cause under or over segmentation. Different algorithms for extracting seeds have been proposed. In Ref.20, for example, seeds are extracted using a gradient vector field followed by Gaussian filtering. Jung and Kim (8) proposed to find optimal seeds by minimizing the residual between the segmented region boundaries and the fitted model. In addition, various of postprocessing algorithms have been applied to improve the segmentation quality. For example, morphological algorithms (e.g., dilation and erosion) (7) can be used iteratively to overcome inaccuracies in segmentation. In Ref.21, learning-based algorithms were used for discarding segmented regions deemed to be erroneous. Similar ideas using neural networks can be seen in Ref.22.

When nuclei do not appear clearly in the images to be segmented (e.g., nuclear borders are not sharp enough or when a significant amount of noise is present), active contour-based methods (9, 11, 12, 23–27), especially those implicitly represented by level set (11, 12, 27), have been proposed to overcome some of these limitations successfully. As well known, the level set framework is well suited for accurate delineation of complicated borders and can be easily extended to higher-dimensional datasets. Ortiz De Solorzano et al. (11), for example, proposed an edge-based deformable model that utilizes gradient information to capture nuclear surfaces. Considering that strong gradients at object boundaries may be blurred and the noise and intracellular structures may also show strong gradients, Mukherjee et al. (28) proposed a level set model that also incorporates a region term using the likelihood information for segmentation of leukocytes with homogeneous regions. In segmenting cells in culture or in tissue sections, Dufour et al. (12) proposed a multilevel deformable model incorporating both a gradient term and a region term, adopted from Chan and Vese model (29), to segment cells with ill-defined edges. In Ref.30, Yan et al. also proposed a similar multilevel deformable model to segment RNAi fluorescence cellular images of Drosophila. In Ref.27, Cheng and Rajapakse utilized the Chan and Vese model (29) to obtain the outer contours of clustered nuclei, using a watershed-like algorithm to separate clustered nuclei. Similarly, Nielsen et al. (9) have described a method for segmenting Feulgen stained nuclei using a seeded watershed method, combined with a gradient vector flow-based deformable model method (31). Considering that some nuclei may appear to overlap in 2D images, Plissiti and Nikou (32) proposed a deformable model driven by physical principles, helping to delineate the borders of overlapping nuclei. In Ref.33, Dzyubachyk et al. proposed a modified region-based level set model, which addresses a number of shortcomings in Ref.12 as well as speeds up computation. To reduce the large computational costs of variational deformable models, Dufour et al. (34) proposed a novel implementation of the piece-wise constant Mumford–Shah functional using 3D active meshes for 3D cell segmentation.

Besides the methods mentioned earlier, several other approaches for segmenting nuclei based on filter design (35, 36), multiscale analysis (37), dynamic programming (38), Markov random fields (39), graph-based methods (40–42), and learning-based strategies (43–47) have been described. As new imaging modalities, staining techniques, and so forth are developed, however, many existing methods specifically designed for current imaging modalities may not work well. Later, we show that the application of some such methods can fail to detect adequate borders, or separate touching or overlapping nuclei, in several staining techniques. Therefore, considerable resources have to be spent to modify existing methods (or developing entirely new segmentation methods) to better suit the new applications.

Here, we describe a generic nuclear segmentation method based on the combination of template matching and supervised learning ideas. Our goal is to provide a method that can be used effectively for segmenting nuclei for many different types of cells imaged under a variety of staining or fluorescence techniques. We aim to guarantee robust performance by allowing the method to “calibrate” itself automatically using training data, so that it will adapt itself to segmenting nuclei with different appearances (due to the staining techniques for example) and shapes. The method is also “constrained” to produce smooth borders. Finally, given that the objective function used in the segmentation process is the normalized cross correlation (NCC), the method is also able to better handle variations in illumination within the same image, as well as across images. We note that template matching-based methods for image segmentation have long been used for segmenting biomedical images. One prominent example is the brain segmentation tool often used in the analysis of functional images (48). When segmenting nuclei from microscopy images, contour templates have also been used (43, 44). Here, we utilize similar ideas with some adaptations. First, our approach is semiautomated in that it first seeks to learn a template and statistical model from images delineated by the user. The model is built based on estimating a “mean” template, as well as the deformations from the template to all other nuclei provided in the training step. After this step, any image of the same modality can then be segmented via a template-based approach based on maximization of the NCC between the template estimated from the input images and the image to be segmented. We describe the method in detail in the next section and compare it to several other methods applied on different datasets in “Results” section. Finally, we note that our method is implemented in MATLAB computer language (49). The necessary files can be obtained through contact with the corresponding author (G.K.R.).

Materials and Methods

Given the large variation in appearance of nuclei in microscopy images, a completely automated (unsupervised) approach for segmenting nuclei from arbitrary images may be difficult to obtain. We, therefore, focus on a semiautomated approach, depicted in Figure 1, where the idea is to first construct a statistical model for the mean texture and most likely variations of shape to be found in the dataset to be segmented from hand-delineated images. Segmentation of any image of similar type is then achieved by maximizing the NCC between the model and the local image region. Part A outlines the training procedure whereby the user utilizes a simple graphical user interface to isolate several nuclei samples, which are then used to build the statistical model. Part B outlines the actual segmentation procedure, which proceeds to first find an approximate segmentation (seed detection) of an input image by matching the statistical model with the given image and then produces a final segmentation result via nonrigid registration.

Figure 1.

Overview of nuclear segmentation approach. Part A outlines the training procedure, which utilizes sample nuclei manually identified by the user to build a statistical model for the texture and shape variations that could be present in the set of nuclei to be segmented. The model is then sampled to form a detection filter-bank. Part B outlines the actual segmentation procedure which utilizes the detection filter-bank to produce a rough segmentation, and then refines it using nonrigid registration based on the NCC.


As outlined in part A of Figure 1, we utilize a simple graphical user interface to enable an operator to manually delineate rectangular subwindows each containing one nucleus sample from an image of the modality he or she wishes to segment. It is required by our system that each subwindow contains only one nucleus and recommended that the set of subwindows contains a variety of shapes and textures (small, large, bent, irregular shaped, hollow, etc.), because more variations present in the input images will translate into more variations being captured by the model. We note that it is not necessary for the user to provide the detailed outline for the nucleus present in each window. Rather, a rectangular bounding box suffices. In our implementation, given N such rectangular subwindows, which can be of different sizes, each subwindow containing one nucleus from the training set, we first pad each subwindow image by replicating the border elements so as to render each subwindow of the same size (in terms of number of pixels in each dimension). The amount of padding applied to each subwindow is the amount necessary for that subwindow to match the size of the largest rectangular subwindow in the set. The set of subwindows is then rigidly aligned to one subwindow image from the set (picked at random) via a procedure described in earlier work (50). As a result, the major axis of nuclei samples is aligned to the same orientation. In this case, we choose the NCC as the optimization criterion for measuring how well two nuclei align and include coordinate inversions (image flips) in the optimization procedure.

The set of N rigidly aligned subwindows, denoted as equation image from now on, is then used to estimate a template that will represent an “average” shape as well as texture for this set. Several procedures can be used for this purpose. In this work, we choose the procedure outlined in Heitz et al. (51) where the idea is to iteratively deform all nuclear images (subwindows) toward a template image that is closest (in the sense of least deformation) to all other images in the set. Figure 2 contains a diagram depicting the procedure we use. The procedure depends on the computation of a nonrigid map that aligns two images I i, and I j, via equation image, with x an input coordinate in the image grid Ω, and a nonrigid mapping function equation image. In our approach, the nonrigid registration is computed via maximization of the NCC cost function, which is described in detail in the Appendix. Given the ability to nonrigidly align two nuclear images, the template estimation procedure consists of choosing a subwindow image from the set at random and denoting it equation image. Then, starting with the iteration k = 1:

  • 1.Nonrigidly register equation image to each subwindow image equation image such that equation image
  • 2.Calculate a temporary average shape template equation image, with equation image, and equation image the inverse of the transformation function f (which we compute with Matlab's “griddata” function).
  • 3.Compute the average texture on the same average shape template above by first registering each subwindow image in the set to equation image (i.e., equation image) and update the template via equation image.
  • 4.Compute equation image (sum of squared errors). If equation image stop, otherwise set equation image and go to step 1.
Figure 2.

Diagram depicting training procedure.

The end result is an image equation image that represents an average template (in the sense of both shape and texture), as well as a set of spatial transformations that map each subwindow image to the final template via equation image. We next apply the principal component analysis (PCA) technique (52) to derive a statistical model for the possible variations in the shape of the sample nuclei. We encode each spatial transformation equation image as a vector of displacements via equation image, with L the number of pixels in each image. Thus, the mean and the covariance of the set of spatial displacements equation image are:

equation image(1)
equation image(2)

Using the PCA method, the principal deformation modes are given by the eigenvectors equation image of the covariance matrix equation image satisfying equation image. A statistical model for the variations in shape is obtained by retaining the top eigenvalues and eigenvectors corresponding to 95% (this percentage chosen arbitrarily) of the variance in the dataset. This means that the number of eigenvectors used in each segmentation task (imaging modality) will depend on how much variability is present in the (training) dataset. In cases where variability is large, more eigenvectors will be necessary. In cases when variability is small, a small number of eigenvectors will be used. In all cases, the accuracy of the PCA reconstruction procedure is set to 95% (of the variance). The model can be evaluated by choosing an eigenvector equation image and calculating equation image, where b p is a mode coefficient. The corresponding template is obtained by reassembling equation image into a corresponding spatial transformation equation image and computing equation image.

In our approach, the statistical model is evaluated for equation image in intervals of equation image. The result of this operation is a set of images obtained by deforming the mean template and representing nuclear configurations likely to be encountered in data to be segmented. In addition, this set of images is augmented by including rotations (rotated every 30°, totaling seven orientations in our implementation) as well as variations in size (two in our implementation). Finally, we discard the top 1% and bottom 1% (in the sense of area) of the templates to avoid potentially segmenting structures that would be too small or too large to be considered as nuclei. A data-dependent way of choosing this threshold is also described in the “Discussion” section. The reason being that templates that are too small may cause over segmentation, whereas templates that are too large may merge nuclei that are close to each other. Figure 1 (top right) contains a few examples of template images generated in this way for a sample dataset. We denote the set of template images generated in this way as the “detection filterbank” to be used as a starting point for the segmentation method described in the next subsection.

Segmenting the mean template

Our procedure depends on the mean template, computed as estimated earlier, on being segmented in the sense that pixels in the foreground and background are known. Although many automated methods can be considered for this step, we choose to utilize a rough contour manually provided by an user. The contour is then refined utilizing a level set approach (53). The advantage is that such a process can be repeated until a satisfactory segmentation result has been made by the user. Figure 3 shows the outline of the procedure.

Figure 3.

The mean template image must be segmented before the segmentation algorithm based on NCC maximization can be utilized. We utilize a semiautomated approach wherein an user draws an initial contour, and a level-set-based algorithm refines it to accurately match its borders.


Our segmentation algorithm is based on the idea of maximizing the NCC between the statistical model for a given dataset (its construction is described in the previous subsection) and local regions in an input image to be segmented. The first step in such a procedure is to obtain an approximate segmentation of an input image, here denoted as equation image, by computing the NCC of the input image against each filter (template image) in the detection filterbank to obtain an approximate segmentation. To that end, we compute the NCC between each filter equation image, equation image and the image to be segmented via:

equation image(3)

where equation image and equation image, with equation image denoting the neighborhood around equation image of the same size as filter W p. We note that Nc is the number of channels in each image (e.g., one for scalar images and three for color images). A detection map denoted M is computed as equation image. We note that the value of the cross correlation function γ above is bound to be in the range equation image. We also note that the index p that maximizes this equation also specifies the template W p that best matches the region equation image and is used later as a starting point for the deformable model- above is bound to be in the range [−1,1]. We also note that the index p that maximizes this equation also specifies the template W p that best matches the region u and is used later as a starting point for the deformable model-based optimization.

The detection map equation image is mined for potential locations of nuclei using the following two principles: (1) only pixels whose intensities in M are greater than a threshold μ are of interest. (2) The centers of detected nuclei must be at least a certain distance far away from each other. This helps to prevent, for example, two potential locations from being detected within one nucleus, causing over segmentation. These two principles can be implemented by first searching for the highest response in M. Subsequent detections must be at least a certain distance from the first. This is done by dilating the already detected nuclei (remember, the filtering step above not only defines regions where nuclei might be located but also the rough shape of each). This process is able to detect nuclei of different shapes due to simulated templates of various shapes and orientations generated in the previous step, and it is repeated until all pixels in the thresholded detection image M have been investigated. We note again that each detected pixel in M has its associated best matching template from the detection filterbank. Therefore, this part of the algorithm provides not only the location of a nucleus but also a rough guess for its shape (see bottom middle of Figure 1) and texture.

Once an initial estimate for each nucleus in an input image is found via the procedure described earlier, the algorithm produces a spatially accurate segmentation by nonrigidly registering each approximate guess to the input image. The nonrigid registration nonlinearly adapts the borders of the detected template so as to accurately segment the borders of each nuclei in the input image. In addition, the nonrigid registration approach we use also is constrained to produce smooth borders. Details related to the nonrigid registration are provided in the Appendix. Rather than optimizing all guesses at once, which could lead to difficulties such as a large number of iterations in our gradient ascent-type strategy, each nucleus is segmented separately.

Segmenting touching nuclei

An important feature of our template matching approach is that it is capable of segmenting touching nuclei without difficulties with a small modification of the procedure described earlier. In our method, if two (or more) nuclei are detected to be close to each other (e.g., the closest distance between their best matching templates' borders is smaller than 10 pixels), these nuclei are regarded as being potentially in close proximity to each other. If so, their best matching templates obtained from the filter bank procedure above are taken together under a subwindow and then nonrigidly registered to the same subwindow in the real image using the same optimization algorithm in the Appendix. An example showing the segmentation of two nuclei in close proximity to each other is shown in Figure 1 (bottom row). The left part of this portion of the figure shows the initial estimates from the filterbank-based estimation of candidate locations. The result of the nonrigid registration-based estimation of the contours for each nucleus is shown at the bottom right corner of the same figure. The black contours indicate the borders of the best matching templates (the initial guesses), and the white lines delineate the final segmentation result after nonrigid registration.

Experiments Overview

Data acquisition

We demonstrate our system applied to several different cell nuclei datasets:1) a synthetic dataset BBBC004v1 generated with the SIMCEP simulating platform for fluorescent cell population images (54, 55);2) two real cell datasets (U2OS cells and NIH3T3 cells) acquired with fluorescence imaging (56);3) and a histopathology dataset obtained using thyroid tissue specimens with several different staining techniques. The primary goal for the simulated dataset is to obtain an accurate count for the number of nuclei in each field of view. Each simulated image contains 300 objects with different degrees of overlap probability (ranging from 0.00 to 0.60). The U2OS (48 images, each containing multiple nuclei) and NIH3T3 (49 images) cells were obtained with the Hoechst 33342 fluorescence signal, and the ground truth (including accurately delineated borders) is provided by experts (56). Of these, the U2OS dataset is more challenging, with nuclei tending to be more varied in shape and more clustered together. The intensity of the NIH3T3 images, however, is more nonuniform than the U2OS dataset. In addition, we apply our method to segmenting nuclei from histopathology images taken from tissue sections of thyroid specimens. Tissue blocks were obtained from the archives of the University of Pittsburgh Medical Center (Institutional Review Board approval #PRO09020278). Briefly, tissue sections were cut at 5- equation image thickness from the paraffin-embedded blocks and stained using three techniques. These include the Feulgen stain that stains deoxyribonucleic acids only. If no counterstaining is performed, then only nuclei are visible demonstrating chromatin patterns as deep magenta hues shown in Figure 6a. The second is a silver-based technique that stains the intranuclear nucleolar organizing regions (NORs) (black intranuclear dots) and counterstained with nuclear fast red that uses kernechtrot that dyes nuclear chromatin red (Figure 6b). The third is the same silver-based staining for NORs without counterstaining (Figure 6c). All images used for analysis in this study were acquired using an Olympus BX51 microscope equipped with a 100X UIS2 objective (Olympus America, Central Valley, PA) and 2 mega pixel SPOT Insight camera (Diagnostic Instruments, Sterling Heights, MI). Image specifications were 24 bit RGB channels and 0.074 μm/pixel, equation image field of view. More details pertaining to the image acquisition process for this dataset are available in Ref.57.

Experimental setup

We note that our system is able to work with grayscale (single color) images as well as with color images. Equation (A1), in the Appendix, allows color images to be used, while the method can also be used to segment 3D data by defining the inter products and convolutions utilized in Eqs. (A1) and (A2) in three dimensions. In addition, we mention that for color images, each color channel (R, G, and B) is equally weighted in the approach we described earlier. This allows for segmentation even in the case when the optimal color transformation for detecting nuclei is not known precisely (as is the case in many of the images shown). In cases where this information is known precisely, the approach we proposed can be used with only the color channel that targets nuclei, or with the image after optimal color transformation. In each experiment, k sample nuclei (k is arbitrarily chosen as 20 in our experiments) were chosen by the authors for the training process. All but one of the parameters remained constant for all experiments. The percent of variance retained in PCA analysis was set to (95%), equation image for the calculation of the average template, the step size in the gradient ascent procedure κ was set as 5×104, the scale number in the multi-scale strategy s was set as 2. For smoothing parameter σ in the gradient ascent procedure, a higher σ value helps to smooth the contour, while a lower σ value helps to better capture the real border of nuclei. In this paper, σ was experimentally set as 1.5 (pixels). The only parameter that varied from dataset to dataset was the detection threshold μ. Whereas a higher value of μ may miss some nuclei (e.g., out of focus), a lower value of μ may confuse noise and clutter for actual nuclear candidates. There are two ways to determine an appropriate value for detection threshold μ. When the ground truth (e.g., manual delineation of nuclei) for the training images is provided, μ value can be selected automatically by maximizing the dice metric equation image (58) between the detections and provided ground truth. Here, equation image counts the number of nuclei in different results, GT corresponds to the ground truth result, while equation image corresponds to the nuclei detection result with respect to μ. When ground truth is not available, an appropriate μ value has to be empirically selected by the user to detect most nuclei in the training images for each application or dataset. In the experiments shown later, ground truth was not used for selecting μ. Rather, μ was empirically determined for each dataset based on empirical experimentation with a given field of view (containing multiple nuclei) from the corresponding dataset.

For comparison, we choose several different types of algorithms commonly used for cell nuclei segmentation. These include the level set method [Chan and Vese model (29)], an unsupervised learning method [color K-means (59)], and the direct seeded watershed method, which uses a shape-based method to separate clumped nuclei [implemented by CellProfiler (60)]. As the CellProfiler software (60) is only able to process 2D grayscale images, a typical choice is to convert the color histopathology image to grayscale image by forming a weighted sum of R, G, and B channels, which keeps the luminance channel ( equation image) (61). In addition, we prefer to take the general approach of normalizing all image data to fit the intensity range of [0, 1] by scaling the minimum and maximum of each image (discounting outliers set at 1% in our implementation). As the level set method and the K-means method may not be able to separate clumped nuclei very well, a common solution is to apply seeded watershed algorithm on the binary masks segmented from level set method and K-means method, in which seeds are defined as the local maxima in the distance transformed images of binary masks (62). Note that H-dome maxima (62) are calculated on the distance transformed images to prevent over-segmentation, and for different datasets, the H value is arbitrarily selected for the best performance. These techniques were chosen, as they are similar to several of the methods described in the literature for segmenting nuclei from microscopy images (12, 33). In the following sections, we will show both qualitative and quantitative comparisons of these methods.


Qualitative Evaluation

In Figures 4–6, we compare the results for different types of datasets using different methods. In Figure 4, the results are obtained by different methods applied to the segmentation of synthetic nuclei with clustering probability set to 0.3. Note that we use green (in color), red, and yellow square dots to represent correct detections, missed detections, and spurious detections, respectively. In Figure 5, the first column shows the sample segmentations of the U2OS data (under uniform illumination), and the second column shows the sample segmentations of the NIH3T3 data (under heterogeneous illumination), in which the white contours delineate the borders of segmented nuclei. The first row of Figure 5 corresponds to results computed using the approach we described in this article, the second row corresponds to level-set-based method, the third row corresponds to color K-means-based method, and the fourth row corresponds to direct seeded watershed method. In addition, we show the hand-labeled images of U2OS data and NIH3T3 data as the ground truth separately in Figures 5i and 5j in the final row. In Figure 6, we show the segmentation results on sample histology images with different staining techniques, in which each column corresponds to a distinct staining technique (details have been described in the previous section), whereas each row corresponds to a distinct segmentation method (the row order is the same as Figure 5).

Figure 4.

Nuclei counting in synthetic images. Upper left: results of our template matching approach. Upper right: result obtained with level set method. Bottom left: results obtained with color K-means-based method. Bottom right: results obtained with seeded watershed method. Note that green square dots represent correct detections, red square dots represent missed detections, and yellow square dots represent spurious detections. [Color figure can be viewed in the online issue which is available at wileyonlinelibrary.com.]

Figure 5.

Nuclei detection and segmentation from different fluorescence images. Note that the improvements are pointed out by white arrows. First row: results obtained with our template matching approach. Second row: results obtained with level-set-based method. Third row: results of color K-means-based method. Fourth row: results of seeded watershed method. Last row: hand-labeled results as the ground truth. First column: results of U2OS fluorescence image under uniform illumination. Second column: results of NIH3T3 fluorescence image under heterogeneous illumination.

Figure 6.

Nuclei segmentation from histopathology images with different stainings. Note that the improvements are pointed out by black arrows. First row: results of our template matching approach. Second row: results of level-set-based method. Third row: results of color K-means-based method. Fourth row: results of seeded watershed method. [Color figure can be viewed in the online issue which is available at wileyonlinelibrary.com.]

From the comparison, we can see that when nuclei samples are imaged clearly with distinct enough borders (e.g., parts of image in Figures. 4a and 5a), all methods tested are able to achieve reasonable results. However, when noise or clutter is present, or when images are screened under uneven illumination (intensity inhomogeneities can be seen in Figures. 5a and 5b), most methods fail to segment nuclei very well (Figures 5d, 5f, and 5h). Comparatively, our template matching approach still performs well on these images. The improvements are pointed out by white arrows in Figure 5. In addition, our template matching approach can be naturally applied to higher-dimensional data, as can other algorithms, such as “RGB” channel images (Figures. 6a–6c), and achieve what can be visually confirmed as better segmentation results over the existing methods tested. The improvements are pointed out by black arrows in Figure 6. We note that in several locations (pointed out by arrows) our method performs better at segmenting cluttered nuclei. We also note that other methods often detect spurious locations as nuclei. Finally, we also note that our template matching approach is also much more likely (because it is constrained to do so) to produce contours that are more smooth and more realistic than the several other methods used for comparison.

Quantitative Evaluation

We used the synthetic dataset described earlier to calculate the average count produced by each method. We also studied the performance as a function of the clustering probability for this simulated dataset. The result is shown in Table 1, where C.A. refers to “count accuracy”, whereas O.P. refers to “overlap probability” of data at each column. For the fluorescence microscopy data (U2OS and NIH3T3), we follow the same evaluation procedure as documented in Ref.56, including:1) Rand and Jaccard indices (RI and JI), which are used to measure the fraction of the pairs where the two clusterings agree (higher means better);2) two spatially aware evaluation metrics: Hausdorff metric and normalized sum of distances (NSD) (smaller means better);3) counting errors: split, merged, added, and missing (smaller means better).

Table 1. Nuclei counting accuracy
AlgorithmsC.A. (O.P.: 0) (%)C.A. (O.P.: 0.15) (%)C.A. (O.P.: 0.30) (%)C.A. (O.P.: 0.45) (%)C.A. (O.P.: 0.60) (%)
  1. Note that the bold face represents the best performance for each data (each column).

Template Matching99.886.584.780.676.2
Level Set (29)99.486.483.178.671.2
K-means (59) 100.087.784.380.272.5
Seeded Watershed (19)99.9 91.0 88.1 84.7 78.4

We also compare the results of the methods discussed earlier together with two other methods: active masks (63) and a merging-based algorithm (3), as well as a manual delineation result. In Table 2, for both U2OS and NIH3T3 data, we can see that although the Hausdorff metric values are quite high for our template matching approach, most segmentation metrics are comparable or better than many of the existing algorithms. Our segmentation result also performs better than the manual delineation result (JI, Split for U2OS data, Split, and Missing for NIH3T3 data) explored in Ref.56. More details pertaining to each method used in this comparison are available in Ref.56. The high Hausdorff metric can be explained by two reasons:1) some bright noise regions are detected (especially in NIH3T3 dataset) and no morphological postprocessing is used in our template matching approach, and 2) we choose a relatively high threshold μ that discards some incomplete nuclei (small area) attached to the image border in some images. However, these incomplete nuclei are included in the manual delineated ground truth. The first reason may explain why the ”Added” error for NIH3T3 dataset is much higher than for U2OS dataset. In addition, the second reason may also explain why, excluding the active masks method (63), our algorithm misses more of the U2OS cells (last column of this table). On the other hand, for the NIH3T3 image data which contains intensity heterogeneities, the method we propose misses the fewest nuclei. We also note that, for the Rand and Jaccard indices, the NSD metric, and splitting errors, for both U2OS and NIH3T3 dataset, our results are similar to or better than the best results produced by other methods (excluding the manual delineation result).

Table 2. Quantitative comparison of nuclei segmentation
Algorithm (U2OS/NIH3T3)RIJIHausdorffNSD (× 10)SplitMergedAddedMissing
  1. Note that the bold face represents the best performance for each metric (each column).

Watershed (direct) (19)91%/78%1.9/1.634.9/19.33.6/3.713.8/2.91.2/2.42.0/11.63.0/5.5
Active Masks (63)87%/72%2.1/2.0148.3/98.05.5/5.010.5/1.92.1/1.5 0.4/3.910.8/31.1
Merging Algorithm (3)96%/83%2.2/1.9 12.9/15.90.7/2.51.8/1.62.1/3.01.0/6.83.3/5.9
Level Set (29)91%/81%2.39/2.3096.6/122.80.85/5.01.1/1.40.35/1.42.75/4.2 0.85/8.2
K-means (59)90%/78%2.36/2.3594.6/100.61.05/6.151.56/0.45 0.3/0.92.6/2.751.6/17.4
Template Matching 95%/91% 2.50/2.7277.8/131.2 0.64/2.65 0.58/0.511.45/2.490.9/3.73.48/2.8

Finally, we also studied how the number of nuclei used for training affects the performance of the proposed method. This was done for both U2OS and NIH3T3 dataset, by randomly selecting nuclei (of different sample sizes), and then implementing the method as described earlier. We found that the performance of several different quantitative metrics, such as the Rand index, NSD, and so forth, do not vary significantly when different amounts/types of nuclei samples are used (data omitted for brevity).

Summary and Discussion

We described a method for segmenting cell nuclei from several different modalities of images based on supervised learning and template matching. The method is suitable for a variety of imaging experiments given that it contains a training step that adapts the statistical model for the given type of data. In its simplest form, the method consists of building a statistical model for the texture and shape variations of the nuclei from the input of a user, and then segmenting arbitrary images by finding the instance in the model that best matches, in the sense of the NCC, local regions in the input images. We note that given an experimental setup, once the training operation is completed, the method is able to segment automatically any number of images from the same modality. We have demonstrated the application of the method to several types of images, and results showed that the method can achieve comparable, and often times better, performance compared with the existing specifically designed algorithms. Our main motivation was to design a method for segmenting nuclei from microscopy images of arbitrary types (scalar, color, fluorescence, different staining, etc.). To our knowledge, ours is the first method to apply a template matching approach that includes texture and shape variations to accurately delineating nuclei from microscopy images. In addition, to our knowledge, ours is the first method to utilize a supervised learning strategy to build such a statistical model, which includes texture and shape variations in multiple channels, for detecting nuclei from microscopy images.

In a practical sense, our method provides three main contributions. First, its overall performance is robust across different types of data with little tuning of parameters. We have demonstrated this here by applying the same exact software (with the only difference in each test being the value for μ) to a total of six different imaging modalities and showing the method performs as well or better than all other methods we were able to compare against. The performance was compared quantitatively and qualitatively, using both real and simulated data. Comparison results with a total of six alternative segmentation methods are shown here. Other, simpler, segmentation methods were also used for comparison, including several thresholding schemes followed by morphological operations. The results of these were not comparable to many of the methods shown here. Therefore, we have omitted them for brevity. Second, among the methods we have tested in this manuscript, we show that our method is the only method (besides manual segmentation) that is capable of handling significant intensity inhomogeneities. This is due to the fact that we utilize the NCC metric in the registration-based segmentation process. The NCC metric is independent of the overall intensity of the local region of the image being segmented. Finally, we also mention that, among all methods tried, the template matching method we described produced noticeably more smooth and accurate borders with fewer spurious contours. This can be seen, for example, by close observation of Figure 6. The smoothness in the contours obtained by our method is primarily due to the fact that the statistical modeling we use includes only the main modes of variation in nuclear shape. These tend to be, topically, size, elongation, as well as bending (in addition to rotation). High-order fluctuation in contours does occur in nuclei at times, but these do not occur as often as the ones already mentioned. We note that the method is still flexible enough to accurately segment nuclei that do not conform to these main modes of variation given the elastic matching procedure applied in the last step of the procedure.

We also note that our algorithm has several parameters including the percent of variance in PCA analysis, ε in the calculation of “average” template, equation image in the nonrigid registration procedure, and μ in the approximate segmentation procedure. The algorithm is not unduly sensitive to these, as the same fixed parameters were utilized in all six experiments (datasets) used in this article. The only parameter that was selected differently for each dataset was the detection threshold μ. When ground truth is available, we described a method to automatically choose the optimal μ for the given dataset. In addition, in our current implementation, we discard the top and bottom 1% (in size) of the generated templates, in an effort to reduce outlier detections. This percentage too could be made dataset dependent through a cross validation procedure, when the precise ground truth is present.

Finally, it is important to describe the computational cost of our template matching approach, which is also important in evaluating the performance of an algorithm. Our approach consists of a training stage and a testing stage and is implemented in MATLAB 64 bit mode and tested on a PC laptop (CPU: Intel core i5 2.30 GHz, memory: 8 GB). The computational time for training a statistical model (560 simulated templates) from 20 nuclei samples (window size: equation image), for example, is about 1.6 h. Detecting and segmenting all cell nuclei (36 nuclei) from a fluorescence image (equation image) take about 20 min (about half a minute per nucleus). We note however, that the computational time can often be significantly reduced by implementing the algorithm in a compiled language such as C, for example. In addition, we note that the computational time should be considered in context to alternative segmentation methods capable of producing results (albeit not as accurate) on similar datasets. The level set algorithm by Chan and Vese, which is used in a variety of other nuclear segmentation methods, takes even longer to compute on the same image (23 min) in our implementation (also in MATLAB). Finally, we note that the computational time of our algorithm can be decreased by utilizing a multiscale framework. That is, instead of performing the filtering-based approach for detection in the original image space, we have also experimented with first reducing the size of the image (and templates) by two for the initial detection only (the remaining part of the method utilized the full resolution image). Thus, we were able to reduce the total computation time for the same field of view to roughly 10.6 min. The accuracy of the final segmentation was not severely affected (data not shown for brevity). Future work will include improving the computational efficiency of this method by further investigation of multiscale approaches, as well as faster optimization methods (e.g., conjugate gradient). Finally, we note again that the approach described above utilizes all color information contained in the training and test image. In cases where the nuclear stain color is known precisely, the approach can be easily modified to utilize only that color. In addition, many existing techniques for optimal color transformation (64) can also be combined with our proposed approach in the future for better performance.


Here, we describe the nonrigid registration algorithm used in both the training and the segmentation steps outlined earlier. Let equation image represent a target image (usually a raw image to be segmented) to which a source image (in our case the template) equation image is to be deformed such that equation image, with equation image representing the warping function to be computed. We wish to maximize the square of the multichannel NCC between the two images:

equation image(A1)

where Nch is the number of channels, and equation image, where the sum is computed over the (fixed) image grid. We note that maximizing the squared NCC is equivalent to maximizing the NCC. We choose the squared NCC, as it provides a more general framework, in which both positive and negative cross correlations can be optimized to the same effect. Equation (A1) is maximized via steepest gradient ascent. The gradient of it is given by:

equation image(A2)

In practice, we convolve equation image with a radially symmetric Gaussian kernel of variance equation image to regularize the problem. Optimization is conducted iteratively starting with equation image, and equation image, where equation image is the Gaussian kernel, equation image represents the digital convolution operation, and κ is a small step size. Optimization continues until the increase in the NCC value falls below a chosen threshold.

In addition, we perform the maximization above in a multiscale framework. That is, we utilize a sequence of images equation image, equation image, and equation image, where equation image denotes the image T downsampled by four (reduced to 1/16 of its size) after blurring, equation image denotes the image T downsampled by two (reduced to 1/4 of its size) after blurring, and equation image denotes the original image being matched. The algorithm starts by obtaining an estimate for equation image (using the gradient ascent algorithm described above) using images equation image. The estimate of the deformation map u is then used to initialize the same gradient ascent algorithm using images equation image, and so on.