#### Image Pre-processing Methods

Images were acquired using a Leica TCS-4D (Leica Microsystems Inc., Bannockburn, IL) confocal microscope equipped with a krypton/argon laser. Typical image size is 512 × 512 × 20, and sampling resolution along *x*, *y*, and *z* direction are 0.45, 0.45 and 1.0 μm, respectively. The file format is usually a series of TIFFs. One problem with the confocal image stack is the inevitable attenuation of light along the depth of the specimen (49, 63). The uneven illumination along the depth of the specimen results in spatial variation of light intensity across the image volume. Besides the optical problems, photobleaching of the specimen contributes to further degradation of the image signal (49). Photobleaching can be modeled as a first-order decay process and therefore can be corrected computationally.

Several sophisticated methods have been used to correct for depth-dependent signal attenuation (61, 62). Unfortunately, the computational needs of these methods are high enough to preclude routine use on a large scale. For the present work, a much simpler method was adopted, keeping in mind the need for computational speed to process the large numbers of images in this study. This method works by comparing each image slice with the standard intensity slice in the stack. Denote the measured image intensity as **I** = (*I*_{1}, *I*_{2}, …, *I*_{n}), where *I*_{i} denotes the intensity of all pixels in the *i*th slice. Since background pixels are not our concern, only the foreground, i.e., visible pixels, are of interest, and we allow the intensity of the foreground pixels in the image stack to be denoted **I**_{o} = (*I*_{o}_{1}, *I*_{o}_{2}, …, *I*_{on}). These intensities can be estimated as *I*_{oi} = {*I*_{x} | *I*_{x} ∈ *I*_{i}, *I*_{x} > *v*_{t}}, where *v*_{t} is an intensity threshold determined from a gray-level histogram of the image, based on a measure of class separability (42). In other words, voxels that are brighter than the threshold represent an estimate of the image foreground. We denote the average intensity of foreground pixels in the *i*th slice as *ν*_{oi} = Ī_{oi}. To restore the foreground intensity, we first set a standard average intensity, which is simply taken as the maximum of *v*_{oi}, i.e., *v*_{os} = max(*v*_{oi}). This number is used to scale the pixel intensity in each image slice according to the formula *I*_{i} = *I*_{i} × (*v*_{os}/*v*_{oi}). After this scaling, the average intensity will be equal at each slice. The scaling ratio *v*_{os}/*v*_{oi} is independent of the foreground area of the specific slice.

Noise and other artifacts present in the image degrade the segmentation result, resulting in, for example, over-segmentation. Again, notwithstanding the availability of sophisticated, but computationally intensive algorithms (40), simpler and computationally efficient methods were adopted. Median filtering (50) is widely used as a noise reduction tool. A median filter with a kernel width of three was applied to each slice of the original 3D image to suppress the effects of shot noise, which is introduced by the photomultiplier tubes of confocal microscopes.

Separation of image background and foreground regions not only defines a broad area of interest but also reduces the ambiguity in the results due to uneven and dense background. Using the same threshold, *v*_{t}, it is also possible to smooth out variations in the background by setting all pixels below *v*_{t} to zero, leaving pixels with brightness above *v*_{t} unchanged.

Another type of preprocessing accounts for pixel misclassification errors due to factors such as imaging noise, random variations in staining, and presence of extraneous objects such as dust. For example, the thresholding process described above sometimes results in some small isolated artifactual objects, often due to the presence of dense noncellular matter in the background. To remove these artifacts, we use a minor region removal algorithm (43). First, all the connected components are identified in the thresholded image, and the sizes of all isolated objects are calculated. Objects smaller than a set threshold are considered to be artifactual, and their voxel intensity is changed to background (zero). Complementarily, it is also possible that some misclassified regions (holes) may lie entirely within an object of interest, such as a nucleus. To remove these holes, the minor region removal algorithms (43) are again applied to the binary logical complement of the thresholded gray image. In the complemented image, the background can be expected to be the largest “blob,” and holes are small island-like objects. If the objects are smaller than the predefined threshold, which is set empirically and manually by the user, assisted by the software system, then the intensity threshold *v*_{t} is assigned to the pixels belonging to those objects.

Finally, since the object and background have a varying brightness level in the original image, the following problem may occur: the detected objects (region of interest in the thresholded image) boundary may be corroded, containing breaks, gulfs, or peninsulas not corresponding to the physically-correct object boundaries. To solve this problem, morphological filtering is an effective and widely used solution (22, 48). We chose to use the morphological “opening” operation on the image, which achieves shape smoothing without the possible side effects of merging two separated objects. In detail, “opening” of an image *I*(*x*,*y*,*z*) by a structuring element, also known as a kernel, and denoted **K**, is written as. It is defined as *I*(*x*,*y*,*z*) ○ **K**. It is defined as *I*(*x*,*y*,*z*) ○ **K** = (*I*(*x*,*y*,*z*) Θ **K**) ⊕ **K**, where Θ is the morphological erosion operator and ⊕ is the morphological dilation operator (48). Due to the fact that 3D image stacks are usually anisotropic, i.e., the sampling distance size along the axial dimension is bigger than in the radial dimension, the structuring element for dilation and erosion is chosen as a 3D kernel of size 3 × 7 × 7, as illustrated in Figure 3. Note that some small objects may appear in the final segmentation due to the opening operation. These are eliminated by postprocessing operations after segmentation, but before any further analysis such as FISH quantification.

#### Connected Object Separation

As noted in the article opening paragraphs, the watershed algorithm is widely studied and used for efficient object separation. The term “watershed” comes from a graphic analogy attributed to Vincent and Soille (32). In this analogy, the gray-level image is treated as a topographic surface. It is assumed that holes have been punched in each regional minimum. If the surface is “flooded” from these holes, the water will progressively flood the “catchment basins” (i.e., the set of points on the surface whose steepest slope paths reach a given minimum) of the image. At the end of this flooding procedure, each minimum is completely surrounded by “dams,” which delimit its associated catchment basins. The set of dams obtained in this way corresponds to watersheds (called watershed surfaces in the 3D case) from a geophysical analogy, and provides a tessellation of the input image in its different catchment basins.

Several extensions of the watershed method have been described in the literature. For instance, Malpica et al. (36) proposed a method to segment nuclear clusters based on a 3D watershed algorithm, where two different image transformations and nuclear markers are presented to fit different types of clusters. This method is based on 2D image slices; nucleus cluster images are characterized by extreme complexity and variability, which limits the accuracy of 2D algorithms. Solorzano et al. (9) presented a 3D segmentation approach applied to cancerous specimens. The 3D confocal image is segmented into nuclear and background regions, and each nuclear region is classified by visual inspection. Objects classified as clusters are divided into individual nuclei using an automatic watershed algorithm.

Notwithstanding its popularity, the watershed algorithm has several limitations. These limitations arise from the fact that it relies on touching objects exhibiting a narrow “neck” in the region of contact. These “necklines” play a critical early role in estimating the number of objects in a given cluster. This process is notoriously error-prone. Considerable effort has been devoted to the design of algorithms for generating the correct set of “markers” to guide the object segmentation. The problem of determining the correct number of markers is inherently difficult, and is conceptually similar to the problem of automatically determining the number of groups in multidimensional statistical data (37–39).

The classical watershed algorithm also ignores important cues in the image. For instance, touching nuclei often exhibit prominent intensity gradients that can be interpreted and exploited to perform accurate object separation. The watershed algorithm does not have a built-in notion of object shape and size, i.e., it does not incorporate an object model that can provide additional cues for separating touching objects. Several attempts have been made to overcome some of the limitations of the watershed algorithm. One class of attempts has relied upon some form of modeling of the objects of interest, i.e., the nuclei. For instance, Roysam et al. (40), and Mackin et al. (41) modeled connected groups of nuclei as a cluster in the four-dimensional space comprised of the spatial dimensions (*x*,*y*,*z*), and the intensity dimension *I*(*x*,*y*,*z*). This type of model makes the weakest assumptions about the objects of interest. Ancin et al. (6) describe a more sophisticated modeling effort using a stronger set of modeling assumptions. In their work, the nuclei were modeled as blobs whose feature values (e.g., compactness and size) were within defined intervals. This model was used to verify whether or not a given image object represents a valid nucleus.

##### Computation of the gradient-weighted distance transform.

As noted above, the difficulty with watershed segmentation is that before applying it, one must check if the objects and their background are marked by a regional minimum, and if crest lines outline the objects. If not, one must transform the original image so that the contours to be calculated correspond to watershed lines, and the nuclear objects to catchment basins surrounded by them. To this end, two image transformations have been widely studied: distance transform and gradient transform. Distance transformation is purely geometrical, and accounts for the shape of objects. However, it is only good at dealing with regular shapes, either isolated or touching objects with bottleneck-shaped connections. The gradient transformation is intensity-based, assuming that internuclei gradients are higher than intranuclei gradients. As with all gradient-based operations, this transformation is sensitive to imaging noise, and usually results in over-segmentation. To overcome the above difficulties, we propose a combined image transformation called the “gradient-weighted distance transform”, which accounts for both geometric and intensity features.

Let **I** denote the preprocessed 3D image. We first compute its 3D intensity gradient, denoted **G**, as the difference between a pair of images, derived by dilation then erosion of the brightness value of the image. The structuring elements for dilation and erosion are chosen as shown in Figure 3. The problem of computing the 3D image gradient is a widely studied problem (51, 52). The main advantage of the morphological approach is computational efficiency, which is an important consideration for the application of interest.

In order to compute the geometric distance transform, the preprocessed image **I** is first divided by intensity thresholding using the automatically computed threshold *v*_{t} described earlier, as follows:

- (1)

The geometric distance transform **D** is calculated over **I**_{b}, using the chamfer distance transform (47). This algorithm can be computed using voxel masks in three dimensions, as illustrated in Figure 4. These masks are parts of a 3 × 3 × 3 cube. Two passes over the volume **I**_{b} are carried out. The forward mask is swept over the volume left to right, top to bottom, and front to back. The backward mask is swept in the opposite direction. At each position, the sum of the local distance in each mask voxel and the value of the voxel it covers are computed, and the new value of the central voxel (labeled 0 in Fig. 4) is the minimum of these sums. In summary,

- (2)

where *v*_{i,j} is the value of the pixel at position (*i,j*), and (*k,l*) is the position in the mask (the center being (0,0)). The local distance from the mask is denoted *c*(*k*,*l*) ∈ {*d*_{1},*d*_{2},*d*_{3},*d*_{4},*d*_{5}}, and is illustrated in Figure 4. Notice that these local distances already account for the anisotropy of **I**_{b}. Specifically, the unequal voxel size *δ*_{xy} and *δ*_{z} in the radial and axial dimensions are accounted for. Note that we use the Euclidean distance here instead of the optimal weights presented in published literature (47). Based on our tests, the optimal weight does not bring us any better results in terms of the final segmentation.

The geometric distance transform **D** and the gradient transform **G** must be combined into a single representation that captures the object separation cues available in the data. One challenge in this regard is the fact that these quantities are dissimilar, i.e., they are expressed in different units, and they can be normalized differently. The final result of the combining operation should be in distance units. These conflicting requirements are met by the following formula.

- (3)

where *G*_{min} and *G*_{max} are the minimum and maximum values of the gradient **G** needed for normalization. Note that the distance value **D′** is high at positions closer to the center of foreground objects, and in pixels with smaller gradient values. **D′** is smaller close to the boundary of the foreground objects, or where the gradient is relatively large. Intuitively, this captures the essential object separation cue that pixels with bigger gradient values tend to be on the boundary of an isolated object, or on the boundary between two touching objects. In practice, the watershed algorithm requires the inverse of this distance transformation. This inverse is denoted **T**, and is computed as follows:

- (4)

where max(**D′**) is the global maximum within the distance images, and *S*_{g} represents a Gaussian smoothing operator. The smoothing operation is needed because the transformed image may contain tiny noise-caused intensity peaks, usually due to uneven cell staining. Before applying watershed segmentation, the background pixels obtained previously need to be set to [max(**D′** + 1)], where the 1 is added to ensure that the maximum distance within the object (one) is greater than the distance value of the background.

Figure 5 illustrates the effectiveness of the combined measure in equation 3. Figure 5a shows a sample image, with the nuclei indicated in blue, and the FISH signal displayed in red. Figure 5b is a surface plot of the geometric distance **D** for the region indicated by the white box in Figure 5a. Figure 5c is the result of combining the geometric and gradient measures **D** and **G** as in equations 3 and 4 above. It is clear that the combined transformation in Figure 5e is effective in discriminating touching nucleus clusters that do not have the characteristic bottleneck-shaped connecting pattern.

##### Enhanced 3D watershed algorithm.

Unfortunately, applying the watershed algorithm to the above-described transformed image can directly lead to oversegmentation, i.e., a single nucleus may be divided into multiple fragments. Several solutions have been proposed in the literature to address this well-known problem. Some authors have proposed marker-controlled segmentation (24, 36). In this method, singular markers are defined and imposed as minima on the transformed image. From these minima, the watershed algorithm will find the crest lines in the image by simulating a flooding process (32). In general, this process is difficult. As noted in the introductory paragraphs of this article, the problem of discovering singular markers has the same conceptual level of difficulty as the well-known and unsolved problem of estimating the number of clusters in statistical cluster analysis problems. For a specific application of interest, this problem can sometimes be reduced using a priori knowledge of the solution, when available. This is not straightforward, especially when dealing with noisy images, and when the objects to be detected are complex and varied in shape, size, and intensity. This is especially true when segmenting dense nucleus clusters. Another approach presented in the literature is hierarchical segmentation (23, 28). For example, Beucher (23) defines different levels of segmentation starting from a graphical representation of the images based on the mosaic image transform. Then the hierarchical segmentation is refined by means of a new algorithm called the “waterfall algorithm,” which allows the selection of minima and catchment basins of higher significance compared to their neighborhood. This approach reduces the oversegmentation considerably.

Another type of solution proposed to the above problem requires targeted postprocessing. Postprocessing is performed to find the final contours of the objects. Specifically, some merging techniques have to be used to eliminate the oversegmentation (7). The present work builds upon this methodology. Specifically, we have used a post-watershed merging approach using object model information. The 3D watershed algorithm is simply carried out on the gradient-weighted distance image **T** using the immersion simulation approach described by Vincent and Soille (32), without deliberate markers (in other words, local minima become markers). The following section describes this approach in more detail.

#### Model-Based Object Merging Methods

After the 3D watershed algorithm has been carried out using the gradient-weighted distance transform described above, undersegmentation can be nearly eliminated, but the problem of oversegmentation remains, as illustrated in Figure 6. To overcome this problem, some type of merging mechanism has to be introduced in the postprocessing step. Several techniques have been proposed in the literature. One possible method is to make use of hysteresis thresholding to filter noisy weak contours, representing the watershed lines between small regions. As pointed out by Najman and Schmitt (28), hysteresis thresholding produces nonclosed contours and barbs in the case of watershed. Adiga and Chaudhuri (7) presented a rule-based heuristic merging technique to reduce oversegmentation, by identifying the oversegmented objects based on size, and merging them with their parent nucleus. This method represents a significant advance, but can be improved upon. Its limitations arise from the fact that merging purely based on object size is prone to error, especially when segmenting objects with great variation in size. Second, a global size threshold is not easy to set in an automated and consistent manner. Finally, the merging rule does not account for the features of other objects in the image. The present work is similar in principle, but is built upon a richer model of the objects of interest.

Even the most sophisticated pre- and postprocessing techniques cannot overcome the inherent limitation of purely intensity-based methods, namely the assumption that segmentation can be carried out solely based on information provided by the actual image. Actually, some kind of prior related knowledge can and must be incorporated into the algorithms for automatic nucleus segmentation. That is the reason the model-based segmentation approach has been introduced. Different procedures have been proposed in the literature to approach the problem of representation and usage of prior knowledge for image analysis, such as deformable shape models (27, 46) and statistical models (44, 45).

Due to the wide variation of object shapes and presence of many touching objects, in this work we introduce a statistical modeling–based approach to break the watershed surface, and eliminate oversegmentation. The deformable-shape models are computationally slower, and thus less attractive for the application of interest.

##### 3D object feature selection.

Statistical shape-modeling methods depend upon the availability of parametric models to describe the nucleus objects. These parameters must be selected carefully in order to accurately characterize the nucleus objects, and discriminate outliers from real nucleus objects in an effective manner. The set of parameters must be rich enough to describe complex objects. A realistic strategy for estimating these parameters is for the user to specify examples of valid and invalid nucleus objects, and to perform supervised morphometry on these objects. In practice, the tedium and labor cost of specifying these examples is high enough to limit the number of examples. This in turn forces us to limit the number of object modeling parameters. In this work, our primary training data is cell nuclei from rat brain tissue, where there are about 100 nuclei in each image. We use only a few parameters, as described below. Note that not all these features are actually used for all images. Globally-optimal feature selection is a nontrivial task, and a definitive solution is outside the scope of this article.

Let the location of the pixels in a cell nucleus denoted **p** = {*p*_{0}, *p*_{1}, …, *p*_{n}_{–1}}, where *p*_{i} = {*x*_{i}, *y*_{i}, *z*_{i}}. Their corresponding pixel intensity values are denoted **v** = {*v*_{0}, *v*_{1}, …, *v*_{n}_{–1}}. The following 3D features are readily measured.

###### Volume.

The volume (size) of the object, *V*, is the total number of voxels inside the object, i.e., *V* = *n*.

###### Texture.

The simplest texture measure, denoted *T*, is the standard deviation of intensities of all pixels inside the object

- (5)

where v̄ denotes the average nucleus intensity.

###### Convexity.

The convexity, *S*, of an object is defined as the ratio of the object volume to the volume of the convex hull of the object. The convex hull of an object can be formed by a method called Jarvis's March (53). The convexity is desired to be close to one for circular and elliptical objects, and less than one for concave objects.

###### Shape.

Let **Q** be the boundary pixels of the object. The shape feature, *U*, is defined as

- (6)

where | · | denotes the number of elements in a set.

To eliminate the effect of anisotropy on feature calculations, we use the following features computed from a 2D projection of the nucleus. Let **p′***= {p′*_{0}, p′_{1}, …, p′_{k−1}} denote the *k* pixels that belong to the projected nucleus, where *p′*_{i} = (x_{i}, y_{i}) is the 2D location.

###### Circularity.

Let *p̄′* denote the center of projected nucleus, then the distance between pixels **p′** and the center can be described as **d** = ∥**p′***− p̄′∥.* The circularity, *C*, is defined as

- (7)

###### Area.

The area, *A*, is the number of pixels of 2D projected nucleus, i.e., *A* = *k*.

###### Mean Radius.

Let **R** be the vector of the distance from the boundary pixels to the center *p̄′,* and the mean radius *R̄* is defined as the average of **R**, i.e., *R̄*.

###### Eccentricity.

The eccentricity, *E*, is defined as the ratio of the major axis to the minor axis, and can be estimated by the ratio of the maximum to minimum radius **R**, i.e., *E* = max(**R**)/min(**R**).

##### Statistical object model construction method.

The statistical object model is an *m*-dimensional Gaussian distribution defined on a vector of *m* features *X* = (*x*_{1}, *x*_{2}, …, *x*_{m}) drawn from the list above. The distribution requires the mean, denoted *X̄* and covariance matrix, denoted Σ_{X}. These parameters are estimated from a subset **C**_{t}, of the objects produced by the watershed algorithm described above (denoted **C**).

The training set **C**_{t} is selected as follows. It is known that objects representing intact nuclei in these results are generally characterized by a relatively large value of volume *V*, convexity *S*, and circularity *C*. Based on these considerations, the training set can be constructed by placing thresholds on volume *V*, convexity *S*, and circularity *C*, as described below:

- (8)

where *V̄,**S̄,* and *C̄* are the mean values of object volume, convexity and circularity, and σ_{V}, σ_{S} and σ_{C} are their corresponding standard deviations respectively, *t* is an empirically specified parameter that sets the degree of selectivity. Note that we remove all the nuclei that are cut by the *x*, *y*, and *z* axes to eliminate their influence on the training object selection, also the half-presented nuclei are not of interest, i.e., when we do FISH analysis in our study, they should all be excluded from further consideration.

Based on the above Gaussian model, we can measure the confidence score for any given object *c* with feature *X*, using the Gaussian probability that the object feature fits the model, as follows (37):

- (9)

##### Watershed surface breaking and object merging method.

To correct the oversegmentation produced by the watershed step, it is necessary to detect and break (eliminate) the false watershed surfaces and thereby merge nucleus objects. This is guided by a merging criterion based on a merging score derived from the confidence measure described above in equation 9.

Let **W** denote the set of watershed surfaces that separate adjacent 3D nucleus objects. As illustrated in Figure 7a, each watershed surface *w* ∈ **W** separates two touching nuclei, denoted as *c* and *c.* We define the gradient of *w* as the average intensity gradient among all pixels in the watershed surface *w*, i.e., *γ*_{w} = (∑_{i∈w} γ_{i})/n, where *n* is the number of pixels in *w*. In the same manner, we define the intensity gradient *γ*_{c} for each nucleus object *c* by averaging the intensity gradients among all pixels in *c*. Let *c*_{w} denote nucleus object formed by breaking *w* (in other words, merging *c* and *c* separated by *w*). Then, we have:

- (10)

Note that pixels corresponding to the watershed surface *w* itself should also be merged into *c*_{w}. The confidence score of *c*_{w}, based on equation 9 above, is called the “merging score,” and denoted *S* in the following. Intuitively, the merging decisions are based on the following two observations: 1) The merging score *S* should be higher than the score of either nucleus before merging, i.e., S and S. 2) The gradient of *w* should be relatively large compared with the gradient of nuclei *c* and *c.* This is based on assuming that intranuclear gradients are smaller than internuclear gradients, which generally holds true. With these observations in mind, we calculate the following ratios:

- (11)

The ratio *R* reflects the relative degree that the nuclei match the statistical model before and after merging, thus it accounts for the confidence we have on the breaking of *w*. The higher *R* is, the more confidence we have in merging *c* and *c.* The ratio *R* captures the intuition that a watershed surface with high intensity gradient is likely the boundary of two touching nuclei. The higher the *R,* the less likely that *w* represents background pixels, thus more likely that *w* belongs to the interior of a nucleus, rather than *c* and *c* being two nuclei separated by *w*. The above two ratios can be combined as follows into a single decision making criterion:

- (12)

where β is an empirical decision threshold (typical value 1.2).

Breaking of the watershed surface *w* results in the merging of two objects *c* and *c.* This procedure is repeated until no more watershed surfaces in **W** satisfy the condition in equation 12. Special attention needs to be given to nuclei that touch more than one object, as illustrated in Figure 7b. In this case, we have multiple candidate watershed surfaces to be selected for breaking. Intuitively, we must assign a higher priority to the one that has a greater merging score, i.e., break the watershed with the greatest *c*_{w} value before other watershed surfaces.

Let **W**_{c} denote the watershed surfaces that are adjacent to nucleus object *c*, and each *w* ∈ **W**_{c} separates *c* from its neighbors. The complete watershed surface breaking algorithm is described as follows.

##### Validation of the object features.

In order to obtain a measure of discriminative capability of the selected features, we adopted Fisher's discriminant ratio criterion (*FDR*) (37):

- (13)

where *u*_{1}, *u*_{2} are mean values of the feature in two classes, and *σ*_{1}, *σ*_{2} are their corresponding standard deviations. Prior to calculating the *FDR*, we need to have nucleus class information available (similar to the training data set). In this work, all the nuclei identified by watershed segmentation described previously can be classified into two categories: a set of intact nuclei, which should not be merged during post-processing; and a set of nucleus fractures resulting from oversegmentation that need to be merged with their adjacent neighbors. To classify them, we first run the model-based watershed surface breaker using the features previously defined. At this stage, minimal manual editing may be performed to correct misclassifications, aided by a graphical user interface (GUI) that is described in the next section. Once we have class information for all nuclei, we can calculate *FDR* for each desired feature. Table 1 shows their average values, obtained by testing on a series of images.

Table 1. Discriminative Capability of Various Features as Measured by Fisher's Discriminant Ratio (FDR)Object feature | Fisher discriminant ratio (FDR) |
---|

Volume (3D) | 0.50 |

Texture (3D) | 0.42 |

Convexity (3D) | 0.53 |

Shape (3D) | 0.37 |

Circularity (2D) | 0.33 |

Area (2D) | 0.25 |

Mean radius (2D) | 0.25 |

Eccentricity (2D) | 0.17 |