The initial watershed segmentation described above correctly splits many of the multinuclear clusters, but some over-segmentation persists, necessitating an algorithm for merging fragments. Several methods have been proposed in the literature to merge the over-segmented objects using cues such as the intensity gradient at the touching border (37), the size of the objects (45), or a combination of other morphological features (10). Since a large number of merging possibilities exist, the need arises for an efficient algorithm for managing the merging process. In our prior article (22), we described a hierarchical algorithm that searches for optimal combinations of nuclear fragments (22, 31, 53) guided by a model of the nuclei. In this article, we extend this methodology to accommodate multiple models. The merging procedure (Fig. 2) searches over all the objects generated by initial segmentation. For each object, it builds a merging tree, forms the merging candidates, selects the most likely model, fits the model to the image data, and computes confidence scores. The final step selects the optimal subset of merging decisions.

###### Merging candidate generation

To efficiently carry out the merging procedure, we build a hierarchical merging tree based on the region adjacency graph (RAG) data structure (54). The details of this procedure are described in our earlier article (22). We provide a brief summary here. As illustrated in Figure 3A, two or more objects are neighbors if they share voxels along a shared boundary. We define the notion of a Root Path Set (RPS) for each node object *v* at depth *d* on a tree, denoted RPS(*d*,*v*). For the root node, the RPS is denoted RPS(1, 1), and is trivial, consisting of only one object, {For any other node, the RPS consists of all nodes along the path from the root node, e.g., RPS(2, 4), and RPS(3, 3) = {1,2,3}. An object *u* is a neighbor of RPS(*d*,*v*) if *u* is a neighbor of any one object in RPS(*d*,*v*), and . On the basis of the initial segmentation, we construct the RAG, in which each node is an object, and any two neighboring objects are connected by a link. For an object *r* that is being considered for merging (e.g., object no. 1 in Fig. 3A), we build a merging tree denoted *T*_{r} to obtain all the merging candidates. Initially, *T*_{r} only contains the object *r* as root, and the tree depth *d* = 1. Then, *T*_{r} is grown as follows: for each node *v* at depth *d* of *T*_{r}, find all the neighbors of RPS(*d*,*v*) from the RAG. For each neighbor *u*, add it to the tree as *v*'s children with two exceptions: ; or , where *v*′ is another node that appeared at depth *d*. We then increment the depth *d* by 1, and repeat the above procedure until no more objects can be added to *T*_{r}. For example, at depth *d* = 2 of the tree illustrated in Figure 3A, the neighbors of RPS(2, 2) are objects no. 3, no. 4, and no. 5, so we add them as the children of object no. 2 at *d* = 3. Similarly, the neighbors of RPS(2, 3) are objects no. 2, no. 4 and no. 5, so we add them to the tree except for object no. 2, since RPS {1,2,3} already has appeared by the previous operation, i.e., adding object no. 3 to RPS(2, 2).

To reduce computation, we limit the combinatorial tree growth by setting an upper bound to the size of the RPS. That is, the total number of voxels contained in all objects in a RPS should not exceed a prespecified threshold. One simple idea is to set the threshold to the maximum number of voxels that an object can possibly contain. Therefore, this size bound can be fixed for any input image, independent of the initial segmentation used.

As a refinement to our previous work (22), we no longer impose a limit on the maximum depth that a merging tree can grow. The reason for the change is that the degree of over-segmentation generally varies widely depending on the initial segmentation, so a universal upper bound on the tree depth is ineffective. It has to be set to the maximum number of fragments that a single object can possibly contain in the initial segmentation, and any upper bound smaller than that will result in missed merging candidates. By removing the tree depth constraint and using the size bound instead, the depth of the merging tree is set adaptively, i.e., a more fragmented object will have a merging tree of greater depth. We have found this to be a better tradeoff that does not miss merging candidates.

###### Merging criteria

Once the merging candidates are computed, merging decisions can be made. To achieve this end, we need a statistical measure of confidence in a merging decision, i.e., a score measuring the likelihood of an object formed by merging several regions/fragments as being an intact object, such as nucleus. Since we are dealing with multiple types of objects concurrently, an automatic model selection must also be performed concurrently.

*Object Model Selection:* To classify the objects using the trained models, we adopt Fisher's Linear Discriminant Analysis (LDA) (33). We denote the object models , where *K* is the total number of object classes in the given image. The feature vector of an object is denoted , where *m* is the feature dimension. The basic idea of LDA is to transform the object features into a new space, usually with a lower dimension *d* < *m*, so that the transformed data among these *K* classes are as well separated as possible. Specifically, we want to find a matrix **W** of size *m* × *d*, such that the transformed features are well separated among *K* classes, but are scattered in a small region within each of these classes. Mathematically, the objective function for finding **W** can be expressed as:

- (1)

where is the between-class scatter matrix, **m**_{i} is the average feature vector for class *i*, and is the overall mean. The denominator term is the within-class scatter matrix, in which is the *j*th feature vector in class *i*, and *N*_{i} is the total number of sample objects in class *i*. By maximizing the criterion in Eq. (1), the solution **W** is composed of the largest eigenvectors of the matrix , and the new dimension is d. For example, in the case of two classes, Fisher's LDA projects the original object features into a new one-dimensional space. Having computed **W** using training samples as above, we can transform any objects into a lower-dimensional space, and classify them using a standard method, such as the Bayesian or *k*-nearest neighbor classifier (55). Figure 4 shows an example of LDA on one image containing two distinct object types—neurons and glia. The selected 2D features are intensity and texture. The transformed 1D feature provides class separation that is comparable to the 2D case.

*Merging Confidence Calculation*: using the classification procedure described above, we can classify an object—the class is denoted *c*. To measure the confidence score of a merging candidate **x**, we compute the probability that **x** belongs to the object model *M*_{c} using a Bayesian formula. Note that the feature vector **x** used here can be different from the feature vector used for classification, since the emphasis at this stage is on distinguishing fragments from intact nuclei. To simplify our discussion, we use the same notation throughout the paper. Since the variance of our selected object features varies considerably, we first normalize them such that every dimension has zero mean and unit standard deviation using the formula , where **u** and **σ** denote the mean and standard deviation, respectively (for convenience, we use the same notation before and after feature normalization). At this stage, Principal Component Analysis (PCA) can also be applied to remove the dependences among these features, and reduce the dimensionality. Let denote the new transformed feature vector, and *c* is the object class. Based on Bayes' rule (33), the merging confidence score can be expressed as:

- (2)

where is the a priori probability of model *M*_{c}, and is the class-conditional probability. The prior is preset based on the known relative abundance of the object classes.

To calculate , we previously used a parametric estimate assuming a multivariate Gaussian distribution (10, 22, 56), and estimated the distribution parameters from the training samples. In the present work, we eliminate the Gaussian assumption to permit greater generality in the modeling. Specifically, we adopt a non-parametric Parzen window method for estimating the probability density (57), as suggested by some authors (58, 59). For feature vector **x** and a sample size *N*, the estimated density function is given by:

where is *j*th sample, is the Parzen window function, and *h* is the window width. It has been shown (57) that converges to the true density function as the sample size grows, i.e., , if the window functions and *h* are properly chosen. We use the smooth Gaussian window in this work, given by:

where denotes the covariance matrix of *m*-dimensional random variable *z*. The window size *h* plays an important role in the estimate. When *h* is small, the influence of each training sample is limited to a small region. When *h* is larger, there is more overlap of the windows and the estimate is smoother. In this work, we set *h* to the distance from **x** to the *k*th nearest neighbor among all the sample points (32). Let denote the distances between **x** and the training samples in the increasing order, then . To reduce computation, we ignore samples whose distance during density calculation. In summary, the overall posterior probability (2) can be written as follows:

- (3)

where *N*_{c} is the sample size of class *c*, is the *j*th feature vector in class *c*, and is the covariance matrix of class *c*. The above probability estimate reflects the confidence of the object **x** being intact in its class, and it will be used as a score for region/fragment merging. Figure 3 shows a detailed illustration of the above based on actual data.

The above merging procedure terminates after applying the merging step to each object generated by the initial segmentation. A flowchart description of this procedure is shown in Figure 2. Upon completion, the object classification data is also available as a valuable addendum to the output. This is often an important problem to solve in its own right. In our experiments, both the segmentation and classification results are presented in the GUI for human observers to inspect and validate.