Topological analysis of the distribution of proteinaceous and nucleic acid components of the cell, in particular mammalian cell nuclei, is helpful in understanding cellular functions in the state of health versus disease (1–10). Correlations between the distribution of cellular proteins and/or fractions of nuclear DNA and certain diseases has allowed mammalian cells to be utilized as useful models in the search for appropriate disease treatment, in the context of systems biology (11, 12). With the availability of today's more advanced imaging approaches (including confocal laser scanning microscopy, two-photon excitation microscopy, high content cell imaging, and automated tissue scanning), high resolution optical imaging has evolved into an essential tool for moving new chemical entities through the pharmaceutical discovery pipeline utilizing cell-based assays. Imaging advantages for drug discovery are realized through the ability of high-resolution microscopic imaging to measure the spatial and temporal distribution of molecules and cellular components, which is vital to understand the activity of drug targets at the cellular level. Thus, microscopic imaging applies to the preclinical stages of drug discovery for exploratory studies, target identification and validation, lead generation and optimization, and biomarker discovery (13). Drug efficiency can be measured by the uniformity of cellular response upon drug application, focusing on what percentage of cells in a population has reacted to the applied drug. More interestingly, compound effects can be evaluated by imaging changes in the relevant proteins' distribution patterns, and or nucleic acid loci which function as drug targets. This new, cytomic approach (1, 2) is gaining momentum by decreasing attrition in the very costly process of drug development.

Epigenetic changes, such as DNA methylation and histone modification, play a key role in cellular differentiation (14–16). Aberrant global methylation patterns are associated with several cancer types. Methylation pattern imbalances in cancer cells include genome-wide hypomethylation and localized aberrant hypermethylation of CpG dinucleotides (CpG islands) in promoter regions of tumor suppressor genes (17, 18). The reversible nature of epigenetic aberrations constitutes an attractive therapeutic target, and epigenetic cancer therapy with demethylating agents has already shown to be promising (19). Demethylating agents cause structural reorganization of the genome in cell nuclei, as they not only alter the DNA methylation load but also influence its spatial distribution (20, 21). Therefore, in a previous image-based cytometrical approach, we delineated MeC and overall DNA in AtT20 mouse pituitary tumor cells by means of immunofluorescence, and revealed significant differences in the patterns of MeC and (DAPI)-derived signals between untreated and a subpopulation of these cells treated with 5-AZA (22), a demethylating agent that has been reported to change methylation patterns on a genomic scale (23). Therefore, image-based assessment of DNA methylation patterns may provide a powerful technique for characterizing mammalian cells during differentiation and their status of health versus disease, as the underlying molecular processes involve large-scale chromatin reorganization, which is visible by light microscopy (24–29).

Today's advanced cellular imaging systems can produce multispectral two-dimensional (2D) and 3D data in quantities that often require machine vision support to assess and quantify the degree of individual cell similarity within an entire cell population based on cellular features. Topological analyses typically necessitate the segmentation of cellular regions of interest (ROI), including the entire cell and/or subcellular compartments such as the nuclei. This process involves the delineation of the ROI, recognition of residing patterns, and statistical quantification of these patterns with dedicated algorithms. So far, nuclear features have been analyzed in one of the following three ways: (i) comparing a known or unknown pattern with a reference pattern using statistical tests; (ii) classification of patterns through supervised learning, utilizing decision trees, support vector machines and neural networks; or (iii) clustering, in which the distance between points in feature space is used as a discriminating factor (30). The features are measurements reflecting complete cellular or just nuclear morphology, fluorescence intensity, and texture. For example Strovas et al. normalized the intensity of a variant of green fluorescent protein from methylotrophy promoter (P_{mxaF}) of single cells to their size, in *Methylobacterium extorquens* AM1 culture. This served as a descriptor of cell-to-cell heterogeneity in growth rate and gene expression in response to antibiotics (31). Knowles et al. measured protein distribution through radial bright features within nuclei to identify changes in tissue phenotype (32). Lin et al. employed linear discriminant analysis with nuclear models that were constructed from userprovided training examples to distinguish different cell types (33). Markovian and fractal features (34), Zernike moments, co-occurrence matrices (35) and features generated by Gabor transformation have been commonly used in recognizing subcellular structures (36). Yet, the sensitivity of texture features depends strongly on the optical system setup, such as focusing, image magnification, and object positioning. In the description of cellular structures, the textural, morphological, and intensityfeatures are usually complementary.

The use of features in the quantitative description of 3D nuclear architecture is employed in many biological and medical applications, ranging from *in situ* studies of DNA, protein localization and migration in living cells, exploration of the structural aspects of cell division to investigations of the role of nuclear alterations in pathology (6–10, 37, 38). These approaches mostly consider the statistical distribution of one target, a protein or DNA fragment (single gene copy or genomic region) to be analyzed. In those cases, a reference pattern detected under specific conditions is usually defined and compared with protein/DNA distribution patterns that result from changes in culture conditions. However, image-based cytometry, which readily considers two or more parameters at the same time, would largely benefit from algorithms that can statistically assess patterns of multiple cellular targets. This is especially valuable in the discovery of pathways that can be targeted in drug discovery. Here, we report the development and application of a novel comparison-based approach that provides a statistical measurement on the two classes of DNAs; MeC and DAPI-positive global DNA, as nuclear targets. The algorithm compares the relative distribution of signals derived from these two targets (from two colorchannels), projects them onto scatter plots, and then measures the degree of similarities between the plotted signal distributions of cells within a population (22). This method offers a way to evaluate cellular response to external factors such as drugs and changes in culture conditions via dissimilarity assessment of relevant cellular structures.

Similarity between two data objects is perceived through measurement of the objects proximity in a multi-dimensional space, and is used to express the objects' relationships within a cluster or between clusters obtained through a partitioning process. Distance or similarity measurements between objects forming a cluster have been defined as equivalent notions (39); however, appropriate metrics are required to identify objects with similar or dissimilar profiles. Commonly applied similarity measures can be organized into three groups according to object representation: (1) point-based, including Euclidean and Minkowski distances, (2) set-based including Jaccard's, Tanimoto's, and Dice's (40) indices, and (3) probabilistic with Bhattacharyya (41), Kullback-Leibler's, and correlation-based Mahalanobis (42) distances, respectively. In many practical applications the objects are described by discrete features, by which the similarity is assessed (39). Furthermore, the sample homogeneity as cluster quality measure can be perceived as an averaged pairwise object similarity (36, 39).

We utilized the Kullback-Leibler's measure with its properties in our study. The background of this approach is introduced here. Let us consider a random discrete variable *X* with probability distribution *p* = {*p*_{i}}, where *p*_{i} is the probability for the system to be in *i-*th state. The measure log(1/*p*_{i}) is called the unexpectedness or surprise (43). Two extreme states can occur: if *p*_{i} = 1, then the event is certain to happen, and if *p*_{i} ≈ 0 then the event is nearly impossible. Now, consider two discrete distributions *p* = {*p*_{i}} and *q* = {*q*_{i}}, where *p*_{i} and *q*_{i} are the probabilities of occurrence of the *i-*th state in a set of system states. The difference: log(1/*q*_{i}) − log(1/*p*_{i}) defines change of unexpectedness of the probability *p* with respect to probability *q*. Averaging the unexpectedness of the events over *p*_{i} leads to:

where: *H*(*p*) is the negative of Shannon's entropy (44) and *K*(*p*,*q*) is the measure of information referred to as inaccuracy (45). *KL*(*p*‖*q*) is nonnegative and delimited by the following constraints: , and .

Function *KL*(*p*‖*q*) is known as the Kullback-Leibler's divergence (46) of information linked to two probability distributions *p* and *q*. This is also a measure of how different two probability distributions (over the same system states space) are. Typically, *p*_{i} represents data, observations, or a precisely calculated probability distribution, and *q*_{i} represents an “arbitrary” distribution, a model, a description or an approximation of *p*_{i}. Following (46) it is assumed that: (i) 0log(0/*q*_{i}) = 0; and (ii) terms in Eq. (1) where the denominator is zero are treated as undefined and are neglected in order to provide absolute continuity of *p*_{i} with *q*_{i}.

The Kullback-Leibler's divergence can be used to measure the distance between various kinds of distributions (47). For instance, it has been employed in medical and systems biology applications including registration of image datasets (48), image segmentation (49), temporal analysis of gene expression (50), clustering of gene expression data (51), and similarity analysis of DNA sequences (52).

The objects' homogeneity assessment is then performed in two steps. First, distance-based similarity is measured between the combined 2D MeC/DAPI histograms of all nuclei and the histogram of each individual cell nucleus. Second, each nucleus (object) in the population is assigned into one of the predefined categories based on similarities.

Assessment of cell population homogeneity is not a trivial task as it is constrained by the imaging modalities and the cell type itself. In a typical setting, the evaluation of cellular response to external factors such as drugs can be achieved with a comparison of the treated population to an untreated (reference) population. However, in this work we present a method to assess each population by itself, in isolation. These populations were analyzed *a posteriori*, (i.e., without prior knowledge of relevant structural information). Regardless, our approach also allows for a global assessment of cellular patternsamong populations.

In 3D image analysis of nuclei, the segmentation of the nucleus and the quantification of residing features are the most vital components. A common scheme in existing approaches is the watershed algorithm followed by extraction of pertinent features (53–58). The aforementioned solutions require the extraction of tens of features for clustering or classifier training for the further application of a pattern recognition task. Hence, an algorithm utilized for feature extraction and pattern recognition, may be restricted by the morphology of a specimen, in which some features are redundant whereas others are irrelevant. Although some methods for cellular detection and segmentation have been proposed, a general-purpose system that can perform analysis and recognition tasks for a variety of confocal microscope images withoutnecessitating an approach modification or system training(related to the target-specific applications) is still not available.

The main aim of this work is to develop a software system that can be robustly applied to the topological analysis of nuclear targets, such as MeC and DAPI, which will provide useful parameters in the elucidation of epigenetic mechanisms as well as the evaluation of epigenetic drugs tested in cultured cell models. The algorithm developed combines the three major tasks: (1) automated segmentation of nuclei in a cell population, (2) subsequent nuclear pattern extraction, and (3) distance-based statistical measurement of cell dissimilarity using Kullback-Leibler (K-L) divergence. This method considers the strength of statistical evaluation of intra-nuclear MeC/DAPI patterns, especially valuable when cell population homogeneity is difficult to be assessed due to lack of standardized reference and sample size. In this study, we evaluate the potential of using an unsupervised 3D seeded watershed algorithm coupled with K-L divergence measurement to calculate the dissimilarity of mouse pituitary folliculostellate TtT-GF cell response to treatment with the demethylating agents, 5-AZA and OCT. This response was quantitatively measured and displayed as the differential co-distribution of MeC/global DNA signals in treated and untreated cells. A comparison of K-L divergence with other commonly used similarity metrics demonstrates the superior performance of our method.