Computational analysis of cellular and subcellular structures aims to provide quantitative information (such as the measurement of physical quantities) that can be used to generate and test hypotheses related to normal and pathological eukaryotic cell characterization. Such studies have long been a major topic of biomedical research (see, for example (1, 2)) and advances in microscope image acquisition systems and sophisticated image processing algorithms over the past decade have established computational analysis of cell images as a important component of cell biology research (3–6). Amongst many other interesting topics, image-based analysis of nuclear morphometry is a key problem due to the important roles that the cell nucleus plays in biology. Nuclear morphology, and associated changes, have been studied in conjunction with cellular movements (7), cancer (8, 9), Hutchinson-Gilford progeria (10), as well as gene expression and protein synthesis (11), to name a few.

Both visual and computational approaches have been applied in characterizing nuclear morphology. For example, nuclear morphology can be visually rated on an objective scale of “normal” and “dysmorphic” (12) but this limits both reproducibility and the number of samples that can be tested. Alternatively, quantitative descriptors of nuclear morphology can be computed from images. Since it is difficult to fully control all physical and biological sources of variation in common experimental setups (e.g., cell cycle phase, focal plane position) most studies are statistical in nature: quantitative nuclear shape and size information is analyzed for significant, broad trends. This information is then analyzed in conjunction with different properties of cells or tissues with the goal of elucidating important relationships and increasing our understanding of fundamental biological concepts. To date, the vast majority of nuclear morphology studies have been based on the extraction of parameters related to shape and size and statistical analysis of their respective means, variation, and covariation (see (2, 8, 11, 13–16) for examples). While such approaches have produced useful results in distinguishing healthy and pathological tissues, as well as providing useful representations of shape distributions, important recent advances in the theory of shape statistics (17, 18) could increase the accuracy of the computations.

One of the key concepts arising from such theory is that shape spaces are inherently nonlinear and standard formulae often used for computing sample means, variances, etc. need to be modified to account for the nonlinearities. We use the following example to illustrate this concept. We first construct a distribution of shapes based on a medial axis parametric representation and show that the simple (Euclidean) average of medial axis coordinates does not necessarily represent the correct mean.

Let *a* represent a real valued random variable uniformly distributed in the closed interval [0,1/2]. A medial axis (a set of 2D coordinates representing a curve on the plane) is constructed based on the random variable *a* as

with *s* ∈ [0,*a*]. The boundary of each object is constructed by traveling a constant distance *d* in the normal direction from the medial axis *y*(*s*). Part A of Figure 1 shows a sampling of shapes created from such a model. Each shape is represented by the medial axis as well as its boundary. Morphometric studies aim to recover information about the shape distribution by extracting and analyzing information from tens, hundreds, or thousands of images containing the shapes of interest. Following this approach, one could be tempted to simply extract the medial axis model by fitting such a model to each shape. Note that for the purposes of this demonstration we do not consider algorithms for extracting medial axis representations, but rather assume these are given. Let *z ^{k}*(

*s*) represent the medial axis extracted from the

*k*th shape. Assuming that the underlying geometry is a Euclidean vector space, an “average” medial axis is simply given by

where *N* is the number of figures or shapes available. The Euclidean average of the medial axis distribution defined in Eq. (1) is shown in Figure 1. For comparison purposes, the known mean shape (defined by the medial axis representation in Eq. (1), with *s* ∈ [0,E{a}]) is also shown in Figure 1, part B. It is clear that the average shape computed by assuming an Euclidean vector space as the underlying geometry is incorrect; in fact, it produces a shape which cannot be represented using the model defined in (1). In portions where the medial axis is approximately linear, both the Euclidean and the correct mean are close to each other. In parts where the medial axis does not closely approximate a straight line, however, the Euclidean average can produce large errors. This is due to the fact that medial axis parameters are not elements of an Euclidean vector space and therefore standard formulae for computing means, variances, covariances, etc., do not apply (19). In fact it can be shown that the elements of medial axis representations belong to a nonlinear manifold (the Riemannian symmetric space) and standard statistical analysis methods such as principal component analysis (PCA) have to be modified to account for an appropriate notion of distance within the manifold (19). A more detailed explanation of this particular example is provided in the appendix together with the description of an alternative method soon to be described.

The field of statistical shape analysis (17, 18) has long provided important tools for medicine and biology. We mention briefly a few of the major research directions in the area and their potential applications to nuclear morphometry. The landmark-based work pioneered by Kendall (20) and Bookstein (21) can yield valuable results, but it is not directly applicable to nuclei because corresponding landmarks between different nuclei are difficult to ascertain (although recent advances may circumvent such difficulties (22)). Shape analysis methods via medial axis representations (23) is also popular, and recent work by Fletcher et al. (19) has provided a mathematical basis for performing PCA based on medial axes extracted from image data. Medial axis representations, however, can be cumbersome to extract, especially for complex shapes (shapes with numerous “blobs” may require more than one medial axis) or for three-dimensional shapes.

An attractive alternative for statistical analysis of shapes is provided by the computational anatomy (CA) framework, where the goal is to quantify shape differences by analyzing the spatial transformations that map different elements of a population (24–26). Here the definition of a shape space is linked to the orbit of a template image (that is, the set of images composed through deformations of a template image) under smooth and invertible spatial transformations (diffeomorphisms). The framework can be extended to handle unlabeled landmarks, contours, as well as dense imagery in arbitrary dimensions and is therefore a viable candidate for modeling distributions of nuclear morphology. Here we show how tools derived from the CA framework can be used to characterize important features of nuclear shape. More specifically, using the large deformation metric mapping (LDMM) framework of Miller and coworkers (25, 27), combined with multidimensional scaling (MDS) (28), we offer methods for performing interpolation between two nuclear shapes, measuring “geodesic” distances between them, as well as computing the most representative (mean) shape from a distribution of nuclei. Although this methodology has been previously applied to brain imaging studies (see, for example, (29, 30)), we believe the work described here is the first to investigate the application of similar methods to nuclear morphology. Diffeomorphic methods have been recently applied to register nuclei in sets of either 2D or 3D images (41), but not as an approach to characterize nuclear shape distributions. Issues particular to nuclear morphology study, such as the lack of a standardized orientation, and initialization, are discussed. In addition, by combining classical MDS with distance measurements originating from the LDMM framework we provide methods for estimating the intrinsic dimension (number of free parameters), as well as methods for visualizing the most significant variations, of a nuclear shape distribution. The combination of the LDMM-MDS frameworks constitutes a novel approach for characterizing the nonlinear properties of biological shape distributions and are in stark contrast to previous methods based on the analysis of deformation models using PCA (see, for example, (31)).