From microbes to numbers: extracting meaningful quantities from images


For correspondence. E-mail; Tel. (+33) 1 40 61 38 91; Fax (+33) 1 40 61 33 30.


Light microscopy offers a unique window into the life and works of microbes and their interactions with hosts. Mere visualization of images, however, does not provide the quantitative information needed to reliably and accurately characterize phenotypes or test computational models of cellular processes, and is unfeasible in high-throughput screens. Algorithms that automatically extract biologically meaningful quantitative data from images are therefore an increasingly essential complement to the microscopes themselves. This paper reviews some of the computational methods developed to detect, segment and track cells, molecules or viruses, with an emphasis on their underlying assumptions, limitations, and the importance of validation.


Since the first observation of ‘animalcules’ by Antonie van Leeuwenhoek in 1676 through handcrafted lenses, the study of microorganisms has been intimately linked to the history of microscopy. The recording of these first images relied entirely on the ability of the investigator to accurately capture observations by drawings. Modern microscopes, on the other hand, provide faithful digital snapshots or movies containing much more details than a human observer could possibly reproduce and easily allow to observe hundreds or thousands of microorganisms simultaneously. Visual inspection is an important first step, but the information contained in images generally needs to be quantified, for several reasons. First, because biological phenotypes are rarely black or white, differences must be assessed statistically based on numeric data. Second, visual inspection of images generated by large-scale screening projects (Neumann et al., 2010) is unfeasible. Third, more fundamentally, quantification is needed to confront observations to computational models of biological functions (Jaqaman and Danuser, 2006; Karr et al., 2012). For example, quantitative analyses of trajectories are required to determine if the erratic displacements of a molecule under the microscope are due to free diffusion, or if these motions are constrained by obstacles, or on the contrary driven by an active process (Saxton and Jacobson, 1997). Thus, images must first be converted into biologically meaningful quantities, such as the number of bacteria in a colony, the lengths of filopodia, or the trajectory of a molecule. To some extent, such measurements can be extracted from the images manually, but results are inaccurate, subjective, and unreproducible, and the process is tiresome, expensive, and does not scale well with the quantity of images. Automated computational methods, by contrast, offer the prospect of objective, reproducible, accurate, economical and scalable analysis. In the following, we provide a non-technical (and non-exhaustive) overview of some methods to computationally extract biological measurements from light microscopy data. We focus on the extraction of simple and generic quantities such as cell shape, molecule positions, and trajectories. Although such measurements are often only intermediates in a longer processing pipeline (for example, trajectories may be further analysed to compute diffusion coefficients), their extraction is usually the most difficult and rate-limiting step.

In general, extracting quantitative biological data from an image means solving an ill-posed problem, i.e. a problem for which there is no unique solution unless additional information or assumptions, not contained in the image itself, are used to discriminate among otherwise equally valid solutions. Countless methods have been developed and continue to be so in the field of computer vision, and a rapidly growing subset specifically for biological applications. To a large extent, different analysis methods are based on different assumptions (explicitly or not), the validity of which strongly depends on the application and data sets. We therefore highlight the main assumptions behind the methods and end with a brief discussion of the crucial task of validation.

Detecting and outlining cells

Perhaps the most straightforward goal when processing a cellular microscopy image is to identify and outline each individual cell. In image processing parlance, these tasks are called detection and segmentation respectively (words in bold are defined in Some image processing terms used in this review). Cell detection enables, for example, to measure the growth of a bacterial microcolony, while segmentation allows to measure cellular morphology, for example to determine red blood cells infected by malaria parasites, or to analyse cell polarization in chemotaxis, or defects in cell division (Di Ruberto et al., 2002; Sliusarenko et al., 2011).

The simplest and fastest detection and segmentation technique is intensity thresholding (Fig. 1a), which implicitly assumes that the intensities of cell and background pixels have well-separated distributions (histograms). Unfortunately, this is rarely the case because images are corrupted by noise, random intensity fluctuations due to, e.g. limited brightness or thermal electrons in cameras. Therefore, thresholding is often applied after a denoising step, e.g. by blurring the image with a Gaussian filter, thus spreading the intensity across neighbouring pixels. The classification of a pixel thus no longer depends only on its initial intensity, but also on that of the neighbourhood. The implicit assumption here is that objects and the background extend over several spatially connected pixels. However, even when images are well contrasted, thresholding-based segmentation can fail, as shown in Fig. 1a, where two Listeria bacteria in contact were segmented as a single object. Keeping touching objects apart is a recurrent problem in image segmentation. Region growing methods such as the watershed transformation can separate touching objects, but often lead to over-segmentation (objects are split into multiple pieces) and still require some hints from the image to accurately locate the boundaries, e.g. stronger image gradients. Such information is absent from the image in Fig. 1a. Yet, our eyes have no difficulty in recognizing the two bacteria despite the lack of a visible boundary because we have seen similar cells in isolation and therefore know their approximate shape and appearance. A simple computational implementation of this kind of pattern recognition are template matching algorithms. However, in its basic form, template matching does not explicitly distinguish cells from the background and, therefore, fails when the image region containing the cell strongly differs from the template image, as in fact happens when cells touch (Fig. 1a). Better results can be obtained with methods that incorporate explicit assumptions about the cell shapes independently of the background. As an example, we used a mathematical model of multiple fluorescent rods to detect several touching and overlapping Shigella bacteria. This was done via an optimization approach, in which the localization and orientation parameters of the rods were iteratively changed, until the modelled image closely resembled the real image (Fig. 1b) (Zhang et al., 2006). This approach is, however, restricted to rod-like objects and cannot be used for cells that display more plastic shapes, such as amoebae (Fig. 1c). For these highly deformable cells, a larger class of optimization methods called deformable models are more appropriate. In these methods, a mathematical curve (or surface, for 3D images) is iteratively deformed under the action of several ‘forces’ until the cell boundaries are captured (Leymarie and Levine, 1993). Some of these forces are computed from the image itself and tend, for example, to attract the contour towards regions of high intensity gradients, which most likely correspond to cell boundaries. Other forces are derived from assumptions about the object shapes, for example that the cell contour is smooth, as for amoebae (Fig. 1c) (Zimmer et al., 2002) or round, as for leucocytes (Ray et al., 2002) or filamentous, as for cytoskeletal filaments (Smith et al., 2010). Deformable models thus provide a flexible framework to incorporate assumptions on the segmented object suited to the application of interest. A popular class of deformable models use so-called ‘level set’ techniques. With level sets, iteratively evolving contours can naturally split or merge. This topological flexibility entails several benefits, such as automatic initialization (Fig. 1d) or automatic handling of cell divisions (see below), and comparatively easy extension to 3D images (Dufour et al., 2005; Li et al., 2008; Dzyubachyk et al., 2010). However, a significant drawback of these methods is their relatively large computation time and the lack of guarantee that the best solution (the global optimum) is found. Computation time can be reduced considerably using models represented explicitly, e.g. by triangular meshes (Dufour et al., 2011). Another promising approach popularized in recent years are ‘graph cut’ methods, which combine efficiency and global optimality, but have yet to be tested on a large range of microscopy data sets (Shi and Malik, 2000; Daněk et al., 2009).

Figure 1.

Extracting cell positions, shapes and trajectories from limages.

a. Segmentation of cells by intensity thresholding. Left: raw image of fluorescent Listeria bacteria. Right: segmentation result (each colour corresponds to a distinct segmented object). Note that the two touching cells (arrow) are not distinguished.

b. Detection and localization of cells by optimizing a model of multiple rods. From left to right: raw image of Shigella bacteria, raw image with detected rods shown as red bars, corresponding model image, and segmentation by thresholding for comparison.

c. Segmentation and tracking by active contours. A differential interference contrast image of Entamoeba histolytica cells is shown. Each superposed coloured curve is a distinct active contour. Inset: trajectories of five cells tracked by active contours.

d. Segmentation by level sets. A fluorescence micoscopy image of three Plasmodium falciparum sporozoites is shown with superposed level set contours (red curves). The level set evolves from an automatically defined initialization (left) to the final segmentation (right, green curves).

e. Simulated images of a fluorescent rod with added noise (SNR as indicated).

f. Quantitative validation of the rod-detection algorithm on the simulated images. CDR indicates rate of correct detection, RMSE indicates random mean square error (a measure of accuracy) in estimations of the rod's centre (x,y), orientation (θ) and intensity (A).

g. Fluorescence image of a yeast nucleus (two projections of a 3D image stack). The bright green dot (arrow) corresponds to a single GFP-labelled chromatin locus. The nuclear envelope is also stained with GFP, while the nucleolus is labelled with mCherry (red). Scale bar, 1 μm.

h. Qualitative validation of a method to map the nuclear territories of chromosomal loci in yeast by automated analysis of images such as shown in panel g.

The heat maps show the positioning probability of the spindle pole body (SPB) and a ribosomal DNA gene (rDNA) in a co-ordinate system defined by the centre of the nucleus (centre of yellow dashed circle) and the centre of the nucleolus (centre of dotted red curve). The SPB is known to be embedded in the nuclear envelope, and the rDNA in the nucleolus, which is known to occupy a crescent-shaped region located opposite the SPB. The computed maps agree with this expectation, thus qualitatively validating the method. Panels b, e and f are reproduced from (Zhang et al., 2006) (Copyright © 2006, IEEE), panel c (inset) from Zimmer et al. (2002) (Copyright © 2002, IEEE) and panels g and h from (Berger et al., 2008).

Detecting and localizing molecules, viruses and other particles

Images of cells display a large variety of shapes and appearances because microscopes can reveal many of their details. Molecules or viruses, by contrast, are generally smaller than the resolution of standard light microcopes (∼ 0.2–0.3 μm). As a consequence, they are visible only as a fuzzy light spot – called the point spread function (PSF) – which does not reflect any biological details, but merely the optical properties of the microscope. This, however, greatly simplifies computational analysis, because for a spatially isolated subresolution object, the image can be accurately described by a rather simple mathematical model. In this well-defined setting, algorithms can be rigorously designed to meet certain desired performance standards. For example, detecting a particle by thresholding the image blurred with the PSF can ensure that the probability of false detection does not exceed a given threshold. Once a particle has been detected, its position can be computed with an accuracy that is not limited by the size of the pixel, nor by the optical resolution, but only the signal-to-noise ratio (SNR). This is typically done by an optimization procedure, in which the function approximating the PSF is iteratively displaced until it most closely matches the image, although non-iterative algorithms are also employed (Cheezum et al., 2001; Parthasarathy, 2012). Theory provides a fundamental limit to the localization accuracy, which can serve as a reference for assessing algorithms (Thompson et al., 2002; Ober et al., 2004). In practice, localization accuracies of about 1 nm have been achieved for single molecules, allowing for example to resolve the individual steps of myosin along actin filaments (Yildiz et al., 2003). Another example where single particle localization algorithms are applicable is provided by genomic loci, which can be labelled and visualized individually and thus localized with high accuracy relative to other loci (labelled in a different colour) or other landmarks in the cell (Fig. 1g). Combined with automated detection of thousands of cells, such algorithms have allowed to map the territories of genes within ∼ 2 μm long Caulobacter crescentus bacteria (Viollier et al., 2004) and the similarly sized nucleus of the yeast Saccharomyces cerevisiae (Berger et al., 2008) (Fig. 1h). This led to high resolution views of the spatial organization of the genome in these organisms despite their small size.

In most cases, of course, molecules of interest are not spatially isolated from each other, thereby apparently preventing the use of single particle localization. However, some of the most powerful microscopy techniques that emerged in recent years allow to construct images of dense molecular samples with more then 10-fold gains in spatial resolution over standard microscopy (Moerner, 2012) (review by J. Xiao, this issue). These methods crucially rely on the accurate localization of single photoswitchable dyes in long image sequences by computational analysis.

Extracting trajectories

Light microscopy allows to monitor the dynamics of individual pathogens or molecules in vivo. Computational tracking of these moving objects can help understand the mechanism of, say, virus transport inside infected cells, bacterial chemotaxis, or protein diffusion inside bacteria (Berg and Brown, 1972; Deich et al., 2004; Brandenburg and Zhuang, 2007). A variety of tracking algorithms have been developed for this purpose (Sbalzarini and Koumoutsakos, 2005; Sergé et al., 2008; Smal et al., 2008; Godinez et al., 2009), often inspired by earlier work for radar data in military applications. In many cases, tracking is performed after an initial detection or segmentation step on all frames of the image sequence. Tracking then reduces to linking detected objects across consecutive frames, thereby constructing trajectories. This linking is rather straightforward for isolated objects such as a genomic locus (Sage et al., 2005), or when objects are well separated from one another and move slowly. Assuming that objects cannot move over large distances, detected objects can then simply be linked to their nearest-neighbour detected in the next frame. This task can be complicated by detection errors, i.e. false detections or false misses, caused by low SNR or because objects move out of the imaging volume. Linking also becomes ambiguous when the density of objects is high and/or the temporal resolution is low, such that objects can move more than the typical inter-object distance from one frame to the next. One approach to reduce this ambiguity is to use appearance features that distinguish objects from others and remain approximately constant, such as shape, orientation, intensity, or texture (Rasmussen and Hager, 2001; Jaqaman et al., 2008). Unfortunately, this strategy fails for most subresolution objects such as single molecules, where the images of two copies are essentially indistinguishable. The only information available for linking, then, are the trajectories themselves. Thus, assumptions about the dynamics of the particles are needed to define the most plausible trajectories and to perform linking accordingly (Genovesio et al., 2006; Jaqaman et al., 2008; Meijering et al., 2012). In algorithms designed to track aircraft or missiles, these assumptions can rely on the laws of gravitation: because inertia of these objects is large, trajectories are mostly smooth and predictable over short-time intervals. In most biological contexts, by contrast, inertia is negligible and Brownian motions, characterized by small jittery displacements, are ubiquitous. However, molecular motors, or bacteria propelled by flagella obey other, more complex, motion models. Because the type of motion is usually not well known a priori (it is often precisely the goal of tracking to determine it), sophisticated algorithms have been developed that assume multiple motion models between which particles can switch stochastically (Mazor et al., 1998; Genovesio et al., 2006; Li et al., 2008; Feng et al., 2011). Such algorithms attempt to learn the motion behaviour of a particle during periods when it can be tracked unambiguously (e.g. when the particle is alone in its vicinity) and assume similar behaviour in order to make linking decisions in more ambiguous situations (e.g. when the particle moves through a crowded region). One advantage of microscopy compared with radar is that tracking does not have to be real-time, but can be performed off-line (after imaging is completed). As a consequence, algorithms can take advantage of the entire image sequence to optimize the trajectories. However, as the number of frames increases, the number of possible trajectories increases exponentially and computation times become unreasonably long. Therefore, intermediate strategies are needed, which restrict the number of examined possibilities, trading off optimality against speed (Genovesio et al., 2006; Jaqaman et al., 2008).

Tracking is not always performed independently of detection. For example, with deformable models the detection or segmentation of an object in a time series often relies on an initial guess taken simply from the segmentation of the previous frame (Leymarie and Levine, 1993; Ray et al., 2002; Zimmer et al., 2002); similar strategies are sometimes used to track objects with template matching, when the template image is updated using the object's position computed in the previous frame. Another approach is to consider the entire movie as a single volume (i.e. treating time as an additional space dimension) and finding ‘tubes’ that traverse this volume from beginning to end while passing through regions most likely to contain objects (Bonneau et al., 2005; Padfield et al., 2009; Luengo-Oroz et al., 2012). One advantage of this approach is its relative tolerance to transient object disappearance; however, it may place stronger constraints on motion models and requires better temporal resolution.

Many tracking algorithms assume that individual objects remain as single entities throughout the movie. However, viral capsids can disassemble, vesicles can merge, and cells divide. These possibilities are accounted for in methods that allow objects to split or merge (Genovesio et al., 2006; Jaqaman et al., 2008). Deformable models are among the methods well suited to tracking dividing cells, since they can handle topological changes either naturally (for level sets) or with the help of additional procedures (Zimmer et al., 2002; Dufour et al., 2005; 2011; Dzyubachyk et al., 2010). Because cells can divide but usually do not fuse, special constraints such as repulsive forces are introduced to constrain the topological flexibility of level sets such that models are allowed to split but not to merge (Dufour et al., 2005; Dzyubachyk et al., 2010). Methods that can track dividing cells enabled the reconstruction of the lineage history of growing populations of bacteria (Wang et al., 2010; Sliusarenko et al., 2011), human stem cells (Li et al., 2008) or cells of the developing Caenorhabditis elegans embryo (Carranza et al., 2011).


Countless image analysis algorithms are now available either commercially or as free open-source tools, often bundled into larger software platforms (Table 1) (Eliceiri et al., 2012). It is therefore not difficult to find and use tools for segmentation, tracking or other analyses. Unless the software crashes, it will always provide some quantitative results. The key question is whether (and to what extent) the results are reliable. Automatically extracted numbers should not be automatically trusted. A related question is how well a specific algorithm performs compared with others. Validation and benchmarking are central issues, that can be addressed with three complementary approaches.

Table 1. Some general-purpose image processing platforms used in biological imaging
Software nameWebsiteAccessibility
ImageJ Free, open-source
FiJi Free, open-source
Icy Free, open-source
CellProfiler Free, open-source
Bioimage XD Free, open-source
Imaris Commercial
Volocity Commercial
Metamorph Commercial

Whenever possible, extracted data should be subjected to a visual quality control by a human expert, for example by inspecting cell trajectories superposed to the raw images. When this is not feasible on all images, it should be done on a representative, or at least randomly chosen, subset. Such inspection can reveal gross errors and help set processing parameters such as intensity thresholds. With more efforts, manual analysis can provide an approximate ground truth allowing semi-quantitative validations, but this is not suited to accurately determine an algorithm's performance, such as detection rates in low SNR images of single molecules.

Simulated images, on the other hand, provide a perfectly known ground truth that can be generated massively at negligible cost, allowing to rigorously and extensively characterize algorithms using quantitative performance metrics. For example, artificial images shown in Fig. 1 were used to quantify the detection rate and the localization accuracy of our rod detection method (Fig. 1f) (Zhang et al., 2006). By varying simulation parameters (such as the SNR, object shapes, speeds, etc.) users can more clearly define the conditions under which the method works well and those under which it fails, and quantify uncertainties. For example, simulation results in Fig. 1f indicate that efficient detection of fluorescent rods requires an SNR of at least 2, and that for SNR less than 5, random localization errors will exceed 10 nm. Such information can be very important in drawing biological interpretations from the extracted data. For example, neglecting random localization errors in the analysis of normally diffusing membrane proteins can lead to erroneous interpretations of hindered diffusion (Martin et al., 2002). Of course, simulations are only meaningful if they are realistic enough to capture the performance-limiting features of real images. A method that works well on simulated images may still fail on real images. But a method that fails on simulated images will almost always fail on real images too. Thus simulations provide necessary, but not sufficient, elements of validation. Validation and benchmarking of algorithms therefore benefit from tools that simulate realistic images (Boulanger et al., 2009; Rajaram et al., 2012).

Because it is hard to guarantee that simulations are sufficiently realistic, the most useful validations are those based on real data. While fully quantitative validations are impossible in absence of an exact ground truth, it is often possible to verify qualitative outcomes based on some previously known phenotype. For example, to validate an algorithm that maps genomic loci in the yeast nucleus, we applied it to two nuclear structures whose approximate localizations were already known from electron microscopy and verified that the recovered maps agreed with expectations (Fig. 1h) (Berger et al., 2008). Mutants with previously characterized phenotypes can provide similarly important data for validations. Thus, a useful resource are curated databases containing both raw images and corresponding, previously known phenotypes (Ljosa et al., 2012).

Absolute certainty that an automated system performs as expected is difficult, if not impossible to reach. However, by combining the three validation approaches outlined above, users of image quantification methods can reasonably well define when to trust extracted numbers and when not to.


Computational methods that automatically extract meaningful quantitative information from raw images are playing increasingly central roles in modern biological research. Many tools already exist (Table 1), yet many more need to be developed to cope with the rapidly increasing complexity, variety and sheer mass of images produced by old and new microscopy techniques. Users should be aware of the capacities of such tools, but at least equally important, of their underlying assumptions and limitations and the conditions under which they can, or should not, be applied. Some understanding of how methods work is important in this respect. For example, because the method used to detect Shigella bacteria in Fig. 1b was designed for rod-like shapes, it may yield unexpected results when applied to curved C.crescentus cells. Less obvious biases may exist when extracting the positions or trajectories of individual particles, for example if particles do not obey any of the assumed motion models. The extraction of quantitative information from noisy or incomplete images can be seen as an interpretation, which relies in part on prior information external to the data. Most extraction methods provide a unique interpretation, even when images are ambiguous, e.g. when many objects are tracked in a crowded volume, or when localizing very dim molecules. Even if this interpretation optimally reconciles the image with the assumed prior information, other interpretations may be nearly as valid (i.e. there may be several minima of similar values in the optimization problem). Ideally, extraction methods should provide a range of interpretations and assign probabilities to each based on prior knowledge, thus providing a sense of the image's degree of ambiguity and the uncertainties associated to the extracted data. Such methods would be very useful by allowing researchers to easily distinguish between the certain and the less certain and help define which improvements in image acquisition are most needed to reduce data ambiguity. Although relevant approaches exist, they have yet to be translated into powerful and user-friendly tools adapted to biological imaging. Thus important work is still laid out for developers of computational methods. In the meantime, users of existing algorithms should learn their power and limitations and take validation seriously. Under these conditions, automated image quantification can provide a powerful computational lens into the workings of life, as important for biological research as the microscopes themselves.


I thank S. Seveau, F. Frischknecht and N. Guillén for images in Fig. 1 and two anonymous referees for useful comments. Work in my group is supported by Institut Pasteur, Fondation pour la Recherche Médicale (Equipe FRM), Région Ile-de-France and Agence Nationale de la Recherche (Grants ANR-09-PIRI-0024, ANR-2010-BLAN-1222-02, 2010-INTB-1401-02).

Appendix: Some image processing terms used in this review

  • Detection: an object (e.g. a cell) is identified in one or more regions of an image.
  • Segmentation: the image is split into distinct regions (segments), corresponding to distinct objects in the image.
  • Tracking: trajectories of moving objects are reconstructed.
  • Intensity thresholding: a segmentation method in which a pixel is determined to be part of an object (rather than part of the background) if its intensity exceeds a threshold.
  • Template matching: a detection method that finds locations in an image most similar to a small target image (the template).
  • Watershed transform: a segmentation method where the image can be considered as a landscape (with altitude given by image intensity) in which an imaginary water level rises simultaneously from multiple low-altitude points; barriers are created where distinct lakes merge, defining the boundaries between segmented regions.
  • Optimization: a mathematical function depending on one or more variables (e.g. the unknown co-ordinates of a particle) is defined in such a way that low values of the function are obtained for the correct values of the variables (e.g. the sum of the squared differences between pixels of the actual image and a model image). The values of the variables that minimize this function are the solution of the optimization problem. They are usually found by progressively changing the variables until the function reaches a minimum (ideally the global minimum, i.e. the minimum with the lowest value).
  • Deformable models, active contours: a class of segmentation methods in which object shapes are represented by flexible curves or surfaces, which evolve in time until they coincide with object boundaries in the image. This evolution is determined by equations that are usually derived from an optimization function.
  • Level set: in some deformable model methods, the boundary is not represented explicitly by its co-ordinates, but implicitly through a function that can have positive or negative values depending on the position in the image. The model boundary is defined by the set of points where this function changes sign (zero level set).