Fluorescent microscope imaging technologies have developed at a rapid pace in recent years. High-throughput 2D fluorescent imaging platforms are now in wide use and are being applied on a proteome wide scale. Multiple fluorophore 3D imaging of live cells is being used to give detailed localization and subcellular structure information. Further, 2D and 3D video microscopy are giving important insights into the dynamics of protein localization and transport. In parallel with these developments, significant research has gone into developing new methodologies for quantifying and extracting meaning from the imaging data. Here we outline and give entry points to the literature on approaches to quantification such as segmentation, tracking, automated classification and data visualization. Particular attention is paid to the distinction between and application of concrete quantification measures such as number of objects in a cell, and abstract measures such as texture.
Unlike many imaging methodologies such as X-ray crystallography that are intrinsically analytic and mathematical, fluorescent microscopy has been slower to take advantage of and to develop novel methods in analysis and quantification. This may in part be because of the fact that the results may immediately be 'seen’ and hence quantification may not appear essential. The diversity of users such as cell biologists, cancer researchers, neuroscientists and plant biologists also ensures that the field further suffers from a literature that is scattered in specialist journals and publications unlikely to be read by the cell biologist.
Hence, while there are several widely known quantification methodologies such as co-localization analysis, fluorescence resonance energy transfer (FRET), fluorescence correlation spectroscopy (FCS), fluorescence recovery after photobleaching (FRAP) and fluorescence lifetime imaging microscopy (FLIM), the range of methods applicable to fluorescent imaging of cells is much broader than may be apparent. In response to the recent advances in imaging technologies, new methods are being developed in automated classification, machine learning, image statistics, clustering, visualization, modelling, feature extraction, segmentation and object tracking to firstly deal with the scale of the data becoming available, but more importantly to find new ways to extract the information contained within the data sources and fully exploit their potential.
There is a wide range of reasons to want to quantify fluorescent imaging. One of the most important is the need to remove potential (unconscious) bias in data selection. A typical microscope may well contain upwards of 1000 cells, the majority of which will not be examined in detail when observing by eye, for instance, the localization of a protein. As well as selection bias, important data may be missed. Of those 1000 cells, a small proportion might be exhibiting a distinct or multiple localizations. If only 1–2% of the available data are being sampled, such effects will in all likelihood be missed and may have been the more interesting result. Similarly, quantification of large numbers of images gives the statistical power to detect subtle effects when comparing experiments. Upon stimulation of a pancreatic cell with sucrose, there might be a 5% drop in the number of insulin granules in the cell as the insulin is released into the intracellular environment; an effect that would be visually undetectable. However, with an automated granule counting assay, 100's of cells might be quantified under a variety of treatments and the compounds found that subtly change this response. More broadly, with whole proteome localization imaging now a reality (1), automated quantification and classification are becoming essential to deal with the growth in imaging data and remove the bottleneck of manual inspection. In the longer term, quantification is needed to enable the sorting, comparison and integration of the valuable data contained in the millions of fluorescent images that are now being generated each year. Just as database, searching and quantification methodologies have added great value to the sequencing revolution, similar tools for imaging will extend the range of biological conclusions that can be made. Finally, fluorescent imaging is potentially a rich data source for mathematical modelling. With the ability to observe and quantify multiple proteins simultaneously in a live cell context over time and under a range of conditions, there is now the data to begin to model and understand the systems biology of the cell.
The purpose here is to outline the main approaches and progress that is being made in the analysis of subcellular imaging, give entry points to the literature, and to identify some of the points at which further research is required. To a large degree, image analysis begins once an image set has been captured. In the following, a range of analysis options that might be applied to such an image set will be described. However, there is strong need for analysis options to be considered before the images are acquired. Firstly, as has been observed: 'tweaking microscope settings for 5 min could save months of tweaking algorithms’(2). But more importantly, awareness of the analysis options changes the range of experiments that will be attempted and conclusions that can be drawn. Further, simply posing the question of ‘how could the difference be quantified?’ can give invaluable insights into the data.
Abstract and concrete image quantification
Within fluorescent image analysis there are presently two main approaches to quantification measures. The first, and most well known, might be called concrete statistics. These include counting measures such as the number of structures in a cell, the volume occupied by a structure or the ratio of fluorescent intensity between regions. At the other end of the spectrum are abstract statistics to measure image content. These are abstract in the sense that they measure properties of an image such as texture or morphology, rather than the more concrete counting measures. One such set of image statistics are the Haralick texture measures (3), the essence of which is to quantify correlations between pixels at a given distance and angular separation.
The advantage in concrete measures is that it is immediately apparent what is being measured, and thus it is possible to make statements such as 'there was a 50% reduction in the count under treatment with compound X’. However, the choice of concrete measures is typically based on the expectations of the researcher, and hence unexpected distinctions may be missed. In contrast, abstract measures such as texture make fewer assumptions and tend to be more generic in the range of imaging that can be distinguished. But while abstract statistics may distinguish a wider range of experiments, what the actual difference is can be less clear. In the next section concrete statistics generation and their applications will be outlined, followed by a section on applications of abstract statistics.
Segmentation and quantification
Quantification from fluorescent imaging involves several stages, each of which may influence the results in another. A typical workflow might include sample preparation, image acquisition, image filtering to remove noise (4) or background, region or edge detection, quantification and data analysis (Figure 1). A good overview of many of the issues in each step may be found in (5). Of these steps, segmentation, that is the process of partitioning an image into multiple regions, typically with the aim of identifying objects or boundaries, is one of the more challenging. Once segmented, statistics such as number of objects, object sizes and intensity ratios are typically straightforward to extract.
While segmentation in general is a developed field, so that for instance many modern digital cameras will identify and automatically select and correct ‘red-eye’ in portrait photographs, segmentation of fluorescent imaging of cells is still very much a developing research area. This is in part because of technical difficulties such as the relatively low signal to noise ratios of fluorescent imaging and photobleaching. But the highly dynamic nature of subcellular structures and protein recruitment to those structures with radical variations and changes in apparent morphology also mean that methods of segmentation based on expectations about the morphology and light characteristics of the objects to be identified are rarely applicable except in cases such as regions like the nucleus in which the geometry is simple. Hence segmentation of cellular fluorescent imaging is largely based on either intensity threshold methods to select regions or intensity difference methods to find edges.
At the cell level, robust systems have been developed to automatically select individual cells from high-throughput 2D imaging, identify nuclear subregions and quantify proteins of interest within the regions found to distinguish phenotypes (6). At the nuclear level, while improving nuclear selection from 2D imaging is still an active area of research (7), automated nuclei selection has been applied to areas such as cell cycle regulation (8) and distinguish proliferating and malignant cells (9). At a finer grained level, considerable research has gone into segmenting and quantifying individual subcellular structures from imaging. Because of their relative structural simplicity and hence their amenability to techniques such as ‘Mexican hat filtering’ (Figure 2), there has been some success in quantification of punctate structures such as endosomes, peroxisomes and nuclear speckles (10,11). Mexican hat filtering (or Laplacian of Gaussian) is an edge detection method that can be ‘tuned’ by parameters to detect edges at different scales. Another useful technique to separate objects based on the topology of the image is watershedding (Figure 2). In this, discrete regions are found by ‘flooding’ from intensity peaks, and only joining regions if the ‘valley’ between them is sufficiently shallow. Such techniques are now standard tools in fluorescent image analysis packages such as ImageJ (see Table 2) and CellProfiler (12). For the reasons outlined above, there has been less success in segmenting non-punctate subcellular structures beyond thresholding, edge detection and watershedding schemes, although neurite segmentation is an exception (13). A wide range of methods with references can be found in Table 1 of (14).
Table 2. A selection of open source software tools for fluorescent image analysis and storage.
Image analysis and quantification with many plug-ins
Table 1. A selection of commercial software tools for fluorescent image analysis and storage.
Each of the image analysis tools supports are wide range of applications such as segmentation, intensity quantification, tracking and co-localization for multidimensional fluorescent imaging as well as specialized applications such as cell migration analysis, FRAP analysis and volume rendering for visualization.
Imaging in two dimensions can be problematic for segmentation as objects that apparently overlap may be spatially separated in the third dimension. Hence 3D fluorescent imaging provides both opportunities in a more detailed view of subcellular structures and a greater amenability to segmentation and quantification, and is a developing area of research for segmentation. Using segmentation techniques such as gradient flows and coupled active surfaces nuclei may readily be segmented and quantified from 3D fluorescent imaging (15,16). Similarly, tools exist to count and quantify punctate structures in 3D imaging via watershedding techniques (17). Further examples and tools for segmentation and visualization of 3D fluorescent imaging may be found in (17,18) and references therein.
At present there is no universal solution to segmentation of fluorescent imaging. For the microscopist, the usual approach is to experiment in software such as ImageJ that supports a range of methods. If simple approaches such as thresholding fail because of background intensity variation or highly clustered objects, then edge detection or watershedding methods might be tried. If these fail, a literature search may turn up software methods that have been specifically designed for the imaging of interest. In some cases, small changes in experimental protocol or image capture setting may improve segmentation results. In this way, fluorescent image segmentation is still an experimental science involving an iterative process of testing and alteration of computational and experimental methods.
Classification and testing for difference
In understanding the functions of the tens of thousands of proteins being found by the sequencing revolution the most fundamental question is what does the protein do? The first steps towards this are where is the protein in a cell? and what does it interact with? Towards answering these, modern automated fluorescent microscopy offer an enormous depth and coverage of information: depth in that a single well may contain over a thousand cells that can be imaged in a few tens of seconds; and coverage in that whole proteomes may now be imaged. However, the number of images so obtained is overwhelming. In 2003, some 75% of the yeast proteome (4156 proteins) was screened and manually classified into 22 localizations (1). Further, it has been estimated that a complete human genome RNAi screen could be imaged in approximately 2 weeks, but would give rise to 106 images (19).
As a consequence of the wide range of phenotypes, concrete image statistics are not well suited to general problems of distinguishing subcellular imaging. Hence considerable effort has gone into abstract measures of fluorescent imaging. Conrad et al. (20) tested 448 different image features for their ability to distinguish images of subcellular localization and found that texture measures had the best performance in distinguishing a range of phenotypic imaging, and these form the foundation of the majority of current automated image classification systems. A common approach is via a statistical classifier such as a neural network (21) or support vector machine (22). Initially, a classifier is trained on the statistics of images of known (human classified) localization, and this is then is used to classify images of unknown localization. Several groups (20,23), including my own (24) (Figure 3), have taken this approach and have shown that correct classification rates of up to 98% (24) can be obtained on images of the major subcellular localizations. Further, automated classification results have surpassed human accuracy (25) and have been applied to the yeast proteome imaging (26). Similar approaches have been applied to 3D whole cell imaging and give comparable results (25), and more specialized classifiers have also been created to identify cell phase (27), mitotic patterns (28) and F-actin ruffles (29). Recently, facilities have been incorporated into the Cell Profiler Analyst software to interactively classify examples to train a machine learning algorithm that will then classify new examples (30).
One difficulty with automated classification is that organelle structure can vary widely between each cell type, and thus classifiers usually need to be retrained for each cell type, although research is ongoing in removing this limitation (33). Another difficulty is that subcellular localization classes and representative training images for each need to be chosen before training. With protein localization often being a highly dynamic process with a protein exhibiting multiple localizations, or localization to subdomains at different or the same point in time, localization is not necessarily clearly defined. Hence assigning a designation ‘endosomal’ may be technically correct, but does not fully describe the situation. Thus, automated classification is to some extent fitting an image into a predefined box that may not reflect the true diversity of a protein's expression.
To better provide a view of the diversity of protein expression, attention is beginning to focus on clustering imaging using the statistical measures developed for classification. Here the aim is to find and group the principle patterns of expression in imaging for one or more proteins in much the same that sequence analysis and measures of sequence similarity may be used to define families of proteins. In (34), imaging of 188 clones of randomly tagged proteins in NIH 3T3 cells were found to group into 35 statistically significant clusters or location patterns using k-means clustering on their image statistics vectors. On the genome wide scale, in this way new patterns or families of proteins may be found that are not dependent on choosing localization 'boxes’(33).
A related question to identifying localization is detecting when localization has changed. A typical experiment would be to image a protein with and without co-expression of another protein to understand how they interact (35) or to image a protein or proteins under a range of drug treatments to screen for active compounds (36,37). In such cases it is not so important what the actual localization of the protein is so much as whether it has been perturbed by an introduced interaction. Image statistics may be used to measure how 'separated’ the statistics for two experiments are utilized. One approach is to examine the (statistical) neighbours of each image to determine whether they are on the same class (38). By employing permutation testing, a p-value for the null hypothesis of no difference between experiments may then be generated. Similarly, in my own research, the distance between the mean vectors for two experiments gave a measure of how separated experiments were, and permutation testing could then be employed to assign p-values for how unlikely that separation was under the assumption of the null hypothesis (31). With this approach it was possible to differentiate 10 distinct localizations in HeLa cells and detect relatively subtle changes such as endosomal redistribution.
Imaging in Time
Live cell fluorescent video microscopy offers a wealth of information on the dynamic organization of proteins and subcellular structures that is unavailable in static 2D and 3D imaging. With the addition of time, organelle dynamics as proteins are recruited, transported and expelled can be viewed in detail and the passage though a cell of proteins and the structures that they interact with can be readily observed. However, while visual comparison of spatial structures for differences such as in size and morphology may be easily made if the differences are large enough, comparisons in time are more difficult, and hence quantification is essential to detect anything but the coarsest features of the image data.
As with segmentation, object tracking from fluorescent video microscopy presents many challenges. Objects viewed may join, split, disappear, change direction or substantially change their morphology, and there are technical challenges such as photobleaching and compromises between spatial and temporal resolution. Typically, higher spatial resolution leads to better identification of the objects to be tracked, but reduces the time resolution and hence the ability to decide which object corresponds to which at distinct time-points. Further, depending on the markers used, the subcellular environment can appear complex and cluttered. Hence tracking algorithms developed in other research areas and adapted to fluorescent video microscopy tend to perform poorly (14) and considerable research has gone into designing algorithms specific to fluorescent imaging. Typical steps taken in object tracking are image acquisition, image filtering to enhance object detection, segmentation or object detection and finally matching of objects at different time-point to create paths. One advantage that tracking can have over other image quantification problems is that in most cases image filtering need only preserve the position of the detected object and not necessarily the structure. An excellent review of approaches taken is given in (14).
The art of tracking is in the matching of objects between images to create paths. At its simplest, an object is matched to the object that it is closest to in the successive image within a given radius (the expected maximum distance an object can move between frames). Variations allow objects to appear or disappear temporarily or permanently, or state the problem as a global optimization problem to minimize the total path lengths of objects, for instance. However, such an approach is likely to fail in environments in which the typical distances between objects are of the order of the distance an object may move between time-points. Technologies such as quantum dots (39) attempt to avoid this by introducing a few fluorescently bright dots to track. Improved tracking can occur by incorporating assumptions about the object tracked such as maximum changes in velocity, morphology or size. With such models, objects can often be tracked in surprisingly complicated environments. For instance, in (40) complex networks of microtubules could be tracked firstly by filtering to enhance lines and then utilizing the fact that the tips of microtubules either grow or shorten to track them. From such tracking, detailed statistics of microtubule behaviour could then be obtained.
Another approach is to track features (without segmentation) rather than objects. In SpotTracker (41), a particle is tracked in complex environments by considering all possible paths taken and a cost function to optimize involving path smoothness, distance and passing through bright pixels. Thus the particle is not segmented from the image, but the algorithm tracks a bright feature within certain constraints. This enabled telomeres to be accurately tracked despite potential confusion with the nuclear envelope that also appeared in the imaging. Combinations of segmentation and features have also been successfully applied to automate lineage tracking up to the 350 cell stage in Caenorhabditis elegans(42).
Possibly, the most ambitious tracking to date was that created to investigate the dynamics of promyelocytic leukemia nuclear bodies (PML NBs) in mitosis (43). In this work human osteosarcoma cells (U2 OS) were imaged in 3D over time with varieties of marker proteins. Nuclei at distinct time-points were then registered with each other by applying appropriate rotations and translations, the results segmented, and the PML bodies then tracked in each nucleus. This gave very detailed information on changes in the dynamics of PML NBs at stages of mitosis and associations with mitotic proteins.
Quantifying over time
While tracking and counting objects over time gives invaluable information about the movement and dynamics of cells and subcellular structures, intensity information within tracked objects can also be exploited. Two examples are given here, one at the cellular and one at the subcellular level.
At the cell and multicellular level, automated tracking has been combined with automated classification to elucidate the phases and timing of the mitosis (28). Multicell 3D image sequences in time of the chromosomal marker histone 2B-enhanced green fluorescent protein (EGFP) were generated. These were then automatically segmented into individual nuclei, tracked and mitotic events identified as points at which new tracks were initiated. Each nuclei at a time-point was also classified into one of seven cell cycle classes utilizing automated texture based classification techniques similar to those described earlier. This enabled automated analysis of the duration of the phases of the cell life cycle in high throughput, and has the potential to be applied to high-throughput RNAi screens to explore the coordination of mitotic processes.
At the subcellular level, in my group's collaborations, 2D and 3D video microscopy has been used to study the role of 3-phosphoinositides in macropinocytosis (44). A typical experiment involves two fluorescent markers: dextran to fill and delineate the region of the macropinosome and a marker such as GFP-2xFYVE to track phosphatidylinositol-3-phosphate (PI(3)P). The dextran channel is used to create a mask of the macropinosome and track its movements, and within this mask the average intensity within the PI(3)P channel could be calculated. In this way the rate of recruitment, the time of retention and the rate of expulsion of phosphoinositides from the macropinosome could be automatically obtained. Combinations of phosphoinositide markers could be used to show quantitatively the order and timings of recruitments and expulsions from the macropinosomes.
Visual Data Representation
As high-throughput imaging and analysis becomes more commonplace, there is a need to develop a language of data representation and visualization to make sense of and convey the meaning of the multidimensional data. New forms of data require new forms of representation. As noted by Edward Tufte, a pioneer in the field of data visualization, “At their best, graphics are instruments for reasoning about quantitative information”(45).
With many fields having utilized 3D and 4D imaging, numerous tools exist to surface and volume render (18), but techniques need to be adapted to visualizing the information of interest to the fluorescent microscopist. In the dense environment of the cell relative, motility of even segmented and rendered objects observed can be difficult to ascertain when viewed as a movie. One approach to overcoming this is to use time as a spatial dimension. In (46), vesicles were segmented and tracked from 3D subcellular movies, and the dimensionality reduced by z-projection giving a 2D image for each time-point. These were then visualized in 3D with the third spatial dimension being the time-point from the movie. Hence stable vesicles appeared as long straight cylinders, while more motile vesicles would show greater curvature in the time dimension enabling a fast visual assessment of the motility state of a large number of vesicles. In my groups work we have used similar techniques to visualize the growth and retraction of tubules from vesicles during endocytosis (47). The advantage in transforming the time dimension into a spatial one is that all of the data can be viewed, compared and the relationships between objects and the timing of events seen at a glance.
In another visualization technique borrowed from phylogeny, statistics generated to quantify imaging have been used to define distances between images, and hence generate ‘phylogenetic’ trees for imaging. In (9), this approach was used to cluster and creates similarity trees for confocal images of breast epithelial cells, and in (34) a consensus subcellular localization tree was created for imaging from 126 wells of randomly tagged 3T3 cells. In this way it was possible to see the relationships between images, but also the hierarchical structure naturally created classes such as ‘punctate’ as unions of several localization classes. Along similar lines, in my group, we are interested in comparing and reviewing high-throughput imaging. Towards this the iCluster high-throughput subcellular localization imaging visualization and clustering tool was developed (Figure 4) (31,32). In the software, large image sets from single or multiple experiments may be loaded, statistics generated for each image and then mapped into two or three dimensions in such a way as to preserve the distances between the statistics vectors. In this way images that are statistically similar are spatially close, and dissimilar images are distance, thus allowing the full range of patterns of expression of experiment(s) to be readily observed. Outliers and unusual cells are then easily detected, and differences between treated and untreated experiments can be seen as spatial separation.
In any rapidly developing field requiring computational support, software to implement new methodologies is inevitably a significant problem. Several commercial solutions for analysis of fluorescent imaging are now available and provide a wide range of functionality (Table 1). However, the rise of several large-scale open source projects such as ImageJ, Cell Profiler and the Open Microscopy Environment (Table 2) are now beginning to provide a powerful alternative. ImageJ provides functions for commonly performed image analysis tasks such as thresholding, particle detection, watershedding, region selection, intensity quantification and so forth; Cell Profiler (6) is principally for high-throughput image quantification and automated analysis; and the Open Microscopy Environment (12) supports data management for light microscopy and has components for storing, visualizing, managing, and annotating microscopic images and metadata.
The importance of these open source projects is that they provide a high quality common foundation and environment for sharing methods and tools in bio-image analysis that can be built upon and verified. Each provides a plug-in architecture or application programming interface which enables researchers to easily contribute new methods as they are developed and exploit a core of library of functions and plug-ins previously contributed. For instance, ImageJ has some 500 contributed plug-ins and 300 macros. Macro creation and recording facilities allow even the novice user to establish and distribute analysis pipelines using combinations of functions.
Another significant advantage in the open source analysis and storage projects is in interoperability. Hence ImageJ, Cell Profiler and Open Microscopy Environment are ‘aware’ of each other, and data may be readily moved from one to another to exploit the specific features of each. Similarly, the open programming interfaces mean that as other analysis or storage solutions are developed, these may be easily integrated with those already developed.
Just as researchers build on, transmit and verify knowledge through publication, it is essential that analysis methods and software can be built upon, transmitted and verified through common open source projects such as have been described here.
Bio-image analysis is proving a powerful tool in both dealing with volume of imaging arising from fluorescent microscopy as well as maximizing the information extracted from the data sources. Much quantification of fluorescent imaging is currently used to distinguish: Does the number of endosomes change under treatment with a compound? Does treatment change the velocity profile of the actin comet? And from such data, interactions are inferred. The difficulty with the approach is that each protein or subcellular molecule is typically involved in a complex web of interaction networks and so determining the nature of the effect on the interaction network that gave rise to the observed change can be problematic. But as quantification methods become more commonplace and sophisticated, the next step is to use the quantified data in combination with mathematical modelling in order that the interaction networks may be teased apart. A beautiful example of this is given in the fluorescent imaging and mathematical modelling of oscillations in nuclear factor κB (NF-κB) localization between nucleus and cytoplasm (49,50). Combining detailed fluorescent imaging data and modelling it was shown that the NF-κB system is oscillatory and uses delayed negative feedback to direct nuclear-to-cytoplasmic cycling of transcription factors. In my groups research, the simple geometric information available from live cell video microscopy proved an extraordinarily rich source of information to build mathematical models and infer biologically relevant data (Figure 5). On the whole cell scale, research is beginning into generative models of subcellular localization by first quantifying the range of morphologies and spatial distributions of structures such as the plasma membrane, nucleus and lysosomes, and then utilizing the statistics of the distributions to generate synthetic models of cells (51). Once high-resolution spatial and temporal maps of cellular distribution combining multiple proteins have been created this will provide a foundation from which to model and understand the systems biology of the cell.
The author would like to thank Dr Rohan D. Teasdale (UQ) and Dr Markus C. Kerr (UQ) for their help in the editing and preparation of this manuscript.