The description and analysis of the pattern of gene expression is crucial for elucidation of the physiological function of genes, and to understand the network of genetic interactions that underlies the process of normal development. Knowing the patterns of expression of large numbers of genes enables the identification of specific promoters, marker genes useful to monitor cells in a specific state, and genes that are tightly coregulated. Although the study of the expression pattern of a gene is a prerequisite for understanding its biological role, even for most known genes, this knowledge is incomplete. And for the many thousands of uncharacterized genes, inevitably including many involved in embryonic development, there are very little data at all. As a consequence, providing a general ability to compare the expression patterns of most genes remains a formidable challenge to computational and lab-based scientists combined.
At the moment, the spatio-temporal distribution of mRNA in vivo during development is most often obtained using reporter gene systems (green fluorescent protein fusion from gene trap for example) or in situ hybridization to mRNA. The data from such experiments are generally images acquired by microscopy, and, for investigations on specific genes, these are inspected visually to extract the required information. Many model organism communities have amassed large collections of in situ images; some images generated in small numbers for specific genes, and some in large scale screens. The challenge then arises to provide effective and useful access to these many images as a publicly available resource.
It is both useful and relatively straightforward to provide access to images on the basis of the expressed gene, and this has been implemented in several databases and in different ways. For example, in model organism specific databases ZFin (Sprague et al.,2003), Wormbase (Rogers et al.,2008), MEPD (Henrich et al.,2005), Geisha (Bell et al.,2004; Darnell et al.,2007), GXD (Ringwald et al.,1997; Smith et al.,2007), BDGP (Tomancak et al.,2007), and Xenbase (Bowes et al.,2008), and across multiple species in VisiGene (Kuhn et al.,2007), 4DXpress (Haudry et al.,2008), and quickImage (Gilchrist et al.,2008). The general areas of bioimage informatics and related topics are covered in a recent review (Peng,2008). Much of the following discussion is based on the analysis of embryonic development stages, but it may be applied in a similar manner to adult tissues and organs.
Retrieving images by developmental time points alone is easily achievable, as long as these data have been recorded and stored in a database, but is possibly of limited value, as for most time points the sheer numbers of retrieved images would overwhelm the user. This would be even more so for retrieval by view (dorsal, lateral, etc.) alone. The real potential lies in being able to query the image data on the specific localization of gene expression, i.e., to be able to retrieve all images with expression restricted to an organ, or other anatomical feature or features of interest, and, furthermore, to use these data to understand the colocalization or coexpression of genes.
This, however, suggests annotation of some sort: the marking up of images according to the exact regions of gene expression. One possibility is to rank images by pure graphical content similarity, which would presumably yield useful results if staining and imaging conditions were very carefully controlled. Algorithms have been developed to achieve this; for example, see Maree et al. (2007) and Muller et al. (2004) for a recent review. However, although such queries may be quite efficient, this does not constitute annotation, and will probably not generalize over images from different projects.
There are currently two approaches to this problem of image annotation: (a) manual annotation by means of a detailed anatomy, or (b) computational annotation by means of image analysis. The underlying aim is however the same in both cases, and that is to reduce the observed gene expression for each in situ to a common underlying schematic, framework or model, as, once this is done, questions can be asked of the image collection as a whole in a systematic manner. The differences between the manual and computational approaches are striking, both in terms of how they are implemented and how the data can be treated subsequently.
Anatomy-based annotation is exemplified in zebrafish by the heroic efforts of Bernard Thisse and others, visualized through the Zfin database (Sprague et al.,2003), and a similar approach has been used both for mouse brains in the Allen Brain Atlas (Lein et al.,2007) and for fruit fly embryos in the BDGP database (Tomancak et al.,2007). This is essentially a manual annotation process. It requires the prior definition of a sufficiently detailed descriptive anatomy, covering all the development stages of interest up to the adult organism, and preferably with ontological style relationships between the anatomical terms. Then it is simply a matter of having an expert in the field review each image one by one, making notes of which anatomical regions show gene expression, and storing these data in a database.
Computational annotation has been implemented for Drosophila melanogaster and can be found in FlyExpress (Van Emden et al.,2006) and builds on work done by Kumar (Kumar et al.,2002; Gurunathan et al.,2004) and others. A different approach to the same problem has been taken by Peng and colleagues (Peng et al.,2007; Zhou and Peng,2007), although, of interest, the author notes that they “separated the dorsal and lateral views manually and also adjusted the orientations of these images,” illustrating some of the problems of fully automating these processes. The EMAGE database (Christiansen et al.,2006) of mouse expression data has a hybrid manual/computational approach requiring the user to manually input tie-points to enable computational warping of the user's image to fit the relevant mouse model, before the expression data is extracted. The database also allows searching by a user-generated expression pattern.
For computational annotation, the aim is to develop a computer program that can annotate the images automatically. This generally has several separate components; to start with, the outline of the embryo and the position of key interior detail must be located within the image (segmentation); from this information, the development stage and view or orientation need to be identified (identification); the outline and interior morphology must then be distorted to fit a standard template for the identified stage and view (registration); and finally, the actual regions of gene expression need to be isolated within the embryonic outline (annotation). A description of the developmental stage and orientation of the imaged embryo is often carried with the image as meta-data, and the second step in this process may, therefore, not be required. The accuracy with which the initial analysis and registration stages are performed will determine the degree to which overlapping expression fields from different genes may be reliably detected. However, these are complex computational problems, requiring significant development effort, and having varying degrees of success, depending, among other things, on image quality, consistency of imaging conditions, and complexity and variability of the shape and layout of the organism.
The two approaches have some general advantages and disadvantages. Computational methods will probably require a great deal of work initially, to develop, test and verify; but if successful, will then process large numbers of images in a relatively short time, and can easily be repeated if algorithms improve. Anatomy-based manual annotation is much slower, depends on the development of a sufficiently detailed set of anatomical terms, and is inaccurate where expression domains do not coincide with the defined borders of anatomical regions. Re-annotation is probably not contemplated. Anatomy-based annotation, however, is able to generate data that are independent of view or orientation, easily allowing amalgamation of the data from different views. If anatomical terms persist over multiple development stages, then it is possible to extract temporal gene expression profiles, and these may be compared with other genes. In contrast, the results of computational analysis of two-dimensional images will probably be harder to transform into, or compare with, other stages, views or orientations.
Neither approach appeared attractive to us. An anatomical ontology had been partially developed for Xenopus, but we lacked the expert annotator effort that would be required to deal with the current collection of approximately 23,000 images, and were also uncertain about the usefulness of an anatomy-based approach for the relatively featureless early embryonic stages. The computational approach appeared daunting, both from the length of time it would probably take to develop a viable system, but also from the wide variety and complexity of embryo shape and the great range of quality, color, and intensity of the in situ stains in the existing images.
We hypothesized that a rapid manual annotation method, which had a low reliance on expert knowledge, but could use the human ability to recognize subtle morphological clues and combine multiple axis rotation with nonlinear distortion in making image comparisons, could provide the basis for a suitable tool. With this, we could get the job done in a reasonable amount of time using a small group of people with varying levels of expertise in Xenopus anatomy. Human annotators should also be able to deal with poor-quality images where an in situ stain is nevertheless discernable. Those with more expertise in the field would help the less experienced gain confidence in the correct interpretation of the images.
We built a simple Web-based annotation tool, XenMARK, to enable manual transfer by direct markup of expression localization patterns from original images to schematic stage diagrams. This yielded computational style output in terms of x–y coordinates of gene expression data for each image, but with human interpretation of the position and extent of the gene expression pattern, normalized to the set of schematics. Loading the x–y coordinate data into a database enabled us to build a powerful and flexible tool for image searching and retrieval.
The annotation tool works in the following way. As input we have the in situ images, and also meta-data describing development stage and orientation for most of them. In addition we have a very rich series of schematic stage diagrams from Nieuwkoop and Faber (1956), complemented by carefully staged exemplar photographs from Will Graham in the laboratory of Barbara Lom (see the Xenopus Staging Table at http://www.bio.davidson.edu/people/balom) and some of our own images, from which we traced additional schematics to fill in the important gaps in the Nieuwkoop and Faber set. The schematics were carefully cropped and digitized to a moderate resolution. Data for all the images and schematics were stored in a database.
In operation, the browser interface displays to the user the (next) in situ image to be annotated and the likely schematic stage diagram next to each other, the information in the database being used to match the schematic for the correct stage and view of the image. The computer program superimposes a grid of (invisible) cells over the schematic diagram, which the annotator can mark up with typical “graphical” mouse movements and clicks. The task for the annotator is to assess the position and extent of gene expression on the in situ image, and transfer this as faithfully as possible to the schematic, making mental allowance for any rotation, inversion, or more complex distortion required. Two colors are available, to represent heavy and light staining, and the annotator can override the presented schematic, choosing a more appropriate one from the set if available. Figure 1 illustrates the annotation tool in use for some typical in situ images, shows clearly the relationship between the grid of cells and the schematics, and illustrates the sort of geometric transformations required by the annotators (see the stage 22 case). The size of the grid was chosen to be a compromise between the ability to capture fine detail, the need to achieve reasonable data transfer rates to build the Web pages, and to minimize the time spent by the annotator filling the cells in. The grid size was standardized at 40 cells in the height of the schematic.
Many of the images came from large scale in situ screens (Pollet et al.,2003; Rana et al.,2006; also from NIBB: http://xenopus.nibb.ac.jp), and it is possible that the quality and inclusion thresholds were lower than for images from low-throughput, gene-specific projects. To allow for this, and to facilitate different actions in the annotation process, several alternative image classifications were available to the annotators. These were Ubiquitous, Indistinct, No expression, Duplicated, and Problematic. The last two of these were used respectively to (a) indicate images for which another annotated image carried equivalent and/or better information, and (b) an image that the annotator believed was too complex in some way for them to deal with, and they wished to pass on for review by someone with more expertise.
We assembled a group of 21 annotators, including 2 high-level experts, in Evry, near Paris, for 4 days, and during this period, we comfortably processed all the ∼23,000 images. This allowed for an initial phase where annotation ground rules were established by group discussion, and a final assessment session. The key to the success of this approach was speed, and much care was put into the annotation tool to minimize the number of user operations. Images were queued in groups according to the sequence probed for, as seeing all the images together for a given gene was a significant aid in assessing which images to annotate. Most images with expression patterns could be annotated in under a minute, and classification into other categories generally only took a few seconds.
One of the most important results was the separation of the useful images (those containing distinct localized gene expression) from the less useful (those containing indistinct or no expression, or ubiquitous expression in the imaged view). By the end of the jamboree, we found that 5,172 images had been annotated with a localized gene expression pattern. The balance was made up of the less useful images, or those for which the expression pattern was substantially duplicated or better represented on a different view at the same stage. The annotated images were derived from 2,094 distinct sequence identifiers, but given the probable duplication between different in situ projects, the number of different genes for which we have data may be somewhat lower than this.
Assessing Consistency and Accuracy
We were aware that human annotators could make mistakes, and wished to have an estimate of the general confidence limits of this approach, and also to understand the most likely source of error, which would help guide future applications of the method. To this end, we included a test set of 16 images spanning the range of development stages to assess the consistency and accuracy of the annotation process. Each annotator was presented with the test set at some point after they became familiar with the annotation tool and the demands of the process. This was not blinded, although annotators were instructed to take no more or less care over these images than with the others that they marked up.
In general, we found a high degree of consistency between annotators, with most images being annotated in the same way. At a level of detail the differences were mostly where annotators had made different cutoffs as the stain got fainter toward the edges of the expressed region. For one of the images, there was considerable discrepancy over the stage selected; probably caused by the image being incorrectly staged. There was also uncertainty over the orientation of some of the quasi-spherical embryos. It is difficult to quantify this accurately, and we prefer to present the actual test data to users of the database, from which they can draw their own conclusions about the quality of the annotation and the potential ways that data might be lost. This can then help them make allowances for annotator error when searching for images. Figure 2 shows some of the test set as pairs of images and schematics with overlaid heat maps summing the data for each image over the set of annotators. Red represents complete agreement, with colors tending to blue for fewer coincident annotations. We can, however, get an approximate measure of confidence by counting the numbers of correct annotations, the number of useable but suboptimal (slight mis-staging), and the number of incorrect annotations. This gives approximately an 80% minimum and a 90% maximum pass rate. We believe this is acceptable, especially when one takes into account that many of the errors were false positives (faint stain marked up rather optimistically), which will simply be ignored by most users.
Data Retrieval Functions
Following annotation, we have x–y coordinate data for over 5,000 in situ images with identifiable localized expression patterns registered onto the set of schematic stage diagrams. This allows us to search effectively through the images looking for patterns of expression in several different ways. These data are publicly available through the XenMARK database, at: http://informatics.gurdon.cam.ac.uk/apps/XenMARK/.
As an initial basis for an effective search, we have reduced the data for each schematic diagram to a superimposed heat map, which graphically and clearly shows the user where the densest caches of data are, and may help shape the user's strategy in searching for images of interest. This is especially useful as the numbers of images at each development stage and view are very unevenly distributed. It also provides an overall snapshot of the variability of gene expression over the whole embryo at each of the stages. Figure 3 shows selected heat maps from stages 19–22, to illustrate this point.
The most straightforward retrieval method is to click on any point in a schematic heat map. The database will return images where the annotated expression pattern intersects the point clicked, and order them by the degree of localization around that point. The user can determine how deeply to search. This may give a rather general survey of images with expression around the point of interest; more informative searches can be run using expression patterns as the starting point for the search.
The user can base the search on their own defined expression pattern. This search mode makes use of the annotation tool, allowing the user to define an expression pattern they are interested in by filling in cells on one of the schematics as if they were annotating an existing image. The database then searches for real expression patterns that match the user's, ordering the results by the degree of fit. Figure 4 shows two user defined search patterns for the stage 13 embryo, posterior-dorsal view, and nicely shows the degree of discrimination achievable by this method. In a similar way, the user can also start from an already annotated in situ image, found by means of either of the previous search mechanisms, and use that image's annotation as the starting point for a search. Figure 5 shows the results of a search using an in situ image of a stage 30 embryo with expression in the somites.
Our initial concerns about the effectiveness of the method were dispelled during the annotation period. Although we had allowed a week for the work, in the event, a little less than 4 days were needed to review and annotate, where required, approximately 23,000 images. One of the more significant outcomes of the project was the separation of the images with clear localized gene expression from those with ubiquitous, or more problematically, no discernable expression, of which there were a large number. These “no expression” images mostly originated from large scale screens (see earlier discussion) and are consequently difficult to accept as true negatives in this context, and we were largely happy to have them removed from further consideration.
From a computational point of view, this approach may at first seem a little regressive. We are, however, confident that this was a pragmatic and effective way to tackle the problem, with limited (human) computational resources and large image collections that have been virtually inaccessible for too long already. The realization that human annotators have much to offer in the field of image analysis is not unprecedented, for example, the Stardust@Home project (http://stardustathome.ssl.berkeley.edu/) (Nelson,2008) searching for faint and rare tracks of interstellar dust particles, had, by March 2007, “used 23,000 ‘dusters’ [and] collectively searched nearly 40,000,000 images”.
Before embarking on this approach, we spent some time considering the use of an anatomical, text-based annotation method, but ultimately were won over by the simplicity of the method described above, and we knew that the coordinate-based data could readily be converted into anatomical terms at a later date if so required. Additionally, we know quite well that the boundaries of defined anatomical regions do not routinely coincide exactly with areas of gene expression, and the use of a set of anatomical terms as the primary recording vehicle would have introduced a degree of imprecision that would be irreversibly locked into the data. Anatomical terms are, however, critical for comparison with expression data from other model organisms, and we are currently working on an automated anatomy translator that will transform the x–y coordinate data generated here to text-based anatomical ontology terms. This will not only enable us to link to anatomy-based expression data for other model organisms, where anatomical terms are used in common, but will also allow us to extend searches laterally to other development stages, where the same anatomical terms persist.
It is clear also that our annotators were not uniformly perfect and that mistakes were made. We now know which were the tricky stages to annotate, and why (the early stages with almost spherical embryos and not all images adhering to orientation conventions), which will help us in the future. Some annotators were a little too ready to see gene expression where it probably was not present, or confused stain with natural pigmentation, especially in poorly cleared, later embryos. But on the whole, we consider these false positives not to be a problem. This certainly fits with the contemporary search engine philosophy, where the user can reasonably be expected to do a little of the work sorting out the bona fide expression wheat from the chaff, especially when the alternative is no data to look through at all. Mis-staging is more problematic, either where the original image is labeled with a wider conceptual stage (e.g., “tadpole”) covering several numerical stages, and different annotators may estimate different specific stages for image mark up. Or the annotator simply considers the image to have been mis-staged by the originators.
There is some room for improvement in the system, both for the annotation phase and also for the data retrieval. The Nieuwkoop and Faber stage diagrams (Nieuwkoop and Faber,1956) have almost legendary status in the Xenopus community; nevertheless, there were some stages where the average real embryo seemed to have a significantly different shape, and we are considering evolving an improved, or more representative, set of schematics for this purpose. This may involve some re-annotation, but we are also working on methods to allow data to be transferred automatically from one schematic to another. Such methods will also help us allow for mis-staging by deliberately pulling in images from neighboring stages in response to a user's query.
One early observation while building the query retrieval system was that there were different models one might appeal to when trying to find the best fit to another expression pattern (whether real or user-defined). For example, whether one wished to see no expression in other areas of the embryo, or whether that was acceptable so long as the search region itself was well bounded by regions of no expression. Ultimately, these choices will be provided as “search models” to the user. More complex queries to tease out system biological relationships, looking for groups of coexpressed genes potentially under a common regulatory control mechanism, and so on, are beyond the scope of this initial application. But the data are all there, and we will be working on this aspect of the search mechanism as soon as possible.
At this point, it is worth looking back to a review of this field by Bard (1997) written over 10 years ago where many of the issues addressed above, and in similar work, were cogently aired. There are two striking points, first, that the author's thoughts about the development of the field now seem a little optimistic, and we are still somewhat short of queryable expression data sets for all model organisms, complete with standard interoperability methods for cross-species queries. Second, that the “real” problems often only appear once the initial application/database has been built, when issues of sustainability and ongoing annotation become important. In particular, there is the question of how to persuade individual scientists to submit small numbers of images, when the value to them has, in a sense, largely expired on publication. In part, we know that this depends on the submission process being made as trivially easy as possible, but also on the ready perception that this resource offers real value and that a small effort by many individuals will benefit the whole community.
Database development will be continued, and we encourage the community to feed back ideas for extending and improving the functionality of the database. In addition, we are able to continue adding images to the database and annotating them, and would also encourage originators of published or unpublished images to contact us to get them added to the database. Details of how to do this can be found by following the link from the database login page.
In conclusion, this somewhat novel approach to gene expression image annotation has worked very well for us. With only a little up-front investment in database and browser interface building, and the enthusiastic involvement of a small group of Xenopus scientists, we were able to extract a large amount of useful information from over 23,000 in situ images, and use this to build a queryable database of Xenopus gene expression localization.
The in situ images were mostly already loaded into our quickImage database, any new images were loaded by the method described in the earlier study (Gilchrist et al.,2008).
The annotation jamboree was held over 4 days in a suite of rooms with space and Internet connections for the group of annotators. There was an initial training and familiarization session, with both the software, and, for some people, Xenopus anatomy. We met periodically as a group to discuss issues that arose during annotation, and also to set common annotation standards for the project. With plenty of time in hand, we decided to re-annotate all the images that had been done in the first session, on the grounds that what we had learned would improve these early annotations. The comparison test set of 16 images was selected to cover a wide range of stages and views, including views that did not exactly match any schematic. All the test images, with one marginal exception, had clear localized expression. Annotators were aware that they were doing the test set and were free to seek advice about the images, as they were for any other images.
We thank Peter Vize and Jeff Bowes at Xenbase for useful discussions about approaches to the problem and sharing data. We also thank Naoto Ueno and Hiroyo Nishide at NIBB for their help in ensuring access to the extensive NIBB image collection during their rebuilding program. We particularly thank those members of the Xenopus community who responded to our request for more images: Michela Ori, Stanislav Kremnyov, Gert Veenstra, Karel Dorey, Enrique Amaya, Martin Roth, Nancy Papalopulu, Eric Bellefroid, and Thomas Pieler. We thank Garland Science/Taylor & Francis, LLC, for permission to reproduce the Nieuwkoop and Faber stage schematics, and Barbara Lom and Will Graham for use of their staged images. This work has been supported in part by the European Commission funded Sixth Framework Programme for Research and Technological Development, Coordination Action X-OMICS and by core funding to the Wellcome Trust/Cancer Research UK Gurdon Institute provided by the Wellcome Trust (UK).