SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. Acknowledgements
  5. LITERATURE CITED
  6. Supporting Information

The availability of both the Xenopus tropicalis genome and the soon to be released Xenopus laevis genome provides a solid foundation for Xenopus developmental biologists. The Xenopus community has presently amassed expression data for ∼2,300 genes in the form of published images collected in the Xenbase, the principal Xenopus research database. A few of these genes have been examined in both X. tropicalis and X. laevis and the cross-species comparison has been proven invaluable for studying gene function. A recently published work has yielded developmental expression profiles for the majority of Xenopus genes across fourteen developmental stages spanning the blastula, gastrula, neurula, and the tail-bud. While this data was originally queried for global evolutionary and developmental principles, here we demonstrate its general use for gene-level analyses. In particular, we present the accessibility of this dataset through Xenbase and describe biases in the characterized genes in terms of sequence and expression conservation across the two species. We further indicate the advantage of examining coexpression for gene function discovery relating to developmental processes conserved across species. We suggest that the integration of additional large-scale datasets—comprising diverse functional data—into Xenbase promises to provide a strong foundation for researchers in elucidating biological processes including the gene regulatory programs encoding development. genesis 50:186–191, 2012. © 2011 Wiley Periodicals, Inc.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. Acknowledgements
  5. LITERATURE CITED
  6. Supporting Information

Xenopus is an efficient vertebrate model for genetic and developmental research benefitting from several advantages, including relatively affordable transgenesis, when compared to zebrafish and mice (Hirsch et al.,2002). The early embryos are amenable to manipulation, thus enabling efficient microsurgery and microinjection (Sive et al.,2000). Historically, Xenopus laevis was the exclusive Xenopus species to be used, but over the past two decades, Xenopus tropicalis has emerged as an excellent experimental model with powerful community resources (Kashiwagi et al.,2010).

While both X. tropicalis and X. laevis offer rapid rates of embryogenesis and large spawning, each carries distinct advantages and disadvantages (Chalmers et al.,2005). The two species are easy to distinguish, since X. laevis is characterized by significantly larger embryos and adults, as well as more robust eggs and better husbandry practices. However, X. laevis poses a substantial setback as a genetic system due to an allotetraploid genome with two related paralogues for most genes (Hellsten et al.,2007). X. tropicalis has a shorter generation time of 3–4.5 months, a potential for much greater spawning number (1,000–9,000 vs. 300–1,000), and is the only known Xenopus species with a diploid genome, which is almost half the size of that of X. laevis (Hirsch et al.,2002). However, both X. tropicalis and X. laevis exhibit a nearly indistinguishable embryonic development (apart from the size differences) and adult and larval morphologies despite having undergone 30–90 million years of independent evolution (Evans et al.,2004).

Another advantage of X. tropicalis is the availability of a fully sequenced genome. While the full X. laevis genome is still awaiting publication, thousands of X. laevis mRNA and EST sequences are readily available. In our comparison of the coding regions of X. laevis and X. tropicalis, we found the sequences to be on average 90% identical (Fig. 1). In comparison, the X. laevis paralogs resulting from the genome duplication are more closely related to each other than the orthologs (Fig. 1). This provides an intermediate genetic distance between such models as human–mouse (85%) and human–chimp (99%) (Consortium,2005; Waterston et al.,2002). In this brief review, we consider the comparative study of this species-pair as a research tool to understand developmental and evolutionary processes.

thumbnail image

Figure 1. Identity between the coding sequences of Xenopus laevis and Xenopus tropicalis (light grey) and between Xenopus laevis paralogs (dark grey). Sequence identity was determined using blastn between the 11,095 pairs of the previously delineated orthologs (Yanai et al.,2011). Pairs of X. laevis paralogs were retrieved from a recent study (Hellsten et al.,2007) and compared as for the orthologs.

Download figure to PowerPoint

Whole Transcriptomic Comparisons Between X. tropicalis and X. laevis

Comparing the two Xenopus species has long been thought to be useful for unraveling conserved developmental processes. The first study to compare gene expression in whole transcriptomic fashion invoked a DNA microarray designed for X. tropicalis genes and hybridized with RNA from both X. tropicalis and X. laevis (Chalmers et al.,2005). RNA from Stages 3 (four-cell stage) and 19 (neurula) were compared and the overlap in gene expression identified using X. laevis and X. tropicalis RNA was shown to be congruent. This data allowed for more efficient access to comparative gene expression data without the trouble of obtaining X. laevis-specific resources.

A similarly designed study focused on transcriptomes of Stages 10 and 40 as well as the egg, ovary, and liver and found overall congruent changes in expression across species (Sartor et al.,2006). Notably, this study also reported a correlation between the sequence divergence and expression divergence of the two species (also see below). However, this method was limited by an inability to distinguish between technical variation due to faulty hybridization and gene expression divergence across species. Consequently, single-species microarrays have been called into question in comparative transcriptomics (Chain et al.,2008). In contrast to the previous studies, this study compared two allotetraploid species (X. laevis and X. borealis) which may have introduced an additional confounding effect.

Recently, a study including some of us set out to compare the developmental transcriptomes of X. laevis vs. X. tropicalis (Yanai et al.,2011). In this study, we designed two custom-microarrays manufactured by Agilent that are species-specific to X. laevis and X. tropicalis, respectively. To enable an appropriate comparison, the corresponding 60-mer probes were designed as virtually identical (>90%), solely adapted to the respective genomes, and three probes were assigned to each ortholog pair. We were interested in comparing the timing and level of gene expression between the two species across development. We found that the timings of gene expression upregulations across species are very similar for most of the genes examined (i.e., little heterochrony). However, significant variation in expression levels (heterometry) was found to a much greater extent than heterochrony. Interestingly, differences in the levels of gene expression were concentrated in the earliest embryonic stages and declined during embryogenesis. Furthermore, specific pathways related to the organisms' response to the environment were found to be associated with coherent changes in the timing of the onset of expression. In addition to these global analyses, the expression plots for individual genes are of interest to those comparing the expression across the development of the two species.

Comparison of the Expression Profiles of Developmental Genes

Gene expression profiles have been previously compared for several developmentally important genes, and here we review these for the neurogenin and noggin families. The neurogenins comprise the highly conserved family of basic helix-loop-helix (bHLH) proteins—neurog1, neurog2, and neurog3—essential for neuronal determination and cell-type specification (Ma et al.,1996). The three originated from subsequent gene duplications along vertebrate evolution. The corresponding neurogenin genes from the two species share 75–89% overall amino acid identity as compared to a 45–57% identity with mouse sequences (Nieber et al.,2009).

Figure 2a shows in situ expression profiles from a recent work (Nieber et al.,2009) across both X. laevis and X. tropicalis. While analyzing the expression profile of neurog1 in both X. laevis and X. tropicalis, it was reported that expression was present throughout embryonic development upon initial induction and gastrulation. Notably, expression appeared stronger in X. tropicalis, which was attributed to “relative flexibility in expression levels” as well as the variation in sizes of these two embryos. Figure 2b indicates the temporal expression profile identified by the developmental time-course comparison. The two different kinds of images charting the expression level differences highlight the differences seen in the in situ: (1) X. tropicalis shows higher expression levels, and (2) there appears to be a slight heterochronic shift to earlier expression in X. laevis, which is hinted in the early stage in situ as well.

thumbnail image

Figure 2. Comparative gene expression profiles across Xenopuslaevis and Xenopustropicalis. a. In situ gene expression comparison for neurog1 as determined in Nieber et al. (2009). Numbers indicate developmental stages. Top row is X. tropicalis. This image is taken from (Nieber et al.,2009). b. Gene expression comparison from the developmental time-course microarray data (Yanai et al.,2011). Fourteen stages were sampled and the units of expression are in log10 scale. c,d. Expression profiles for noggin and noggin2 in the same format as b.

Download figure to PowerPoint

Another comparative study of the gene expression profiles across X. laevis and X. tropicalis addressed noggin genes (Fletcher et al.,2004), a family of secreted proteins important for patterning throughout the development of the vertebrates. It was first identified in the cDNA derived from an X. laevis embryo (Smith and Harland,1992). As for the neurogenin family, in X. laevis, noggin is induced and consequently expressed throughout development. The X. tropicalis and X. laevis noggin orthologs share a 98% amino acid identity as well as a very similar pattern of expression (Fig. 2c). Noggin2 is a bone morphogenic protein antagonist shown to be crucial for the induction of dorsal structures in the embryo. A comparison revealed similar expression profiles (Eroshkin et al.,2006) also evidenced by the developmental time-course data (Fig. 2d).

Introduction of transcriptome profiles to Xenbase

Xenbase is an online database to aid in the study of X. laevis and X. tropicalis biology and genomics (Bowes et al.,2010). It is the main resource for development and genomic information in the Xenopus community, continuously integrating research from the literature, including images relevant to gene expression, anatomy and developmental stages. As of 2011, Xenbase contained 2,316 genes with at least one expression image of the kind shown in Figure 2a for the neurog1 gene. We submitted the 11,095 expression plots of orthologs (Yanai et al.,2011) to Xenbase and we now report their integration. For example, the expression images for neurog1 now also include the time-course profile (Fig. 2b). Thus, such a gene that was already associated with published expression images is now complemented with the temporal expression plots. Moreover, ∼80% of the genes for which we submitted the plots are now associated with their first expression image.

To appraise how beneficial the additional expression images are for the community we asked if detectable biases existed in terms of which genes were previously associated with expression images in Xenbase. We first looked at sequence similarity between X. laevis and X. tropicalis orthologs (Fig. 1). For each set of genes with a given sequence similarity (for example 90% DNA sequence identity), we computed the fraction that had contained expression images. We found that those orthologs with a higher sequence similarity featured a higher fraction of genes with expression images (Fig. 3a). Because sequence conservation is a proxy for functional importance (via selective pressures), the trend shown in Figure 3a can be seen as a demonstration of the Xenopus community's prioritization on functionally important genes. This is likely a result of the method by which genes were isolated in the pregenomic era, where conservation was required by low-stringency hybridizations with genes from distant species.

thumbnail image

Figure 3. Biases of Xenopus research towards genes with higher conservation of sequence and expression. a. Genes with Xenbase expression images as a function of their sequence similarity. The plot indicates the fraction of genes with expression images in Xenbase for the shown sequence similarity ranges. b. Genes with Xenbase expression images as a function of their expression similarity. Expression divergences between X. laevis and X. tropicalis (Yanai et al.,2011) were sorted and split into equally populated bins. The fraction of genes in each bin with Xenbase images is shown.

Download figure to PowerPoint

Next, we examined the distribution of genes with expression images for genes with different levels of expression conservation across the two species. An index for computing expression distance was introduced (Yanai et al.,2011) and computes the Euclidean distance between plots. For example, neurog1 exhibits more expression divergence across the species than noggin or noggin2 (Fig. 2b–d). We partitioned the genes in equally populated bins of increasing expression similarity. For each bin we computed the fraction of genes with expression images. We found that those genes with high expression similarity show a higher fraction of genes with expression images (Fig. 3b). As with sequence similarity, this correlation likely reflects the notion that genes under stronger selection are both evolving slower at the expression level as well as playing a more important functional role and consequently under closer study according to the information found in Xenbase. Overall, the biases shown in Figure 3 indicate that researchers have predictably focused on the more important genes as judged by their expression and sequence conservation. However, even for this class of genes, thousands have lacked images (Fig. 3) for which there are now temporal expression data.

Finally, we reexamined the relationship between expression divergence and sequence divergence (Sartor et al.,2006) by comparing the expression divergence of the orthologous profiles with the sequence similarity. We found a very weak yet significant correlation between the two (R = 0.08, P < 10−13). Thus, overall, the selective forces acting on the evolution of sequence and expression are intertwined, though to a limited degree. Perhaps more extensive work on characterizing both the spatial and temporal expression will more accurately gauge expression divergence.

One important aspect of the transcriptomics data relates to the issue of expression of gene duplicates, a problem that is made worse in X. laevis which sustained a genome duplication following its last common ancestor with X. tropicalis. Because of the lack of a completed genome for X. laevis and the difficult in resolving the expression of genes with near identical sequences using expression microarrays, there was an inherent problem in measuring expression of recent duplicates. In light of this, we adopted a strategy of selecting probes in conserved sequence regions such that the resulting expression of duplicates may be represented as the sum of their separate expressions. Thus, many interesting patterns of subfunctionalization are potentially masked in our profiles for duplicates, as recently revealed (Hellsten et al.,2007).

Gene Expression Clustering Toward Identifying Developmental Networks

Beyond examining the expression profile of particular genes, the temporal expression profiles can be used to identify new genes participating in specific pathways, based upon an operational assumption that genes with similar expression profiles are functionally related. Given a gene of interest, for example mix1, the expression profile can be analyzed and located in a specific cluster (Fig. 4a). Upon analyzing this cluster, other genes with similar expression profiles are identified, such as nodal1. Interestingly, the nodal1 gene may be functionally related to mix1 through its close paralog mix2 (Cao et al.,2008). Additionally, the well-studied gene brachyury1.1 is also a member of this cluster. All three have been studied and thus, known information can be applied to other, perhaps lesser known or unknown, genes in this cluster; for example, LOC100489561, currently only annotated as “sbk1-like.” Furthermore, sbk1 is also a member of this cluster (Fig. 4a), and these two genes with very similar expression profiles may indicate gene duplicates with conserved expression.

thumbnail image

Figure 4. Clustering of comparative gene expression profiles. ad. Each plot indicates a plot of genes clustered by Heyer clustering as presented previously (Yanai et al.,2011). The clustering was performed only on X. laevis profiles. Four expression clusters are shown. The unbroken lines correspond to the X. laevis profiles while the dashed lines show the superimposed X. tropicalis profiles. The full clustering is provided in Supporting Information Table S1.

Download figure to PowerPoint

Notably, goosecoid is also known to be a part of this network of genes; however, it was not clustered with them. Rather goosecoid is in another cluster with a similar expression profile, particularly marked by upregulation at Stage 9 (Fig. 4b). Following this upregulation, the level of expression decreases at a slower rate than the previously noted cluster (Fig. 4a). This suggests that when studying individual gene networks, a more appropriate clustering strategy might involve restricting the analysis to those stages in which this network is induced. This analysis highlights the important role of computational analysis in developmental transcriptome data. Figure 4c,d shows two additional expression clusters in which expression is often not particularly well conserved across the two species. This lack of conservation also provides important clues as to the functional component of a gene's expression profiles (Yanai and Hunter,2009).

To help in this search we developed a platform-independent gene expression browser using the MATLAB environment. This browser provides a simple graphical user interface-based access to the expression data. In addition to forward and backward advance options, genes can be accessed in random order by their Ensembl ID. Expression level is displayed as raw, log-ed, or normalized. Each species is displayed at its own range of expression values to compare clutches to each other. Additionally both species and all clutches/probes are brought together to a common scale for comparison. In particular, this browser has the useful ability to load any sub-set of genes and efficiently click through them. This browser is available for download here http://kirschner.med.harvard.edu/Xenopus_Transcriptomics.html.

In conclusion, we have integrated a large-scale dataset for general consumption by the Xenopus community through the widely used Xenbase portal. This allows for easy accessibility of the dataset and facilitates the research of new—perhaps less sequence and expression conserved—genes by the community. We suggest that integration of additional datasets such as protein expression levels, protein–protein interactions, and gene regulatory linkages, will further help researchers in unraveling the underlying developmental gene networks in the Xenopus embryo.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. Acknowledgements
  5. LITERATURE CITED
  6. Supporting Information

The authors thank the Xenbase team, in particular Kevin Snyder, for their inclusion of the developmental time-course plots. They also thank Tamar Hashimshony for a critical reading of the manuscript. They also thank the two anonymous reviewers for valuable suggestions.

LITERATURE CITED

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. Acknowledgements
  5. LITERATURE CITED
  6. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. Acknowledgements
  5. LITERATURE CITED
  6. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
DVG_20811_sm_suppinfotab1.tex106KSupporting Information Table S1. Clustering supplement to Figure 4

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.