Large-scale expressed sequence tags (EST) analysis was used to demonstrate a number of dynamic changes in the global gene expression profile of the basal chordate Ciona intestinalis over the course of its development. The fertilized egg was found to store a great variety of maternal transcripts and, as development proceeds, the organism expresses a progressively smaller repertoire of genes. In addition, a significant portion of genes involved in embryogenesis were observed to be downregulated during metamorphosis, at which point the adult appears to utilize a different set of genes to form its body. At least 25% of the genes involved in development were found to be used multiple times. This kind of information is essential to form a comprehensive understanding of the overarching expression-control plan by which the basic chordate body is formed.
Most animals are indirect developers, first developing from a unicellular fertilized egg into a multicellular larva, and then into an adult by the process of metamorphosis. The number of genes present in most animals is estimated to be approximately 13 000–35 000, depending on the species (The C. elegans Sequencing Consortium 1998; Adams et al. 2000; International Human Genome Sequence Consortium 2001; Venter et al. 2001; Araricio et al. 2002; Dehal et al. 2002; Holt et al. 2002; Mouse Genome Sequencing Consortium 2002). How many genes are required to be expressed during each stage of this complex developmental process? Are there any rules that can be discovered regarding the global spatial and temporal expression of developmental genes? For example, how many maternally expressed genes are used during the formation of both larva and adult? Or, are different sets of gene utilized for each of these separate stages? The ascidian Ciona intestinalis provides a good experimental system to answer these fundamental questions.
Ascidians are marine invertebrate chordates ubiquitous throughout the world. Their fertilized egg develops quickly into a tadpole larva, which possesses a small number of tissues, including epidermis, central nervous system, endoderm and mesenchyme in the trunk, and notochord and muscle in the tail (Satoh 1994, 2001). The configuration of the ascidian tadpole represents the most simplified and primitive chordate body plan (Satoh & Jeffery 1995; Corbo et al. 2001; Satoh 2003). Lineage tracing of embryonic cells indicates that ascidian embryogenesis is appropriately simple to be challenged by developmental biologists (Conklin 1905; Nishida 1987). In addition, the ascidian genome contains a basic set of genes with less redundancy compared to the vertebrate genome. For example, the most studied ascidian, C. intestinalis, is estimated to possess 15 852 protein-coding genes (Dehal et al. 2002). Because of these advantages, we have conducted a large-scale EST analysis of C. intestinalis to explore global changes in gene expression during embryogenesis (Satou et al. 2002a). Despite extensive EST analyses of various animals, this type of analysis has rarely been carried out over the entire life cycle of an individual species.
In a series of previous studies, we examined six different developmental stages of C. intestinalis: fertilized eggs (Nishikata et al. 2001), cleavage stage embryos (Fujiwara et al. 2002), gastrulae/neurulae (Satou et al. unpubl. data), tailbud stage embryos (Satou et al. 2001), larvae (Kusakabe et al. 2002), and young adults (Ogasawara et al. 2002). We hypothesized that maternal mRNA for molecules responsible for the establishment of the embryonic axis should be actively expressed in large quantities in the fertilized egg (Nishikata et al. 2001). In addition, because the epidermis, endoderm and muscle of the ascidian embryo differentiate autonomously dependent on maternal factors, mRNA associated with these processes should likely be stored in the egg. Embryonic cleavage takes place when the zygote begins to express genes responsible for the first step of embryonic cell specification and cell–cell communication (Fujiwara et al. 2002). In ascidian embryos, the developmental fates of almost all cell types are specified until the gastrula stage. When the organism reaches the tailbud embryo stage, the epidermis, nervous system, endoderm, mesenchyme, notochord and muscle are formed, and thus the genes responsible for these processes must be expressed at this stage (Satou et al. 2001). The tadpole larva is an organism that has a variety of specific functions, and any genes responsible for these functions should be highly expressed at this stage (Kusakabe et al. 2002). Lastly, young adults consist of epidermis, body wall muscle, pharyngeal basket, endostyle, digestive system (including esophagus, stomach and intestine), a dorsal neural complex, and a tubular heart, and thus the genes that are responsible for the formation and function of these adult organs and/or tissues should be highly expressed at this stage (Ogasawara et al. 2002). Here, we report the findings of a genome-wide analysis of global gene expression patterns throughout the lifespan of the ascidian C. intestinalis, from embryogenesis through metamorphosis. As a result of these types of studies, there are now few if any organisms for which more complete molecular resources are available than C. intestinalis (Dehal et al. 2002; Satou et al. 2002a,b; Satoh et al. 2003).
Materials and Methods
cDNA libraries and EST analyses
Ciona intestinalis were cultivated at the Maizuru Fisheries Research Station of Kyoto University, Maizuru, Kyoto, Japan. Eggs, embryos, larvae and young adults were handled as described previously (e.g. Satou et al. 2002a). Six cDNA libraries were constructed with poly(A)+ RNA isolated from fertilized eggs, 32–110-cell stage embryos, gastrulae/neurulae, tailbud embryos, larvae and young adults, respectively, as described by Satou et al. (2001). Each of the libraries was arrayed in 384-well plates with a Q-pix robot (Genetix, New Milton, Hampshire, UK).
Expressed sequence tag (EST) sequences were determined by conventional procedures using an ABI3700 sequencer with big-dye terminators at the Academia DNA Sequencing Center, National Institute of Genetics, Mishima, Japan, directed by Dr Yuji Kohara.
The sequence of the 3′-most ends were compared with one another using FASTA software (Pearson & Lipman 1988) to group the cDNA clones into ‘clusters’. The threshold value for clustering was a similarity score of 150 and 89% sequence identity. Thus, the term ‘cluster’ is nearly synonymous with the word ‘gene’. However, as the threshold for grouping cDNA into clusters was so high, with some frequency (∼15%) cDNA which should have been grouped into the same cluster were in fact placed in multiple clusters. The term ‘EST count’ is synonymous with the number of clones examined, representing the amount of expressed mRNA of the corresponding gene. In this study, we used the term ‘EST count’ only for 3′-EST.
Inevitably, some percentage of cDNA clones are chimeric clones of two or more unrelated cDNA, an unavoidable experimental artifact. In spite of careful experiments, contamination between neighboring clones on an Escherichia coli plate also sometimes occurs. These artifacts were removed whenever possible during analysis (see below).
Annotation by gene ontology terminology
All of the 5′-EST were used for BLASTP searches against the SWISS-PROT/TrEMBL human proteome set, which was released 17 August 2002. To avoid contaminations and experimental errors, only cDNA clusters containing multiple EST were used. The blast results were then compared within each cDNA cluster, and the blast results shared by multiple EST within a single cluster were selected. Among the positive blast results, the one with the lowest E-value was chosen as the best-hit human proteome protein for that particular cDNA cluster. If the E-value for the best-hit protein was higher than 1e-15, the cDNA cluster was regarded as a gene lacking any significant similarity to known human proteins. Because proteins in the SWISS-PROT/TrEMBL human proteome were well annotated by gene ontology (GO) terminology, the GO of the human best-hit proteins were assigned to their respective cDNA clusters.
Temporal expression group
The expression coefficients and the Euclidean distances between all possible pairs of clusters were calculated as described by Claverie (1999). Using the unweighted pair group method with averages (UPGMA) implemented by Phylip (Felsenstein 1993), a dendrogram was constructed based on the calculated Euclidean distances, and was visualized by tree explorer (Kumars et al. 1994). Temporal expression groups were empirically defined based on this dendrogram (red lines in Fig. 3A). The resulting 26 temporal expression groups include cDNA clusters with similar temporal expression profiles, as shown in Figure 3(B), indicating that this method grouped the cDNA clusters appropriately.
Detection of peaks in gene expression based on EST counts
Genes were considered upregulated or downregulated between time points when the difference between the EST counts at two or more successive time points was three or more. This threshold value was determined based on a significance test described in Audic & Claverie (1997), and helped to eliminate random fluctuations from our analysis. For example, when comparing the two libraries (gastrulae/neurulae and larvae) with the least number of EST, the probabilities that the difference between the EST counts of the two libraries resulted from random fluctuation were less than 23.8%, 11.6%, 7.4% and 5.4%, assuming fold-changes of 1–4, respectively. In terms of predicting the number of ‘recycled’ genes, a higher threshold leads to higher precision selection with a concomitant loss in sensitivity. A threshold value of three led to a prediction of 36% for the proportion of genes recycled, while a value of four led to a prediction of 23%. To balance these precision and sensitivity issues, we accepted both values, yielding maximum error rates of 7.4% and 5.4% for the thresholds of 3 and 4, respectively.
Results and Discussion
Global expression trends
The fertilized egg stores a large number of different maternal transcripts and, as development proceeds, the number of genes expressed decreases.
We constructed six cDNA libraries from fertilized eggs (EG), cleaving embryos (CL), gastrulae/neurulae (GN), tailbud embryos (TB), larvae (LV) and young adults (AD), from which a large number of EST were generated. Because the libraries were not amplified or normalized, the appearance of each cDNA clone occurs in proportion to its abundance at each particular stage. As shown in Table 1, we obtained a total of 167 143 and 164 742 EST from the 5′- and 3′-ends of cDNA, respectively. The 3′-EST were compared with one another using FASTA, separating the 164 742 EST into 17 221groups, which we will hereafter call (cDNA) clusters. Similarly, 29 444 3′-EST from the EG library were grouped into 8316 clusters, 29 796 EST from the CL library were grouped into 7823 clusters, 23 475 EST from the GN library were grouped into 6628 clusters, 31 209 EST from the TB library were grouped into 7487 clusters, 24 680 EST from the LV library were grouped into 4674 clusters, and 29 138 EST from the AD library were grouped into 6184 clusters (Table 1). This number of cDNA clusters (17 221) is larger than the number of protein-coding genes (∼16 000) estimated from the draft genome sequence (Dehal et al. 2002). This is mainly due to inevitable clustering errors (see Material and Methods), and to a lesser extent alternative splicing variants.
*EG, fertilized eggs; CL, cleaving embryos; GN, gastrulae/neurulae; TB, tailbud embryos; LV, larvae; AD, young adults. cDNA libraries were constructed from RNA isolated from the five different stages. †The number of clusters was obtained after examination of cluster overlap at the five stages.
Figure 1(A) shows the relationship between the number of 3′-EST and the number of calculated clusters at each of the six developmental stages. One trend was visible at this point: the ratio of clusters per EST counted is highest in fertilized eggs and lowest in larvae, and the ratio gradually decreases during embryonic development. This suggests that the fertilized egg expresses and/or stores the most varied population of mRNA, and as embryonic and larval development proceeds, embryos and larvae express a smaller and smaller number of species. Young adults also express a smaller number of gene species, but the ratio increases slightly during metamorphosis, suggesting that a novel and/or different set of expressed genes are used to form the adult body plan. As shown in Figure 1(A), the clusters/EST-counts curves nearly plateau, but the number of EST is never quite saturated. Therefore, there remains the possibility that rarely expressed mRNA are missing from this and the following analyses. However, based on the gene annotation results of the Ciona genome, the EST studied account for at least 85% of the total number of genes (Dehal et al. 2002; Y. Satou et al. unpubl. data). This proportion is most likely large enough to ensure that the addition of data pertaining to the rare messages would not significantly affect the conclusions of the following analyses.
Figure 1(B) shows the 10 clusters in each library with the largest relative EST-counts (EST count of a cluster/total EST count). It is evident that the number of genes which give rise to large relative EST-counts increases as development proceeds, as the average EST-counts of the 10 most highly expressed genes were 4.5 per thousand in the EG library, 6.2 per thousand in the CL library, 8.7 per thousand in the GN library, 14.7 per thousand in the TB library, 20.5 per thousand in the LV library and 17.7 per thousand in the AD library. From this, it is reasonable to conclude that the fertilized egg contains the most varied population of transcripts, and as embryonic development proceeds a less and less varied population of transcripts are expressed. This trend is also observed to reverse somewhat upon metamorphosis as a result of the vast changes in global gene expression which accompany this stage.
Populations of expressed genes change dynamically during development
To categorize the cDNA clusters based on their probable functions, each cluster was screened against a human proteome that has been extensively annotated using gene ontology (GO) terminology (Ashburner et al. 2000). It has been reported that nearly 80% of Ciona proteins have clear homologues in the human proteome (Dehal et al. 2002). For more precise analysis, we used additional EST from six different adult tissue cDNA libraries, which contained a total of 164 742 3′-EST and 167 143 5′-EST. In spite of these additional data, 5210 clusters still consisted of only one EST, and these were excluded from the following analysis to avoid error due to occasional contamination. Of the remaining 12 011 clusters (17 221 minus 5210), 4907 clusters demonstrated significant homology with human proteins. The GO of the best-hit human protein for each cluster was adopted as the GO of cluster. By this homology-based automated annotation, we were successfully able to classify 4039 (33%) clusters within ‘biological process’ categories and 4442 (37%) clusters within ‘molecular function’ categories. Examples are shown in Table 2.
Table 2. Examples of cDNA clusters in each functional class
The number of clusters in these functional classes at the six different stages of Ciona intestinalis development are shown in Figure 2(A), revealing marked alterations over time in the number of genes expressed in each functional class. Figure 2(B) shows the relative EST counts of each class at the six different stages, demonstrating how the genes are used qualitatively and quantitatively during the entire process of ascidian development. For example, the number of different metabolic mRNA species at each stage decreases progressively during embryogenesis, and increases slightly following metamorphosis. Because young adults express nearly 600 fewer metabolic genes than eggs (Fig. 2A) while possessing nearly similar relative levels of metabolism-related transcripts (Fig. 2B), it is likely that the organisms at this stage produce higher levels of a less-varied population of mRNA.
This approach yields equally informative data regarding the cell cycle. The repertory of cell cycle-related cDNA clusters decreases at the larval stage (Fig. 2A), while the relative abundance of these transcripts decreases rapidly after the gastrula/neurula stage (Fig. 2B). This suggests that a larger variety of cell cycle genes are expressed abundantly in early embryogenesis, and that the relative abundance of cell cycle-related mRNA decreases after the gastrula/neurula stage. Interestingly, the tailbud embryo maintains a large diversity of expressed cell cycle-related genes, even though the overall abundance of these transcripts continues to decrease at this stage.
Figure 2(A,B) also suggests, that the abundance and variety of the repertories associated with morphogenesis, cell adhesion and cell motility increase as embryonic development proceeds, especially after the gastrula/neurula stage. This is to be expected, as matters of structure and motility obviously increase in importance as an organism develops.
At least 25% of development-related genes are used multiple times during development
To create an overview of the temporal gene expression profile of C. intestinalis, we categorized cDNA clusters with similar expression profiles into groups. Based on the EST counts of the cDNA clusters, we calculated the expression coefficients and the Euclidean distances between all possible pairs of clusters, and used these to yield expression profiles, as described previously (Claverie 1999). This method allows a meaningful measure of expression-profile similarity independent of the total numbers of EST in the six libraries. Here, we analyzed 3671 clusters with 10 or more EST counts, as those with fewer than 10 EST counts are likely to have a higher degree of error. By means of the UPGMA method, we built a dendrogram using the calculated Euclidean distances (Fig. 3A). By this method, we defined 26 different temporal expression (TE)-groups, which reflect quite well the actual expression patterns of the genes. The remaining 13 550 clusters which had less than 10 EST counts were also each assigned to a TE-group, but were not used in the following analyses unless otherwise noted.
The TE-groups provide a large wealth of information regarding the overall orchestration of gene expression at the organismal level. For example, TE-group 1 is a group of cDNA clusters with abundant maternal expression and lower levels of expression in the later stages of development, while TE-group 12 genes are mainly expressed following metamorphosis. Figure 3(C) also shows the relative abundance of the 26 different TE-groups, with TE-group 1 being the major group. The second most highly expressed group is TE-group 5, which is comprised of genes which are expressed strongly in early embryogenesis, but much less at the tailbud stage and thereafter. The third is TE-group 9, whose genes are expressed mainly in gastrulae/neurulae. Fourth is TE-group 17, which contains genes that are mainly expressed in larvae. Fifth is TE-group 12, which contains genes that are mostly highly expressed in young adults. Surprisingly, the genes which make up each of these five TE-groups display similar profiles, with a striking peak of expression during one particular stage. In addition to these, TE-group 24 displayed a strong expression peak at the tailbud stage. These results suggest that many developmental genes are expressed only once during C. intestinalis embryogenesis.
Thus, we endeavored to quantitate how many genes are ‘single-use’ and how many are used repeatedly during Ciona development. We defined ‘recycled genes’ as those that are used two or more times during development; by definition, TE-groups representing recycled genes would have two or more peaks in their temporal expression profile. As a rough estimation, 17 of the 26 TE-groups, numbers 02, 06, 07, 08, 10, 11, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23 and 26, were found to represent recycled genes, and these TE-groups are comprised of a total of 928 clusters (25%). The remaining 2743 clusters (3671 minus 928) represent members of TE-groups whose genes are expressed only once during development or that are continuously expressed. These clusters thus represent the single-use genes.
There is a slight possibility that certain genes or clusters have a very narrow window of temporal expression that is missed by collecting data at only the six developmental stages examined by this study. Or, if a gene is expressed in one tissue at one developmental stage and then in another tissue at another stage, it may be counted incorrectly as a single-use gene. Therefore, the number of ‘single-use’ gene clusters will most likely be overestimated. At the very least, it seems reasonable to conclude that at least 25% of developmental genes are recycled (Fig. 4A).
To corroborate this data, we approached this subject by another method, determining peaks of expression simply based on EST counts (see Materials and Methods). By this estimation we achieved comparable results, predicting that at least 23% or 36% of genes were recycled, with a maximum error rate of 5.4% and 7.4%, respectively (see Materials and Methods).
Recently, microarray technology was applied to determine the global gene expression profile of Drosophila melanogaster over the course of its life-span (Arbeitman et al. 2002). In this Drosophila study, a total of 4028 genes were assayed, 63.7% of which were found to be recycled over the course of development. This apparent discrepancy may be explained by an inherent source of underestimation in our study stemming from the differences between the fly and ascidian life cycles. For the analysis of Drosophila, mature adults were used in addition to developing embryos. In contrast to the bi-phasic life-cycle of the ascidian, characterized by two body forms (larvae and adults), the fly Drosophila exhibits three phases (larvae, pupae and adults). The Drosophila profiling study clearly showed two points of dynamic changes in the profile, at the interphases between the three body forms. These differences in developmental patterns should be considered for all future discussion. Of the 3671 Ciona cDNA clusters with 10 or more EST counts, 3321 also were present in the six libraries derived from mature adult tissues. Because we did not examine every tissue type in this study, and because the gonad contains immature and mature eggs whose mRNA species mostly overlap those of fertilized eggs, it is difficult to concretely determine which genes are used multiple times during the whole ascidian life-cycle. However, a considerable number of genes used during Ciona development appear to also be used in adult tissues. With this in mind, the rates of recycling may in fact be comparable to that of Drosophila.
Of 17 221 total cDNA clusters, 969 clusters were consistently observed in every stage of development. However, no clusters exhibited a clear ‘housekeeping’ expression pattern, characterized by similar-strength expression at every stage. In fact, most of the classical ‘housekeeping’ genes, defined as those which perform functions fundamental to the survival of every type of cell, tended to be classified into TE-group 1. As described below, genes for general transcription machinery, including RNA polymerase II, are abundantly expressed maternally. Genes encoding core glycolysis enzymes, including triosephosphate isomerase, glyceraldehydes-3 phosphate dehydrogenase, phosphoglycerate kinase and phosphoglyceromutase were also all classified into TE-group 1. Of the eight core genes for enzymes of the TCA cycle, six were classified into TE-group 1. These mRNA are all abundantly maternally supplied, and tend to decrease in expression during development, suggesting that even these ‘housekeeping’ genes are developmentally regulated.
Multiple genes within the same functional category
One of the most prominent pieces of useful information gleaned from this grouping pertains to the genes for structural constituents of cytoskeleton. As shown in Figure 4(B), almost a quarter of the genes in this functional class are recycled. Outside of this quartile, the remaining genes in this class are very unevenly divided among the various TE-groups. 31%, 14%, 9% and 11% of these genes were assigned to TE-groups 17, 12, 24 and 1, respectively, while TE-groups 3, 4, 5 and 9 each contain 3%. TE-groups 1, 3, 4, 5 and 9 are characterized by expression during the early embryogenesis, TE-groups 17 and 24 are characterized by expression in tailbud embryos and larvae, and TE-group 12 is characterized by expression in young adults. These results suggest that there are three different sets of genes for structural constituents of cytoskeleton in the Ciona genome which are all coordinately regulated during the life cycle of the organism. For example, cDNA clusters 02317 and 00086 encode distinct β-tubulins, and are classified into TE-groups 5 and 12, respectively. Therefore, the former β-tubulin is likely used mainly in early embryogenesis, while the latter is mainly used following metamorphosis. cDNA cluster 10847 corresponds to intermediate filament F, and 00259 and 03380 to intermediate filament C. The former is classified into TE-group 12, and the latter two into TE-group 17 (these latter two cDNA clusters were split due to an inevitable clustering error, but were manually confirmed to be derived from a single gene). Thus, the type of intermediate filament used by the organism is likely switched during metamorphosis. Finally, cluster 08111 encodes an actin-related protein, which is involved in the control of actin polymerization. This cluster was assigned to TE-group 13, which represents a group of recycled genes. Thus, the genes for structural components may be broken down into four discrete groups: one (20%) mainly expressed during early embryogenesis, one (40%) mainly expressed at tailbud and larval stages, one (15%) mainly expressed in the young adults, and one (25%) repeatedly used in multiple stages. If this is indeed the case, one would expect to find four overriding genetic mechanisms which bring about these four distinct expression profiles.
Figure 4(C) shows the TE-group breakdown of the transcription factor genes, from which it can be observed that these genes are not over- or under-represented in any of the TE-groups. Therefore, unlike the global control which is evidently exerted over the structural genes, it is likely that far more fine-tuned and complex mechanisms are responsible for individual regulation of the transcription factors.
Coordinated and non-coordinated expression profiles: implications for functional coordination and non-coordination of genes
Next, we examined the relationships between the TE-groups and functional classes. Here, we describe several examples of coordinated or non-coordinated relationships between the temporal expression profiles and the actual functions of the genes.
Cell cycle genes As described above, the relative abundance of cell cycle-related genes was decreased at the tailbud and later stages compared to early embryogenesis levels. Do all cell cycle genes show a similar expression profile? 69% of the cell cycle genes fall into TE-groups 1–10, which represent genes with peak expression in early embryogenesis (from eggs to neurulae) (cf. Fig. 4B). The remaining 31% of the cell cycle genes show at least one expression peak during later development, from the tailbud stage to the young adult. For example, MPF (a complex of cyclinB and cdc2), which is known to be one of the most important cell cycle regulators during fertilization and early embryogenesis (reviewed by Doree & Marcel 2002), also acts more generally as an M-phase regulator called M-Cdk complex during later developmental stages. Ascidian cyclin B (cDNA cluster ID: 00734) and cdc2 (ID: 03068) are classified into TE-group 5 and 1, respectively. This again suggests the importance of this complex in early embryogenesis, because both of these TE-groups are characterized by extensive gene expression during early embryogenesis. During early embryogenesis, the cell cycle proceeds without the G1 or G2 phases, which are both of crucial importance later on during final differentiation of tissues. Cyclin D is a critical for entrance into G1 phase (reviewed by Sherr & Roberts 1999), and its Ciona ortholog (ID: 00029) is classified into TE-group 24, whose genes peak in expression at the gastrulae/neurulae and tailbud stages. It is also known that the anaphase-promoting complex (APC) works as a cell-cycle regulator in both the mature-type cell cycle and the embryonic-type cell cycle, which lacks the G1/G2 phases (reviewed by Irniger 2002). The cdc20/APC complex degrades M-cyclin in both the embryonic and mature cell cycles, while in the mature cell cycle a complex of Hct1/Cdh1 and APC is required for repression of M-cyclin synthesis during G1 phase. The ascidian Hct1/Cdh1 (ID: 01610) gene was classified into TE-group 26, characterized by expression in later developmental stages, although this was not automatically annotated because the best-hit protein was a fragment of human Hct1/Cdh1 protein that is not annotated in the human proteome used. The ascidian cdc20 (ID: 00064) gene, on the other hand, was classified into TE-group 1, as it is expressed most strongly during early embryogenesis. Thus, the temporal expression profiles of these genes are in good agreement with the known data regarding their roles in both the embryonic and mature cell cycles.
Transcription factors The functional class of transcription factors includes the gene-specific activators/repressors in addition to those transcription factors which code for basic transcriptional machinery. The genes for basic transcriptional machinery include RNA polymerase II and at least six factors: TFIID, TFIIB, TFIIA, TFIIE, TFIIF and TFIIH. Of these, RNA polymerase II (ID: 02357), TATA-binding protein (TBP; the main component of TFIID; ID: 04232), TFIIB (ID: 01088) and TFIIH (ID: 05391) are members of TE-group 1. TFIIA γ-subunit (ID: 00908) and TFIIF β-subunit (ID: 05256) are members of TE-group 2, and TFIIF α-subunit (ID: 02519) is in TE-group 3. All of these TE-groups exhibit similar expression profiles characterized by peak expression in the early embryo. The gene for TFIIA α- and β-subunits (ID: 01318) was classified into TE-group 17, which is characterized by expression in the tailbud embryo and larva. Because the TFIIE genes (α-subunit, 32402; β-subunit, 12684) were less abundantly expressed, they were excluded from this analysis. As a result, these general transcription factors, except for TFIIAα/β, exhibit similar expression profiles. The gene-specific activators/repressors, on the other hand, displayed a wide variety of expression profiles. For example, genes in TE-group 13, which in Figure 3(B) can be viewed to contain a large number of transcription factors, account for 6.25% of genes in this class (Fig. 4C). Of the transcription factors in this group, all are gene-specific activators or repressors, including cAMP-response element binding protein (CREB), hepatocyte nuclear factor 4-α (HNF4-α), Nuclear factor NF-κB p105 subunit, and Sox1.
Structural constituent of ribosome Most (68%) genes which encode ribosomal structural constituents were classified into TE-group 12. This TE-group contains nearly every gene which codes for 40S and 60S ribosomal proteins, with only a few exceptions. One exception was due to a misclustering artifact which occurred when EST for a single gene were mistakenly grouped into two cDNA clusters. Other exceptions were the genes for the 60S acidic ribosomal proteins P0 and P2, which were assigned to TE-group 9 and 17, respectively. Although the specific functions of these acidic ribosomal proteins are not known in detail, yeast ribosomes lacking P1 and P2 proteins selectively translate a different subset of mRNA to those ribosomes that have these proteins (Remacha et al. 1995). Because the expression of these two acidic ribosomal protein genes are differently regulated from other ribosomal genes, and because they are expressed abundantly in gastrulae/neurula and tailbud embryos/larvae, respectively, these proteins may play a special role in regulating protein synthesis during embryonic development. The last major group of exceptions includes the mitochondrial ribosomal protein genes. These were mainly assigned into TE-groups 1, 5, 8 and 9, characterized by abundant expression in early embryos.
Signal transduction genes – TGFβ signaling genes Regarding signal transduction, it can be observed, by comparing panels (A) and (B) of Figure 4, that the genes for signal transduction machinery do not appear to be over- or under-represented in any of the TE-groups, suggesting that these genes are used for a vast array of different purposes. To understand how the expression of these genes is regulated and to learn what information can be gleaned from this data set, we focused on the transforming growth factor (TGF)β signaling system, one of the most important signal transduction systems in animal development. Recently, we were able to annotate all of the TGFβ-signaling molecules in the C. intestinalis draft genome (Dehal et al. 2002; Hino et al. 2003). As summarized in Table 3, the genes for 10 TGFβ ligands, three of which were not successfully assigned to any vertebrate counterparts, three TGFβ type-I receptors, two TGFβ type-II receptors and five Smad proteins are all encoded in the genome. Because not all of these cDNA clusters had EST counts of 10 or higher, we performed our analysis on the TE-groups to which these genes were assigned.
Table 3. Ascidian TGFβ signaling genes and their expression.
EG, fertilized eggs; CL, cleaving embryos; GN, gastrulae/neurulae; TB, tailbud embryos; LV, larvae; AD, young adults. *These genes did not show clear orthologies with any vertebrate TGF. †TE-groups put in parentheses were assigned to cDNA clusters with less than 10 EST.
TGFβ-ligands bind to a heterodimer of type-I and type-II receptors. The ascidian type-I receptors have not yet been confidently assigned to any vertebrate counterparts, while the ascidian type-II BMP receptor was not found in either the draft genome sequence or the present EST collection (Table 3). Therefore, it cannot be predicted from primary structures what combinations of type-I and type-II receptors are used, or which ligands are received by the receptors.
‘Digital northern’ data based on EST counts show possible combinations between type-I, type-II receptors, ligands, and intracellular smad proteins (Table 3). For example, only a combination of type-I receptor-c and type-II receptor-a is possible in tailbud embryos, while any combination of type-I-b/c and type-II-a/b is possible in young adults.
As receptor subunit partners are likely to be coordinately regulated, it is likely that they will be assigned to the same TE-group, as shown above. For example, BMP-2/4-related gene and type-I receptor-a gene as well as smad1/5 are all constituents of TE-group 5. Additionally, ADMP, lefty/antivin, putative BMP5/7, type-I receptor-c and typeII receptor-a are all in TE-group 9, which is characterized by prominent expression in gastrulae/neurulae. In spite of the lack of experimental evidence for functional links between these factors, these temporally coordinated genes are likely to be under the control of a single regulatory system, and may be functionally linked.
Taken another way, genes which share a well-coordinated temporal expression profile with a known pathway or system are likely to be involved in that system. For example, the gene corresponding to cluster 02118 displays an expression profile dramatically similar to both BMP-2/4-related and type-I receptor-a (gene expression coefficients are 0.997 and 0.944, respectively). Although this gene does not exhibit significant similarity with any known proteins, it is quite likely that it will be found to play a role in this BMP-pathway. Spatial expression data of the type presented in our previous studies (Nishikata et al. 2001; Satou et al. 2001; Fujiwara et al. 2002; Kusakabe et al. 2002; Ogasawara et al. 2002) as well as a functional assay for this gene will be necessary in order to confirm or deny this hypothesis.
The detailed analysis of developmental gene expression in Ciona presented here highlights several previously unrecognized features of the overall gene expression blueprint. The fertilized egg was found to possess the largest repertoire of different transcripts, though this widely varied mRNA population could either be the result of direct expression or storage of maternal transcripts. As embryogenesis proceeds, though, the complement of transcripts expressed becomes progressively smaller. After metamorphosis, a wide variety of genes seems to be used for construction and maintenance of the adult form. Many trends in gene expression were observed over the lifespan of the animal, including dynamic changes in cytoskeletal constituent gene regulation in young adults. A considerable number of genes appeared to be expressed only once during development, while at least 25% of the genes studied are used repeatedly.
In order to fully understand the development of any complex organism, it will be required to map out the activity of each gene in the genome in both a spatial and temporal manner. Many techniques, including microarray analysis, have been used to approach this problem (e.g. Arbeitman et al. 2002). Here, we present one such solution based on large-scale EST analysis, though we cannot present the full extent of our findings here due to the large amount of information collected. Because our EST collection contains nearly all of the genes involved in the development of C. intestinalis, our study provides a remarkably in-depth picture of global gene regulation during the development of this basal chordate. In addition, we have performed more than 1000 in situ hybridizations of various genes at each of five developmental stages (eggs, cleaving embryos, tailbud embryos, larvae and young adults). Together, this bulk of information makes C. intestinalis a premier system for comprehensive analysis of the control mechanisms which underpin developmental gene expression.
One of the major challenges in genomics is to predict the function of genes for which little information is available. This study may also be quite useful for this purpose. Predictions may be made regarding the function of an unannotated gene if it is found to share a temporal and/or spatial pattern of expression with a known group of genes. In addition, data of this type may be used to group previously unrelated genes into altogether novel systems. The basal chordate C. intestinalis thus provides an excellent experimental system with which to explore the complex nature of gene regulation in development.
We would like to thank Drs Yuji Kohara and Tadasu Sin-I for EST sequencing, and Chikako Imaizumi for technical assistance. This research was supported by Grants-in-Aid for Scientific Research from MEXT and JSPS, Japan to Y.S. and N.S. (14704070, 13044001 and 12202001). This research is also a part of CREST project, JST, Japan.