The specification and development of organs depend on the localized expression of specific classes of transcription factors. One such class is that of the family of tsh-related Zn-finger (tshz) transcription factors. The Drosophila teashirt (tsh) gene (Fasano et al.,1991), the founding member of the family, has been shown to play critical roles during the development of the fly embryo and adult: tsh is required to pattern the midgut (Mathies et al.,1994) and to specify the identity of the trunk ectoderm in the embryo (Roder et al.,1992). tsh is also required for the establishment of the proximodistal axis of adult appendages, such as wings and legs (Abu-Shaar and Mann,1998; Erkner et al.,1999; Azpiazu and Morata,2000; Casares and Mann,2000; Wu and Cohen,2000,2002) and the specification of the eye (Pan and Rubin,1998; Singh et al.,2004; Bessa and Casares,2005). More recently, it has been shown that the tsh paralogue, tiptop (tio), albeit dispensable for Drosophila development, is functionally equivalent to tsh and can partially compensate its loss in tsh mutants (Laugier et al.,2005; Bessa et al.,2009; Datta et al.,2009). tshz genes have been also identified in vertebrates. At least, three tshz have been reported in mouse, chicken, and Xenopus (tshz1-3; Caubit et al.,2000; Long et al.,2001; Manfroid et al.,2004,2006; Koebernick et al.,2006; Onai et al.,2007) and one in zebrafish (tshz1; Wang et al.,2007). Molecularly, invertebrate and vertebrate tshz genes show similarity in their three first and widely spaced Zn-finger domains. Vertebrate tshz's show additional Zn-fingers and a vertebrate-specific homeodomain (Koebernick et al.,2006; Onai et al.,2007). Despite these molecular differences, the three murine tshz genes can function as the endogenous tsh when expressed in Drosophila (Manfroid et al.,2004). These various studies show that vertebrate tshz genes are expressed also with dynamic and complex patterns in many developing organs, including the central nervous system, mesodermal derivatives (somites, pronephros), limbs, and branchial arches. However, the functional characterization of this gene family in vertebrates is still fragmentary. During early Xenopus development, tshz3 is required for dorso–ventral axis formation (Onai et al.,2007), while tshz3 knock-out mice show ureteral defects (Caubit et al.,2008).
In this study, we have further investigated the evolution of the tshz gene family and made use of the zebrafish to explore the developmental expression of two of its members, zebrafish tshz2 and tshz3b.
RESULTS AND DISCUSSION
Phylogenetic Analyses of tshz Genes
To establish a comprehensive phylogeny of the tshz gene family, a total of 119 nucleotide sequences from insects, cephalochordates, urochordates, and vertebrates were used for phylogenetic analyses (see the Experimental Procedures section). Because there is no multiple sequence alignment scheme that outperforms the rest in producing reliable phylogenetic trees (Essoussi et al.,2008), the possible effect on sequence relationship inferences of the use of a given alignment scheme must be taken into consideration. Therefore, Supp. Fig. S1A shows the consensus tree of the four consensus Bayesian trees obtained with four different alignment procedures (Supp. Fig. S1, which is available online). Supplementary Fig. S1B shows the consensus Bayesian tree obtained when using the alignment that produced the largest number of filtered informative positions (the alignment generated with M-Coffee), here taken as a measure of the quality of the alignment. It should be pointed out that, as suggested by Notredame and coworkers (2000), only aligned amino acid positions with a high degree of confidence were used in the phylogenetic analyses (see the Experimental Procedures section). Zinc-finger domain positions represent approximately 45% of the positions used in the phylogenetic analyses. An abridged phylogeny of the tshz genes is shown in Figure 1.
All Dipteran species (Anopheles, Aedes, and Culex), but Drosophila, show one tshz gene (Fig. 1). Coleoptera (Tribolium), Hymenoptera (Nasonia), and Hemiptera (Acyrthosiphon) species also show a single tshz gene (Fig. 1). The evolutionary scenario put forward by Shippy and colleagues (Shippy et al.,2008) of a relatively recent gene duplication in the lineage leading to Drosophila seems the most logical explanation for such a pattern. Nevertheless, when three out of four alignment schemes are used, the retrieved phylogenies suggest one old gene duplication before the divergence of the Diptera, Hymenoptera, Coleoptera, and Neoptera, followed by multiple independent losses (Fig. 1 and Supp. Fig. S1). Taking into consideration the difficulties in getting a credible alignment when using highly divergent sequences, at present, the scenario put forward by (Shippy et al.,2008) should still be viewed as the most likely hypothesis.
In all mammals and reptiles, there are always three tshz genes. Furthermore, we found a single tshz gene in Urochordata and Cephalocordata species (Fig. 1) as well as in lamprey (this sequence was not used in the phylogenetic analyses because it is very incomplete; data not shown). Two rounds of genome duplication, leading to four tshz genes in a common ancestor of jawed vertebrates, one of which was lost before the diversification of the group, could thus explain the data. The hypothesis of two rounds of genome duplication (2R theory) was first advanced to explain the presence of four Hox clusters in mammals (Sidow,1996; Furlong and Holland,2002; Larhammar et al.,2002). Recent analyses based on a large number of genes have provided further support to the 2R theory (Sundstrom et al.,2008). The analyses presented in Figure 1 suggest that the first genome duplication generated a tshz1-2 and a tshz3–4 genes. The second round of duplication led to tshz1 and tshz2, as well as tshz3 and tshz4 (that was lost). However, in the absence of additional data (like the presence of tshz4 in some species but not in others) partial genome duplications, rather than whole genome duplications, could as well explain the tshz gene number in the different species. It should be noted that the phylogenetic position of urochordata and cephalocordata tshz sequences is unexpected but is not well supported (Fig. 1 and Supp. Fig. S1). Two different tshz1 genes are reported in GenBank for Xenopus laevis. Taking into consideration this information, there are four tshz genes in amphibians. It is unclear whether both tshz1 genes are functional.
In all fish species analyzed but Danio rerio, there are five tshz genes. This observation is in agreement with an extra genome duplication event in the fish lineage and subsequent loss of one gene (3R hypothesis; see Sundstrom et al.,2008, for results based on a large number of genes that support this hypothesis).
Identification of the tshz1a, 1b, 2, 3a, and 3b Zebrafish Genes
As a step toward identifying the full complement of zebrafish (Danio rerio) tshz genes and their evolutionary relationships, we used the phylogenetic information given in Figure 1 plus synteny information. Although five tshz genes are always observed in fish, we found only four genes in zebrafish (tshz1a, tshz2, and two tshz3 genes). In the phylogenetic analyses shown in Figure 1, the two zebrafish tshz3 genes cluster with the tshz3a gene from other fishes. Nevertheless, in Oryzias latipes, Gasterosteus aculeatus, and Takifugu rubripes the tshz3b gene is flanked by the ZNF536 and ZNF507 genes. One of the two zebrafish's tshz3 genes is flanked by gene ZNF536 and LOC560119 (there is no similarity between LOC560119 and ZNF507). Due to its proximity to ZNF536, we labeled that sequence as tshz3b (Fig. 1). In O. latipes, G. aculeatus, and T. rubripes tshz3a is flanked by the gene coding for the RPB5-mediating protein and a novel gene of unknown function. The novel gene shows approximately 80% nucleotide similarity in the three fish species thus it is very likely the same gene. In ENSEMBL there is no gene annotated as orthologous of this novel gene in zebrafish, as there is no tshz3 gene annotated in the vicinity of the gene coding for the RPB5-mediating protein (zgc:110109). However, there are many tshz expressed sequence tags (ESTs) from other species that map to the region where the tshz3a gene should be. Thus, we named that novel gene tshz3a. Finally, no tshz1b gene was found in the zebrafish. In O. latipes, G. aculeatus, and T. rubripes the following genes can be found in the vicinity of tshz1b: Gal1-R, MBP, ZNF236, ZNF516, (tshz1b), ZADH2, HAS2, ZHX2, and DERL1. However, there is a gap in the zebrafish genome sequence (estimated to be approximately 400 kb) between genes MBP and HAS2. Therefore, this gap is the likely reason why tshz1b could not be found in zebrafish, but we predict that due to its syntenic conservation in other teleosts, the zebrafish tshz1b should be found in the same genomic region.
Zebrafish tshz2 and tshz3b RNA Expression
In the zebrafish, the expression of only one tshz gene has been described to date, tshz1 (renamed here tshz1a; Wang et al.,2007; Fig. 1). Here, we have studied in detail the expression of the tshz2 and tshz3 genes. While we could not detect any in situ hybridization signal using tshz3a probes at any time between 10 and 72 hours postfertilization (hpf; not shown), tshz2 and tshz3b are expressed dynamically in several tissues during development (Figs. 2–5). We detect maternal tshz2 and tshz3b mRNA in early embryos (2–8 hpf; Figs. 2, 3). tshz2 expression disappears around 10 hpf and reappears in 24 hpf embryos (Fig. 2E,F and Supp. Fig. S2A). At 24 hpf, tshz2 is expressed in pectoral fin buds, liver, olfactory placodes, hindbrain, and spinal cord (Fig. 2E,F and Supp. Fig. S2A). At this stage, the rostral limit in the hindbrain is difficult to define, as expression is weak (Fig. 2E,F and Supp. Fig. S2A), while expression increases in more posterior rhombomeres (Fig. 2E and sections in Supp. Fig. S2A). At spinal chord level, tshz2 expression is restricted to a lateral cell cluster (Fig. 2E and Supp. Fig. S2A). At 36 hpf, expression is conspicuous in the olfactory placodes and hindbrain, and fades in the spinal cord (Fig. 2G,H and Supp. Fig. S3A,B). tshz2 is also expressed in branchial arches (ba) 4 and 5 at 24 and 36 hpf, although faint and transient expression in more anterior branchial arches can be detected at 24 hpf (Fig. 2F–H; Supp. Figs. S2B, S3B).
At 10 hpf stage, tshz3b is strongly expressed in the hindbrain and posterior neural tube and also in the lateral mesoderm (Fig. 3E–H). In embryos doubly stained for pax6b and tshz3b, the rostral limit of tshz3b in the hindbrain lies posterior to pax6b (Fig. 3E,E′). Because pax6b rostral limit is rhombomere 1 (Puschel et al.,1992), tshz3b expression does not extent up to the anterior limit of the hindbrain. In the neural tube, tshz3b is expressed dorsally in the anterior part of its domain but extends along the whole dorsoventral extent more posteriorly (Fig. 3F–H). At 19 hpf, neural expression is detected in the olfactory placodes, midbrain, hindbrain, and in the spinal cord (Fig. 3I). In addition, tshz3b expression is detected in the tail bud and in the nascent somites (Fig. 3I). We do not detect variations among similar stage individuals in their tail bud mesoderm expression, suggesting this expression pattern is not related to the segmentation clock in any obvious way. By 24 hpf, tshz3b expression fades from all domains except for the olfactory placodes and hindbrain (Fig. 3J,K). Colabeling with egr2/krox20 at this stage confirms that tshz3b expression extends from r3 to r6 (Fig. 3L). Of interest, at these early stages, the expression of tshz3b and tshz1a abut each other in the hindbrain (which we have confirmed in double in situ experiments; not shown), with tshz3b spanning r3–6 and tshz1a starting at r7 (Wang et al.,2007). At 36 hpf, tshz3b is expressed in the forebrain, midbrain (tectum opticum), and whole extent of the hindbrain, and in the pectoral fin buds (Fig. 3M,N and Supp. Fig. S4). Of interest, the neural tube expression of tshz3b changes dramatically from 24 to 48 hpf. Strong expression is gained in the tectum opticum at 48 hpf (compare Figs. 4I and 5I). In contrast, tshz3b expression is strong in ventral–lateral regions of the hindbrain at 24 hpf (Fig. 4P) and 36 hpf (Supp. Fig. S4A), but this expression becomes much fainter at 48 hpf (Fig. 5P). Expression of tshz3b in branchial arches is first detected at 36 hpf in branchial arches 4 and 5 (Fig. 3M,N and Supp. Fig. S4B).
tshz, meis, and pax6a Are Transiently Coexpressed in the Hindbrain and in the Retina
In Drosophila, tshz, meis, and pax6 homologues (teashirt, homothorax, and eyeless, respectively), have been shown to be functionally related during eye development (Bessa et al.,2002; Peng et al.,2009). The vertebrate homologues, the meis and pax6 genes, have been extensively studied in vertebrates but a functional relation between them and tshz has not been described so far. In the zebrafish meis and pax6 genes are known to be expressed in, and required for the development of specific regions of the neural tube (Waskiewicz et al.,2001; Choe et al.,2002; Kleinjan et al.,2008; Stedman et al.,2009). To determine whether the domains of expression of tshz2 and tshz3b overlapped the expression patterns of any of the meis or pax6 genes, we compared their expression patterns in sections of 24 hpf and 48 hpf zebrafish embryos by in situ hybridization using specific RNA probes.
At 24 hpf, tshz2 and tshz3b share expression with meis1, 2.1, and 2.2 in the olfactory placodes (Fig. 4A–G). In addition, tshz2 and tshz3b, meis1, 2.1, and 2.2, and pax6a and 6b are expressed in the hindbrain (Fig. 4A–G). However, the precise patterns of expression differ, and overlap in distinct domains. tshz2 and 3b are strongly expressed in ventral/lateral regions of the hindbrain where meis2.1 and 2.2 are also expressed (Fig. 4O–U). In addition, tshz2 is expressed along the dorso–ventral axis of the neural tube, where it overlaps the expression of pax6a, which labels more strongly the medial zone (Fig. 4). At 48 hpf, tshz2 is expressed in the ganglion cell and inner nuclear layers of the neural retina (Fig. 5H). meis2.1 and meis2.2 are expressed in the inner nuclear layer, while pax6b (and more weakly pax6a) is expressed in the ganglion cell and inner nuclear layers (Fig. 5K–N). However, this tshz2 expression pattern is transient, as we do not detect any neural expression in 72 hpf embryos (not shown). tshz3b expression in the tectum opticum is reminiscent of that of meis1 (Fig. 5I,J). In this region, it is likely that its expression overlaps with meis2.1 and 2.2 as well (Fig. 5K,L). Of interest, in more posterior regions of the neural tube (medulla oblongata) tshz2 and tshz3b expressions are more reminiscent of the expression of meis2.1 than that of meis1 (Fig. 5O–R). Also, we have noted that at 36 hpf, meis2.1 and meis2.2 show expression in branchial arches 1–5 similar to that of tshz2 and tshz3b (not shown). This gene expression data indicates that tshz2 and tshz3b are likely coexpressed with meis and pax6 genes in different subdomains of the cephalic region of the nervous system, which might allow functional interactions among these genes.
A comprehensive phylogenetic analysis of the family, including invertebrate and vertebrate sequences, uncovered four tshz genes in the zebrafish genome: tshz1a, tshz2, tshz3a, and tshz3b, and predicts the existence of a fifth, tshz1b. Of these, we investigated in detail the developmental expression of tshz2 and tshz3b compared their patterns with those of meis and pax6 genes. At 10 hpf, tshz3b is expressed in the prospective hindbrain and spinal cord and in the mesoderm, while tshz2 is not expressed. By 24 hpf, tshz3b is predominantly expressed in the hindbrain (rhombomeres 3 to 6) and in the olfactory placode and tshz2 is expressed in the hindbrain and olfactory placode and also in the spinal cord, branchial arches, pectoral fin buds, and liver. At later stages, new domains of tshz3b expression are added in the tectum opticum, the branchial arches and the fin buds, while tshz2 expression disappears from the spinal cord but appears in the tectum opticum and in the neural retina. When the expressions of tshz1a (Wang et al.,2007), tshz2, and tshz3b (this report) are globally compared, they show a significant degree of overlap. After 36–48 hpf, these genes are expressed in the olfactory placodes, in the hindbrain and in the tectum opticum (this latter domain only shared by tshz1a and tshz3b). However, the developmental time at which each of these genes starts being expressed in these domains differ. In addition, the rostral limit of expression within the hindbrain also differs: That of tshz3b lies in rhombomere 3 while tshz1 starts in rhombomere 7. Also, these three paralogues are expressed in the paryngeal arches, but in different ones. While tshz1a is detected in the first pharyngeal arch, tshz2 and tshz3b are expressed in the posterior branchial arches. Therefore, tshz1a, tshz2, and tshz3b share some domains of expression, reflecting their paralogy. However, they also show significant spatial and temporal subfunctionalization, the true extent of which will only be uncovered by detailed comparative expression and functional analyses. In addition, we show that tshz2 and pax6a are coexpressed in the hindbrain at 24 hpf stage and in the neural retina at 48 hpf stage of development. When analyzed jointly, the expression patterns of tshz, meis, and pax6 gene families define a complex set of coexpression domains in the developing zebrafish brain where their gene products have the potential to interact.
In this work, we used 119 nucleotide tshz sequences from insects, cephalochordates, urochordates, and chordates, that are annotated as such in public databases or have significant similarity by BLAST (E-value <0.05) with known tshz sequences. This set of sequences represents a compromise between sequence inclusion and loss of information, because the inclusion of very incomplete sequences, or sequences showing large deletions produce many sites showing alignment gaps that are not considered when using Bayesian methods of phylogenetic reconstruction (Ronquist and Huelsenbeck,2003), as here performed. Accession numbers are listed in Supp. Table S1.
Establishing the relationship of divergent sequences, such as the tshz sequences here considered, can be difficult because different multiple sequence alignment (MSA) algorithms can produce different alignments. Recently Essoussi and colleagues (Essoussi et al.,2008) have shown that there is no single MSA tool that consistently outperforms the rest in producing reliable phylogenetic trees. Furthermore, Golubchik et al. (2007) showed that the absence of amino acid residues often leads to an incorrect placement of gaps in the alignments, even when the sequences were otherwise identical. Therefore, MSA algorithms must perform worst when sequences differ in size, as it is often the case, when considering divergent sequences. Moreover, for a given alignment, not all amino acid positions will be aligned with equal confidence. Therefore, when using divergent sequences, it is advisable to use more than one alignment algorithm, as well as only the amino acid positions that are well supported, and compare the results, as here performed and next described.
Nucleotide sequences were translated and aligned at the amino acid level. The resulting alignment was used as a guide to produce the corresponding nucleotide alignment. The multiple alignment algorithms implemented in the following software were used: ClustalW (Thompson et al.,2002), M-Coffee (Thompson et al.,2002), T-coffee (Notredame et al.,2000), and Muscle (Edgar,2004). Furthermore, as suggested by Notredame et al. (2000), we used only aligned amino acid positions with a score greater than 3. The number of aligned informative positions with a score greater than 3 obtained using ClustalW, M-coffee, T-coffee, and Muscle was, respectively, 107, 143, 121, and 87. It should be noted that the smaller sets of informative positions are not necessarily a subset of a larger set.
The synteny information was obtained using Ensyntex (Fonseca et al.,2009), a Web-based application that allows the exploration of microsynteny in the regions surrounding a set of gene identifiers provided by the user. The dimension of the region and number of genes considered was, respectively, 500 kb and 5 genes. Note that Ensyntex relies on Ensembl gene annotation (release 53) but it does not use the synteny regions defined by Ensembl (available only for some organisms).
AB wild-type embryos were used. Embryonic stages are given as hours postfertilization (hpf) at 28.5°C and, for early stages, referred to number of somites.
Probe Preparation, In Situ Hybridization, and Immunolabeling
tshz2 (IMAGp998k1511982Q), tshz3a (IMAGp998E1214819Q), and tshz3b (LLKMp964D0651Q) cDNAs were obtained from ImaGenes GmbH. Antisense RNA probes were prepared from cDNAs using digoxigenin as label. The embryos were fixed, hybridized, sectioned, and stained as described in (Tena et al., 2007).
We thank J.L. Gómez-Skarmeta and J.R. Martínez-Morales for comments on the manuscript. F.C. was funded by grants from the Spanish Ministry of Science and Innovation and Consolider “From Genes to Shape,” of which F.C. is a participant researcher. J.S.S. was funded by Fundação para a Ciência e a Tecnologia (Portugal) and is a GABBA PhD Program fellow of Universidade do Porto.