Since the molecular isolation of the Drosophila forkhead (fkh) gene (Weigel et al., 1989b), a large number of structurally related genes have been identified. All of these genes appear to encode transcription factors, which share a ∼110 amino acid domain with high sequence similarity that serves as a DNA-binding domain (reviewed in Kaufmann and Knöchel, 1996). This domain has been termed forkhead domain after the founding member of this protein family, and based upon its structural features, it is also called winged helix domain. The tertiary structures of forkhead domains complexed with DNA have been determined for the domains of the mouse proteins HNF-3γ (Foxa3) and genesis (Foxd3; reviewed in Gajiwala and Burley, 2000). These studies demonstrated the presence of three α-helical domains in the N-terminal portion and two looping “wings” in the C-terminal portion that are anchored by intervening β-sheets. Sequence-specific DNA contacts occur within helix 3, which binds within the major groove of the DNA as part of a helix-turn-helix structure, as well as within wing 2 at the C-terminus of the domain. Not surprisingly, helix 3 displays the highest degree of sequence conservation among forkhead domain family members.
In vitro binding site selection experiments and the comparison of known binding sites of several different forkhead domain proteins have identified a common 7-bp core recognition motif, 5′ (G/A)(T/C)(C/A)AA(C/T)A 3′, which is required but not sufficient for specific binding (Pierrou et al., 1994; Roux et al., 1995; Kaufmann et al., 1995; Perez-Sanchez et al., 2000; Biggs and Cavenee, 2001). Sequences flanking this core motif provide differential DNA-binding specificity, and the amino acid stretch between helix 2 and helix 3 of the forkhead domain appears to be important in dictating site-selective binding (Overdier et al., 1994). In addition, binding to DNA results in DNA bending (Pierrou et al., 1994; Gajiwala and Burley, 2000).
Forkhead domain encoding genes have been widely conserved during evolution and are found in species as distant as yeast and humans, although sequence conservation among gene products from distantly related species is largely restricted to the forkhead domains. Recent phylogenetic analysis has led to the classification of forkhead domain proteins from chordate species into several defined subclasses, termed FoxA through FoxQ (Kaestner et al., 2000; http://www.biology.pomona.edu/fox.html).
Genetic and other types of functional studies have uncovered the involvement of forkhead domain encoding genes in the regulation of a variety of developmental and differentiation processes, the control of metabolism and life span, and as effectors of signal transduction events (for recent references, see Chen et al., 1996; Ogg et al., 1997; Brunet et al., 2001a, b; Cederberg et al., 2001; Kos et al., 2001; Kume et al., 2001; Mahlapuu et al., 2001; Sasai et al., 2001; Topczewska et al., 2001; Zaffran et al., 2001). Moreover, forkhead domain genes were found to be affected in tumorigenesis and in several human congenital syndromes (Li and Vogt, 1993; Barr, 1997; Chatila et al., 2000; Crisponi et al., 2001; Karkkainen et al., 2001; Lai et al., 2001; Wildin et al., 2001).
To date, the functions of six Drosophila forkhead domain genes have been studied by genetic approaches. forkhead (fkh) itself acts to specify the identities of terminal embryonic regions, including prospective gut tissue, and to control salivary gland development (Weigel et al., 1989a, 1990; Andrew et al., 2000). The two sloppy paired genes slp1 and slp2 have functionally redundant roles in the segmentation of the ectoderm and mesoderm (Grossniklaus et al., 1992; Cadigan et al., 1994; Riechmann et al., 1997; Lee and Frasch, 2000) and in neuronal specification (Bhat et al., 2000). crocodile (croc) is involved in the establishment of anterior head structures (Häcker et al., 1995), while jumeaux (jumu, aka domina and Dwhn) is required for the generation of asymmetric sibling identities during neuronal differentiation and for normal eye and wing morphogenesis (Cheah et al., 2000; Strodicke et al., 2000; Sugimura et al., 2000). Based upon its activity as a suppressor of position-effect variegation, jumu appears to act in the modification of chromatin structure. Finally, biniou (bin) has been shown to be essential for the formation of visceral mesoderm and midgut musculature as well as normal gut morphogenesis (Zaffran et al., 2001). In addition to these six genes, for which mutations are available, four other genes have been isolated from Drosophila in a low stringency hybridization screen and analyzed with respect to their embryonic expression patterns (Häcker et al., 1992). Specifically, the FD2 gene was found to be expressed in the early posterior mesoderm and later in segmental mesodermal cell clusters, whereas FD3, FD4, and FD5 were observed in specific patterns within the developing central and perhaps peripheral nervous system.
The publication of the sequence of the euchromatic portion of the Drosophila melanogaster genome (Adams et al., 2000) has made it possible to complete the survey of forkhead encoding genes from this genetic model organism. We show herein that the Drosophila genome encodes a total of 17 forkhead domain genes. With four exceptions, the encoded proteins can be clearly assigned to one of the known subclasses of forkhead domain proteins that were based upon sequence relationships among forkhead domain sequences from chordate species. To provide some clues about their potential developmental functions, we present an overview of the embryonic expression patterns of seven previously uncharacterized members of this gene family.
Of the 17 Drosophila Forkhead Domain Genes, 13 Fall Into 10 Known Subclasses
Upon searching the genomic scaffold sequence of Drosophila melanogaster, we detected a total of 17 forkhead domain encoding genes in the genome. Six of them were named previously, based upon their mutant phenotype, while the remaining ones are named according to their predicted cytologic location on the chromosomes. All 17 genes are also listed in the Genome Annotation Database of Drosophila (http://www.fruitfly.org/annot/), although for several of them, automated annotation detected only partial forkhead domain sequences. In these cases, we used similarity searches in the genomic and expressed sequence tag (EST) sequence databases, the locations of putative splicing donor and acceptor sites, and reverse transcriptase-polymerase chain reaction (RT-PCR) to assemble putative full-length forkhead domain sequences. All 17 Drosophila forkhead domain sequences were used as queries in BLASTP searches for mouse orthologs. Four Drosophila sequences that did not yield orthologous mouse protein sequences were used in TBLASTN searches of the human and Caenorhabditis elegans genomic sequence databases, which identified a C. elegans ortholog for fd102C but no human orthologs for any of these.
Sequence comparisons with forkhead domains from the mouse and rat (Fig. 1) combined with phylogenetic analysis (Fig. 2) demonstrated that the forkhead domain subclasses A, B, C, D, F, G, K, N, O, and P are represented in the Drosophila genome. Unlike in vertebrates, most of the subclasses are represented by a single member in Drosophila, with the exception of subclasses B, G, and N, each of which has two members. In addition, Fd85E (FoxP) appears to produce two alternative splicing products, which differ in their encoded helix 3 and wing domains (Fig. 1). FoxE, FoxH, FoxI, FoxJ, FoxL, FoxM, and FoxQ appear not to be represented in the Drosophila genome, whereas the Drosophila genes fd3F, fd19B, fd64A, and fd102C cannot be easily grouped within any of the subclasses that are known in chordates. However, there is a close homolog of fd102C in the genome of C. elegans (Fig. 1).
Novel Embryonic Expression Patterns of Drosophila Forkhead Domain Genes
fd68A mRNA is detected at high levels in preblastoderm embryos, which indicates a prominent maternal contribution, and uniform mRNA distribution persists until embryonic stage 13 (Fig. 3a). At this stage, mRNA levels start declining in all tissues except for the central nervous system (CNS; Fig. 3b). From stage 14 onward, fd68A mRNA expression is restricted to the CNS and appears to be expressed in all cells of this tissue (Fig. 3c).
A partial cDNA sequence of Ches-1-like, which was isolated with degenerate primers designed against mouse Foxn1 (Whn), was reported previously (Schlake et al., 2000). Ches-1-like is more closely related to Foxn2 and Foxn3 (the mouse counterpart of human Ches1) than to Foxn1 (Fig. 1 and data not shown). The earliest expression of Ches-1-like occurs at stage 8 in the area of the head mesoderm that produces the progenitors of the hemocytes (Fig. 3d). Expression in these cells is very transient and ceases during delamination and migration of the hemocytes. During stage 10, there is a new pattern of expression, which initiates in all cells of the salivary gland placodes (Fig. 3e). At stage 11 and early 12, expression at these sites disappears first in the portions of the salivary gland primordia that have invaginated (Fig. 3f,g). At stage 11, mRNA expression is observed in the dorsal somatic mesoderm, and during stage 12, prominent segmental expression is seen throughout the dorsoventral extent of the somatic mesoderm, which gives rise to the body wall musculature (Fig. 3f,g). The expression in the somatic mesoderm declines between stage 13 and 14, when a new site of expression appears in the anterior portion of the hindgut (Fig. 3h,i).
Although fd88A has a strong maternal contribution (Fig. 3j), fd88A mRNAs completely disappear in blastoderm stages before cellularization (Fig. 3k) and remain undetectable until stage 11. During germ band retraction (stage 12), mRNA expression appears in the endoderm and ectoderm, with lower levels in ventral portions of the thoracic and in head segments except for the labrum, which is strongly positive (Fig. 3l,m). There is also a cluster of fd88A-expressing cells near the developing proventriculus, which may be associated with the stomatogastric nervous system (Fig. 3l, white arrow). Expression in the epidermis of the trunk segments and in the endoderm persists into later stages (Fig. 3n).
fd85E products appear to exist in two splicing isoforms, fd85Ea and fd85Eb, which differ in their third helix and wing domains (Fig. 1). A fd85Eb-specific probe detects mRNAs in the yolk cytoplasm from blastoderm stages until stage 13 (Fig. 3o,p, and data not shown). During late stage 12, expression starts also in the CNS, with the initial expression occurring in segmental clusters along the ventral midline. After stage 13, cells in the entire CNS become positive for fd85Eb mRNA (Fig. 3q). A probe for the fd85Ea isoform produces an identical staining pattern in the CNS, although the yolk cytoplasm is not stained (data not shown).
In preblastoderm stages, maternally supplied fd19B mRNA products are distributed uniformly throughout the embryo (Fig. 3r). However, in early syncytial blastoderm, maternal mRNAs disappear abruptly. At the same time, zygotic expression is initiated within an anterior domain between 0 and ∼30% egg length at higher levels than the previous maternal mRNA levels (Fig. 3s). Before cellularization, expression becomes restricted to a transverse stripe, which centers around 17% egg length. This stripe is also very transient and disappears before cellularization, with its longest persistence being observed along the dorsal circumference (Fig. 3t). During germ band elongation, weak expression is observed in a segmental pattern (Fig. 3u), while no expression is detected between stage 10 and the end of embryogenesis.
fd3F mRNA is contributed maternally and maternal and/or zygotic transcripts are uniformly distributed in embryos until stage 12 (Fig. 3v). During mid-stage 12, uniform levels of mRNA distribution decrease and a spatially restricted pattern of expression appears in one lateral and one ventral cluster of cells within each trunk hemisegment as well as in clusters within the embryonic head (Fig. 3w). After stage 12, expression occurs exclusively in these cell clusters which, based upon their position and arrangements, are likely to correspond to chordotonal sensory organs and their precursors (Fig. 3x).
fd102C mRNAs are expressed strictly zygotically. During blastoderm stages, starting just before cellularization, a small anterior domain of expression is observed between 0 and ∼8% egg length. This domain includes the anlagen of the pharynx and anterior portions of the procephalic neuroectoderm while the ventral-most cells of this region, which give rise to anterior midgut, are excluded (Fig. 3y). It appears that the descendants of the cells from this domain maintain fd102C expression during later stages as they undergo morphogenetic rearrangements and some of them delaminate to contribute to neurons of the CNS (Fig. 3z). However, we cannot exclude that additional cells also start to express fd102C during these later stages. During and after stage 14, fd102C expression is seen in a large number of neurons within the brain hemispheres and in dorsolateral pharyngeal areas (Fig. 3z′).
In this study, we have determined that the Drosophila genome encodes a total of 17 forkhead domain encoding genes and characterized 7 new members of this gene family. Additional diversity of forkhead domain sequences appears to be achieved by alternative splicing events. For example, in the case of fd85E, this finding results in isoforms that significantly differ in the domains conferring DNA-binding specificity. Of interest, a previous report identified two variants of fd3F cDNAs that only contain coding sequences for the first helix and lack those for the second and third helix as well as the wing domains (Rouyer et al., 1997). Therefore, the corresponding mRNA products, which oscillate in a circadian rhythm in adult brains, are not likely to encode DNA-binding protein isoforms. Whereas our in situ hybridizations with a probe specifically recognizing 3′ coding sequences that are required for DNA-binding did not address potential oscillations of the corresponding mRNA products, our RT-PCR data demonstrate that fd3F variants encoding a full forkhead domain are indeed made in embryonic stages. The significance of isoforms derived from partially spliced variants, which could potentially have dominant negative activities, remains to be explored.
The Drosophila set of forkhead domain genes represents 10 of the 17 subclasses of such genes that are found in chordates (Kaestner et al., 2000). Most of the subclasses are represented only by a single gene in Drosophila. However, there are two members in each of the FoxB and FoxG subclasses, which appear to be derived from relatively recent tandem duplication events. These duplications have apparently preserved much of the regulation and, at least in the case of slp, also the genetic function of each of the “twin genes.” By contrast, the more strongly diverged sequences and completely different expression patterns of the two members of the FoxN subclass, jumu and Ches-1-like, as well as phylogenetic analysis, suggest that their duplication occurred before the subdivision of the Ecdysozoan and Deuterostomian lineages.
Four of the Drosophila forkhead domain genes do not fall into any of the known subclasses. However, fd102C has an ortholog in C. elegans, which suggests that it is a member of an additional subclass that was lost in an ancestor of the chordate lineage. It is possible that the other three genes have diverged from one of the known subclasses. Indeed, based upon its affinities with slp in terms of sequence and expression patterns, fd19B is likely a FoxG derivative.
In addition to the forkhead domain sequences, aspects of the expression patterns have been evolutionarily conserved (Table 1). Specifically, genes of the subclasses FoxB, FoxD, FoxG, and FoxK are expressed in the CNS in both Drosophila and vertebrate embryos. Likewise, the expression of FoxC and FoxF genes includes homologous tissues in the mesoderm of fly and vertebrate embryos (Table 1; Zaffran et al., 2001). Furthermore, it has been suggested that the Drosophila hindgut and the notochord in chordates, both of which express FoxA genes, are homologous structures (Kispert et al., 1994). The early somitic expression of mouse Foxn3 (Tribioli et al., 2002; therein called Foxn2) is reminiscent of the somatic mesodermal expression of Ches-1-like, whereas the presence of human FoxN2 in T-cells (Li et al., 1992) may reflect the expression of Ches-1-like in hematopoietic progenitor cells. However, for genes of the subclasses O and P, similarities in their expression patterns are less obvious. Finally, it will be interesting to determine whether Drosophila FoxO (fd88A) is a component of the Akt pathway during insulin signaling and the regulation of apoptosis, as are its vertebrate and C. elegans orthologs (Ogg et al., 1997; Brunet et al., 2001a).
Table 1. Chromosomal Map Positions and Expression Patternsa
Summary of chromosomal map positions and expression patterns of Drosophila forkhead domains and the major tissues of expression of their mouse orthologs. For more complete descriptions of previously published expression patterns, see references listed in the text and at http://www.informatics.jax.org/menus/expression_menu.shtml. CNS, central nervous system; PNS, peripheral nervous system.
Maternal, early uniform, later in chordotonal sensory precursors/organs
Maternal; blastoderm: Anterior domain; germ band elong.: Segmental
Blastoderm: Termini; germ band elong.: Segmental; post. visceral mesoderm; somatic muscle progenitors
Blastoderm: Small anterior domain; late: brain hemispheres, pharynx
The absence of representatives from seven known vertebrate forkhead domain subclasses, E, H, I, J, L, M, and Q, in the Drosophila genome suggests that the corresponding functions of these types of genes are either not needed in this species or that they are supplied by unrelated genes. In particular, the absence of FoxH (FAST-1 and -2)-related genes, which are implicated in TGF-β signaling (Whitman, 1998), argues for the existence of alternative mechanisms or components of TGF-β signal transduction.
Interestingly, two of the newly characterized genes, fd19B and fd102C, are expressed in nested domains within the prospective head region of the early embryo. Hence, it is possible that they function as “head gap genes” during early segmentation and respond to different concentrations of the morphogen Bicoid, as was postulated previously (Driever and Nüsslein-Volhard, 1988). In particular, the maternal and zygotic pattern of fd19B expression is reminiscent of that of the zinc-finger encoding gap gene hunchback (hb; Tautz et al., 1987). These presumptive gap genes may have been missed in previous genetic screens for embryonic patterning mutants, because they map to the X and fourth chromosome, respectively, which were not covered as exhaustively by genetic screens. The developmental roles of these and other newly characterized genes can now be studied by means of reverse genetics, which we have initiated for the mesodermally expressed genes biniou, fd64A, and Ches-1-like (Zaffran et al., 2001, and unpublished data). Together with the previously completed surveys of Drosophila basic helix-loop-helix genes (Moore et al., 2000; Peyrefitte et al., 2001), our present survey of Drosophila forkhead domain genes provides important insights into the potential contributions of transcription factors that are encoded by large gene families to embryonic patterning and tissue development.
Database Search and Phylogenetic Analysis
TBLASTN searches of the published Drosophila melanogaster scaffold sequence (Adams et al., 2000) and Drosophila EST database (Rubin et al., 2000) were performed with different forkhead domain query sequences from Drosophila and vertebrates. Sequences that were interrupted by introns were re-examined using the most closely related forkhead domain sequence as a query sequence, and based upon these alignments, the presence of splicing donor and acceptor sites was verified in the genomic sequence. If a donor or acceptor site was not present at the predicted position, the transition between exons was verified by RT-PCR and sequencing. All 17 Drosophila forkhead domain sequences were used as queries in BLASTP searches for mouse orthologs. Four Drosophila sequences that did not yield orthologous mouse protein sequences were used in TBLASTN searches of the human and C. elegans genomic sequence databases, which identified a C. elegans ortholog for fd102C but no human orthologs for any of these.
ClustalX (Jeannmougin et al., 1998) was used for sequence alignments and to draw N-J trees, using 1,000 bootstrap trials and a random number generator seed value of 100. The resulting tree was drawn with the Treeview application (Page, 1996).
Genomic DNA and cDNA Isolation
Genomic sequences spanning forkhead domain encoding regions were amplified by PCR from genomic DNA of a w1118 strain and cloned into the pCR II vector (Invitrogen). PCR products included the following sequences: fd3F, a 816-bp fragment from nucleotide (nt) 14483 to 15298 of GenBank entry AE003429[gi:10728454]; fd19B, 838 bp from nt 134764 to 133927 of AE002611[gi:10729259]; fd85E alternative exon A (plus part of preceding common exon), 881 bp from nt 22260 to 21380 of AE003684 [gi:10726416]; fd85E alternative exon B, 579 bp from nt 21124 to 20546 of AE003684 [gi:10726416]; Fd102C, 1696 bp from nt 63883 to 65578 of AE003847 [gi:10728143]; Ches-1-like, 1244 bp from nt 173852 to 172609 of AE003441 [gi:7290819]. Primer sequences are available upon request. For fd88A, probes were made from EST clones LD05569 and LD45950, and for fd68A, from EST clone LD16137.
For fd3F, in which the predicted helix 2 and helix 3 coding sequences are separated by a large intron, embryonic mRNA was reverse transcribed with SMART RACE (BD Biosciences Clontech) and a fragment spanning both portions of the forkhead domain coding sequence was PCR amplified and sequenced.
In Situ Hybridization
In situ hybridizations were done as described by O'Neill and Bier (1994), for digoxigenin-conjugated antisense riboprobes.
NOTE ADDED IN PROOF
Several recent reports provide support for an involvement of fd88a (Foxo) in the insulin receptor pathway, including Junger et al. (2003), Kramer et al. (2003), and Puig et al. (2003).