The combination of optical clarity of its embryos, their rapid development, and fecundity have made the zebrafish the most tractable vertebrate in which to take a genetic approach to development (Kimmel, 1989; Driever et al., 1996; Haffter et al., 1996). A powerful tool that has boosted research in any of the major model organisms is transgenesis—the ability to insert foreign DNA into the genome of one's favorite laboratory creature. This procedure is most often done for the purpose of expressing specific gene products in a controlled manner, for example, the tissue-specific expression of “reporter” genes such as β-galactosidase (lacZ) or green fluorescent protein (GFP) to mark specific tissues. In zebrafish, such transgenesis is a routine undertaking, and it is now fairly simple to express either reporter or bioactive gene products in a desired tissue-specific manner (Shafizadeh et al., 2002). However, the high throughput nature of zebrafish allows another use of transgenesis: rather than controlling expression of a specific gene in a predetermined way, it is possible to use transgenes as a screening tool to probe and manipulate the genome. The two most prominent examples of this approach are insertional mutagenesis and gene trapping/enhancer detection (Fig. 1).
Insertional mutagenesis is an alternative to chemical mutagenesis whereby the insertion of endogenous DNA of known sequence into the genome is used as the mutagenic agent (Fig. 1). This strategy eliminates the laborious task of positional cloning (necessary for chemical mutagenesis, which generally causes point mutations), as the genomic DNA flanking the inserted element can be easily cloned and sequenced, greatly simplifying the search for the mutated gene.
Gene traps and enhancer detection vectors are transgenes that will express a reporter gene under the regulatory control of the gene into (or near) which they insert. Thus, animals carrying such insertions (trap lines) can provide both information about the spatiotemporal regulation of individual or groups of genes and serve as lines in which different cell types, tissues, or even chromosomes, are visibly marked. Enhancer detection vectors contain a basal (or low-activity) promoter upstream of the reporter gene, which should be susceptible to the influence of enhancer elements near where the vector inserts (Bellen et al., 1989; Gossler et al., 1989; Fig. 1). Because some enhancer elements can operate at a great distance (e.g., Spitz et al., 2003), this feature may in some cases allow for a fairly large target range within the genome for an insertion to be activated by such enhancers. Gene traps, on the other hand, lack a promoter; instead they require insertion into a transcribed gene, so that they are expressed from that gene's own promoter (Friedrich and Soriano, 1991; Skarnes et al., 1992; Fig. 1). Most commonly, gene traps include a splice acceptor upstream of the reporter gene, so that the transgene will be spliced into the endogenous message upon inserting into an intron of a gene; this method will create a fusion transcript, which can potentially encode the reporter gene product. Thus, while enhancer detection may be more efficient, as there should be a larger target size for reporter activation, gene traps have the advantage that it will often be easier to unambiguously identify the trapped gene and are more likely to be mutagenic.
Each of these transgenesis applications requires the ability to make very large numbers of transgenic lines, as well as an efficient way to detect and recover lines carrying the desired insertions. For example, most insertions will not disrupt any gene, let alone one whose mutation leads to a detectable phenotype (Gaiano et al., 1996a). For an insertional mutagenesis screen, one must both make a large number of insertions, and have a way to identify those which cause phenotypes. Similarly, the power of either gene traps or enhancer detection events is in generating a large number of them, so that lines exist with a great diversity of expression patterns, and expression information is gathered for many genes. In recent years, a variety of techniques have been brought to fruition to achieve these goals. These large-scale approaches will likely have, in combination with the zebrafish genome sequence and genome-wide gene expression studies, a substantial impact on the understanding of vertebrate, including human, development, disease susceptibility, and evolution.
Injection of either plasmid DNA or bacterial artificial chromosomes (BACs) into the cytoplasm of one-cell stage embryos (Stuart et al., 1988; Culp et al., 1991; Jessen et al., 1999) remains the most frequently used method for making transgenic lines to express a gene of interest, such as GFP (Amsterdam et al., 1995). Furthermore, BAC injection is the only method amenable to delivering very large constructs, which may be required to include all of the cis-acting elements of a given gene and faithfully reflect its endogenous expression (Jessen et al., 1998, 1999). However, the efficiency of producing germ line transgenics by this methods is generally low, approximately 5–20% (Stuart et al., 1988; Culp et al., 1991). The coinjection of meganuclease restriction enzyme Sce1 with plasmid DNA has been reported to increase this transgenesis rate (Thermes et al., 2002) but does not achieve the rates required for insertional mutagenesis or production of a large number of gene trap lines. Furthermore, these goals are best achieved with insertional elements that will integrate into the genome as a single copy of the vector at any given insertion site without causing chromosomal rearrangements or deletions. Transgenes generated by DNA microinjection often integrate as concatemers of many tandem copies (Stuart et al., 1988) and may cause chromosomal rearrangements; in fact, the only reported insertional mutation generated by DNA microinjection likely caused a chromosomal rearrangement resulting in the mutation of more than one gene (Cretekos and Grunwald, 1999).
Retroviruses and transposons are agents specialized in integrating into the host genome and lend themselves to these goals, as they integrate efficiently without causing rearrangements. Transposon-based vectors have been used extensively for mutagenesis in invertebrates (Cooley et al., 1988; Zwaal et al., 1993; Anderson, 1995; Bellen et al., 2004), and retroviral vectors have been used to generate thousands of gene trap lines in murine embryonic stem cells (Wiles et al., 2001; Stanford et al., 2001), many of which prove to be mutagenic when mice are produced from these lines and bred to homozygosity. Thus provided that they produce transgenic lines in zebrafish efficiently enough, these vectors are likely to be ideal for both the goals of gene trapping and insertional mutagenesis.
Retroviruses infect a cell by means of an interaction with a protein in their envelope membrane, and a molecule (usually a protein) on the membrane of the target cell. Once inside the target cell, the RNA genome is reverse-transcribed, and the resulting DNA, called a provirus, integrates into the host genome. Several proteins encoded by the virus are required for infection, including the envelope, reverse transcriptase, and integrase proteins. Retroviral vectors have a genome that includes the cis-acting sequences necessary to promote their reverse transcription and integration but do not encode the viral proteins. These viruses are produced in mammalian cell lines in which the viral proteins are supplied in trans. Thus, the provirus integrates into the DNA of infected cells but cannot replicate to produce additional virus. Retroviral vectors can be engineered to express reporter genes in place of the viral proteins.
The only retroviral vector system currently in use in zebrafish uses a genome and two genes derived from the Moloney murine leukemia virus (MoMuLV) and the envelope glycoprotein (G) from vesicular stomatitis virus (VSV). As a well-studied retrovirus, MoMuLV is a very good vector system for making insertions, but its own envelope protein restricts infection to mammalian cells because it interacts with a species-specific cell surface protein. This problem can be circumvented by a procedure called pseudotyping, essentially the replacement of the envelope protein with another one conferring a different host range. VSV has an extensive host range, likely because its envelope protein (VSV-G) interacts with phospholipids on the target cell membrane rather than a specific protein (Schlegel et al., 1983). Although VSV is not a retrovirus, and, thus, would not be useful for generating insertions, by pseudotyping a MoMuLV vector with VSV-G, the retrovirus can now infect cells from a wide variety of species, including zebrafish (Burns et al., 1993). An additional feature provided by the VSV-G protein is increased stability of viral particles, allowing to increase the titer of the virus (i.e., the number of infectious virus particles in a given volume) by up to 1,000-fold through ultracentrifugation (Burns et al., 1993; Yee et al., 1994). This is of critical importance in the efficient generation of zebrafish germ line transgenics (Kawakami et al., 2005).
When VSV-G pseudotyped retrovirus is injected among the cells of a midblastula-stage zebrafish embryo (500–2,000 cells), many of the cells become independently infected, producing a mosaic organism in which different cells harbor proviral insertions at different chromosomal sites. However, for the purposes of making stable transgenics, the only relevant cells are those that will give rise to the germ line. At the time of injection, there are four cells destined to become the germ line (primordial germ cells, PGCs), and this population will increase to approximately 30 over the time course of infection (Yoon et al., 1997). Integration events in these PGCs will be passed on to the progeny of the injected fish (Lin et al., 1994; Gaiano et al., 1996b); however, because the germ lines are mosaic, an integration event that occurred in any given PGC will only be present in approximately 1 to 20% of the offspring (Gaiano et al., 1996b; Chen et al., 2002; Ellingsen et al., 2005; A. Amsterdam, unpublished data). The higher the titer of the injected virus, the more likely that any given PGC will be infected, and with very high titer virus, every PGC can be infected multiple times. Founders injected with such virus can transmit over 30 different insertions and produce individual offspring with as many as 10 to 15 independently segregating insertions (Amsterdam et al., 1999; Chen et al., 2002).
One possible disadvantage of the pseudotyped retroviral system is that it is not always straightforward to generate virus with sufficiently high titer. The standard method begins with a producer cell line that expresses the MoMuLV genes. The viral genome is either transiently transfected or a stable transfectant clone can be selected. Finally, the VSV-G gene is transiently transfected, and virus is harvested from the medium over the next few days; because the VSV-G protein is toxic to the cells, virus-producing cells cannot be maintained, so a new transfection must be performed for every virus preparation. There can be great variation in viral titer from preparation to preparation, even when a virus with the same genome is being produced repeatedly. Although the titer of the virus can be determined on tissue culture cells, this determination is not always indicative of how well the virus will infect embryonic cells; whereas a low titer on tissue culture cells is certain to predict a low infection rate in the embryo, the reverse is not always true. Thus, the best way to determine if a virus stock is suitable or not is to directly test the number of integrations in injected embryos, using a DNA-based assay (Amsterdam et al., 1999; Amsterdam and Hopkins, 2004). A more significant problem, however, is that not all vectors (i.e., different sequences in the virus), can be produced at high titer, although it is not entirely clear why or predictable. Thus, trying to design new vectors with additional features (e.g., expression of a reporter gene or gene trapping), often involves the testing of many constructs before finding one that works (Chen et al., 2002; A. Amsterdam and T.S. Becker, unpublished data).
In a manner analogous to retroviral vectors, transposon vectors use the cis-acting elements from naturally occurring transposons in conjunction with the gene normally encoded by the transposon supplied in trans. For transposons, this gene encodes transposase, an enzyme that catalyzes both the excision of the vector from a plasmid (or a pre-existing genomic location) and integration into the host genome. As with the case of retroviral vectors, different sequences can be engineered into the vector in place of the transposase gene to express a gene of interest or serve as a gene trap. Unlike retroviruses, transposons do not infect cells from the outside; the plasmid containing the transposon vector and mRNA encoding the transposase are coinjected into one-cell-stage embryos. As the early divisions of the zebrafish embryo are very rapid, injected embryos are mosaic for integration events as in the case of retroviral infection; therefore, injected fish transmit any given insertion to less than 50% of their progeny.
There are three different transposon vectors/transposases that have been shown to integrate into the zebrafish genome and generate germ line transgenics. Vectors based upon the Caenorhabditis elegans Tc3 transposon were able to transpose into the zebrafish germ line, albeit only at rates comparable to microinjection of plasmid DNA (Raz et al., 1998). The Sleeping Beauty transposon is based on a reconstruction of an ancient fish transposon; evolutionarily acquired mutations in both the cis-acting sequences and the transposase gene were “fixed” by comparing sequences from multiple salmonid species to make both a functional transposase gene and a vector that would transpose in its presence (Ivics et al., 1997). When vectors and transposase are injected, approximately one third of the injected fish transmit the integrated vector. Whereas most founders only transmit a single insertion, some founders transmit over five different independently segregating insertions (Davidson et al., 2003). The highest transgenesis rates for transposons have been achieved with the Tol2 system from the medaka fish (Kawakami et al., 2000, 2004). Half of the injected founder fish transmitted the vector to their progeny, and on average, nearly six different insertions were transmitted by each founder.
While the frequency of transgenesis with the transposon systems is lower than the best that has been achieved with the retroviral vectors, there are two possible advantages to using transposons. First, the transposon vectors appear to be more easily amenable to engineering than the retroviral vectors; altering the sequences within the vector does not dramatically alter the efficiency of transposition, although as a general rule, the frequency of transposition decreases as the size of the vector increases (Izsvak et al., 2000; Davidson et al., 2003). Second, it is possible that new insertions could be generated efficiently by remobilization of existing insertions. All of the germ line transgenics that have been reported to date were generated by the coinjection of vector-bearing plasmids and transposase mRNA, transposing the vector from the plasmid to the host genome. However, for each of the transposons, it has been demonstrated that stable insertions can be excised again by injection of transposase (Raz et al., 1998; Kawakami et al., 2004; Balciunas et al., 2004), and Tol2 has been shown to reintegrate at new genomic sites (Kawakami et al., 2004; Parinov et al., 2004). Although the frequency with which lines with new insertion sites can be generated by remobilization is currently lower than making them de novo, improvements in this direction might allow for very efficient production of transposon lines similar to that seen with P elements in Drosophila (Cooley et al., 1988).
The only insertional agents that have efficiently produced insertional mutants to date are the retroviral vectors (Gaiano et al., 1996a; Amsterdam et al., 1999, 2004a; Golling et al., 2002). Approximately 1 in 80 insertions, when bred to homozygosity, produces an embryonic visible (and usually lethal) phenotype. Although this rate is too inefficient to conduct a large-scale screen by breeding one insertion at a time, by using the ability of founders to transmit multiple insertions to individual F1 progeny, multiple insertions can be screened simultaneously. By crossing F1 fish with multiple insertions and screening the progeny of multiple F2 sibling crosses, one can screen an average of approximately 12 insertions per F2 family, allowing the recovery of approximately one insertional mutation per seven families screened (Amsterdam et al., 1999; Amsterdam and Hopkins, 2004). A slight complication to screening multiple insertions per family is that, for each recovered mutation, linkage analysis must be performed to identify which insertion is responsible for the phenotype, but this analysis is easily done by Southern analysis on DNA from the parents of the phenotypic and wild-type clutches as well as mutant embryos (Amsterdam et al., 1999). The genomic sequence adjacent to the mutagenic provirus (junction sequence) can then be cloned by inverse polymerase chain reaction (PCR) or linker-mediated PCR. This method has been used to generate over 500 insertional mutants with recessive embryonic phenotypes (Amsterdam et al., 2004a). Whereas in approximately 4% of the mutants more than one insert appears linked to the phenotype (because two or more proviruses integrated near each other), in the rest of the cases, genomic sequence adjacent to the mutagenic insertion could be determined, and to date, in over 90% of these mutants, the disrupted gene has been identified.
Approximately 60% of the mutagenic insertions reside in either the promoter, 5′ untranslated region (UTR), or first intron, whereas approximately 20% are in coding exons and 20% in downstream introns or 3′ UTR (Table 1). The apparent preference for insertions at the 5′ end of the gene could be due to insertion preference of the retrovirus, as has been suggested in mice (Scherdin et al., 1990; Mooslehner et al., 1990), human tissue culture cells (Wu et al., 2003), and the cloning of a large number of unselected viral insertions in zebrafish (G. Golling and T. Thacker, personal communication, see below). It is also possible that insertions at the 5′ end may be somewhat more likely to be mutagenic than insertions further downstream in the gene. For example, there are two insertional alleles of smoothened (smu) in which the virus is in the first coding exon and in which no mRNA can be detected (Chen et al., 2001); a third allele in which the insertion is in an intron further downstream has a less-severe phenotype (Wilbanks et al., 2004).
Table 1. Location of Proviral Insertions Under Different Forms of Selection
>3 kb from gene
<3 kb upstream of gene
<3 kb downstream of gene
G. Golling and T. Thacker, personal communication; numbers in parentheses are the percentage of the subset of inserts within genes, for the purpose of comparison with insertional mutagenesis.
Amsterdam et al. (2004a), and A. Amsterdam, unpublished data.
Insertional mutants have not been ascribed to a given gene if they are “outside” of the gene except in a few cases confirmed either by analysis of gene expression in mutants or by non-complementation to other known mutants in that gene.
Unlike chemical mutagenesis, which by causing point mutations has the potential to create hypomorphic or neomorphic alleles by amino acid substitution, insertional mutagenesis generally works by reducing or abrogating gene expression, although there are exceptions. In most cases in which RNA expression in insertional mutants has been analyzed, levels were reduced anywhere from fivefold to undetectable levels (Golling et al., 2002). Thus, in some cases proviral insertions cause hypomorphic rather than null alleles. However, some mutations cause exon skipping instead of down-regulation of expression. For example, there are three insertional alleles of the vHNF-1 gene, two in the first or second exons, which appear to be null alleles, and one in the fifth intron, which leads to two different splice variants, skipping either the fourth or third and fourth exons (Sun and Hopkins, 2001). This allele is predicted to make a truncated protein, and in fact has a less-severe phenotype. Truncated proteins might also be produced by another mechanism by the virus used in this screen; the virus contains a splice-in, splice-out, frameshift-producing gene trap cassette (Chen et al., 2002; see below). When the virus inserts in an intron in the correct orientation, it is possible for the previous splice donor to splice to this exon in the provirus, and splice out to the next endogenous exon, thus creating a frameshift mutation and presumably a truncated protein. However, this mechanism of mutation appears to be far less common than transcriptional down-regulation. (Golling et al., 2002; A. Amsterdam, unpublished data).
The large-scale insertional mutagenesis screen recovered mutants in approximately 25% of all of the genes that can mutate to a recessive visible embryonic phenotype, indicating that only approximately 1,400 genes (approximately 5% of all zebrafish protein-coding genes) are uniquely required during development (Amsterdam et al., 2004a). This estimate is primarily based on the fact that the identified genes include roughly 25% of certain classes of “housekeeping” genes, which one would expect to be essential (e.g., ribosomal proteins, tRNA synthetases), as well as 25% of the genes that have been identified by positional cloning of ENU-induced mutations. An extrapolation of the allele distribution from the insertional screen (number of genes hit once, twice, and so on) is also consistent with this screen having reached 25% saturation.
Because nearly all of the genes mutated in this screen were identified, rather than only pursuing mutants with especially interesting phenotypes, this strategy provided an unbiased survey of the genes required for zebrafish embryonic development. These genes are far more likely to have homologs in invertebrate or unicellular organisms than genes chosen at random. Furthermore, the C. elegans and Saccharomyces cerevisiae homologs of these genes are far more likely themselves to be essential genes than worm or yeast genes at large. This finding suggests that a conserved core group of essential genes has been maintained throughout evolution, a conclusion that could not have been reached without the unbiased view of the genetic requirements for vertebrate development made possible by insertional mutagenesis.
The insertional mutagenesis screen described above was rather labor intensive, requiring both the identification of high insert density fish and three generations of breeding. It is possible that, at least for some phenotypes, insertional mutagenesis could be conducted much more efficiently using gynogenetic methods, such as the screening of early pressure diploid (Beattie et al., 1999) or haploid (Walker, 1999) embryos. One possibility would be to take advantage of the germline mosaicism of founder fish, directly screening their progeny. This method would allow the screening of dozens of insertions per clutch, without the selection of high insert number fish or additional breeding generations. Because of the germline mosaicism, only a small percentage of the embryos will inherit any given insertion, so the screen would have to be sensitive enough to detect only one or a few mutants per clutch, without too high a false-positive rate. In a pilot screen in which the haploid progeny of approximately 300 mosaic F0 fish were screened for both brain pattern formation by in situ hybridization at tail bud stage and brain morphology at 32 hours postfertilization (hpf), six insertional mutants were recovered (Wiellette et al., 2004). Alternatively, if a higher proportion of mutant embryos were required to be sure of the phenotype, one could screen the progeny of multi-insert F1 fish, selected as in the three-generation screen. Although this method would add the work and resources to raise all F1 families and select the high insertion number F1 fish, one could screen around 10 inserts per clutch and it would still be one less generation to breed and many fewer clutches to screen than the diploid F3 screen described above. Either of these methods has the potential to approach saturation for a given phenotype.
Retrovirus-mediated insertional mutagenesis can also be used for reverse genetic approaches. Whereas the phenotype-directed screen was able to recover mutants in a large proportion of the genes required for embryonic development, and a further scaling up of that screen could possibly recover mutants and clone the genes for nearly all of them, most genes will not have an embryonic phenotype when mutated. Some may have adult phenotypes or have phenotypes only under certain environmental or experimental conditions. Such mutations might be more likely than those with embryonic phenotypes to provide insight into (and animal models for) diseases such as obesity, diabetes, neurodegeneration, or heart disease. Additionally, it may be desirable to generate double mutants of genes known to be partially functionally redundant. Golling and colleagues at Znomics, Inc., are using a retroviral vector to create a library of over a million insertions that should represent at least one insertion in every gene. The sperm from 40,000 founder males will be cryogenically preserved, and DNA samples from sperm from these founders will be used to clone and sequence the DNA adjacent to every insert transmitted by each founder. These sequences will be used to determine which inserts appear to be in, and thus possibly disrupt, genes. To date, approximately 900 junction sequences have been determined, and nearly 400 of them appear to be within or very near a gene. Thus, if every sperm sample contains ∼30 insertions, each might contain disruptions in up to 13 different genes (G. Golling and T. Thacker, personal communication). Ultimately, for any given gene, it should be possible to identify one or more sperm samples that contain an insert in that gene. Because the germ lines of the founder fish are highly mosaic, only 3–20% of the progeny produced by in vitro fertilization with those sperm samples will contain the desired insert, but these can be easily identified by PCR on DNA from fin clips, using one primer in the virus and one in the specific junction sequence. Once this library is constructed, it should be much easier to recover a mutation in any given gene of interest than by using the existing reverse genetic approach, by which ENU-induced mutations in a given gene are recovered from a library of mutagenized fish by sequencing the exons from all of the samples in the library (Wienholds et al., 2002; Stemple, 2004).
Of course, the presence of an insert in or near a gene does not guarantee that the gene will necessarily be affected, but with several different insertions to choose from within the library, it is likely that at least one will do so. An analysis of the locations of randomly cloned inserts that have been found within genes shows a similar distribution to that of the mutagenic inserts (Table 1, compare numbers in parentheses for randomly cloned inserts to the numbers for mutagenic inserts). Both are heavily biased toward either the first exon or first intron, suggesting that these insertions are both common and are likely to impair gene expression. Insertions in downstream introns may be less likely to be mutagenic, as the proportion of downstream insertions in introns rather than exons is much higher among the random insertions than the mutagenic set. In any case, the effect of any given insertion on a gene can be determined by reverse transcriptase-PCR on fish homozygous for the mutation.
High-efficiency transgenic techniques are also useful for the generation of a large number of enhancer detection lines, where in each line the expression of a reporter gene (preferably one that can be viewed live as a single copy insertion, such as GFP) is under the transcriptional control of tissue-specific enhancer(s), completely or partially recapitulating the expression pattern of a given gene. A large collection of such lines is valuable for several reasons. Each line provides a different population of marked cells or tissues that can be useful in observing the normal development of specific organs and structures. Any given line can also be used to analyze the effects of other genes on the marked tissue, either by crossing mutants into the enhancer detection line, overexpressing genes in it, or knocking down genes in it using antisense morpholinos (Nasevicius and Ekker, 2000). The development of the marked tissues can then be compared between experimental situation and wild-type, thus facilitating the analysis of the influence of a gene on a particular cell type or tissue. Also, such lines will be very useful starting points for mutagenesis screens looking for defects in specific tissues or cell types that are not easily observed morphologically, such as small groups of neurons in the central nervous system. Another use is as a visible locus marker: an insertion in or near a gene can be crossed to a mutation in that same gene, so that the reporter-expressing locus is in trans to the mutation. As long as the insertion is near enough to the gene in question to prevent recombination between the transgene and the mutant copy of the gene on the other chromosome, the transgene can act as a “green balancer” as used in fly genetics (Casso et al., 2000); all progeny lacking reporter activity should be homozygous for the mutation. Finally, in so far as the expression of the reporter in a given line recapitulates the expression of an identifiable gene, a large collection of enhancer detection lines can substantially contribute to our knowledge of the dynamic expression pattern of a large number of genes.
Both transposons and retroviruses have been used to generate several enhancer detection lines using GFP (Balciunas et al., 2004; Parinov et al., 2004), YFP (Ellingsen et al., 2005), or dsRed (Becker et al., unpublished results) as the reporter gene. The lacZ gene has not been used, as it has often proved to be an unsuitable reporter in zebrafish, likely due to frequent inactivation after germ line passage, especially in retroviral vectors (Gaiano et al., 1996b; and N. Gaiano, M. Allende, T.S. Becker, and N. Hopkins, unpublished results). In broad outline, the screening procedure is the same in all cases: founders are produced that have mosaic germ lines, and their progeny are screened directly for fluorescence (Fig. 2A). Because the germ lines of the founders are mosaic, and individual fish may transmit multiple insertions, 1–20% of the F1 embryos should inherit any given insertion and express fluorescence in a similar pattern, and embryos with more than one pattern might arise from the same founder. Insertion lines are established by outcrossing pattern-expressing F1 fish, and DNA from F1 fish with the same pattern is analyzed to identify which insertion is common to all the fish with that expression pattern. Different vector systems have different efficiencies in enhancer detection frequency, efficiency of producing pattern lines, and ease of identification of the trapped gene, depending on whether insertions are found as single copies (Table 2).
Table 2. Efficiencies of Transgenesis, Trap Activation, and Generation of Pattern Lines by Various Enhancer Detection and Gene Trap Vectors
Transgenesis rate (average no. of inserts per transmitting founder)
Pattern lines per insert
Founders screened per pattern line
The Sleeping Beauty transposon system has been used as a vector for delivering an enhancer detector in both medaka fish (Grabher et al., 2003) and in zebrafish (Balciunas et al., 2004). In both cases, a promoter was used that was capable of driving ubiquitous expression, cska in the medaka and ef1α in the zebrafish; the results from each experiment were fairly similar. In the zebrafish experiments, one third of the founders produced progeny that strongly expressed GFP ubiquitously, a proportion comparable with previous experiments with the Sleeping Beauty transposon (Davidson et al., 2003). Thus, to a rough approximation, most insertion events with this vector express GFP ubiquitously. However, approximately 10% of the founders that produced GFP-expressing progeny included embryos that expressed GFP in a tissue-restricted pattern. Each of the nine observed patterns was distinct, although most of them were restricted to ectoderm and neural crest derived tissues such as the central nervous system, cranial sensory ganglia, and pharyngeal arches. It is notable that, unlike some enhancer detection vectors that do not express the reporter at all (or express it at very low levels) in the absence of integration at a site under the activating control of an enhancer, most of the transgenic lines in this system express the reporter ubiquitously; observable detection events are those in which regulatory elements near the insertion site restrict an otherwise ubiquitous expression pattern and, thus, would seem to detect silencing, rather than enhancing, cis-regulatory elements. Assuming an average transgenesis rate of two to three insertions per transmitting founder (Davidson et al., 2003), this rate means that at most 3–5% of the insertions fell under control of such a restrictive element. In the medaka experiments, because GFP expression from the cska promoter was fairly weak, it was possible to detect enhancer activity in the absence of silencing the promoter in other cells, and 12% of integration events produced a specific pattern (Grabher et al., 2003). The overall yield of detected patterns was 1 per 40 founders screened for the zebrafish, 1 in 27 for medaka.
Another enhancer detection vector was delivered with the Tol2 transposon system and used a partial promoter from the epithelial-expressed keratin8 gene (Parinov et al., 2004). All of the transgenic progeny of injected embryos expressed GFP in the skin epithelia. In addition to skin expression, the progeny of approximately two thirds of the transgene-transmitting founders included fish with an expression pattern in addition to skin expression. Again, all of the expression patterns found were unique, and the 28 expression patterns observed covered many different tissues derived from all embryonic germ layers, although the most common expressing tissue was the nervous system. Estimating that GFP-positive F1 pools contained three insertions on average, approximately 25% of insertion events resulted in expression patterns. This finding is approximately five times higher than what was observed in the ef1α-promoter-containing vector, and the difference may be that this vector allows enhancers to boost expression in a tissue-specific manner, whereas the other requires that an otherwise ubiquitous expression pattern be restricted. Overall, one GFP expression pattern was identified for every eight founders screened.
The most efficient production of enhancer detection lines has been achieved by the use of a retroviral vector. As the viral long terminal repeats, which drive viral transcription in mammals, lack promoter function in zebrafish, an enhancer detection vector was developed by placing a proximal GATA2 promoter in front of YFP within the virus. This previously characterized promoter was shown to have very low, if any, activity on its own (Meng et al., 1997), and, therefore, required activation to be detected. The vector was initially tested in zebrafish tissue culture cells where approximately 14% of integrations resulted in fluorescent cells (Ellingsen et al., 2005). The viral vector was then injected into zebrafish embryos and was found to be activated at a similar rate after passing through the fish germline; 95 independent transgenic lines of fish with tissue specific fluorescent protein expression were established from approximately 700 transmitted proviral insertions. An ongoing large-scale screen has resulted in the isolation of close to 1,000 enhancer detection lines with early tissue-specific reporter expression to date (Becker et al., unpublished results).
Reporter gene expression in all 95 transgenic lines was activated within the first day of embryogenesis, and in the majority of cases, reporter gene expression was driven according to the spatiotemporal specificity of genes found nearby (Ellingsen et al., 2005; Fig. 2B). The genomic locations of the corresponding retroviral insertions were analyzed and a transcriptional unit (either a known gene or a Genscan or Ensembl predicted gene also represented by at least one expressed sequence tag) was found within 15 kb in around 80% of the 65 cases that could be mapped to the (as yet unfinished) zebrafish genome assembly. For the other fifth of the insertions, the nearest gene was up to 220 kb away. Interestingly, a large proportion of the enhancer detection integrations map to previously known developmental genes, including members of the pax, hox, sox, pou, otx, emx, zinc finger, and basic helix–loop–helix transcription factor gene families as well as nontranscription factor genes involved in embryonic development (Ellingsen et al., 2005; Fig. 2B).
The endogenous RNA expression patterns of the genes adjacent to the insert closely match the YFP reporter gene expression in all but 1 of 23 cases examined; this matching was true even for inserts 30–220 kb from the neighboring gene. Although certain aspects of the endogenous pattern are sometimes found to be missing from the enhancer detection lines (Ellingsen et al., 2005), the matches are generally very close, and one can, therefore, conclude that enhancer detection with this vector generates patterns according to cis-regulatory elements controlling genes near the insertion site and, therefore, that enhancer detection in zebrafish produces results equivalent to those found in the fruit fly (reviewed by Bellen, 1999). However, sometimes an insertion was found to be between two genes and actually recapitulated the expression pattern of the one further away. Additionally, due to incomplete annotation of the zebrafish genome, not all genes have been identified, so the gene truly nearest to a given insertion might not be noticed. These facts emphasize the need to confirm expression patterns for genes in the vicinity of each insertion site. An increasing number of expression patterns are also available online at www.zfin.org (Thisse et al., 2004), and this database is expected to expand to all zebrafish genes expressed in the embryo.
There are some important differences between zebrafish and the fly enhancer detection approaches: First, the activation frequency in the fly is 65%, whereas in fish, it is 15% to 25%. Second, in the fly, only approximately 10% of the detection lines show specific patterns as opposed to ubiquitous expression (e.g., Bellen, 1999), whereas in the fish, this patterning is found for almost all of them. The third major difference from the fly is that enhancer detection integrations in the fish are sometimes much farther from the gene whose pattern the reporter recapitulates. These differences may reflect that, in the fly, P-element insertions are strongly biased toward the 5′ end of genes and that these vectors are efficiently activated by the regulatory elements of all kinds of genes. In vertebrates, intergenic distances are larger, and it may be that genes with highly regulated expression patterns have stronger enhancers that act over longer distances (e.g., Sagai et al., 2005). Thus, while fewer insertions may be activated in the fish than in the fly, they may be more likely to be expressed in a tissue-specific pattern and might often be farther from the gene whose enhancer is being used.
It is probably too early for a direct comparison of enhancer detection between viral insertions and the transposon systems because they presumably have different properties relating to where in the genome they integrate, both globally as well as locally within any given gene. Furthermore, all examples mentioned here use different promoters, and this difference may have an influence on what types of genes are recovered from a screen, as has been observed in Drosophila (e.g., Bellen, 1999). For example, it appears that the Tol2 enhancer detection vector is activated at approximately twice the frequency as the retroviral vector, but it is not clear whether this disparity results from the vector or whether it is influenced by the basal promoter used. Lastly, the numbers of insertions recovered and characterized to date are still too small for a definitive comparison. For instance, a significant fraction of enhancers detected with a viral vector and a GATA2 promoter appear to be regulating “developmental” genes, i.e., transcriptional regulators, receptors, or other previously described genes involved in embryonic development (Ellingsen et al., 2005), whereas in the transposon-mediated enhancer detection approaches, fewer of the insertions have been mapped to the genome, and those that have do not show a bias toward “developmental” genes. Nevertheless, all enhancer detection approaches noted a preponderance of tissue-specific expression patterns, perhaps reflecting that developmental genes have powerful enhancers, which in effect make these genes and their environment “large targets” for such approaches.
The mutagenic rate of these integrations has not been determined. Selection for enhancer detection does not particularly bias toward mutagenic insertions; over half of them were found “outside” of the gene, and a far lower proportion of enhancer detection insertions were found in the first exon on first intron than was seen in the insertional mutagenesis distribution (Table 1). Nonetheless, 5% of the enhancer detection insertions were in exons or splice sites and, hence, should disrupt the messenger RNA of the gene, and a fraction of the insertions in the core promoter or in introns will also likely affect gene expression. However, it is currently unknown how many genes that are developmentally regulated such that they would produce enhancer detection patterns will result in a detectable phenotype when mutated, so the actual frequency of mutagenesis by enhancer detection is still unknown. At least one of the retroviral enhancer detection insertions, into the core promoter of ptc1, is mutagenic, resulting in early larval lethality (Becker, unpublished observations). In insertional screens in Drosophila using enhancer detectors, insertion lines were selected through either observation of expression, or through phenotypic analysis. In several screens (e.g., Kania et al., 1995; Salzberg et al., 1997; Deak et al., 1997), it was found that, while one screening method alone could produce more than 200 potentially interesting strains for a given tissue (e.g., expression in the nervous system vs. mutations affecting patterning in the nervous system), less than 10 of these were identified by both methods (reviewed by Bellen, 1999). It may well be that, in zebrafish as well, insertional mutagenesis and enhancer detection will prove to be complementary technologies.
Unlike enhancer detection vectors, gene traps require that the insertion land within the transcription unit of the gene. Most commonly, gene traps contain a splice acceptor followed by a reporter gene so that, if they land in the intron of a gene, they will be spliced into the message. This also means that they must insert in the correct orientation. Additionally, while the reporter gene has its own ATG, if it lands downstream of the initiation codon of the gene into which it has inserted, the reporter gene's ATG may not be efficiently used, and if it is out of frame with the upstream initiation codon, the reporter protein may not be expressed. Thus, there are several reasons to expect that a gene trap vector should be less efficient (i.e., have a lower frequency of insertions leading to detectable reporter gene activity) than enhancer detection. However, there are several advantages, as explained below.
Retroviral gene trap vectors, which have worked very well in murine embryonic stem (ES) cells (Friedrich and Soriano, 1991; Wiles et al., 2001; Stanford et al., 2001), have not yet been productive in zebrafish. Many attempts to make pseudotyped gene trap vectors with lacZ or GFP as a reporter that efficiently infected the zebrafish germ line have failed due to an inability to produce these viruses at sufficient titer (N. Gaiano, A. Amsterdam, and N. Hopkins, unpublished data). The retrovirus used in the insertional mutagenesis screen described above contains a variation of a gene trap (Chen et al., 2002). Rather than a typical gene trap where the reporter gene is preceded by a splice acceptor and followed by a polyadenylation signal, this trap is a small exon preceded by a splice acceptor and followed by a splice donor. Thus, when this provirus integrates in the intron of a gene in the correct orientation, this exon should be incorporated into the message, but should not terminate it; the inserted exon would cause a frameshift, presumably resulting in either a truncated protein or a loss of message due to nonsense-mediated mRNA decay. The trap exon encodes the FLAG epitope in all three reading frames, but unfortunately, antibodies against this peptide do not work very well in whole-mount staining on zebrafish embryos and could not be used for detection of gene trap events (Chen et al., 2002). Trap events can be detected among pools of embryos by 5′-rapid amplification of cDNA ends (RACE) and were found at a rate of approximately one per injected founder, thus approximately 1 per 25 inserts (Chen et al., 2002), but the lack of a visible reporter gene precludes the discovery of the spatial expression pattern of the trapped genes. Nonetheless, these insertions are likely to be mutagenic, and it would be possible to identify the trapped genes in a large number of F1 pools, and in a manner analogous to the insertion library described above, cryopreserve sperm samples corresponding to each trapped gene.
A more successful gene trap approach has been the use of a Tol2 transposon vector (Kawakami et al., 2004). The vector includes a promoterless GFP gene preceded by a splice acceptor, and approximately 1 in every 12 inserts results in the expression of GFP in a specific pattern. As approximately half of the injected embryos become founders that transmit approximately six different inserts on average, useful trap lines are found at a rate of one per four tested founders. As with the enhancer detection lines, all of the 36 reported expression patterns were different, and a wide variety of tissues were represented. Many of the trap insertions were verified to be true gene trap events, as 5′-RACE was used to demonstrate the presence of a fusion transcript containing the 5′ end of an endogenous gene spliced accurately to the splice donor of the GFP transgene. In other cases, 5′-RACE failed to yield such a fusion transcript, likely due to the inefficiency of the 5′-RACE protocol. For these cases, it is not known at this time if the reporter activity represents true gene trapping or possibly enhancer detection by means of the use of a cryptic promoter in the vector.
One in twelve Tol2 gene trap insertions are activated. This finding is approximately twice the frequency as observed with the retroviral gene trap described above, but possibly that frequency was underestimated because traps were only detected by 5′-RACE, which may fail sometimes. As expected, the frequency is approximately three times lower than the Tol2 enhancer detection vector. However, because the transgenesis rate with the gene trap appears to be higher, the efficiency of identifying trap lines is actually better with the gene trap (see Table 2). The retroviral enhancer detection vector is still a bit more efficient than the Tol2 gene trap, finding a trap pattern in one of three founders rather than one of four.
While the efficiency of generating trap lines with new patterns is somewhat lower with the Tol2 gene trap than retroviral enhancer detection, the gene trap may have some other advantages. For one, it may be easier to reliably determine which gene has been trapped. As with the case of enhancer detection, identification of the trapped gene first requires the determination of which of several inserts is actually the trap. This determination can be achieved either by outcrossing the transgenics until only the activated transgene is left or by correlating a single insertion with reporter expression. However, not only can cloning of the junction fragment aid in the identification of the gene into which the transposon has inserted, 5′-RACE can be used as well. For example, Kawakami et al. (2004) cloned junction fragments for 16 of the trap lines, and using the current assembly and annotation of the zebrafish genome, one could only confidently identify the trapped gene in a couple of them. However, 5′-RACE was able to identify the genes trapped in half of the lines. For most of these, the exon preceding the insertion site could in fact be found in the genome assembly near the insertion site, but they were not annotated as part of genes; often they were 5′ UTR exons. Thus, not only does the option of using 5′-RACE aid in the identification of the trapped genes, such data are useful in the annotation of the genome.
Additionally, the gene trap may more faithfully mimic the expression pattern of the trapped gene. Reporter activity in some of the enhancer detection lines (produced by any of the three vectors) is observed in only a subset of the expression pattern of the corresponding endogenous genes. This finding may indicate that in these cases reporter gene expression only reflects the influence of a subset of the cis-acting elements that control expression of the endogenous gene or misses specific interactions between an enhancer and a specific promoter. Whereas a comparison of fluorescence with endogenous gene expression has not been reported for any of the Tol2 gene trap lines, because the reporter is part of a fusion transcript with the endogenous gene, it seems likely that reporter activity will closely recapitulate its expression, as is seen in mouse (Wiles et al., 2001; Stanford et al., 2001).
Finally, the gene trap vectors are more likely than the enhancer detection vectors to be mutagenic. Although some of the enhancer detection insertions might affect expression of the endogenous gene, most probably will not. However, the gene trap should mutate the gene by incorporating into the mRNA and truncating it prematurely. This finding will only be true to the degree that the gene does not splice around the gene trap, as can be observed sometimes in murine ES cell gene traps (e.g., Gasca et al., 1995). Thirty-six Tol2 gene trap insertions were bred to homozygosity, and none displayed an embryonic phenotype (Kawakami et al., 2004). As perhaps as few as 5% of zebrafish genes will give an embryonic phenotype when mutated (Amsterdam et al., 2004a), it is likely that too few lines have been analyzed to know how efficient these gene traps will be as mutagens. In one tol2 gene trap line, the amount of full-length mRNA for the trapped gene is approximately fourfold less than wild-type in embryos homozygous for the insertion (Kawakami et al., 2004). Whether most Tol2 gene trap lines will disrupt the expression of the endogenous gene more or less than this remains to be seen. If these gene traps are not generally found to significantly decrease the expression of the genes into which they have integrated, it is possible that new vectors with splicing enhancers (Tanaka et al., 1994; Chen et al., 2002) might be more effective by reducing the frequency of splicing around the gene trap exon.
The technologies reviewed above have created a wonderful resource of both mutants and useful transgenic lines and will surely continue to do so. Furthermore, both the mutants and the trap lines have yet to be fully exploited. Thus, while new screens will surely identify more insertional mutants, and the production of additional enhancer detection and gene trap lines will continue, there is much to be done with the lines already in existence.
The current insertional mutant collection represents approximately one quarter of all of the genes essential for vertebrate embryonic development, and should thus include mutations in approximately a quarter of all genes that can affect any given developmental process, provided that the overall phenotype is morphologically visible. However, while many mutants affecting various developmental processes are surely present in the collection, they may not have been recognized yet as such. The patterning and development of many different tissues is best illuminated either by sectioning or by staining with various antibodies, in situ hybridization markers, or other reagents. Thus, whereas the characterization of the phenotypes is somewhat rudimentary at present, numerous “shelf screens” are being conducted on the collection, identifying the subset of mutants affecting the development of a host of structures. Thus, a substantial portion of the genes required for the proper formation of all of these structures will be identified. For example, approximately a dozen of the mutants lead to the development of cystic kidney, and because the identity of all of the genes was known, it was clear that many, if not most, of them are involved in the production or maintenance of primary cilia (Sun et al., 2004). This demonstrated that perturbation of these cilia is the central pathway in embryonic kidney cystogenesis. In another screen, histological examination of the retina identified approximately forty genes required for proper eye development (Gross et al., 2005). Thus, the authors were able to identify the types of genes whose mutation led to different eye phenotypes. Another screen of the collection identified seven mutants with large livers; although some of these may be defective in cell or organ growth control, three of them appear to be models for different human liver diseases (Sadler et al., 2005).
In addition, the existing collection can also be used to monitor the long-term effects of heterozygosity of these genes in adults, as mutations that have embryonic phenotypes in the homozygous state might predispose to adult disease in the heterozygous state and, thus, identify new genes in disease pathways and/or provide animal models for human diseases. For example, many genes whose mutation in the heterozygous state is known to predispose mammals to cancer (tumor suppressor genes) cause prenatal death in mice in the homozygous state (Jacks, 1996). Thus, among a large collection of recessive embryonic lethal mutations, one might expect to find mutations in which heterozygotes are more prone to develop cancer. An initial survey of the heterozygous mutants revealed the surprising fact that mutation in any of a dozen ribosomal protein genes leads to a predisposition to the formation of an otherwise rare tumor type (Amsterdam et al., 2004b). As another example, whereas homozygous mutation of the vHNF1 gene in fish specifically affects both kidney and pancreas development (Sun and Hopkins, 2001), heterozygous mutation of the vHNF1 gene can predispose people to either kidney disease or diabetes (Horikawa et al., 1997; Nishigori et al., 1998). Thus, it might not be surprising if heterozygotes of insertional mutants were predisposed to a variety of diseases that might be screenable in adult fish.
Besides the further exploration of the existing insertional mutants, new insertional mutagenesis screens, possibly using gynogenetic methods for increased efficiency and targeting specific phenotypes, could possibly reach saturation. Furthermore, it may be possible to screen for postembryonic phenotypes as well. For example, early pressure diploids could be raised past embryogenesis and screened for physiological phenotypes as juveniles or adults. Similarly, insertional mutants in selected genes, as identified from the insertion library method, could be screened for postembryonic phenotypes, or used as the starting population for “sensitized” screens, which would aim to identify mutations that interact synthetically with the initial mutation.
The various gene trap and enhancer detection systems, especially the large-scale enhancer detection screen in Bergen, have provided material for a multiplicity of laboratories worldwide. Transgenic fish carrying insertions in or near known genes, so-called “marker lines,” can be used for fine mapping of expression in live or fixed embryos and even in adult fish, for instance in the brain, and for detailed fate mapping, where reporter expression during gastrulation provides reference domains used to guide the application of lineage tracers. Transgenes expressed in specific tissues can be used for time-lapse imaging to elucidate cell migration patterns and tissue differentiation at the single-cell level. Insertions in or near receptor genes can be used as live readouts for their ligands; for example, an insertion in the ptc1 gene can reflect hedgehog signalling, and as more of the insertions are mapped, there will surely be enhancer detection lines representing many different signaling pathways. Additionally, these transgenic lines provide easily visualized markers to analyze phenotypes of mutations by introducing the marked chromosome into a mutant background, as well as assessing the effect of morpholino injections or the application of small molecules to specific mutations.
Additionally, insertions in or near novel genes, or of known genes whose role or even expression has not been explored, are very useful in providing candidates for gene discovery. The strategy here is to collect many transgenic lines with expression in the tissue of interest, map the genomic locations of the activated insertions, confirm expression patterns by in situ hybridization, and finally explore the function of the identified gene by morpholino injection directly into the transgenic line, thus allowing real-time analysis of gene function at high resolution. Another possible utility of a large number of reporter-expressing lines would be to express either Gal4 or the cre or FLP recombinases in a variety of patterns, allowing for the expression of bioactive genes in specific tissues by crossing such lines to transgenics where the gene of interest is either controlled by Gal4-UAS (Scheer and Campos-Ortega, 1999) or floxed (Langenau et al., 2005). These methods will add a multitude of new possibilities to investigate genes and the genome of this fast-advancing, easy-to-grow laboratory vertebrate.
Finally, the use of gene traps could be combined with insertional mutagenesis in several ways. First, double mutants could be made between trap lines with similar expression patterns, where either trap alone failed to produce a visible phenotype, as a way to identify genes that are genetically redundant. Alternatively, it might be possible to select for mutagenic traps in genes with particular characteristics, such as transmembrane proteins or genes that are regulated by certain growth or differentiation factors, as has been done in murine ES cells (Scherer et al., 1996; Mitchell et al., 2001). Selection of traps in such genes could be carried out in tissue culture cells and transgenic fish generated by nuclear transfer (Lee et al., 2002).
The authors thank Greg Golling for sharing data in advance of publication, and Kavitha Becker for help with the figures. A.A. was supported by a grant from the NIH, and T.S.B. was supported by grants from the Sars Centre, the National Programme in Functional Genomics (FUGE) in Norway, by the European Commission as part of the ZF-Models Integrated Project in the 6th Framework Programme, and by the University of Bergen.