SEARCH

SEARCH BY CITATION

Keywords:

  • embryonic stem cell;
  • gene trap;
  • knockout mice;
  • mouse;
  • mutagenesis

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

Gene trapping in embryonic stem (ES) cells is a proven method for large-scale random insertional mutagenesis in the mouse genome. We have established an exchangeable gene trap system, in which a reporter gene can be exchanged for any other DNA of interest through Cre/mutant lox-mediated recombination. We isolated trap clones, analyzed trapped genes, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp]. The number of registered ES cell lines was 1162 on 31 August 2013. We also established 454 mouse lines from trap ES clones and deposited them in the mouse embryo bank at the Center for Animal Resources and Development, Kumamoto University, Japan. The EGTC database is the most extensive academic resource for gene-trap mouse lines. Because we used a promoter-trap strategy, all trapped genes were expressed in ES cells. To understand the general characteristics of the trapped genes in the EGTC library, we used Kyoto Encyclopedia of Genes and Genomes (KEGG) for pathway analysis and found that the EGTC ES clones covered a broad range of pathways. We also used Gene Ontology (GO) classification data provided by Mouse Genome Informatics (MGI) to compare the functional distribution of genes in each GO term between trapped genes in the EGTC mouse lines and total genes annotated in MGI. We found the functional distributions for the trapped genes in the EGTC mouse lines and for the RefSeq genes for the whole mouse genome were similar, indicating that the EGTC mouse lines had trapped a wide range of mouse genes.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

In mouse, mutagenesis is a relevant and widely used strategy to elucidate gene function in vivo. Gene trapping in ES cells is a powerful and efficient tool for inducing insertional mutations and identifying responsible (trapped) genes (Brickman et al. 2010). Gene trap vectors contain a splice acceptor (SA) and/or a donor sequence and a reporter/selection marker gene. When the trap vector is integrated into an endogenous gene, a fusion transcript between the endogenous gene and the reporter or selection marker gene is produced. In many cases, the insertion of a gene trap vector results in gene disruption. The trapped (disrupted) gene can be determined easily using rapid amplification of cDNA 5′-ends (5′RACE), inverse polymerase chain reaction (PCR), or plasmid rescue. The mouse genome database can be searched using BLAT (BLAST-like Alignment Tool) to quickly identify trapped genes and their chromosomal localizations (UCSC Genome Browser [http://genome.ucsc.edu/cgi-bin/hgGateway]).

Large-scale gene trap projects have been performed by several academic research groups, and international collaborations have been organized through the International Gene Trap Consortium (IGTC) [http://www.genetrap.org] (Skarnes et al.2004; Nord et al. 2006). Sequence tags of the trapped genes deposited in the National Center for Biotechnology Information (NCBI) Genome Survey Sequence Database (dbGSS) by each research group were downloaded and subjected to the IGTC identification and annotation pipeline. IGTC contains over 120 000 ES trap lines covering about half the protein coding genes in mouse. These trap lines are freely available to the scientific community. However, it is difficult to cover all genes via gene trapping, because of the existence of hot spots of gene-trap integrations. To mutagenize all the protein coding genes, the International Knockout Mouse Consortium (IKMC) was organized, which allowed many knockout ES cell lines to be produced. The knockout ES cell lines are provided on request. Thus, gene-trapped and gene-targeted ES resources are now useful and important tools for researchers in the life sciences.

We developed an exchangeable gene trap method (Araki et al. 1995, 1997,2002, 2006, 1999, 2010; Taniwaki et al. 2005) that was optimized for post-insertional modification using a Cre/mutant lox recombination system. Our trap vector carries a SA and the promoterless β-galactosidase/neomycin-resistance fusion gene (β-geo) and three stop codons that are located before the β-geo start codon. Therefore, this vector is highly selective for integrations into the introns adjacent to the exon containing the start codon (Taniwaki et al. 2005).

In this study, we used a new version of the trap vector, analyzed more than 2000 trap clones, and constructed the database for Exchangeable Gene Trap Clones (EGTC) [http://egtc.jp], which will contribute to the IGTC project. In addition, we established mouse lines from gene trap ES cells, and registered the lines with CARD R-BASE, the database for cryopreserved embryos at Kumamoto University [http://cardb.cc.kumamoto-u.ac.jp/transgenic/index.jsp]. Currently, 454 mouse lines are available as frozen embryos or live animals.

Both the EGTC trap clones and most of the IGTC trap clones were obtained using the promoter-trap vector with the β-geo gene as the selection/reporter gene. Therefore, trapped genes with β-geo vectors can only be expressed in ES cells, and this restriction might affect the diversity of gene function among the trapped genes. Several reports have examined the distribution of integration site in trapped genes or expression levels in ES cells; however, until now, the functional diversity of trapped genes has never been investigated. Here, we examined the diversity of trapped genes in EGTC using the KEGG (Kyoto Encyclopedia of Genes and Genomes) PATHWAY collection of pathway maps (graphical diagrams) that represent the molecular interaction and reaction networks, and the Gene Ontology (GO) classification provided by Mouse Genome Informatics (MGI) [http://www.informatics.jax.org] (Bult et al. 2013). We show that the EGTC ES clones and mouse lines cover a wide range of mouse genes.

Materials and methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

ES cell lines

Five ES cell lines were used in EGTC. KTPU10 and KTPU8 are feeder-free ES cell lines derived from the TT2 ES cell line, which was established from the F1 embryo of C57BL/6 and CBA (Yagi et al. 2013). KMB6-6 is a feeder-free ES cell line established from C57BL/6J, and KAB6 is a feeder-dependent ES cell line established from B6(Cg)-Tyrc−2J/JCard. The Mol/MSM-1 ES cell line is derived from the MSM/Ms line (Araki et al. a).

Cell culture and electroporation

The ES cell lines were grown as described previously (Nakahara et al. 2013). For electroporation with gene trap vectors, SpeI-digested plasmid DNA was used. ES cells were suspended in 0.8 mL of phosphate-buffered saline (PBS), electroporated using a Bio-Rad Gene Pulser set at 800 V and 3 μF, and then fed with medium supplemented with 180–200 μg/mL G418 after 48 h. Selection was maintained for 7 days, and the colonies were picked, expanded, and stocked.

Plasmids

Trap vectors, pU-21, pU-21B, pU-21T, and pU-21W (Fig. 1) were used in this study. The vectors were constructed from pU-17 (Taniwaki et al. 2005), which carries the intron-lox71 – SA – β-geoloxP – polyadenylation (pA) – lox2272-plasmid vector, by adding several modifications.

image

Figure 1. Schematic representations of the gene trap vectors. The pU-21T vector has been described previously (Nakahara et al. 2013). The four trap vectors contain 1.8 kb of an intron and a splice acceptor (SA) sequence from the mouse En-2 gene, the β-geo gene, FRT site (F), and a polyadenylation signal (pA). lox71 (left-element mutant lox) has 5-bp mutations in the inverted repeat region, Lox2272 (heterospecific lox) has 2-bp mutations in the spacer region, and Lox511 (heterospecific lox) has a single mutation in the spacer region. A lox71 site is located within the intron sequence, and loxP, lox2272, and lox511 sites are downstream of the β-geo, pA, and pSP73 vector sequences, respectively. The vectors all have a loxP-pA cassette flanked by FRT sites. (a) The pU-21 and pU-21B vectors have three stop codons in-frame with the ATG of β-geo. (b) The pU-21T and pU-21W vectors have three stop codons in all three frames upstream of the ATG of β-geo. The β-geo-less CpG in pU-21W resulted from the fusion of β-galactosidase and neomycin phosphotransferase with reduced CpG motifs.

Download figure to PowerPoint

In the pU-21 vector, the cryptic SA sequence at the 5′-end of the intron sequence was deleted, the pA signal from the SV40 T-antigen was added to the β-geo gene, and two Flp recombinase target (FRT) sites were placed before the loxP and lox2272 sites. The pU-21B vector was constructed from pU-21 by moving the position of the lox71 site to 300-bp upstream of the SA site to decrease the deletion rate of the lox71 site during the vector integration event. In pU-21T, the En-2 exon sequence of pU-21B was changed to have three stop codons in each reading frame. In pU-21W, positions 2290–4940 of pU-21T were replaced with positions 2529–5154 of pWHERE (InvivoGen, France) to reduce the number of CpG motifs.

The pU-21, pU-21B, pU-21T, and pU-21W sequence data have been submitted to the DDBJ/EMBL/GenBank databases under accession numbers AB212855, AB255647, AB255648, and AB427140, respectively.

Analysis of trap clones and annotation of trapped genes

Southern blotting, 5′RACE, inverse PCR, and plasmid rescue were performed as described previously (Nakahara et al. 2013). The DNA sequences obtained by 5′RACE, inverse PCR, and plasmid rescue were first compared to the trap vector sequences, vector sequences were removed, and the sequences were submitted to the NCBI dbGSS. The sequences were compared with major repetitive sequences, namely, the genomic 45S rRNA sequences, L1 repetitive sequences, and early transposon (ETn) sequences. Sequences that did not match any of the repetitive sequences were analyzed using the BLAT (BLAST-like Alignment Tool) search at UCSC Genome Bioinformatics [http://genome.ucsc.edu] and the BLAST search against nr-nt (non-redundant all database including daily updates) at GenomeNet [http://www.genome.jp/tools/blast/] or NCBI [http://blast.ncbi.nlm.nih.gov/Blast.cgi], to identify the trapped genes. Then the gene name and gene symbol were determined according to the MGI annotations. When there was no corresponding gene in MGI, the gene name annotated by NCBI or UCSC was used. When there was no corresponding gene in MGI, NCBI, or UCSC but expressed sequence tags (ESTs) obtained by 5′RACE existed in UCSC, the trapped genes were classified as “EST”. When there was no corresponding gene or EST, the trapped genes were classified as “New”. The annotation results were confirmed by reverse transcription (RT)-PCR using gene specific primers and anti-sense primers located downstream of the RT primer used for first-strand cDNA synthesis.

Production of chimeric mice and establishment mouse lines

Chimeric mice were produced by aggregation of ES cells with 8-cell embryos of ICR mice (Nippon Clea) or C57BL/6 females (Nippon Clea) mated with BDF1 males (Nippon Clea). Chimeric male mice were mated with C57BL/6 females to obtain F1 heterozygotes. The DNA from the F1 heterozygous males was subjected to Southern blotting to confirm that the band pattern matched that of DNA from the original trap ES clones. Then, the F1 males were deposited to the embryo and sperm bank at the Center for Animal Resources and Development (CARD) in Kumamoto University, and sperm as well as two-cell embryos of the trap lines have been cryopreserved. CARD R-BASE is a resource database for cryopreserved embryos, and is open to the scientific community [http://cardb.cc.kumamoto-u.ac.jp/transgenic/index.jsp]. A supply system of embryos has been established in CARD and trap mouse lines can be supplied on request.

Animal care was in accordance with the guidelines for animal and recombinant DNA experiments of Kumamoto University. Animal studies were approved by the Ethics Committee of the Center for Animal Resources and Development, Kumamoto University.

KEGG pathway analysis

The KEGG web service [http://www.genome.jp/kegg/kegg2.html] was used for pathway analysis. KEGG was developed by the Kanehisa Laboratories in the Bioinformatics Center of Kyoto University and the Human Genome Center of The University of Tokyo. KEGG2 is a main entry site to the KEGG web service. When a trapped gene mapped to a KEGG gene, we checked to see if KEGG pathway information was available for that gene. When the trapped gene was mapped to a KEGG pathway, we examined the first-level and second-level categories of each pathway using the KEGG PATHWAY database [http://www.genome.jp/kegg/pathway.html]. The number of trapped gene that mapped to each second-level category was counted, and the number of genes in second-level categories that were in the same first-level category was summed up.

GO analysis

The MGI Gene Ontology Browser [http://www.informatics.jax.org/searches/GO_form.shtml] was used for GO classification. The “MGI Gene Detail” page has “Gene Ontology (GO) classification”, which links to the “Gene Ontology Classifications” page. All GO terms related to the gene of interest are listed in “Tabular Form” in this page. For example, the Fto gene that is trapped in the Ayu21-T95 clone has 32 annotations (GO terms): 21 GO terms for biological process, one GO term for cellular component, and 10 GO terms for molecular function. Each GO term (classification term) is linked to the “Gene Ontology Browser Term Detail” page (Fig. S1).

We performed two kinds of analyses: (i) GO term direct assay; and (ii) second-level category assay, independently in each of the three GO categories, biological process, cellular component, and molecular function. For the GO term direct assay, all the GO terms annotated to each trapped gene were listed, and the number of genes for each GO term was counted.

The biological process, cellular component, and molecular function categories contained 23, 20, and 23 second-level categories, respectively. GO terms usually have multiple path trees, and the number of path trees for the “adipose tissue development” GO term is shown in a representative Gene Ontology Browser page (Fig. S1). In each of the categories, the second-level category is positioned at the top of the path tree. For the second-level category assay, the second-level category in each path tree was listed and redundant second-level categories within one EGTC clone were removed. Redundant categories arise because one gene usually has multiple GO terms and one GO term has multiple path trees. Then, the number of trapped lines that contributed to each second-level category was counted.

To compare the GO term distribution pattern between the EGTC clones and the whole mouse genome, the number of RefSeq genes annotated to each second-level category was examined using the MGI Gene Ontology Browser on 30 July 2013.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

Generation of the EGTC library

In this study, we used four trap vectors containing the SA and β-geo gene (Fig. 1). pU-21 and pU-21B have three stop codons, all of which are in-frame with the β-geo gene in the SA sequence. We have shown previously that pU-17, which carries the same SA sequence, was highly selective for integrations in the introns adjacent to the exon containing the start codon (Taniwaki et al. 2005). In pU-21T and pU-21W, three stop codons were positioned to correspond to all three frames (Fig. 2a) in order to completely terminate possible upstream translations. Theoretically, it is expected that the integration site of the vector is restricted to the upstream region of the start codon of a trapped gene. Thus, these Stop-ATG vectors were designed to act as promoter trap vectors. All the trap vectors carry two mutant lox sites and a wild loxP site for post-modification of the trap allele through Cre-mediated cassette exchange.

image

Figure 2. Integration sites of Stop-ATG vectors into protein-coding trapped genes. (a) Schematic representation of the trap vector pU-21T and the sequence at the junction of the SA and β-geo gene. pU-21T contains 1.8-kb of an intron and a splice acceptor (SA) sequence from the mouse En2 gene, the β-geo gene, and a polyadenylation signal (pA). (b) Distribution of integration sites of the trap vectors based on the exon–intron structure of 778 protein-coding trapped genes. Exon +1 contains the start codon of the trapped genes. “Intron −1” means the upstream intron just before the ATG exon. “Intron +1” means the downstream intron just after the ATG exon.

Download figure to PowerPoint

The trap vectors were introduced into ES cells through electroporation, and trap clones were isolated and stocked. We selected the trap clones that retained all three lox sites and single-copy integration by PCR and Southern blotting, and then analyzed the trapped genes in each clone by 5′RACE to identify the exon sequence fused to SA-β-geo. For the several trap clones in which 5′RACE failed, we analyzed their genomic DNA by inverse PCR or plasmid rescue to identify the genomic sequences flanking the trap vector. We found that most of the 5′RACE-failed clones trapped the rRNA gene (Nakahara et al. 2013); therefore, we did not analyze these flanking genomic sequences further.

The sequences obtained by 5′RACE, inverse PCR, and plasmid rescue were submitted to the dbGSS. Until the end of June 2013, we submitted a total of 1162 GSSs to dbGSS. Of these, 89.1% are known genes, 4.6% are ESTs, and 6.4% are new genes, and the chromosomal localizations of 976 of the GSSs were determined by BLAT searches on the UCSC Genome Browser. The results from these analyses, including the 5′RACE sequences, names of trapped genes, synonyms, gene symbols, chromosomal locations, genome maps, accession numbers of the GSSs in dbGSS, and homology search results, are available in the EGTC database [http://egtc.jp].

Integration of Stop-ATG vectors in introns around the ATG-containing exon

As mentioned above, the new trap vectors, pU-21, pU-21B, pU-21T, and pU-21W, should work as promoter-trap vectors. We examined the integration sites in 778 protein-coding trapped genes in the context of their exon–intron structures. We found that in 60% of the trap clones (467 clones) the vectors were integrated into the upstream intron just before the ATG exon or into the downstream intron just after the ATG exon, as shown in Figure 2b. The number of trap clones in which the vector was integrated in introns +4 downstream of the ATG exon was only 42 (5.4%). Thus, our Stop-ATG vectors showed a strong bias for integration into the 5′-regions of the trapped genes.

In a previous study, we suggested a possible reason for why trap vectors are integrated into the upstream intron just before the ATG exon or into the downstream intron just after the ATG exon (Taniwaki et al. 2005). Briefly, it is known that upstream AUG codons and open reading frames (uAUG/uORF) are common features of mRNA for mainly the negative control of translation from the main AUG codon (Morris & Geballe 2000; Kozak 2002). It has been estimated that about half of human mRNA has uAUG/uORF (Suzuki et al. 2000). Leaky scanning and reinitiation mechanisms of ribosomes allow the downstream main AUG codons to be accessed by the translation machinery (Kozak 2002). Although uAUG/uORF diminish translation of the main ORF, it was reported that approximately 40% of ribosomes were able to initiate twice and approximately 25% were able to initiate three times (Wang & Rothnagel 2004). Because an ORF started from an endogenous AUG could act as a uORF in “downstream-initiated” trap clones, it can be assumed that the translation initiation of β-geo will be at a somewhat lower level than that of the endogenous trapped gene, but enough for the acquirement of G418 resistance.

Chromosomal distribution of trapped genes

We compared the distribution of the gene trap alleles on the mouse chromosomes in the EGTC and IGTC trap clones and found that the chromosomal distribution patterns were similar (Fig. 3). This result is reasonable, because most of the trap vectors used in IGTC clones were also promoter-trap types carrying the β-geo gene. Next, we examined the distribution of genes in the whole mouse genome based on the RefSeq genes (Fig. 3). We found that there were a high number of gene trap insertions into chromosomes that contained a high number of RefSeq genes such as chromosomes (Chr) 2, 4 and 11, as reported previously (Hansen et al. 2003). The clear exception was Chr X, because of the direct induction of the null mutation of Chr X genes in male ES cells. Another exception was Chr 7, the most gene-rich mouse chromosome with 3194 genes or 8.9% of all mouse genes. The cause of low trap efficiency in Chr 7 is unclear, but the ratio of genes expressed in undifferentiated ES cells might be less than in other chromosomes.

image

Figure 3. Distribution of trapped genes on the mouse chromosomes in the Exchangeable Gene Trap Clones (EGTC) and International Gene Trap Consortium (IGTC) trap clones. The red bar shows the distribution of trapped genes registered in the EGTC database. The blue bar shows the distribution of trapped genes listed in the IGTC database. For comparison, the green bar shows the distribution of RefSeq genes from the National Center for Biotechnology Information (NCBI) mouse genome assembly (build 36) and patch release 2 (GRCm38.p2) provided by Genome Reference Consortium [http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/data/].

Download figure to PowerPoint

KEGG pathway analysis of the EGTC cell lines

Because our gene-trap system is promoter trapping, only genes that are expressed in ES cells are trapped, and this selection might reduce the diversity of gene functions in the EGTC resource. To know whether there were any special functional tendencies in the trapped genes, we used the KEGG web server. Eight hundred of the 1162 genes registered in the EGTC database could be annotated to KEGG GENES, a collection of protein-coding genes. To assign functions to the trapped genes, we used KEGG PATHWAY, which is a collection of manually drawn pathway maps that represent knowledge on molecular interaction and reaction networks. The total number of KEGG pathways is 352, and each KEGG pathway is classified into 42 second-level categories, which are integrated into six-first-level categories, “1. Metabolism”, “2. Genetic Information Processing”, “3. Environmental Information Processing”, “4. Cellular Processes”, “5. Organismal System”, and “6. Human Diseases”.

Among the 800 EGTC genes that mapped to KEGG genes, 289 were assigned to KEGG pathways. We counted the number of KEGG pathways in each second- and first-level category to determine the functional tendencies of the trapped genes (Table S1). The numbers of trap clones that could be assigned KEGG pathways in each of the first- and second-level categories are shown as a cumulative bar chart in Figure 4. The most frequently occurring first-level category was “6. Human Diseases” (255 lines), with “6.1 Cancers: Overview” (65 lines) and “6.2 Cancers: Specific types” (66 lines) as the major second-level categories. This finding suggests that many of the genes that were expressed in the ES cells were involved in cancers. The most frequent second-level category of all was “3.2 Signal transduction” (130 lines), implying the importance of signal transduction pathways in maintaining the undifferentiated state of ES cells. In total, EGTC trapped genes were found in almost all the pathways (41/43 second-level categories, Table S1); thus, we concluded that there was no apparent gene function bias among the trapped genes.

image

Figure 4. Distribution of trap clones in the Exchangeable Gene Trap Clones (EGTC) cell lines assigned Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in the first- and second-level categories. The six-first-level categories are shown along the horizontal axis. The red number at the top of each bar is the total number of embryonic stem (ES) cell lines mapped to the corresponding first-level category. The cumulative bars show the proportion of second-level categories included within each first-level category. The number of ES cell lines that mapped to each second-level category is in parentheses. Each second-level category is indicated by its code number. 1.0: Global map, 1.1: Carbohydrate metabolism, 1.2: Energy metabolism, 1.3: Lipid metabolism, 1.4: Nucleotide metabolism, 1.5: Amino acid metabolism, 1.6: Metabolism of other amino acids, 1.7: Glycan biosynthesis and metabolism, 1.8: Metabolism of cofactors and vitamins, 1.9: Metabolism of terpenoids and polyketides, 1.11: Xenobiotics biodegradation and metabolism, 1.12: Reaction module maps, 2.1: Transcription, 2.2: Translation, 2.3: Folding, sorting and degradation, 2.4: Replication and repair, 3.1: Membrane transport, 3.2: Signal transduction, 3.3: Signaling molecules and interaction, 4.1: Transport and catabolism, 4.2: Cell motility, 4.3: Cell growth and death, 4.4: Cell communication, 5.1: Immune system, 5.2: Endocrine system, 5.3: Circulatory system, 5.4: Digestive system, 5.5: Excretory system, 5.6: Nervous system, 5.7: Sensory system, 5.8: Development, 5.9: Environmental adaptation, 6.1: Cancers: Overview, 6.2: Cancers: Specific types, 6.4: Neurodegenerative diseases, 6.5: Substance dependence, 6.6: Cardiovascular diseases, 6.7: Endocrine and metabolic diseases, 6.8: Infectious diseases: Bacterial, 6.9: Infectious diseases: Viral, and 6.10: Infectious diseases: Parasitic.

Download figure to PowerPoint

Production and deposition of the EGTC mouse lines

The IGTC gene trap and IKMC knockout resources are both libraries of ES cell clones. Users who want to obtain a mutant mouse line have to start by producing chimeric mice. This step is time-consuming and costly, and there is always a risk of failure of germ line transmission. To avoid these problems, we have produced a resource of gene-trap mouse lines. In the first pU-21 and pu-21B trap clones, we used all the trap clones that we judged to be single-copy integrations for chimeric mice production. In the later pU-21T and pU-21W trap clones, we chose trap clones in which the 5′-RACE was successful but in which the trapped genes were not repetitive elements or rRNA genes, and we also eliminated redundant trap clones. The chimeric mice that we obtained were mated with C57BL/6 females, and the F1 males were deposited to CARD R-BASE. A summary of how the trap ES clones and mouse lines were established is given in Table S2. The 454 trap mouse lines that we obtained make up the largest gene-trap mouse library collection in IGTC. All the trap mouse lines are available to the scientific community on request.

GO analysis of the EGTC mouse lines

Because the number of trapped genes that could be assigned KEGG pathways was limited (113 genes among 454 trap lines) (Table S1), we used the GO classification provided by MGI to analyze the trapped genes in the EGTC mouse lines and found that 313 of the trapped genes were annotated with at least one GO term. GO has a directed acyclic graph structure and each term is positioned in a hierarchical structure with relationships to one or more other terms in the same domain (and sometimes to other domains) as shown in the example in Figure S1. GO is organized under three categories: biological process, cellular component, and molecular function, and 272, 288, and 275 of the trapped genes had GO terms in each of the categories, respectively.

We performed a GO term direct assay in which we counted the number of trapped genes that were annotated to each GO term. The results of the GO term direct assay for biological process are listed in Table 1. The 272 trapped genes had 1757 GO terms associated with them, indicating that the EGTC mouse lines cover quite a variety of biological processes. The three most frequently found biological process GO terms were “transcription, DNA-dependent” (61 lines, 22.4% of the 272 mouse lines), “regulation of transcription, DNA-dependent” (60 lines, 22.1%) and “transport” (29 lines, 10.7%), reflecting the high transcription activity in ES cells. The results of the GO term direct assay for cellular component are listed in Table 2. The 288 trapped genes had 1092 GO terms associated with them. Just over half (50.7%) of the trapped genes were annotated to “nucleus”, which corresponded to the high percentage of GO terms involved in transcription in biological process. The results of the GO term direct assay for molecular function are listed in Table 3. The 275 trapped genes had 1275 GO terms associated with them. The most frequently found molecular function GO terms were “protein binding” (88 lines, 32.0% of the 275 mouse lines), “metal ion binding” (73 lines, 26.5%), “DNA binding” (50 lines, 18.2%), and “nucleotide binding” (47 lines, 17.1%).

Table 1. Gene ontology (GO) term direct assay for biological process of annotated trapped genes in the Exchangeable Gene Trap Clones (EGTC) mouse lines. The top 18 GO terms are listed (272 mouse lines, 1757 GO terms)
RankBiological process GO termMouse lines%
1Transcription, DNA-dependent6122.4
2Regulation of transcription, DNA-dependent6022.1
3Transport2910.7
4Multicellular organismal development248.8
5Negative regulation of transcription from RNA polymerase II promoter238.5
6Phosphorylation186.6
7Cell differentiation176.3
7Positive regulation of transcription, DNA-dependent176.3
7Protein phosphorylation176.3
10Metabolic process165.9
10Chromatin modification165.9
12Cell cycle155.5
13mRNA processing145.1
14Negative regulation of transcription, DNA-dependent134.8
14Positive regulation of transcription from RNA polymerase II promoter134.8
14Protein transport133.8
17Apoptotic process124.4
17Oxidation-reduction process124.4
Table 2. Gene ontology (GO) term direct assay for cellular component of annotated trapped genes in the Exchangeable Gene Trap Clones (EGTC) mouse lines. The top 18 GO terms are listed (288 mouse lines, 1092 GO terms)
RankCellular component GO termMouse lines%
1Nucleus14650.7
2Cytoplasm12744.1
3Membrane9733.7
4Integral to membrane5920.5
5Plasma membrane5218.1
6Intracellular3311.5
7Mitochondrion289.7
8Cytoskeleton258.7
9Golgi apparatus238.0
10Cytosol206.9
11Cell junction165.6
12Nucleolus155.2
13Protein complex144.9
14Endoplasmic reticulum134.5
14Perinuclear region of cytoplasm134.5
16Extracellular region124.2
16Neuronal cell body124.2
16Transcription factor complex124.2
Table 3. Gene ontology (GO) term direct assay for molecular function of annotated trapped genes in the Exchangeable Gene Trap Clones (EGTC) mouse lines. The top 20 GO terms are listed (275 mouse lines, 1275 GO terms)
RankMolecular function GO termMouse lines%
1Protein binding8832.0
2Metal ion binding7326.5
3DNA binding5018.2
4Nucleotide binding4717.1
5Transferase activity3412.4
6Zinc ion binding3312.0
7ATP binding3211.6
8Nucleic acid binding279.8
8RNA binding279.8
10Hydrolase activity228.0
11Kinase activity207.3
12Sequence-specific DNA binding transcription factor activity196.9
13Sequence-specific DNA binding186.5
14Chromatin binding155.5
15Protein homodimerization activity145.1
16Catalytic activity134.7
16Protein kinase activity134.7
18Ligase activity124.4
18Protein domain specific binding124.4
18Protein heterodimerization activity124.4

Although the GO term direct assays indicated that the trapped genes had a wide range of functions, because of the large numbers and broad distribution of the GO terms, it was difficult to estimate the general tendency of gene function in the EGTC mouse lines. Moreover, it needs special attention that GO term direct assay has a problem about redundancy of genes due to highly hierarchal structure of GO term. Since the same GO term can appear to multiply in the same hierarchal tree, simple comparison of gene numbers annotated to a single GO term might not accurately reflect distribution of gene function.

To better understand the functional distribution of trapped gene in the EGTC mouse lines, we performed a second-level category assay and compared the trapped genes in the mouse lines to the RefSeq genes (i.e. all the mouse genes) in the mouse genome assembly. We defined the second-level category as the GO terms positioned at the top of the path tree in each of the three categories (see Fig. S1 for an example). We counted the number of trapped genes in the mouse lines and the number of RefSeq genes that contained each second-level term. Figure 5 shows the results of the second-level category assay and the percentage of genes associated with each GO term represented by heat maps. In all three ontologies, the top two most frequent second-level categories were identical between the trapped genes in the EGTC mouse lines and RefSeq genes, indicating that there was no special bias in the trapped genes in EGTC lines. In addition, the low frequent categories under 1% were also identical between the trapped genes and RefSeq genes. All the data related to the second-level categories and GO terms are listed in Tables S3–S5. These results show that the functional distribution of trapped genes in EGTC mouse lines have similar tendencies as the RefSeq genes for the whole mouse genome.

image

Figure 5. Gene ontology (GO) second-level category assay of the Exchangeable Gene Trap Clones (EGTC) mouse lines and heat maps showing the percentage of genes associated with each GO term. (a) Biological process had 23 associated second-level category GO terms. A total of 272 EGTC mouse lines and 24 969 RefSeq genes were annotated to a term under biological process. (b) Cellular component had 20 associated second-level category GO terms. A total of 288 EGTC mouse lines and 24 780 RefSeq genes were annotated to a term under cellular component. (c) Molecular function had 21 associated second-level category GO terms. A total of 275 EGTC mouse lines and 24 367 RefSeq genes were annotated to molecular function. Because most of genes were associated with several GO terms and path trees, the sum of the percentages can be over 100%. The heat maps indicate the frequency of occurrence of each second-level category: image, 0–1%; image , 1–5%; image, 5–10%; image, 10–20%; image, 20–40%; and image, 40–100%.

Download figure to PowerPoint

Features of the EGTC database

From the EGTC home page [http://egtc.jp], there are entrance points into six menus: “About EGTC”, “Database Access”, “System”, “Related Links”, “Contact Us”, and “Topics”. The “Database Access” page allows users to search for a specific clone using either “Search By Keyword” or “Search By Homology”. Users can search the EGTC database by key words under gene name, gene symbol, chromosome number, accession number, CARD ID, EGTC ID or by sequence homology. The “Database Access” page also displays information about the database under “Statistics of Database”. An example of the search results using key word “solute carrier family” with order by Card ID is given in Figure 6. Nine mouse lines that matched the search criteria are listed in the output. By clicking on an EGTC ID (21–128 in this example), an information page with details about the selected trap clone is displayed. The information on this page is displayed in five sections: information about the trapped gene, information about other clones with the same trapped gene, information about the trap clone, homology search results, and information about the mouse line. In the information about trapped gene section, links are provided to other sites that contain information about the gene. In this example, the links are to “UCSC Genome Browser (cover whole gene)”, “MGI Gene Detail”, “NCBI Gene”, “KEGG GENES”, “IGTC Gene Annotation”, “EST Profile of NCBI UniGene”, and “NCBI UniGene”. In the information about the trap clone section, the external links are to “NCBI GSS” via the accession number, and to “UCSC Genome Browser (cover GSS sequence)”, and “IGTC Cell Line Annotation”. In the information about mouse line section, there are two links, one to “CARD R-BASE” via the CARD ID and the other to “International Mouse Strain Resource (IMSR)”. Furthermore, the result of the BLAT search for the GSS sequence against the Mouse July 2007 Assembly on the UCSC Genome Browser is displayed in the information about the trap clone section. In this example, the trap vector pU-21 was integrated in the first intron of solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 (Slc7a2) gene on chromosome 8.

image

Figure 6. Results of a key word search of the Exchangeable Gene Trap Clones (EGTC) database. (1) Top page of the EGTC database [http://egtc.jp]. (2) Database Access page. In this example, the key word was “solute carrier family” and the output was ordered by Card ID. (3) Upper part of the information page for the Ayu21-128 clone. (4) Lower part of the detailed information page for the Ayu21-128 clone. (5) BLAT search result of GSS sequence (5′-RACE product) of the Ayu21-128 clone against the Mouse July 2007 Assembly on the UCSC Genome Browser.

Download figure to PowerPoint

All the results reported here have been integrated into the EGTC database, which has the following features:

  1. Promoter trap using Stop-ATG vectors: The trap vector showed a strong bias for integration into the 5′-regions of the trapped genes. This is a desirable character to obtain null mutation.
  2. Post-insertional modification: The EGTC trap clones are capable of post-insertional modification. The feature will be discussed in detail later.
  3. Production of mouse lines: The number of registered ES cell lines is not huge. However, we have established more than 450 mouse lines, and deposited them to CARD R-BASE. These mouse lines are ready for use by the scientific community.
  4. Information about mouse lines: The EGTC database makes available information not only for trapped genes but also for mouse lines. The EGTC IDs are linked to an information page with details about the selected trap clone. The “Mouse Information” section on this page displays the following information: “CARD ID”, “Strain Name”, “Internal Code”, and “Description”.

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

We developed an exchangeable gene trap system, performed gene trapping, identified trapped genes in 1162 trap ES clones, and established 454 mouse EGTC lines. The information about the trap clones, including the GSS sequences, annotations of the trapped genes, insertion sites, and links to other databases, have been organized in the EGTC database. In addition, the GSSs of the EGTC trap clones are available in the IGTC, MGI, IMSR, and UCSC Genome Browser databases. Thus, the EGTC database is widely accessible to the scientific community, and all EGTC mouse lines can be obtained from the CARD R-BASE resource in Kumamoto University upon request. More than 50 groups have used EGTC mouse lines, and some of their research results have been published (Hoshii et al. 2007; Vanhoutteghem et al. 2009; 1999Yamashita et al.2009; Ito et al.2010; Kim et al. b; Park et al. 2012; Cid et al.2013; Kappei et al. 2013; Nakahara et al. 2013). In these papers, the phenotypes of the following mouse lines were analyzed: Ayu21-18 (trapped gene; Bnc2), Ayu21-81 (Kcnk5/TASK-2), Ayu21-127 (Lgr4), Ayu21-T2 (Msi2), Ayu21-T93 (Ccdc55/NSrp70), Ayu21-T346 (Hmbox1/Hot1), Ayu21-W34 (Cadm1/Igsf4), and Ayu21-B6T44 (Cd99). In a previous study, we also discussed the usefulness of the EGTC database to the IGTC project (Araki et al. 2009b). At that time, June 2008, a total of 469 ES cell lines were registered in EGTC, and 170 mouse lines including the eight mouse lines listed above, were established and deposited in CARD. In the present study, the number of registered ES cell lines in EGTC was 1162 on 31 August 2013, and 454 mouse lines were established and deposited in CARD. Furthermore, we reported the usefulness of Mol/MSM-1 ES cells established from MSM/Ms mice (Nakahara et al. 2013), and as a result, Ayu21-MT1–Ayu21-MT142 (94 ES cell lines) were registered in EGTC, and 26 mouse lines were established and deposited in CARD. Consequently, 599 (1162–469–94) ES cell lines and 258 (454–170–26) mouse lines are reported here for the first time.

Although the IKMC and IGTC have a huge number of established mutant ES cell lines (Skarnes et al. 2011), the number of mutant mouse lines that are available is not so many (1006 lines in KOMP CSD, 551 lines in KOMP Regeneron, 1013 lines in EUCOMM/EUCOMMTools, and four lines in NorCOMM, on 8 September 2013). We believe the EGTC trap mouse lines are a useful mouse resource.

We analyzed the distribution of trapped gene functions using KEGG PATHWAY and found that the EGTC clones covered a broad range of pathways in spite of the restriction of their expression in ES cells (Fig. 4). However, several KEGG pathways, such as “Biosynthesis of Other secondary metabolites”, “Sensory system”, “Immune diseases”, and “Endocrine and metabolic diseases” were rarely found in the trapped genes, indicating that these pathways may not be working in the ES cells.

A GO term direct assay of trapped genes in the EGTC mouse lines revealed that the number of GO terms exceeded 1000 in the three ontologies and covered a wide range of the functions. Although functions related in transcription seemed to be more frequent, it was difficult to detect any special tendencies. The second-level category assay, which compared the distribution patterns of GO terms between the trapped genes in the EGTC clones and the RefSeq sequences from the whole mouse genome, clearly showed that the distribution of gene functions in the trapped genes and RefSeq genes was similar. Therefore, we concluded that the promoter trap strategy in ES cells was useful in capturing a wide range of gene functions.

As described previously (Araki et al. 2009b), because our trap vectors have several mutant lox sites, the EGTC trap clones are capable of post-insertional modification (Fig. 1). In over 90% of the EGTC trap clones, the Stop-ATG vectors were inserted between the −3 and +3 introns of the trapped genes. This property is very useful when post-insertional modifications are going to be performed. For example, we produced Cre-driver mouse lines using the EGTC trap clones. In addition, recombination between lox71 and loxP by Cre recombinase induces deletion of the SA and the β-geo gene resulting in recovery of transcription from the endogenous gene. Thus, the relationship between the gene disruption and phenotype observed in trap mice can be confirmed.

Recently, many new genome editing technologies have been reported. They include the zinc-finger nuclease (Urnov et al. 2005; Miller et al. 2007) and transcription activator-like effector nuclease (TALEN) (Miller et al. 2011) technologies, and the CRISPR/Cas systems (Cong et al. 2013; Mali et al.2013; Wang et al. 2013). Simple knock-outs can be made more easily and directly using microinjection into pronucleus. However, knock-in of a gene is still required for ES cell culture and our exchangeable gene trap system using Cre-mediated recombination is useful as a tool for both knock-out and knock-in.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

This work was supported by Grant-in-Aid for Scientific Research (KAKENHI) Priority Areas “Integrative Research Toward the Conquest of Cancer” (17012018 to KY), (S) (No. 21220010 to KY), (B) (No. 19300149 to KA), (B) (No. 23310135 to KA), (B) (No. 20300146 to MA), and (B) (No. 23300159 to MA) from the Japan Society for the Promotion of Science (JSPS). We thank T. Keida, N. Koga, Y. Kimachi, K. Hori, Y. Tsuruta, Y. Sakumura, K. Haruna, R. Minato, and T. Egami for their technical assistance.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information
  • Araki, K., Araki, M., Miyazaki, J. & Vassalli, P. 1995. Site-specific recombination of a transgene in fertilized eggs by transient expression of Cre recombinase. Proc. Natl Acad. Sci. USA 92, 160164.
  • Araki, K., Araki, M. & Yamamura, K. 1997. Targeted integration of DNA using mutant lox sites in embryonic stem cells. Nucleic Acids Res. 25, 868872.
  • Araki, K., Araki, M. & Yamamura, K. 2002. Site-directed integration of the cre gene mediated by Cre recombinase using a combination of mutant lox sites. Nucleic Acids Res. 30, e103.
  • Araki, K., Araki, M. & Yamamura, K. 2006. Negative selection with the Diphtheria toxin A fragment gene improves frequency of Cre-mediated cassette exchange in ES cells. J. Biochem. 140, 793798.
  • Araki, K., Imaizumi, T., Sekimoto, T. & Vassalli, P. 1999. Exchangeable gene trap using the Cre/mutated lox system. Cell Mol Biol (Noisy-le-grand) 45, 737750.
  • Araki, K., Okada, Y., Araki, M. & Yamamura, K. 2010. Comparative analysis of right element mutant lox sites on recombination efficiency in embryonic stem cells. BMC Biotechnol. 10, 29.
  • Araki, K., Takeda, N., Yoshiki, A., Obata, Y., Nakagata, N., Shiroishi, T., Moriwaki, K. & Yamamura, K. 2009a. Establishment of germline-competent embryonic stem cell lines from the MSM/Ms strain. Mamm Genome 20, 1420.
  • Araki, M., Araki, K. & Yamamura, K. 2009b. International Gene Trap Project: towards gene-driven saturation mutagenesis in mice. Curr. Pharm. Biotechnol. 10, 221229.
  • Brickman, J. M., Tsakiridis, A., To, C. & Stanford, W. L. 2010. A wider context for gene trap mutagenesis. Methods Enzymol. 477, 271295.
  • Bult, C. J., Eppig, J. T., Blake, J. A., Kadin, J. A., Richardson, J. E. & Mouse Genome Database G. 2013. The mouse genome database: genotypes, phenotypes, and models of human disease. Nucleic Acids Res. 41, D885D891.
  • Cid, L. P., Roa-Rojas, H. A., Niemeyer, M. I., Gonzalez, W., Araki, M., Araki, K. & Sepulveda, F. V 2013. TASK-2: a K2P K(+) channel with complex regulation and diverse physiological functions. Front Physiol. 4, 198.
  • Cong, L., Ran, F. A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P. D., Wu, X., Jiang, W., Marraffini, L. A. & Zhang, F 2013. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819823.
  • Hansen, J., Floss, T., Van Sloun, P., Fuchtbauer, E. M., Vauti, F., Arnold, H. H., Schnutgen, F., Wurst, W., Von Melchner, H. & Ruiz, P 2003. A large-scale, gene-driven mutagenesis approach for the functional analysis of the mouse genome. Proc. Natl Acad. Sci. USA 100, 99189922.
  • Hoshii, T., Takeo, T., Nakagata, N., Takeya, M., Araki, K. & Yamamura, K. 2007. LGR4 regulates the postnatal development and integrity of male reproductive tracts in mice. Biol. Reprod. 76, 303313.
  • Ito, T., Kwon, H. Y., Zimdahl, B., Congdon, K. L., Blum, J., Lento, W. E., Zhao, C., Lagoo, A., Gerrard, G., Foroni, L., Goldman, J., Goh, H., Kim, S. H., Kim, D. W., Chuah, C., Oehler, V. G., Radich, J. P., Jordan, C. T. & Reya, T. 2010. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 466, 765768.
  • Kappei, D., Butter, F., Benda, C., Scheibe, M., Draskovic, I., Stevense, M., Novo, C. L., Basquin, C., Araki, M., Araki, K., Krastev, D. B., Kittler, R., Jessberger, R., Londono-Vallejo, J. A., Mann, M. & Buchholz, F. 2013. HOT1 is a mammalian direct telomere repeat-binding protein contributing to telomerase recruitment. EMBO J. 32, 16811701.
  • Kim, H. R., Jeon, B. H., Lee, H. S., Im, S. H., Araki, M., Araki, K., Yamamura, K., Choi, S. C., Park, D. S. & Jun, C. D 2011a. IGSF4 is a novel TCR zeta-chain-interacting protein that enhances TCR-mediated signaling. J. Exp. Med. 208, 25452560.
  • Kim, Y. D., Lee, J. Y., Oh, K. M., Araki, M., Araki, K., Yamamura, K. & Jun, C. D.. 2011b. NSrp70 is a novel nuclear speckle-related protein that modulates alternative pre-mRNA splicing in vivo. Nucleic Acids Res. 39, 43004314.
  • Kozak, M. 2002. Pushing the limits of the scanning mechanism for initiation of translation. Gene 299, 134.
  • Mali, P., Yang, L., Esvelt, K. M., Aach, J., Guell, M., Dicarlo, J. E., Norville, J. E. & Church, G. M 2013. RNA-guided human genome engineering via Cas9. Science 339, 823826.
  • Miller, J. C., Holmes, M. C., Wang, J., Guschin, D. Y., Lee, Y. L., Rupniewski, I., Beausejour, C. M., Waite, A. J., Wang, N. S., Kim, K. A., Gregory, P. D., Pabo, C. O. & Rebar, E. J 2007. An improved zinc-finger nuclease architecture for highly specific genome editing. Nat. Biotechnol. 25, 778785.
  • Miller, J. C., Tan, S., Qiao, G., Barlow, K. A., Wang, J., Xia, D. F., Meng, X., Paschon, D. E., Leung, E., Hinkley, S. J., Dulay, G. P., Hua, K. L., Ankoudinova, I., Cost, G. J., Urnov, F. D., Zhang, H. S., Holmes, M. C., Zhang, L., Gregory, P. D. & Rebar, E. J 2011. A TALE nuclease architecture for efficient genome editing. Nat. Biotechnol. 29, 143148.
  • Morris, D. R. & Geballe, A. P. 2000. Upstream open reading frames as regulators of mRNA translation. Mol. Cell Biol. 20, 86358642.
  • Nakahara, M., Tateyama, H., Araki, M., Nakagata, N., Yamamura, K. & Araki, K. 2013. Gene-trap mutagenesis using Mol/MSM-1 embryonic stem cells from MSM/Ms mice. Mamm. Genome 24, 228239.
  • Nord, A. S., Chang, P. J., Conklin, B. R., Cox, A. V., Harper, C. A., Hicks, G. G., Huang, C. C., Johns, S. J., Kawamoto, M., Liu, S., Meng, E. C., Morris, J. H., Rossant, J., Ruiz, P., Skarnes, W. C., Soriano, P., Stanford, W. L., Stryke, D., Von Melchner, H., Wurst, W., Yamamura, K., Young, S. G., Babbitt, P. C. & Ferrin, T. E. 2006. The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse. Nucleic Acids Res. 34, D642648.
  • Park, H. J., Byun, D., Lee, A. H., Kim, J. H., Ban, Y. L., Araki, M., Araki, K., Yamamura, K., Kim, I., Park, S. H. & Jung, K. C. 2012. CD99-dependent expansion of myeloid-derived suppressor cells and attenuation of graft-versus-host disease. Mol. Cells 33, 259267.
  • Skarnes, W. C., Rosen, B., West, A. P., Koutsourakis, M., Bushell, W., Iyer, V., Mujica, A. O., Thomas, M., Harrow, J., Cox, T., Jackson, D., Severin, J., Biggs, P., Fu, J., Nefedov, M., De Jong, P. J., Stewart, A. F. & Bradley, A. 2011. A conditional knockout resource for the genome-wide study of mouse gene function. Nature 474, 337342.
  • Skarnes, W. C., Von Melchner, H., Wurst, W., Hicks, G., Nord, A. S., Cox, T., Young, S. G., Ruiz, P., Soriano, P., Tessier-Lavigne, M., Conklin, B. R., Stanford, W. L. & Rossant, J. 2004. A public gene trap resource for mouse functional genomics. Nat. Genet. 36, 543544.
  • Suzuki, Y., Ishihara, D., Sasaki, M., Nakagawa, H., Hata, H., Tsunoda, T., Watanabe, M., Komatsu, T., Ota, T., Isogai, T., Suyama, A. & Sugano, S 2000. Statistical analysis of the 5′ untranslated region of human mRNA using “Oligo-Capped” cDNA libraries. Genomics 64, 286297.
  • Taniwaki, T., Haruna, K., Nakamura, H., Sekimoto, T., Oike, Y., Imaizumi, T., Saito, F., Muta, M., Soejima, Y., Utoh, A., Nakagata, N., Araki, M., Yamamura, K. & Araki, K. 2005. Characterization of an exchangeable gene trap using pU-17 carrying a stop codon-beta geo cassette. Dev. Growth Differ. 47, 163172.
  • Urnov, F. D., Miller, J. C., Lee, Y. L., Beausejour, C. M., Rock, J. M., Augustus, S., Jamieson, A. C., Porteus, M. H., Gregory, P. D. & Holmes, M. C. 2005. Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature 435, 646651.
  • Vanhoutteghem, A., Maciejewski-Duval, A., Bouche, C., Delhomme, B., Herve, F., Daubigney, F., Soubigou, G., Araki, M., Araki, K., Yamamura, K. & Djian, P. 2009. Basonuclin 2 has a function in the multiplication of embryonic craniofacial mesenchymal cells and is orthologous to disco proteins. Proc. Natl Acad. Sci. USA 106, 1443214437.
  • Wang, H., Yang, H., Shivalila, C. S., Dawlaty, M. M., Cheng, A. W., Zhang, F. & Jaenisch, R. 2013. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910918.
  • Wang, X. Q. & Rothnagel, J. A. 2004. 5′-untranslated regions with multiple upstream AUG codons can support low-level translation via leaky scanning and reinitiation. Nucleic Acids Res. 32, 13821391.
  • Yagi, T., Tokunaga, T., Furuta, Y., Nada, S., Yoshida, M., Tsukada, T., Saga, Y., Takeda, N., Ikawa, Y. & Aizawa, S. 1993. A novel ES cell line, TT2, with high germline-differentiating potency. Anal. Biochem. 214, 7076.
  • Yamashita, R., Takegawa, Y., Sakumoto, M., Nakahara, M., Kawazu, H., Hoshii, T., Araki, K., Yokouchi, Y. & Yamamura, K. 2009. Defective development of the gall bladder and cystic duct in Lgr4- hypomorphic mice. Dev. Dyn. 238, 9931000.

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information
FilenameFormatSizeDescription
dgd12116-sup-0001-FigS1.pdfapplication/PDF195K

Fig. S1. Example of the Gene Ontology (GO) hierarchical structure. The second-level category, as defined in the present study, is shown. The GO term “adipose tissue development” has six path trees, and a part of one path tree is shown. In this path tree, the category is “biological process” and “developmental process” positioned just under it is the second-level category.

dgd12116-sup-0002-TableS1.docxapplication/PDF71K

Table S1. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of the trapped genes in the Exchangeable Gene Trap Clones (EGTC) cell lines and mouse lines. The numbers of cell and mouse lines in each second-level category are shown. The numbers of cell and mouse lines in each first-level category are shown in bold.

dgd12116-sup-0003-TableS2.docxapplication/PDF16K

Table S2. Establishment of the trap embryonic stem (ES) clones and mouse lines.

dgd12116-sup-0004-TableS3-S5.xlsxapplication/PDF491K

Table S3.Gene ontology (GO) terms and Second Level Categories for Biological process related to the trapped genes in the Exchangeable Gene Trap Clones (EGTC). No.1 represents the number of EGTC mouse lines related to each Second Level Category. No.2 represents the number of EGTC mouse lines related to each GO term.

Table S4.Gene ontology (GO) terms and Second Level Categories for Cellular component related to the trapped genes in the Exchangeable Gene Trap Clones (EGTC). No.1 represents the number of EGTC mouse lines related to each Second Level Category. No.2 represents the number of EGTC mouse lines related to each GO term.

Table S5.Gene ontology (GO) terms and Second Level Categories for Molecular function related to the trapped genes in the Exchangeable Gene Trap Clones (EGTC). No.1 represents the number of EGTC mouse lines related to each Second Level Category. No.2 represents the number of EGTC mouse lines related to each GO term.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.