Medicago truncatula is one of the model species for legume studies. In an effort to develop legume genetics resources, > 21 700 Tnt1 retrotransposon insertion lines have been generated.
To facilitate fast-growing needs in functional genomics, two reverse genetics approaches have been established: web-based database searching and PCR-based reverse screening. More than 840 genes have been reverse screened using the PCR-based approach over the past 6 yr to identify mutants in these genes. Overall, c. 84% (705 genes) success rate was achieved in identifying mutants with at least one Tnt1 insertion, of which c. 50% (358 genes) had three or more alleles.
To demonstrate the utility of the two reverse genetics platforms, two mutant alleles were isolated for each of the two floral homeotic MADS-box genes, MtPISTILATA and MtAGAMOUS. Molecular and genetic analyses indicate that Tnt1 insertions in exons of both genes are responsible for the defects in floral organ development.
In summary, we have developed two efficient reverse genetics platforms to facilitate functional characterization of M. truncatula genes.
Medicago truncatula is one of the model species for legume genetics, genomics and functional genomics studies. Over the last two decades, M. truncatula has been utilized for studies in many research areas, including plant nutrition, metabolism, growth and development, plant–microbe interactions, and plant–environment interactions.
With the completion of gene-rich region genome sequencing and annotation (Young et al., 2011), functional characterization of thousands of M. truncatula genes is needed. The function(s) of the vast majority of plant genes may not be correctly predicted just by DNA and/or protein sequence homology. Because of the ambiguity associated with homology-based function assessment, high-throughput approaches are needed to identify gene function(s). Genome-wide transcript profiling studies, protein–protein interaction studies, and comprehensive metabolomic and proteomic studies provide major correlative data and help to gain knowledge of physiological processes, but they are insufficient to assign exact function(s) for individual genes. An efficient reverse genetics platform is needed for dissecting M. truncatula gene function(s).
Utilization of mutants has long been a reliable approach to decipher gene functions through both forward and reverse genetics strategies (Wu et al., 2005). The availability of large-scale mutant populations, generated by EMS, T-DNA insertion, transposon-tagging, gamma radiation or fast-neutron bombardment, has greatly accelerated functional genomics studies in model plant species (Bevan & Walsh, 2005). Phenotype-driven forward genetics, which historically has played an irreplaceable role in the development of modern genetics, is limited for high-throughput gene function analyses. Genetic map-based gene cloning is still a time-consuming and challenging task even in model plants such as Arabidopsis and rice. In the post-genomic era, with the large amount of sequence information generated by genome projects for Arabidopsis and other plant species, reverse genetics with high-throughput sequence-based screening strategies has enabled researchers to functionally characterize hundreds of genes in a relatively short timespan. Although the recently developed VIGS provides a novel alternative tool for reverse genetics, it is not applicable or efficient in many plant species including M. truncatula. Insertional mutagenesis and EMS mutagenesis-based targeting induced local lesions in genomes (TILLING) dominate the reverse genetics platforms in many plant species. Of all the publicly available mutant collections, a large portion was generated by T-DNA or transposons, while a relatively small portion was generated by chemical mutagenesis and analyzed by TILLING (Martienssen, 1998; Krysan et al., 1999; Bennetzen, 2000; Okamoto & Hirochika, 2000; Parinov & Sundaresan, 2000; Sussman et al., 2000; Courtial et al., 2001; Yamazaki et al., 2001; Fladung et al., 2004; Stepanova & Alonso, 2006). T-DNA has been successfully used as a powerful insertional mutagen in generating gain-of-function and loss-of-function mutants in Arabidopsis (Weigel et al., 2000; Alonso et al., 2003) and rice (An et al., 2005; Larmande et al., 2008; Fu et al., 2009). The introduction of known sequences in T-DNA mutagenesis makes it easy to recover the flanking sequences and thus to identify insertions in genes. However, due to the nature of low frequency of T-DNA insertions, a large population is required to achieve saturation mutagenesis. For plant species with relatively large genomes and lacking high-throughput in planta transformation, large-scale mutant generation by T-DNA insertion mutagenesis is not practical. Medicago truncatula is one of these species (Brocard et al., 2006, 2008; Scholte et al., 2002).
In the past 6 yr, we have regenerated > 21 700 insertional mutant lines in M. truncatula using Tnt1, a well-characterized tobacco retrotransposon (Grandbastien et al., 1989; d'Erfurth et al., 2003; Tadege et al., 2008). The high efficiency of Tnt1 transposition during tissue culture resulted in multiple Tnt1 inserts in single regenerated M. truncatula lines (from 4 to 50 with an average of 25 insertions per genome; Tadege et al., 2008). This feature of Tnt1 enables us to reach near-saturation mutagenesis of the M. truncatula genome with a relatively small number of mutant lines. The value of the Tnt1-tagged mutant population has already been proven in forward genetics screening (d'Erfurth et al., 2003; Benlloch et al., 2006; Tadege et al., 2008; Wang et al., 2008). To make most efficient utilization of these Tnt1 mutants, we developed two reverse genetics approaches. One is the PCR-based DNA pool screening and the other is the direct database BLAST search. Both methods were successful in identifying insertion mutants in M. truncatula.
Materials and Methods
Seed scarification and germination
Medicago truncatula Gaertn. seeds (all M. truncatula mutant lines were generated at the Samuel Roberts Noble Foundation) were treated in concentrated sulfuric acid for 8 min, washed thoroughly with H2O, sterilized for 8–10 min in 30% commercial bleach with 0.01% Tween-20, and then washed with autoclaved H2O for 3–5 times. The sterilized seeds were placed on wet filter paper or 0.5× MS media and kept at 4°C in dark for 7–10 d, and then transferred to a 25°C culture room for germination. Seedlings were transferred into 1-gallon pots and grown to maturation in the glasshouse.
Genomic DNA pooling
Approximately 21 700 Tnt1 insertion lines in M. truncatula were generated. Genomic DNA from individual lines was extracted as described previously (Tadege et al., 2008; Cheng et al., 2011). A simple one-dimensional pooling strategy was used to pool the g-DNAs into three levels (Fig. 1). The DNA pooling strategy is described in detail as following: mix 100 μl of genomic DNA from each of 10 individual samples to make a 1-ml mini pool (M-pool); mix 500 μl of genomic DNA from each of 10 M-pools to make a 5-ml pool (P-pool); mix 2 ml of genomic DNA from each of five P-pools to make a 10-ml super pool (S-pool). Therefore, one S-pool contains five P-pools, 50 M-pools and 500 individual lines. To date, we have pooled 42 S-pools, 210 P-pools and 2100 M-pools, which contain 21 000 individual lines. Because the S-pool and P-pool DNA samples are most frequently used, to make these samples more stable during storage, S-pool and P-pool DNAs were further purified by extracting twice with phenol : chloroform : iso-amyl alcohol (25 : 24 : 1) followed by chloroform : iso-amyl alcohol (24 : 1). The DNAs were precipitated with 1/10 volume of sodium acetate and equal volume of2-propanol, washed with 75% ethanol, dried and dissolved in the original volume of water. P-pool and S-pool DNA samples were aliquoted in 500 μl each. One aliquot of each P-pool and S-pool were kept at 4°C for routine use, and the remaining aliquots of P-pools and S-pools, M-pools and individual DNA samples were stored at −20°C.
Tnt1 primer design
Three forward primers and three reverse primers were designed for Tnt1 (Fig. 1b; Supporting Information Table S1). Primers Tnt1-F, Tnt1-F1 and Tnt1-R, Tnt1-R1 were used for PCR screening, whereas primers Tnt1-F2 and Tnt1-R2 were used for PCR product sequencing.
Gene-specific primer design
Two pairs of gene-specific primers (GSPs) were designed based on the genomic sequence of the gene (Fig. 1b; Table S1). If a genomic sequence is not available, the GSPs were designed based on the cDNA or EST sequences. In this case, it is possible that primers designed from cDNA or EST may span the exon/intron junction and fail to amplify the genomic fragment during the screening. We used these basic guideline for GSP design: primer length is 22–24 bp with 9–11 G/C to match the melting temperatures of Tnt1 primers; there are no G or C clusters longer than 4 in the primers; there is one G or C in the last two nucleotides at the 3′ end of each primer. If a gene is larger than 4 kb, it can be split into two or more fragments with each fragment no larger than 3.5 kb and two pairs of primers are designed for each fragment. The primers for multi-fragment genes are designed in such a way that the first fragment overlaps at least 50 bp with the second fragment, the second fragment overlaps at least 50 bp with the third fragment and so on. The amplification efficiency of gene-specific primers was tested by PCR using both wild-type A17 (the reference genome) and R108 (the ecotype in which Tnt1 lines were generated) genomic DNA as templates before proceeding to the reverse screening of DNA pools.
PCR reactions and product analysis
Primary PCR (1st PCR)
ExTaq (Takara Bio Inc., Shiga, Japan) was used for all PCR reactions. The PCR master mixture was prepared according to the manufacturer's protocol with 1 μM of GSP primer (GSP-F or GSP-R) and 0.25 μM of Tnt1 primer (Tnt1-F or Tnt1-R). Aliquots of 37 μl of the mixture were placed into each PCR tube and 3 μl of super-pool DNA added to each. PCR was run using a touch-down program as follows: 95°C for 5 min; 94 for 30 s, 60°C 30 s and 72°C for 2.5 min, five cycles; 94°C 30 s, 57.5°C 30 s and 72°C 2.5 min for five cycles; 94°C 30 s, 55°C 30 s and 72°C 2.5 min for 25 cycles; 72°C for 5 min and stored at 10°C.
Secondary PCR (nested-PCR)
After the first PCR, the PCR products were diluted 50 times and then 2-μl aliquots of diluted 1st PCR products were used as the template for the nested-PCR. The nested-PCR reaction mixture was prepared with 0.25 μM of GSP (GSP-F1 or GSP-R1) and Tnt1 nested primers (Tnt1-F1 or Tnt1-R1); 38-μl aliquots of the PCR mixture were put in each tube. The PCR was run as for primary PCR. Then 10-μl aliquots of PCR products from individual nested-PCRs were separated side-by-side on 1% agarose gels. The remaining 30 μl nested-PCR products from an S-pool showing positive bands were purified using the QIAquick PCR purification kit (Qiagen) following the manufacturer's protocol, except that the products were eluted in 30 μl of H2O. The concentration of the PCR product was measured using a Nanodrop spectrometer (Nanodrop Technologies Inc., Wilmington, DE, USA). Purified PCR products were sequenced using primer Tnt1-F2 or Tnt1-R2 depending on the primers used for the nested PCR reaction. The sequences of PCR products were aligned with the sequences of genes-of-interest to confirm the gene-specific insertion(s) and to reveal the exact insertion site(s). If only cDNA/EST/TC sequences are available for a target gene and the insertion is in an intron region, the insertion-flanking sequences will not form a contig with cDNA/EST/TC sequences. In this case, the insertion will be missed.
Database architecture and content
The mutant database was constructed using a Fedora Linux system and maintained using MySQL, a relational database management system. The data collected in the current database includes 10 884 Tnt1 mutant lines, which were developed by the Noble Foundation and collaborators in European groups. Currently 44 238 flanking sequence tags (FSTs) from 3436 mutant lines were generated and are BLAST-searchable in this database.
In order to identify the positions in the Medicago genome for the FSTs, all FSTs were compared against the Medicago genomic sequence (Mt v3.5) using BLASTN. The Medicago genomic sequences used for BLAST were downloaded from the Medicago resources website (http://www.medicagohapmap.org/?genome). FSTs were mapped to Medicago pseudo-chromosomes if the BLASTN results met the following criteria: (1) > 80% identity; (2) ratio of the high scoring segment pair (HSP) length to the FST length > 90%; (3) expected values < 0.01; (4) HSP starting at < 15 bp from the beginning of an FST. If an FST had several hits in the Medicago genome, we chose the hit with the highest bit score value. The view of BLAST results can be obtained as an hyperlinked HTML for these mapped FSTs. The criteria for identifying insertion sites are not very stringent: we are trying to provide a useful reference of insertion positions in the Medicago genome for users wherein users need to verify the exact insertion sites.
In addition to the FSTs, photos and phenotype descriptions for most of those mutant lines are also included in the database after phenotypic screening. In total 11 544 pictures for 10 158 lines and 1175 pieces of phenotypic description information are available and can be searched by line numbers or key words.
In order for researchers to view alternate annotations in nearby genomic regions of mapped mutant insertions, we incorporated a platform-independent web application, GBrowse, developed by Stein et al. (2002) as the Generic Model Organism System Database Project (GMOD; http://www.gmod.org). The GMOD GBrowse viewer in combination with the MySQL database is used to store, search and display annotations and other features of the genome aligned to the genomic sequence. We downloaded all predicted gene models from IMGAG and mapped FSTs, Tentative Consensus (TC) sequences and Affymetrix probesets to the Medicago genome by BLAST search for constructing GBrowse in this database. The information for all these mapped features was imported into GBrowse for visualization. Thus, the GBrowse web pages are organized alongside features including predicted gene models, TCs, FSTs and Affymetrix probesets. To select a more specific region of the genome, users can enter either a precise sequence range or any valid Medicago identifier (e.g. gene model name, TC number, FST or probeset ID) in the ‘Landmark or Region’ box. GBrowse then fetches a highlighted feature or a region of the genome specified by the user's search criteria. In addition, we have generated a gene expression atlas (Benedito et al., 2008) that provides a global view of gene expression in all major organ systems of Medicago, with special emphasis on nodule and seed development. To allow researchers to view the gene expression profile for a gene-of-interest in which region mutant insertions are located, the gene expression information collected in the gene expression atlas was also integrated into our website by GBrowse. Users may examine the gene expression levels and patterns in plots via the link on an Affymetrix probeset bar presented on the GBrowse web pages.
The BLAST program, including BLASTN, TBLASTN and TBLASTX, is integrated into this mutant database. Users can use either nucleotide sequences or protein sequences as queries to search the FST database for mutant identification. The search results will be displayed with the corresponding ID of mutant lines as well as the alignments of homologous sequences. If one or more FSTs match the gene sequence, seeds can be ordered online from our website (http://medicago-mutant.noble.org/mutant/).
RNA extraction and RT-PCR
Inflorescence shoots were collected from both heterozygous and homozygous plants of mtpi and mtag, and wild-type R108 plants. Total RNA was extracted using Tri-Reagent (Gibco-BRL Life Technologies, Grand Island, NY, USA) and treated with Turbo DNase I (Ambion, Austin, TX, USA). For RT-PCR, 3-μg samples of total RNA were used for reverse transcription using SuperScript III Reverse Transcriptase (Invitrogen) with olig(dT)20 primer. Two microlitres of 1 : 20 diluted cDNA were used as template for each 30-μl PCR reaction. MtACTIN2 was used as standard control. Gene-specific primers used for RT-PCR were listed in Table S1.
Searching the flanking sequence tags (FSTs) database
Based on the genome size and average gene size, it has been estimated that 20 000 Tnt1 insertion lines with an average of 25 insertions per line are required to reach c. 90% saturation of the Medicago truncatula genome (Tadege et al., 2008). From 2003 to 2011, 21 700 Tnt1 lines were generated at the Samuel Roberts Noble Foundation. As described earlier (Tadege et al., 2008), we have been recovering the flanking sequence tags (FSTs) from the regenerated Tnt1 lines using thermal asymmetric interlaced (TAIL)-PCR (Liu et al., 1995, 2005) to create an FST database (http://medicago-mutant.noble.org/mutant). Currently, the database includes c. 44 000 FSTs from c. 3400 regenerated Tnt1 lines. This FST database can be searched using the sequence of the gene to study. Ideally, one uses the genomic sequence of the gene to search the FST database. If only the coding sequence is available it can also be used. In the latter case insertions in the intronic regions will not be detected. Even though we estimate an average of c. 25 Tnt1 insertions per line (Tadege et al., 2008), the FSTs recovered by TAIL-PCR are far fewer than 25 per line due to the limitation of the method itself and high cost of the Sanger sequencing. We are in the process of exploring a high-throughput sequencing approach to recover FSTs in a more efficient way. Once the approach becomes practical, we will recover FSTs from all 21 700 lines. By that time, the number of total FSTs will be dramatically increased and the probability of finding an insertion in a given gene by BLAST search in the FST database will be greatly enhanced. Even in the current low-coverage FST database, we tested FST BLAST search using genomic sequences of 50 candidate genes and found at least one insertion in 16 genes (data not shown).
PCR-based reverse screening in Tnt1 insertion population
The genomic DNA (g-DNA) from 21 700 Tnt1 lines was extracted and pooled into three-level pools using a one-dimensional pooling strategy (Fig. 1a; see the 'Materials and Methods' section). The pooled g-DNA system – representing 42 super pools (S-pools), 210 pools (P-pools), 2100 mini-pools (M-pools), and 21 000 individual genomic DNAs – was used for reverse screening.
Before the PCR-based screening, two pairs of gene-specific primers (GSP) were designed for every gene of interest (Fig. 1b). The specificity and efficiency of primers were tested using wild-type (R108) genomic DNA. Three primers were designed on both ends of Tnt1 (Fig. 1b). To begin the PCR screening, combinations of gene-specific primers with Tnt1 primers were used to selectively amplify the Tnt1-tagged gene-of-interest amplicons from the S-pools. A schematic screening procedure is shown in Fig. 2. There are four primer combinations for screening each gene of interest: GSP-F with Tnt1-F, GSP-F with Tnt1-R, GSP-R with Tnt1-F, and GSP-R with Tnt1-R. First, combinations of GSP-F with Tnt1-F and GSP-F with Tnt1-R were used to screen the S-pools. Theoretically, any insertions within the region covered by GSP-F and GSP-R should be recovered using these two primer combinations. However, in some circumstances, specific amplification fails if the insertion site is far from the GSP-F and a large amplicon may not be efficiently amplified or if the GSP-F does not closely match with Tnt1 primers under the given PCR conditions. Second, if no specific amplicon was detected, combinations of GSP-R with Tnt1-F and GSP-R with Tnt1-R were then used to re-screen the S-pools.
For each screening of the S-pools, two rounds of PCR amplification were carried out with a touch-down PCR program (see the 'Materials and Methods' section for details). The genomic DNA from S-pools was used as templates for the first round of PCR amplification and 50-fold diluted first round PCR products were used as templates for the second round of PCR (nested-PCR) amplification. Normally, no specific amplicons from the first round PCR were visible in agarose gels; the true positives which give bright bands in agarose gels were observed in the nested-PCR amplification (Fig. 3a). Positive PCR products were purified and sequenced for identity confirmation and insertion site identification. If the gene-specific insertion(s) from the S-pool screening were confirmed, further screening was sequentially performed in the corresponding lower level pools using the same primer combination that is used in the S-pool screening.
There are five P-pools in one S-pool. When P-pools were screened, the same touch-down program was used for two rounds of PCR amplification, except for adjusting the extension time depending on the size of the confirmed S-pool PCR products. True positive products from the P-pool screening, which showed the same size as the one from the S-pool screening, were found in one of the five P-pools (Fig. 3b). Usually it is not necessary to sequence the PCR products from P-pools. After the product was obtained in a specific P-pool, the screening was followed in the corresponding 10 M-pools. PCR results from the first round amplification were checked to see if the same size PCR product was found in one of the 10 M-pools (Fig. 3c). If no clear PCR product with the expected size was obtained, then nested-PCR was carried out to obtain the positive product in a particular M-pool. The screening was continued in the corresponding 10 individual lines as described above (Fig. 3d). The final PCR product was purified and sequenced to re-confirm the identity of the PCR product to the original one from the S-pool. This allows identification of one or more Tnt1 insertion lines for a single gene-of-interest.
Modification of the standard screening procedure to identify insertions in multiple genes
Because individual genomic DNA was extracted from the original regenerated lines (R0), the availability and amount of the g-DNA is limited and not reproducible. To maximize the utilization of the pooled DNA, we developed a multiple-gene screening method. In this modified method, three GSP primers from three different genes, in combination with Tnt1-F or Tnt1-R primers, were used in each PCR reaction in the S-pool screening, that is three genes were screened at the same time (Fig. 4). Positive PCR products were purified and sequenced to identify Tnt1 insertions in a specific gene and to specify insertion sites in given genes. If mixed products were detected in any specific S-pools, individual nested-PCR was performed using the single nested GSP primer with Tnt1-F1 or R1 in the reaction. The screening procedure in lower pools was same as the standard method.
In order to compare the screening efficiency for target genes between the two methods, three genes (Gene A, B and C) were screened in 24 super pools as described above (Fig. 4). Using the standard screening method, 18 positive products for three genes were detected in 14 out of the 24 S-pools with individual GSP-F and Tnt1-F primer combination (FF; Fig. 4a, lower panels), and nine products in nine out of the 24 S-pools with GSP-F and Tnt1-R primers (FR; Fig. 4b, lower panels). Using the multiple-gene screening method, 11 single products and one mixed product (S36FF) were detected in 12 out of the 24 S-pools with FF primers (Fig. 4a, upper panel), and eight products in eight out of 24 S-pools with FR primers (Fig. 4b, upper panel). The nested-PCR was re-run in S36 with individual primer pairs to obtain two single products. Most of the products (21 out of 27) that were amplified by the single-gene method were detected in the corresponding pools by the multiple-gene screening method; however, all positive products resulting from the multiple-gene screening method were detected by the standard method. Three products, two from FF primers and one from FR, were missed in S17, S30 and S23 pools using the multiple-gene screening method. Products for more than one gene were detected in four S-pools (S23, S25, S34 and S36) using the single-gene method, whereas mixed products were detected in one pool (S36) and only small-sized products were amplified in the other three pools (S23, S25, and S34). Sequence analysis showed that 20 flanking sequences from the multiple-gene method were identical to corresponding products from the standard method, and one sequence (S25FF) had mixed readings, indicating a mixture of products, B-S25FF and C-S25FF, was amplified in the same S-pool. The nested-PCR was re-run in S25 with individual primer pairs and two single products were obtained. We further tested several additional sets of three genes using both methods and similar results were obtained, indicating that most Tnt1 insertions were detected by the multiple-gene screening method which significantly conserves the S-pool DNA and also saves time and resources. Therefore, the multiple-gene screening method has been used as the default method for the PCR-based reverse screening.
Furthermore, we compared the screening success rates of the two methods. Eighty-one genes were screened in 14–18 S-pools by the single-gene screening method. One or more Tnt1 insertions were identified in 69 genes with the success rate at c. 85%. In the same 14–18 S-pools, 280 genes were screened by the multiple-gene screening method, one or more Tnt1 insertions were identified in 226 genes with a success rate of c. 80% (Table 1). However, the success rate of the multiple-gene method increased with the increase of total S-pools (Table 1). In 2011, 235 genes were screened in 36 S-pools using the multiple-gene method. Two hundred and three genes were detected with one or more Tnt1 insertions, with a success rate of 86%. More recently, 89 genes were screened in 42 S-pools. Seventy-eight genes were detected with one or more Tnt1 insertions with an 87.6% success rate (Table 1). In summary, the multiple-gene screening strategy described here allowed us to screen Tnt1 insertion lines for three genes in 42 S-pools with 500 or fewer PCR reactions and achieve a success rate of 87% (Table 1).
Table 1. Effect of Medicago truncatula Tnt1 insertion population size on screening success rates
Genes w/o Tnt1
Success rate (%)
Analysis of Tnt1 insertion frequency with the screened genes
So far > 840 genes, requested from 75 different laboratories in 15 different countries, have been screened for Tnt1 insertions. Out of these genes, 655 genes have a gene identification number (ID) and are distributed to the eight pseudo-chromosomes (Mt3.5; Fig. S1). For the remaining 185 genes, some have genomic sequences without gene IDs, whereas some only have cDNA or EST sequences. These genes were not mapped to the pseudo-chromosomes.
Tnt1 insertion preference
The smallest gene we screened for an insertion thus far was 486 bp, whereas the largest gene was 25.4 kb. We analyzed the correlation between the gene size and the screening success rate. All 840 screened genes were classified into five groups based on their sizes: < 1.0, 1.0–4.0, 4.0–7.0, 7.0–10.0 and > 10.0 kb. The screening success rates for each group were 77.9%, 86.3%, 80.8%, 79.1% and 54.8%, respectively (Table 2). The success rates for the first four groups with gene sizes between < 1.0 and 7.0–10.0 kb were essentially very close. When the gene size is > 10.0 kb, surprisingly the screening success rate dropped dramatically, even though large genes were divided into multiple 3–4 kb fragments to cover all the gene regions. For small genes, Tnt1 insertions were sometimes identified in the promoter or 3′ UTR regions.
Table 2. Effect of gene size on the screening success rate of Medicago truncatula Tnt1 mutants
In order to determine whether the transcription rate can influence the insertion frequency, we searched the Medicago Gene Expression Atlas and compared the overall expression level of four sets of ten genes in the following groups: (1) c. 3.0 kb with eight or more insertions, (2) > 7.0 kb with eight or more insertions, (3) c. 3.0 kb without insertions, and (4) > 7.0 kb without insertions. No apparent correlations between the expression levels and the insertion frequency were observed among the four groups.
Miyao et al. (2003) showed that the rice retrotransposon Tos17 inserts more frequently in kinases and resistance genes. The 840 genes screened for Tnt1 insertion fall into different functional categories: nucleotide binding, protein binding, transcription factors, metabolic enzymes, transporters, kinases, hypothetical proteins, etc. The screening success rates for individual functional categories were analyzed. Results showed that the success rates ranged from the highest (88.6%) in kinase genes to the lowest (82.7%) in nucleotide binding genes (Table S2). All of them were close to the overall success rate of 84%, indicating that Tnt1 insertion may not favor any specific categories of gene functions. However, a larger number of insertions need to be analyzed to further confirm the above statement.
Multiple Tnt1 insertions
Multiple Tnt1 insertion alleles for a single gene were frequently detected during the screening. To examine whether screening for mutants in larger genes results in more Tnt1 insertion alleles, we analyzed the correlation between the gene size and the alleles obtained (Table 3). The result shows that > 50% of the screened genes with sizes of 1–4 kb have three or more Tnt1 insertion alleles recovered. The rates are 40.2% and 37% for genes smaller than 1.0 kb and between 4.0 and 10.0 kb, respectively, indicating that the highest rate of multiple insertions was observed in medium-sized genes. The rate is only 7% for genes larger than 10 kb. Taken together, the screening results of 840 genes indicate that Tnt1 may prefer to insert in medium-sized genes and does not favor large genes.
Table 3. Effect of gene size on the insertion number in Medicago truncatula Tnt1 mutants
< 1.0 kb
> 10.0 kb
Genes w/more than 3 Tnt1 insertion
Total genes screened
Multiple insertion percentage
The difference between the Tnt1 insertion numbers among the 840 genes is noteworthy. Amongst these genes, no Tnt1 insertion alleles were identified for 142 genes, whereas eight or more insertion alleles were found for 68 genes. Among the 68 genes with eight or more insertion alleles, some genes had as many as 20 Tnt1 insertion alleles. Analysis of genes with no-insertion and multiple-insertion alleles indicates that both classes of genes distributed in all eight pseudo-chromosomes and fell into different functional categories. To examine whether Tnt1 insertion cold spots or hot spots exist, we mapped the genes with no insertion and genes with multiple insertion alleles, represented by green or red dots, respectively, to the eight pseudo-chromosomes, and no aggregated green or red dots were observed (Fig. S1). The size of most genes with multiple insertion alleles ranged between 1.0 and 7.0 kb and none was larger than 9.0 kb. However, the screening results did not exclude the possibility that some genes with large genomic size might have eight or more Tnt1 insertion alleles in the whole insertion population because the screening was initially carried out in 24 S-pools with primer combinations of GSP-F and Tnt1-F or Tnt1-R. Furthermore, if multiple insertion alleles were detected for a specific gene, no further screening with other primer combinations was carried out in the remaining super pools. Therefore, our default screening procedure does not necessarily maximize the recovery of Tnt1 insertions in a gene in the entire Tnt1 insertion population.
Among the genes with multiple Tnt1 insertion alleles, 10–20 Tnt1 insertion alleles were obtained for some genes. To check if Tnt1 insertions in the alleles were clustered in certain region(s) of the individual genes, we examined Tnt1 insertion sites of alleles in 10 individual genes. These results indicated that the Tnt1 insertions randomly distributed in all 10 genes examined (data not shown).
A case study
Screening for insertion mutants for genes-of-interest
In order to compare the two reverse genetics approaches and prove the utility of our Tnt1 mutant collection for gene function analyses, we selected two homeotic MADS box genes, MtPISTILLATA (MtPI) and MtAGAMOUS (MtAG). MtPI is a class B MADS-box gene and is required for petal and stamen identity specification during flower development. Mutation of MtPI leads to the identity loss of petals and stamens in M. truncatula and Arabidopsis (Goto & Meyerowitz, 1994; Kramer et al., 1998; Benlloch et al., 2009). Arabidopsis AGAMOUS (AtAG) is a class C MADS-box gene controlling stamen and carpel development in Arabidopsis (Gomez-Mena et al., 2005). Although MtAG has not been characterized before, an EST (accession number: CX539597) for MtAG, which shares 70% identity of protein sequence to the Arabidopsis AGAMOUS, was obtained from the M. truncatula EST database. We first searched the FST database for Tnt1 insertion lines in MtPI and MtAG. We did not find a true hit for MtAG but found one insertion line (mtpi-3, NF5477) for MtPI. In mtpi-3, the Tnt1 inserts in the first intron of MtPI and homozygous plants did not exhibit visible phenotypes. We then designed two sets of primers for each gene, MtPI-F and MtPI-F1, MtPI-R and MtPI-R1, MtAG-F and MtAG-F1, MtAG-gR and MtAG-gR1 based on the MtPI genomic sequence (2181 bp) and the MtAG EST sequence (1150 bp). Combined with Tnt1 forward and reverse primers, PCR-based reverse screening was carried out in the Tnt1 insertion population as described above. We identified two insertion lines, NF13337 (mtpi-1) and NF5318 (mtpi-2), for MtPI and two insertion lines, NF10148 (mtag-1) and NF13380 (mtag-2), for MtAG. Sequence alignment revealed that NF5318 has a Tnt1 insertion at the 1935th bp (in the 5th exon) and NF13337 has an insertion at the 1112th bp (in the 4th exon) from ATG of MtPI (Fig. 5a). NF10148 and NF13380 harbor the Tnt1 insertions at the 272th bp and the 260th bp from ATG of MtAG, respectively (Fig. 6a).
Molecular characterization of the insertion mutants
For MtPI insertion lines, four individual plants (plant numbers 1, 2, 3 and 5) from NF13337 and three plants (plant numbers 2, 3 and 5) from NF5318 were found to contain the Tnt1 insertion (Fig. 5b). Of these plants with Tnt1 insertions, two plants (plant numbers 2 and 5) from NF13337 and one (plant number 3) from NF5318 were homozygous for the Tnt1 insertion (Fig. 5c). Gene expression analyses indicated that no MtPI transcript was detected in the homozygous Tnt1 mutant plant of NF5318 (plant number 3), whereas a slightly smaller transcript of MtPI was detected in homozygous mutant plants of NF13337 (plant numbers 2 and 5; Fig. 5d). Sequence analysis of the smaller transcript indicated that there was a 45-bp deletion between the 4th and 5th exons of MtPI in the mtpi-1 mutant. The 45-bp fragment, which locates between the Tnt1 insertion and the 4th intron, was spliced out with the inserted Tnt1 and the 4th intron, leaving a smaller MtPI transcript. Because the deletion is in frame, the deduced protein of the truncated MtPI transcript is 15 aa shorter than the wild-type MtPI protein. In MtAG Tnt1 insertion lines NF10458 and NF13380, the progenies detected with Tnt1 insertion by PCR were shown in Fig. 6(b), and the determination of Tnt1 homozygous mutant plants was shown in Fig. 6(c). The transcript of MtAG was undetectable in homozygous Tnt1 insertion plants in both alleles (Fig. 6d).
Morphological characterization of the insertion mutants
Both alleles of mtpi and mtag Tnt1 insertion mutants displayed defects in flower development (Fig. 7). Other siblings from the segregating population of both alleles were either wild-type or heterozygous for the PI locus or the AG locus but contained all other unlinked insertions (homozygous or heterozygous) of the parent lines. However, these siblings showed no flower phenotypes. The mutant mtpi-1, which has a smaller transcript of MtPI, exhibited a weak flower phenotype. Petals in the second whorl were partially transformed into yellow-greenish sepal-like structures, whereas stamens were normal looking and sepals and carpels remained normal (Fig. 7b,c). However, mutant mtpi-2, which has no MtPI transcripts detected, showed severe flower defects with loss of floral organ identity in the second and third whorls. Petals were completely transformed into five green sepal-like structures. Stamens of mutant mtpi-2 in the third whorl developed into carpel-like structures with ovules inside (Fig. 7d,e). No seeds were produced in either mutant. MtAG mutants mtag-1 and mtag-2 exhibited a similar but weak phenotype in carpel development. The first and second whorls of sepals and petals were the same as in wild-type flowers. Stamens in the third whorl were normal looking. Only carpels in the inner whorl were defective with two stigmas formed, un-fused carpel edges and exposed ovules (Fig. 7g–j). Flowers were generally sterile, although occasionally a few seeds were produced in mtag-2. This preliminary phenotypic analysis of these two flower mutants is in agreement with the roles of these genes in controlling floral identity in other plant species and demonstrates the usefulness of our Tnt1 mutant collection.
In this paper we demonstrated two successful and efficient reverse genetics approaches in utilizing the M. truncatula mutant population: PCR-based reverse screening and web-based FST database searching. The combined approaches enable us to quickly find Tnt1 insertions in genes of interest. The PCR-based screening approach allows researchers to focus on a small number of genes of interest. The modified multiple-gene screening method enables us to screen three genes in 42 super pools, which include 21 000 individual Tnt1 insertion lines, with c. 500 PCR reactions in 3–4 wk. The success rate of the PCR screening method is close to c. 86%, which is comparable to the success rate in Arabidopsis (Krysan et al., 1999; Stepanova & Alonso, 2006). Furthermore, multiple insertion alleles, which are randomly distributed across the gene coding regions, can be obtained for c. 50% of medium-sized genes. The availability of multiple insertion alleles, on the one hand, makes it possible to choose insertion lines with suitable insertion sites (e.g. in exons); on the other, it can quickly confirm loss-of-function phenotypes of genes of interest without pursuing time-consuming transformation experiments for gene complementation.
The existence of insertion bias, that insertions prefer genic regions and the insertion density is closely correlated with the gene density, has been reported by several authors (Alonso et al., 2003; Miyao et al., 2007; Tadege et al., 2008). It is assumed that if a random distribution of insertions in gene-rich regions occurs, the gene size should be positively correlated with the insertion frequency; that is, the larger the gene is, the higher the probability of insertion (Krysan et al., 1999; Li et al., 2006; Tadege et al., 2008). Our screening data analyses indicated that Tnt1 prefers medium-sized genes, evidenced by the high insertion frequency and more insertion alleles recovered in medium-sized genes. For large-sized genes (> 4.0 kb), however, a negative correlation was observed between the gene size and the Tnt1 insertion frequency. How gene size affects the insertion probability is largely unknown. It has been proposed that the transcriptional activity is a determinant of the target site preference, whereas the gene length is negatively correlated with the overall expression level. Highly expressed genes are small in size due to selective forces that favor minimizing the energy and time in transcription (Castillo-Davis et al., 2002; Camiolo et al., 2009). Therefore, large genes have low transcription activity and a low insertion rate. Similar observations were reported for T-DNA insertions in Arabidopsis and Tos17 insertions in rice (Alonso et al., 2003; Miyao et al., 2003). However, one study on the analysis of T-DNA insertion site distribution patterns in Arabidopsis indicated that lack of detectable transcriptional activity is one of the reasons for no insertions in genes (Li et al., 2006). In our analysis, no correlations between the expression level and the insertion frequency were observed. Therefore, the effect of the transcription activity on the insertion frequency remains elusive.
In summary, the Tnt1-tagged M. truncatula mutant population generated at the Noble Foundation is an invaluable resource for the research community. This mutant collection has already been widely used and will be as irreplaceable for researchers working on legume biology as the SALK T-DNA collections for Arabidopsis. The currently available 21 700 lines represent over 525 000 independently distributed Tnt1 insertions in the M. truncatula genome. From the efficiency of reverse screening and the genome saturation probability, the current 21 700 lines may have insertions in a c. 85–90% of the M. truncatula genome. However, the FST database currently hosts only c. 44 000 FSTs from c. 3400 lines. Before a high-throughput FST sequencing approach is developed such that the number of FSTs is dramatically increased, direct BLAST searching of the FST database has only a small chance of finding an insertion line in genes of interest. The reverse screening of DNA pools will still play a significant role in reverse genetics of M. truncatula. When large-scale high-throughput FST sequencing is performed on the mutant collections, it will be possible to map most Tnt1 insertions (if not all) on the Medicago genome. Finding an insertional mutant in one's favourite gene of interest will then be a matter of checking the website (http://medicago-mutant.noble.org/mutant/) and ordering the seeds from a system analogous to the SALK T-DNA lines. The development of this FST database corresponding to the majority of the Tnt1 insertions in the population will, thus, represent a very valuable tool for the research community.
We would like to thank Kuihua Zhang for plant care and seed curation, Shulan Zhang for assistance with flanking sequence recovery and Janie Gallaway for organizing forward screening. This work was supported by the Samuel Roberts Noble Foundation and, in part, by NSF plant genome grants (DBI 0703285 and IOS 1127155) and by the European Union (EU FP6-GLIP project FOOD-CT-2004-506223).