We report here the molecular and phenotypic features of a library of 31 562 insertion lines generated in the model japonica cultivar Nipponbare of rice (Oryza sativa L.), called Oryza Tag Line (OTL). Sixteen thousand eight hundred and fourteen T-DNA and 12 410 Tos17 discrete insertion sites have been characterized in these lines. We estimate that 8686 predicted gene intervals—i.e. one-fourth to one-fifth of the estimated rice nontransposable element gene complement—are interrupted by sequence-indexed T-DNA (6563 genes) and/or Tos17 (2755 genes) inserts. Six hundred and forty-three genes are interrupted by both T-DNA and Tos17 inserts. High quality of the sequence indexation of the T2 seed samples was ascertained by several approaches. Field evaluation under agronomic conditions of 27 832 OTL has revealed that 18.2% exhibit at least one morphophysiological alteration in the T1 progeny plants. Screening 10 000 lines for altered response to inoculation by the fungal pathogen Magnaporthe oryzae allowed to observe 71 lines (0.7%) developing spontaneous lesions simulating disease mutants and 43 lines (0.4%) exhibiting an enhanced disease resistance or susceptibility. We show here that at least 3.5% (four of 114) of these alterations are tagged by the mutagens. The presence of allelic series of sequence-indexed mutations in a gene among OTL that exhibit a convergent phenotype clearly increases the chance of establishing a linkage between alterations and inserts. This convergence approach is illustrated by the identification of the rice ortholog of AtPHO2, the disruption of which causes a lesion-mimic phenotype owing to an over-accumulation of phosphate, in nine lines bearing allelic insertions.
The release of the high-quality genome sequence of cultivated rice of Asian origin (Oryza sativa L.), the main cereal of human consumption (Matsumoto et al., 2005), has motivated large international efforts for setting up tools and resources aiming at inactivating gene function in this model graminaceous system. These tools encompass either sequence-specific inactivation through post-transcriptional (Warthmann et al., 2008; Wesley et al., 2001) or transcriptional (Li et al., 2011) gene silencing and gene targeting (Iida and Terada, 2005), or random disruption through physical (Bruce et al., 2009; Wu et al., 2005), chemical (Till et al., 2007) or insertional mutagenesis (for a review, see Krishnan et al., 2009). In rice, efficient insertional mutagens allowing genome-wide mutagenesis include the endogenous ty-1 copia retroelement Tos17 (Miyao et al., 2003), the introduced maize Ac/Ds (Chin et al., 1999) and En/Spm (Kumar et al., 2005) transposon systems, and the T-DNA of Agrobacterium tumefaciens (Jeon et al., 2000).
Random insertion of a known DNA fragment—insertion mutagenesis—allows high-throughput PCR-based recovery and sequencing of genomic regions flanking the insertion sites and therefore to precisely determine their position on the chromosome pseudomolecules, which can be displayed in web-accessible databases. Implementation of in silico reverse genetics has been an extremely powerful tool to decipher gene function in Arabidopsis, where 385 000 flanking sequence tags (FSTs) are available online (http://signal.salk.edu/cgi-bin/tdnaexpress) (O’Malley and Ecker, 2010). An important requisite for that approach is an accurate indexation of the lines by their FST and availability of seeds in sufficient quantities for distribution. However, both FST characterization and seed multiplication are complex, error-prone processes involving multiple steps that can each be a source of contamination or mislabelling. It is therefore important to assess the quality of the library by carrying out quality checks as well as by benefiting of return experience from the users. So far, the insertion mutagenesis effort in rice has allowed the generation of 500K insertion lines and the release of 200K indexed insertion sites in public databases (Krishnan et al., 2009). Although still insufficient with regard to the rice genome size, the current insertion coverage nevertheless permits to find several alleles already seen in the same gene sequence, notably thanks to the strong insertion bias of Tos17-generating series of allelic insertions (Miyao et al., 2003; Piffanelli et al., 2007). One can take advantage of this property in determining whether convergent phenotypes are observed among the lines carrying allelic insertions in a given sequence. This would give an additional hint that the phenotype is attributable to the disruption of a gene and worth further ‘wet’ verification.
Large-scale field evaluation of rice insertion lines has allowed to observe altered phenotypes at a high frequency in Tos17 (Miyao et al., 2007), Ac/Ds (Kolesnik et al., 2004), and T-DNA (Chern et al., 2007) lines also documented in public databases (http://tos.nias.affrc.go.jp/~miyao/pub/tos17/index.html.en, http://rmd.ncpgr.cn/, http://trim.sinica.edu.tw, http://oryzatagline.cirad.fr). The forward genetics approach could be extremely powerful for the discovery of novel gene functions, but unfortunately the frequency of linkage between the phenotype and the insertion mutagen generally remains extremely low (5%) (Nonomura et al., 2003). In rice, this approach has allowed to identify genes, mainly in Tos17 mutant lines and for traits relatively easy to screen such as viviparity (Agrawal et al., 2001) or panicle fertility (Nonomura et al., 2003, 2004). Successful examples of the disruptive action of the T-DNA conducting a conspicuous phenotype are however scarce, but the presence of a promoter trap and/or an activation tag carried by the T-DNA may additionally lead to reporter-mediated gene detection (Jung et al., 2003; Lee et al., 2004a,b) and observation of dominant phenotypes (Chern et al., 2007; Jeong et al., 2002; Mori et al., 2007), respectively. Nevertheless, the recent development of next-generation sequencing technologies may offer a ‘second life’ to interesting but still untagged mutations, because sequencing the genome of a mutant of particular interest has now became affordable for a research laboratory (Sabot et al., 2011).
In the frame of the plant genomics collaborative programme Génoplante, we have generated through Agrobacterium-mediated transformation of seed embryo calluses of the temperate japonica cultivar Nipponbare (Sallaud et al., 2003) a library of 31 562 insertion lines carrying T-DNA (Sallaud et al., 2004) and Tos17 (Piffanelli et al., 2007) inserts. The inserted T-DNA contained either a gusA enhancer trap (C. Gay and E. Guiderdoni, unpublished data) or a GAL4:GFP enhancer trap (Johnson et al., 2005), acting as a gene detector when inserted inside or in the vicinity of a gene. In a long-term collaborative effort of the International Center for Tropical Agriculture (CIAT, Cali, Colombia), T1 progenies of the insertion lines have been field-evaluated under agronomic conditions in Cali, Colombia, for the collection of phenotypes and seed propagation. Most of the T1 lines have been successfully harvested in a bulk manner, and their T2 seeds are available upon request. Individual harvest of plants exhibiting mutant phenotype has also been carried out and can be delivered as well (http://oryzatagline.cirad.fr).
A subset of the collection has been evaluated more thoroughly for detecting specific alterations in grain development and in response to inoculation by the fungal pathogen Magnaporthe oryzae. All that phenotypic information based on the trait ontology (TO) nomenclature is now gathered in the Oryza Tag Line database (Larmande et al., 2008). The Oryza Tag Line database is linked through sequence information to the genome navigator OrygenesDB (Droc et al., 2006, 2009). The direct link from the phenotype to the sequence through the line ID allows back and forth searches and to implement in silico reverse and forward strategies to focus the effort of gene discovery. Lines exhibiting convergent TO phenotypes can be investigated for the genes interrupted by characterized insertions, while allelic series of insertions in the same gene sequence can be analysed to highlight convergent phenotypes.
The objective of this study was threefold: (i) to estimate the genome coverage by the indexed inserts in the library and assess their quality, (ii) to have a first assessment of the frequency of linkage between the mutagens and the observed mutations and (iii) to determine whether in silico forward and reverse genetics could enhance the chance of identifying linkage between genes and phenotypes for agronomic traits.
The rice community has the joint objective of deciphering the function of most of the agronomically important genes by year 2020 (Zhang et al., 2008), an effort much needed to unravel the evolution of trait determinism and the molecular bases of monocot- and crop-specific traits. In that aim, sequence-indexed insertion libraries represent a helpful resource. To determine whether the effort engaged more than a decade ago has been sufficient, it is necessary to assess the quality and the genome coverage of the sequence-indexed inserts present in the current international collections.
Quality of the association between T2 seed bags and FST (more than 80%), deduced from their use in reverse genetics, sequencing twice the FST, and observation of known phenotypes, indicates high reliability of our collection. A similar rate (76%) of FST reconfirmation has been observed in two important Arabidopsis T-DNA libraries (O’Malley and Ecker, 2010).
One can estimate that altogether 8686 different rice genes, representing one-fourth to one-fifth of the estimated gene complement, are interrupted by a sequence-indexed insert. Nevertheless, it is likely that not all disruptions result in gene K.O.s, because of inefficiency for some insertions, notably in intron or 3′UTR regions, at preventing translation. However, information deduced from Southern blot analysis suggests that the library actually contains 100 000 and 45 000 Tos17 and T-DNA insertion loci, respectively. Even though number of lines to reach genome saturation by inserts relies on an exponential and not a linear relation (Krishnan et al., 2009) and mutagens may exhibit strong insertion bias to certain genes limiting their coverage, this indicates that potentially a much larger set of rice genes can have an insert in an insertion library of rather limited size like OTL. Therefore, upon observation of a phenotype in a line, a subsequent step to the search for presence of sequence-indexed inserts in predicted genes is to survey the cosegregation of other T-DNA and Tos17 insertions residing in the line with the altered trait through DNA blot analysis.
To harness the full potential of the library, one can use PCR surveys in DNA pools (Hirochika, 2001; Lee et al., 2003) using gene- and element-specific primers. However, organization of the T-DNA, integration of backbone sequences, and GC content of the target genomic region may represent important limitations. Nowadays, mutant genome sequencing represent a hopeful alternative and has moreover the potential of revealing hidden lesions in the genome such as point mutations, structural rearrangements, and insertions of other elements mobilized during the transformation/regeneration process (Sabot et al., 2011). Resequencing the genome of Arabidopsis and rice regenerants has recently shown that point mutation is the major source of somaclonal variation (Jiang et al., 2012; Miyao et al., 2012).
Range of phenotypic variation
Field evaluation of insertion lines has been conducted for seed propagation and description of the collection by phenotypic records. The latter could be extended to the behaviour of the insertion lines under various environmental constraints (drought, salinity, high and low temperatures, etc.). 18.2% of the lines exhibited alteration in at least one of the observed traits. Some of the lines cumulated up to 12 variant traits, some of them likely resulting from the pleiotropic effect of some mutations. Comparison of previous field evaluations of the 50 000 NIAS Tos17 Nipponbare library (Miyao et al., 2007), 22 000 TRIM T-DNA activation tagging Tainung67 library (Chern et al., 2007), and >100 000 lines of the RMD T-DNA ZhongHua11 library (Zhang et al., 2006) indicates that variation is observed in the same categories and subcategories of traits, although with variable frequency. For some traits such as EDS/EDR (0.4%) and LSD/NEC (0.7%) mutants, we found the same frequency of variation as in other mutant populations. For example, Wu et al. (2005) found 0.18% mutants affected for blast resistance and 0.74% lesion-mimic phenotypes in an IR64 deletion mutant collection. The altered traits were also found those exhibiting a wide range of variation in plants regenerated from germinal and somatic, tissue, cell, and protoplast cultures that have been extensively evaluated in the 1970–1990s to harness the potential of somaclonal variation in rice breeding. Frequency, range, and favourable vs. unfavourable feature of the variation were found to be genotype dependent (Sukekiyo and Kimura, 1991). Duration and procedure used in tissue culture are also important factors influencing the frequency of variation, as illustrated by the known accumulation of Tos17 copies in cells over time in culture (Hirochika et al., 1996). Differences in frequencies observed between the insertion line libraries may therefore be genotype and tissue culture procedure dependent but may also result from the use of T-DNA that create additional lesions (abortive and nonabortive insertions) and may carry an activation tag (creating dominant mutations and having the potential of producing a phenotype in inserting in genes accomplishing a redundant function).
Interest of the use of insertion mutant libraries in forward genetics screens for gene discovery depends on the answer to an important reiterated question: What is the frequency of phenotypic variations tagged by the mutagens? We have mentioned above that phenotypic variation may have many causes such as DNA/histone methylation, point mutations, deletions, larger structural rearrangements, or mobilization of other transposable elements. The only published information so far for rice insertion mutagenesis has long been that of the Tos17 NIAS library, which evaluated this frequency to 5% (Nonomura et al., 2003). Using specific screens for three different traits (grain development, apparition of spontaneous lesions, and response to inoculation by a fungal pathogen), we show here that efficiency of tagging is dependent on the trait, likely on its amenability to somaclonal variation, and probably on the robustness of the undertaken phenotypic screen. As the segregation analyses between inserts and the altered trait were carried out through DNA blot analysis, all the inserts containing the probe sequence were simultaneously surveyed. Whereas establishment of linkage was unfruitful in grain-associated phenotypes, it was more successful with the two other traits. For the EDS/EDR/LSD/NEC phenotypes, we found in a first assessment that 19 phenotypes of 52 and 27 phenotypes of 109 could be due to Tos17 insertions and T-DNA, respectively (Table 4). These numbers likely overestimated the tagging frequency as in some cases, linkage analysis was only performed on a small number of plants. When considering only the cases where linkage analysis relies on more than five plants, the maximum tagging efficiency is 12 of 109 (11%) for the T-DNA and 11 of 52 (21%) for Tos17. Conversely, a minimum tagging efficiency can be estimated when considering that of 43 EDR/EDS and 71 LSD/NEC phenotypes analysed, two EDS phenotypes (original lines AKJH07 and ALKE03) and two LSD/NEC (APIE05 and AICG07) are likely due to T-DNA or Tos17 insertions. Quite similar tagging efficiencies were found for the T-DNA (2.7%) and for Tos17 (1.9%) (Table 4). Overall, our data suggest that at least four phenotypes of 114 tested (3.5%) are due to an insertion element in these specific screens. The AKJH07/AEWH07 allelic mutations likely identify a NAC transcription factor. Such genes have already been shown to be required for disease resistance (Wang et al., 2009) or abiotic stress tolerance in rice (Hu et al., 2006). In our case, dwarfism associated with these mutations (data not shown) could also be responsible for the increased susceptibility, as blast susceptibility is highly dependent on the development stage (Ribot et al., 2008). In contrast, the ALKE03/AJVE08 allelic pair in the putative NADP-malic enzyme do not display obvious morphological change. The involvement of this gene as a positive regulator of disease resistance is consistent with previous report (Parker et al., 2009) that the NADP-malic enzyme activity is increasing upon rice blast infection and that this activity seems to be suppressed during compatible interaction. The two other phenotypes potentially tagged are due to mutations in known genes (SPL7: Yamanouchi et al. (2002) and OsPHO2: Bari et al. (2006)).
Convergence of hints increase the chance of wet validation of tagging
To increase the chance of establishing linkage between phenotypic variation and insertional mutagens, it might be important to rely on a convergence of hints. Such convergence can be the coincidence of enhancer trapping–mediated reporter gene detection and phenotype in a given organ (such as the seed) or coincidence of phenotypes in independent lines which proved to contain allelic sequence-indexed insertions in a given gene. The latter has been illustrated by the lesion-mimic phenotype observed in 9 lines which was found to result from the KO allelic insertions in the PHO2 gene, which turned to be a hot spot for Tos17 inserts.
This example stresses the need to link phenotypic databases through a common vocabulary to fully harness the potential of that information for focusing molecular validation in forward genetics gene discovery studies. Phenotypic information resulting from field observations or specific screens is so far gathered in distinct web-accessible databases (http://tos.nias.affrc.go.jp/miyao/pub/tos17/index.html.en, http://rmd.ncpgr.cn/, http://trim.sinica.edu.tw, http://oryzatagline.cirad.fr). An important step remains to be accomplished to link these databases through the use of a common vocabulary to describe the altered traits. In that aim, the Plant Ontology Consortium (Jaiswal et al., 2005)-controlled vocabularies to describe mutant phenotypes appear the most suitable to ensure the possibility of future crossreferencing between different databases. Using the hierarchical architecture of TO would allow starting a cross-database search for a broad term (e.g. plant morphology) to establish a first list of lines from different insertion libraries that exhibit a convergent phenotype. Refining the search using narrower TO terms (e.g. plant height) and further examining the function of the genes interrupted in this set of lines (e.g. GA-related) may provide precious indications on the pathways involved in the elaboration of the altered trait.
Nowadays, rice insertion libraries represent more than 500 000 lines and 240 000 sequenced inserts (Krishnan et al., 2009). Although allowing the identification of an insertion in a majority of rice genes owing to the insertional bias of the mutagens in gene-rich regions, the number of sequence-indexed inserts remains largely insufficient and lags far behind Arabidopsis (385 000 insertion sites), the genome of which representing one-third of rice. Even with such extensive coverage, it has to be kept in mind that 12.2% of the Arabidopsis genes yet remain devoid of insertion (O’Malley and Ecker, 2010).
The 30 000 lines of the OTL library may have potential to contain a larger number of lesions in the rice genome covering many more genes, but their full molecular characterization is hardly achievable using conventional PCR-based methods. A promising perspective is the full-genome sequencing of thousands of mutants, which will soon become an affordable effort and would not only reveal virtually all the insertion sites of the known mutagens but also characterize additional hidden lesions (Zuryn et al., 2010). Such an effort would also largely unravel the causes of somaclonal variation residing at the nucleotide level.
This work was supported by several grants of the French plant genomics collaborative programme Génoplante (OsCrR1, OsCrGF, Osmu2, B1, and M1 projects) and of the National Research Agency (ANR) (ANR-GNP-05086G CAGRILL project). The technical assistance of Loic Fontaine, Rémy Michel, Christian Chaine, Frédéric Salles, Murielle Boumbou-Portefaix, Rosie Sevilla, Lucette Gracia, Laurent Rosso, Jérôme Veyret, Carole Maisonneuve, Nicolas Cennes, Anne Laure Latrilhe, Sylvie M’Bello, Vanessa Guerin and Véronique Chalvon is greatly acknowledged.