In the post-genomic sequencing era, an expanding portfolio of genomic technologies has been applied to the study of gene function. Reverse genetics approaches that provide targeted inactivation of genes identified by sequence analysis include TILLING (for Targeting Local Lesions IN Genomes). TILLING searches the genomes of mutagenized organisms for mutations in a chosen gene, typically single base-pair substitutions. This review covers practical aspects of the technology, ranging from building the mutagenized population to mutation discovery, and discusses possible improvements to current protocols and the impact of new genomic methods for mutation discovery in relation to the future of the TILLING approach.
In the post-genomic sequencing era, an expanding portfolio of genomic technologies has been applied to the study of gene function. Prominent among these are reverse genetics approaches that provide targeted inactivation of genes identified by sequence analysis. These include TILLING (for Targeting Local Lesions IN Genomes), which can be thought of as a ‘reverse’ counterpart to what is probably the most common tool for traditional genetic analysis: chemical mutagenesis and screening. TILLING searches the genomes of mutagenized organisms for mutations in a chosen gene, typically single base-pair substitutions. Since the first applications of this general strategy in Arabidopsis thaliana (McCallum et al., 2000) and Drosophila melanogaster(Bentley et al., 2000), TILLING has emerged as a robust approach whose advantages have been described in several publications and reviews (e.g. Gilchrist and Haughn, 2005; Stemple, 2004). Briefly, TILLING provides a range of mutant alleles and is potentially applicable to any organism that can be effectively mutagenized. Notably, TILLING can be applied to species for which genomic resources are limited.
The purpose of this review is to inform actual and potential users about practical aspects of the technology, ranging from building the mutagenized population to mutation discovery. We also discuss possible improvements to current protocols and the impact of new genomic methods for mutation discovery in relation to the future of the TILLING strategy.
Detection of induced mutations and rare polymorphisms
Mutation detection strategies fall into two classes, depending on whether the task is to determine the presence of known lesions or to discover previously unknown ones (Henikoff and Comai, 2003). Screening for the presence of known mutations or polymorphisms is the simpler task, because screening reagents can be customized to the individual lesion. For example, when a derived cleaved amplified polymorphic sequence (dCAPS) PCR primer has been designed to detect a specific single base difference (Neff et al., 2002), it is a straightforward matter to use it for accurate PCR-based genotyping on many individuals. However, the identity of the mutation must be known in advance.
Mutation discovery is a more challenging task. Previously unknown mutations are typically detected by resequencing (Nickerson et al., 2001). However, heterozygous polymorphisms appear as overlapping peaks, and it is challenging to distinguish these from sequencing errors and ambiguities, leading to high rates of false positive calls for automated trace analysis programs (Weckx et al., 2005). For the discovery of common alleles, resequencing should suffice, both because redundancy can confirm what otherwise might be interpreted as ambiguity, and because it is reasonably efficient to sequence every template if a large fraction have the same base change. However, redundancy and efficiency are directly proportional to allele frequency, such that resequencing is not expected to identify alleles that are present at <5% in a population. Thus, methods that are efficient for discovery of rare induced mutations should become increasingly competitive with resequencing as heterozygous allele frequencies decrease.
Relative to full sequencing, methods for mutation discovery provide incomplete information about the lesion being detected. Many methods for mutation discovery detect altered conformations of heteroduplexes between annealed wild-type and mutant strands. For example, denaturing HPLC (Underhill et al., 1997) can reproducibly detect all possible mismatches in heteroduplexed fragments based on the effect of DNA double helix destabilization on hydrophobicity, but it cannot identify the location, identity or number of mismatches on the fragment. This drawback is inherent to other methods that only identify a fragment with a mismatch by an altered property, such as denaturing gradient gel electrophoresis (DGGE) (Alharbi et al., 2005; Li et al., 2002) and single-strand conformational polymorphism analysis (SSCP) (Gross et al., 1999). These methods are therefore typically followed by full sequencing of the mutant fragment. Methods that identify the location and number of mismatches on a fragment, such as high-throughput TILLING (Colbert et al., 2001), require only partial single-pass sequencing to complete the discovery process. For some applications, the incomplete information present in mutation detection readouts will suffice, providing a fingerprint of an allele or genotype that can be used for genetic analysis or mapping. In the case of TILLING, fingerprints result from the use of high-resolution gels for cleaved fragment detection with nearly single base-pair resolution over >1 kb for multiple mismatches.
Mutagenesis protocols for TILLING and other discovery methods are no different from traditional mutagenesis procedures that have been standard tools of geneticists for several decades (Henikoff et al., 2004). This means that a typical forward genetic screen for phenotype generates a mutant population that can be TILLed to discover mutations in a gene of interest. In planning a phenotypic screen, it is often worthwhile to take into account the potential value of the mutagenized population for reverse genetics, and several considerations for establishing a TILLING population are discussed below.
Current technologies for mutation and polymorphism discovery
The enormous success of automated DNA sequencing continues to dominate genomics, and this is likely to continue for the foreseeable future. Replacement of gel-based sequencers with capillary sequencers removed the last impediments to a fully automated sequencing pipeline, resulting in nearly complete sequencing of the human genome (Lander et al., 2001; Venter et al., 2001). Resequencing using current capillary technology is clearly the dominant large-scale platform for the discovery of at least common single nucleotide polymorphisms (SNPs), and we expect further incremental increases in throughput and decreases in cost.
The large installed capacity and high cost of electrophoresis-based sequencers at genome centers, in shared-use facilities and in individual laboratories mean that, even if a new technology proves to be faster, more accurate and cheaper, the transition from these instruments is likely to take years. Nevertheless, new non-electrophoretic technologies promise to replace electrophoresis-based methods in the foreseeable future. For the purposes of this review, we will only consider current technologies that have the potential of becoming available to individual researchers, either as protocols that can be carried out in their own laboratories or as services. However, it should be realized that the future of mutation and SNP discovery depends on technological advances that are difficult to predict.
One technology that has potential for high-throughput SNP and mutation discovery is mass spectrometry (MS), which is familiar to most molecular biologists because it has become the central technology for proteomic analysis. Sequenom (San Diego, CA, USA) has developed a nucleic acid counterpart that is very effective for large-scale genotyping (Buetow et al., 2001). MS technologies have been rapidly improving for proteomics, and we anticipate that these improvements, together with the further development of protocols based on nucleic acid fragmentation, will result in higher throughput and lower cost for this alternative to full resequencing. While attractive for genotyping, it is not clear that MS is cost-competitive for mutation discovery, which requires a much higher level of accuracy and can be limited by PCR costs.
Another promising technology is pyrosequencing, a fundamentally simple technique that measures the release of a pyrophosphate with addition of a base during each round of DNA synthesis (Langaee and Ronaghi, 2005). By adding the four complementary bases one at a time, the sequence beyond the primer site can be determined by measuring the amount of pyrophosphate released in each round. An inherent advantage of pyrosequencing over conventional sequencing is that no size fractionation is needed. Pyrosequencing is constrained by the number of DNA synthesis rounds that can be practically obtained in a single reaction, and until recently its use has been limited to small-scale genotyping by extension of primers in microtiter plate assays. However, an automated pyrosequencer has recently been introduced by 454 Life Sciences, Inc. (Branford, CT, USA) for large-scale parallel pyrosequencing with impressive throughput and performance (Margulies et al., 2005). This makes pyrosequencing the most likely successor to the Sanger DNA polymerase method for large-scale shotgun sequencing. The fact that pyrosequencing involves simply monitoring a reaction makes it also an attractive method for genotyping applications. But, because of the high accuracy required for SNP and mutation discovery, the extent to which pyrosequencing will be competitive for these tasks remains to be seen.
A key feature of the 454® pyrosequencing machine (454 Life Sciences, Inc.) is that amplicons are produced and sequenced on micron-sized beads in an emulsion, resulting in substantial improvements in cost and throughput relative to current microplate-based sequencing. This nanoscale technology has also been used for the ‘polony’ sequencing strategy, in which bead-bound amplicons are hybridized successively to short labeled oligonucleotides. Very high specificity is achieved by ligation, which only occurs for perfectly matched duplexes (Shendure et al., 2005). Each round of ligation amounts to 26 nucleotides of sequence, so polony sequencing requires a much greater level of oversampling than Sanger sequencing (or pyrosequencing) in order to obtain comparably high accuracy. With an estimated cost saving of ninefold, it appears that polony sequencing, like pyrosequencing, will be a strong contender for shotgun resequencing tasks. However, the amount of oversampling required to obtain sufficient accuracy for discovery of rare mutations means that it is unlikely that polony sequencing will become competitive with current TILLING technologies for years to come. Nevertheless, the advantages of performing reactions on beads in an emulsion are so compelling that we would not be surprised if a technology using this system, but better designed for mutation discovery, is introduced in the near future.
Other promising alternatives to full resequencing for SNP and mutation discovery rely on directly comparing mutant to wild-type DNA strands by hybridization to arrayed oligonucleotide probes or by analysis of annealed heteroduplexed products. Hybridization-based assays detect the difference in duplex stability between a perfectly matched double helix and a mismatched one (Cutler et al., 2001). These assays can be performed on microarrays. Like their electronic counterparts, Affymetrix, Inc. (Santa Clara, CA, USA) photolithography-based microarrays continue to improve with incremental reductions in feature size, and they have become a standard for genotyping applications. Other technologies can produce programmable arrays. Nimblegen, Inc. (Madison, WI, USA) produces maskless photolithographic arrays, which can be programmed to synthesize custom oligonucleotides on a small enough scale to make custom designed microarrays affordable for individual researchers and small consortia (Nuwaysir et al., 2002). Similarly, Agilent (Palo Alto ,CA, USA) synthesizes oligonucleotides on glass slides using ink-jet technology (Hughes et al., 2001). In addition to microarrays, match/mismatch assays can be performed on immobilized oligonucleotide-loaded beads using the Illumina (San Diego, CA, USA) system (Gunderson et al., 2005).
Like DNA sequencing, confident detection of heterozygotes on microarrays and beads is inherently challenging, and prior knowledge of the SNP is important. Nevertheless, common SNP genotyping on microarrays has been highly successful (Cutler et al., 2001). However, for SNP and mutation discovery, the identity of the mismatch is unknown, requiring that all possible deleterious mismatches are individually represented on the array. This greatly increases the number of oligonucleotides required for detection and concomitantly decreases the signal-to-noise ratio; together with the difficulty in detecting heterozygotes it does not appear that the efficiency of genotyping will soon translate into efficient SNP and mutation discovery. This is not necessarily an insurmountable limitation, but, unless the cost of these services decreases dramatically, it is not clear that they will be competitive with other technologies in the near future.
Most other alternatives to full resequencing detect mismatches in re-annealed PCR products (Henikoff and Comai, 2003). Whereas perfect match/mismatch detection used for hybridization analysis depends on destabilizing short oligomers, denaturing HPLC (dHPLC) can detect slight differences between duplexes as long as several hundreds of base pairs (Underhill et al., 1997). This is accomplished by resolving heteroduplexes from homoduplexes by HPLC chromatography when they are maintained just below the overall melting temperature. Reductions in hydrophobicity result from local denaturation around single-base mismatches, causing reduced retention of mismatched duplexes on a reverse-phase column. In a similar manner, mismatched bases subject to enhanced local denaturation can also be resolved as they pass through an electrophoretic gel by DGGE. Automated instruments are available for both dHPLC and DGGE (Li et al., 2002; Premstaller et al., 2001).
Mismatches can also be discovered in several other ways. Conformational changes caused by SNPs can be detected on single-stranded molecules resolved on denaturing gradient gels by SSCP (Gross et al., 1999); however, this method has not been shown to detect all types of mismatches in different sequence contexts. Another method is based on mismatch-specific binding by proteins that have evolved for this purpose. For example, the Escherichia coli MutS protein specifically recognizes mismatched bases as part of the mismatch repair pathway, and this is the basis for a mismatch detection kit for use on heteroduplexes (Wagner and Dean, 2000). Mismatches in heteroduplexes have been detected by cleavage at the mispaired site using chemical reagents (Smooker and Cotton, 1993), although the most widely used mismatch cleavage reagents are enzymes (Babon et al., 2003; Till et al., 2004a,b). Among these are bacteriophage resolvases, which have evolved to recognize and cleave Holliday junctions. Purified resolvases effectively cleave single-base-pair mismatches of all types, and both T4 endonuclease VII and T7 endonuclease I have been used to detect cleaved end-labeled products displayed on electrophoretic gels (Youil et al., 1996). Because they allow the precise localization of multiple mismatches on a fragment, enzymatic cleavage reagents are especially suitable for high-throughput mutation detection.
The most popular enzymes for mismatch cleavage detection are members of the S1 nuclease family (Desai and Shankar, 2003). Both S1 and mung bean (Vigna radiata) nucleases have been used for this purpose since the mid-1970s (Shenk et al., 1975). However, the reaction conditions used were not ideal for cleaving single-base-pair mismatches, and use of these enzymes from single-base mismatch detection has been limited. Importantly, Yeung and colleagues showed that a member of the S1 nuclease family purified from celery (Apium graveolens) (CEL I) can reliably cleave mismatches on either strand on the 3′ side, and they went on to show that this cleavage activity is present in extracts from many plant sources (Oleykowski et al., 1998; Yang et al., 2000). We later showed that crude extracts from celery are sufficiently enriched in mismatch cleavage activity that they can be used for high-throughput applications without further purification (Till et al., 2004a,b). After clarification and dialysis, celery juice is aliquoted and stored frozen at −80°C. The high yields of enzyme activity that we obtained from celery juice relative to yields obtained after purification led us to more broadly investigate the basis for mismatch cleavage activity. CEL I shares with S1 and mung bean nucleases all critical residues needed for catalysis (Desai and Shankar, 2003). By modifying pH, temperature and salt conditions, we found that mismatch cleavage by commercially available mung bean nuclease was similar to that obtained using purified CEL I, Surveyor® (Transgenomic, Omaha, NE, USA; described below) and celery juice extract. S1 and plant endonucleases are secreted glycoproteins with single-strand specific ribonuclease and deoxyribonuclease activities thought to degrade nucleic acids for nutrient utilization. Kroeker et al. (1976) originally proposed that the ability of these enzymes to specifically cleave small loops results from transient destabilization of the duplex, causing a bulge with partially single-stranded character. Thus, as is the case for bacteriophage resolvases, mismatches appear to be detected by members of the S1 nuclease family based on an accidental feature of enzymes that have evolved an active site cleft able to accommodate only single-stranded nucleic acids.
Enzymatic mismatch cleavage for high-throughput detection
Our group has introduced step-by-step protocols that use mismatch cleavage enzymes for single-base mutation detection on a production scale (Till et al., 2003a). These protocols were first implemented by the Seattle TILLING Project (STP) for screening ethyl methanesulfonate (EMS)-mutagenized Arabidopsis populations produced for this purpose (Till et al., 2003b). Thus far, the project has delivered >6000 EMS-induced mutations to the Arabidopsis community (http://tilling.fhcrc.org:9366/arab/status.html). Because TILLING is a general strategy for reverse genetics, the project was expanded to include production-scale TILLING of other organisms for which suitably mutagenized lines were developed. STP has also recently begun a high-throughput TILLING service for Drosophilamelanogaster (Fly-TILL: http://tilling.fhcrc.org:9366/fly), and a service for maize (Zea mays) in collaboration with the Maize TILLING Project at Purdue University Lafayette, IN, USA; (http://genome.purdue.edu/maizetilling/).
The high-throughput TILLING protocol used by STP has also been adapted for SNP discovery, referred to as ‘ecotilling’, because it was first applied to the identification and mapping of Arabidopsis ecotypes (Comai et al., 2004). The much higher frequency of common SNPs between genetically heterogeneous individuals (typically 0.1% per base pair between genomes) compared with the frequency of induced mutations means that ecotilling is not likely to be the most efficient genotyping method, although ecotilling patterns might be used to follow segregating alleles in large populations. Ecotilling is especially well suited for the discovery of uncommon SNPs that are typically overlooked by full sequencing. These can potentially be identified and roughly mapped in DNA pools and easily confirmed by single-pass DNA sequencing. Ecotilling should be most efficient for screening of hundreds to thousands of individuals for rare SNPs, because screening ∼1.5-kb fragments in pools on microtiter plates allows confident detection and mapping that cannot be practically achieved by full sequencing.
Would I do this on my own?
The choice of whether or not to carry out TILLING in one's own laboratory depends on the scale of the project that is envisioned and on the availability of resources that are needed to perform TILLING efficiently. For small-scale mutation or polymorphism discovery, there are numerous systems, some of which are available as kits. For example, Transgenomic markets a Surveyor® kit that uses mismatch cleavage by CEL I to identify mutations in double-stranded DNA fragments using an agarose gel assay (Qiu et al., 2004). This and other mutation detection kits can be applied to small-scale projects, such as screening for mutations and polymorphisms in single genes. However, the cost of these kits must be compared with the cost of full sequencing, given the ever-increasing DNA sequencing capacity that is widely available.
Once the number of gene targets and the size of a population reach a point at which kits or full DNA sequencing becomes too expensive or impractical, then it is worthwhile to consider TILLING (Gilchrist and Haughn, 2005). This raises the question of what resources are needed for TILLING. Instruments such as PCR cyclers and heating blocks are found in typical molecular biology laboratories; however, the capability of performing automated polyacrylamide gel analysis might not be readily available. If suitable gel-running instruments are present in a well-equipped laboratory or in an institutional shared resource, then TILLING is an especially attractive option, because it will not require a major equipment purchase. For example, Can-TILL (Vancouver, BC, Canada; http://www.botany.ubc.ca/can-till/) provides ecotilling services to Canadian crop and forestry researchers using LI-COR (Lincoln, NE, USA) gel analyzers at the University of British Columbia (Gilchrist et al., 2006). As capillary-based instruments have generally replaced slab gel-based instruments for DNA sequencing, one might find that existing slab gel analyzers are underutilized and available to be used for TILLING.
Most reagents needed for TILLING are routinely available. However, the CEL I enzyme used for mismatch cleavage only became commercially available in late 2003 when Transgenomic began marketing the Surveyor® kit (Qiu et al., 2004). Until that time, we had received many requests for CEL I endonuclease, which we had purified for our own use based on a modification of a method reported by Anthony Yeung and co-workers (Yang et al., 2000). Requests were forwarded to Yeung, who generously made available purified enzyme to researchers interested in performing TILLING on their own. As mentioned previously, we showed that celery juice extract and commercially available mung bean nuclease provide alternative cleavage enzyme sources (Till et al., 2004a,b). Although our study suggested that a variety of commercial and homemade sources of enzyme can be used for TILLING, this conclusion does not necessarily extend to other reported applications of these enzymes, such as double-stranded cleavage (Sokurenko et al., 2001).
With suitable equipment and affordable cleavage enzymes available, the cost of TILLING and ecotilling becomes comparable to that of other PCR-based applications, such as amplified fragment length polymorphism (AFLP) analysis (Vos et al., 1995). In some cases, a mutagenized population that has been produced for phenotypic screening can also be used for TILLING, making this an attractive option (Henikoff and Comai, 2003). Nevertheless, high-throughput TILLING remains challenging because of the many steps involved, beginning with sample preparation and ending with sequencing of the mutations discovered. To address this problem, STP has established a workshop program funded by the National Science Foundation (NSF) Plant Genome Research Project. Two-day TILLING workshops are intended to familiarize participants with the latest in TILLING technology and procedures (http://tilling.fhcrc.org:9366/files/Workshops.html). These workshops also serve as clearing houses for relevant information and for discussions of technical and logistic problems at all stages of the process. Although the actual screening process is similar for nearly all organisms, the availability of genomic sequence, the ploidy of the organism and the optimal size of a population are all factors that should be considered before a decision is made to embark on a TILLING project, and frequent STP workshops provide a forum for discussion. Thus far, >100 participants have attended workshops in groups of three to five individuals interested in establishing TILLING on their own.
TILLING is also facilitated by custom software developed by our group with funding from the NSF Division of Biological Infrastructure. CODDLE (for Codons Optimized to Detect Deleterious LEsions; http://www.proweb.org/coddle/) is a free Web-based tool that serves as a ‘front end’ for the TILLING process (Till et al., 2003b). CODDLE obtains genomic and protein-coding information from publicly available or user-supplied sources and uses it to identify optimal regions and choose primers for amplification. CODDLE has a large variety of input options that make it suitably flexible both for TILLING services and for individuals who perform TILLING on their own. PARSESNP (for Project Aligned Related Sequences and Evaluate SNPs; http://www.proweb.org/parsesnp/) is a free Web-based tool that evaluates mutations and polymorphisms, providing mapping information, predictions of deleterious effects on an encoded protein, changed restriction sites and other information to facilitate phenotypic analysis (Taylor and Greene, 2003). Like CODDLE, PARSESNP is a general purpose tool that can analyze mutation or polymorphism information that is not generated by TILLING or ecotilling.
TILLING gel detection
Although our own experience has been exclusively with LI-COR 4200 and 4300 series slab gel analyzers, and all of our protocols are customized for these instruments, other slab and capillary instruments should also be adaptable for TILLING (Perry et al., 2003; Augustin et al., 2005). Note, however, that older single-channel LI-CORs are unsuitable, and four-color instruments might not provide sufficient spectral separation to allow the use of all four dye labels. Also, while the ease of loading capillary instruments and their high throughput make them potentially highly desirable for TILLING, they are relatively costly. Furthermore, the advantage of physically separating tracks with capillaries which makes these instruments superior to slab gels may be a disadvantage for TILLING, which currently relies on the background patterns present in all lanes to identify novel bands (Colbert et al., 2001).
We have recently introduced a software tool that facilitates the analysis of TILLING gel images (Zerr and Henikoff, 2005). GelBuddy is a freely available Windows or Mac program that uses background banding, a reproducible feature of PCR using end-labeled primers, to automatically call lanes and calibrate based on molecular weight (http://www.proweb.org/gelbuddy/index.html). GelBuddy allows interactive band calling and creates a file that can be parsed for automated database entry. This makes GelBuddy especially valuable for ecotilling, where the sheer number of polymorphic bands that need to be entered makes the manual process of entering lanes and mobilities into a database tedious and potentially error-prone. STP and other TILLING and ecotilling services have adopted GelBuddy, which also been adapted for AFLP gel images.
High-throughput TILLING services
Efficiency and economy of scale can be achieved by proper planning and implementation of high-throughput TILLING services. A TILLING service or shared resource consists of several components (Figure 1): a user interface through which orders for the TILLING of a target gene are designed and placed; an informatics and database system that follows each order and organizes the data generated at each step of TILLING informing the user of the outcome; a DNA collection that has been quality-tested, standardized in concentration of all samples, and cross-indexed to lines available at a stock center; a high-throughput laboratory that PCR-amplifies the target gene, discovers the mutations, and sequence-validates them. More specifically, the high-throughput method uses five successive steps: PCR amplification, mismatch cleavage by CEL I digestion, gel analysis, examination of the gels to detect CEL I-cut products, and sequence determination of each mutation. The last step consists in turn of three steps: PCR amplification of the target gene from the putative mutant individual, sequencing reaction, and trace analysis. The STP has developed several efficiency and cost-cutting measures (Figure 1), some of which are similar to those implemented in high-throughput sequencing, such as the use of 384-well plates for PCR reactions. Slab gels must be manually cast and loaded by insertion of a membrane comb with 96 robotically spotted samples. In addition, the STP has found that the set-up, running and maintenance of complex liquid-handling robotics require highly specialized technicians, which increases complexity and decreases flexibility. Automated liquid handling at the STP is limited instead to the use of high-efficiency 96- and 384-tip pipettors and of comb-loading robots. The technical personnel employed are typically at the postgraduate level and are partially specialized in different tasks according to skills and seniority. Coordination is achieved by monitoring the flow of orders through the steps of the method and adjusting the focus to remove slowdowns and logjams. Strict procedural protocols are emphasized to ensure reproducibility. Weekly meetings are used to review procedures, to troubleshoot, and to discuss operational adjustments. A strength of this set-up is that multiple organisms can be processed using only minor adjustments.
The STP offers TILLING services for Arabidopsis and Drosophila melanogaster. In addition it has collaborated in the development of a TILLING service for maize with Cliff Weil at Purdue University. TILLING services for rice (Oryza sativa), soybean (Glycine max), and tomato (Lycopersicon esculentum) are currently being explored.
Producing a mutagenized population suitable for TILLING requires careful consideration of several factors, including the genetic structure of the target population, the choice of mutagen and method of application, the mode of sampling, the preparation and quality of DNA and the pooling strategy. The genetic structure can be simple when highly homozygous, inbreeding species are considered. Heterozygosity and dioecy are complicating, but not insurmountable factors (Draper et al., 2004; Slade and Knauf, 2005; Wienholds et al., 2003). To minimize variation, one should choose a single parent (or as few as possible) from which several thousand progeny can be produced in a single or multiple generations. Whenever possible, one should choose the line or accession for which the most sequencing information is available. Mutagens that induce point mutations with high efficiency are desirable. Among alkylating agents, EMS alkylates G residues resulting in transition mutations and it is widely used because it is reliable and a high density of mutations is often obtained. Mutagenic treatments that induce deletions result in fewer changes per genome and are unsuitable for TILLING because they require large population sets and the number of small deletions, which could be detected easily by the CEL I assay (Comai et al., 2004), is unknown (Li et al., 2001). Another method of mutagenesis that has not yet been used for TILLING is the induction of genetic lesions that compromise fidelity of replication or DNA repair. This strategy is effective in bacteria and could potentially be useful in eukaryotes (Goldsby et al., 2002; Tago et al., 2005).
The intensity of mutagenesis that is applied to the target genome is an important component of a TILLING strategy. One factor affecting the outcome appears to be the genetic make-up of the target organism, which results in similar treatments producing widely different mutation densities in different species. Mutagenesis is usually applied in a manner that produces some level of lethality in the treated organismal stage (e.g. seed) while at the same time allowing sufficient fertility. Arabidopsis can be mutagenized to a satisfactory level (producing 1 mutation/170 kb of DNA) without excessive lethality of the treated seed. Increasing the concentration of EMS produces increasing sterility in M1 plants. By contrast, rice mutagenized to produce about 50% seed lethality results in a much lower mutation density (about 1 mutation/Mb of DNA; Wu et al., 2005) and a similar experience has been reported for barley (Caldwell et al., 2004). Because these organisms are all diploid, it seems reasonable to hypothesize that physiological differences underlie the enhanced sensitivity of certain species to lethality caused by alkylating agents. Indeed, the published mutation densities determined by TILLING differ as much as 40-fold. (Note that the numbers provided are only approximate because of different assumptions often used in the calculations.) The highest mutation density was obtained for hexaploid wheat (1 mutation/25 kb), followed by tetraploid wheat (1 mutation/40 kb) (Slade et al., 2005), Drosophila melanogaster (1 mutation/150 kb) (Winkler et al., 2005), Arabidopsis (1 mutation/170 kb) (Greene et al., 2003), zebrafish (1 mutation/230 kb to 1 mutation/500 kb) (Draper et al., 2004; Wienholds et al., 2003), maize (1 mutation/500 kb) (Till et al., 2004a,b), rice (1 mutation/500 kb)(Wu et al., 2005) and barley (1 mutation/Mb) (Caldwell et al., 2004). These data suggest a possible role for polyploidy, but not genome size, in conferring tolerance to high mutation density, a finding that is not surprising (Stadler, 1929). In conclusion, information on the molecular basis for variations in response to mutagenic treatments in plants is lacking and it would be useful for improving TILLING populations.
The method of mutagenesis is critical and affects the way the resulting population is sampled. Mutagenesis is typically carried out by soaking seed in the mutagen of choice. This method targets both parental genomes. A time-saving alternative is the treatment of pollen, which is the method of choice for maize (Till et al., 2004a,b). Individuals arising from mutagenized (the M1 generation) are unsuitable for TILLING because they are chimeric and the M2 generation must be produced, whereas M1 individuals produced by fertilization with mutagenized pollen originate each from a single zygotic cell and can be TILLED (Figure 2). Although mutations are only induced in the paternally contributed genome, all can be potentially sampled, whereas 25% of induced mutations in treated seed are lost in the meiosis leading from the M1 to the M2. As a result, the number of mutations potentially sampled in M2s derived from seed-treated M1s is only 50% higher than those in M1 plants produced from mutagenized pollen. In addition, because all of the latter individuals are heterozygous for the mutations, the families derived from selfing, which are inventoried as seed, will always segregate for any mutant trait, facilitating genetic analysis.
Sampling of the DNA is best carried out in the M2s of seed-treated populations and in the M1s of pollen-treated populations. In the following we will refer to seed populations for simplicity. The Arabidopsis population routinely used for production TILLING consists of 3072 fertile M2 individuals, but the target population for each species will vary, depending on the mutation rate. The population resource could be made more valuable by providing phenotypic information, so that mutations discovered by TILLING can be associated in silico to phenotypes in the corresponding lines (Menda et al., 2004; Wu et al., 2005). Often, it is more practical to test a smaller population before entering production TILLING by carrying out a pilot TILLING experiment involving approximately 800 individuals and five or more target genes. Each sampled individual should be independent to avoid resampling the same mutagenized chromosomes, and this is achieved by deriving a single M2 from each M1. If the M2s are not available, sampling can entail pooling multiple individuals of an M3 family to reconstruct their M2 parent. This method, however, requires careful balancing of all sib contributions to the pool to avoid skewing the representation.
Purity of the DNA preparation is also important: DNA suitable for TILLING is typically of good average size (at least 15 kb) and stable under standard storage conditions. Additionally, it has been the experience of the STP that not all of the preparations meeting these criteria that have been provided by our collaborators are ideal for TILLING. In some cases, a high background of aberrant PCR products is obtained, which reduces the overall efficiency of the process. Thus, it is important to test each DNA purification method and to standardize DNA preparations accordingly. Individual DNA preparations are mixed to achieve the desired pooling arrangement. The STP now uses eightfold pools that are arranged in a bidimensional scheme so that display of the same mutations in two pools identifies the mutant individual. Importantly, satisfactory pooling can only be achieved if each individual DNA is present at nearly the same concentration.
Sequence resources and PCR target design
A crucial resource for TILLING is the availability of sequence information enabling the design of primers that target DNA sequences encoding the protein or domain of interest and allowing robust PCR amplification. At first sight this seems a requirement that is easy to satisfy, as at least some sequence information is available for most experimental organisms or can easily be generated. Nevertheless, problems arise in amplifying a fragment suitable for TILLING because of incomplete genomic sequence information. In organisms for which only partial genomic sequence information is available, primers can hybridize to unknown paralogous and homeologous sequences and amplify spurious products. In the case of organisms for which the whole genome has been sequenced, it is simple to verify that the chosen primers can hybridize efficiently only to the target. The problem of collateral targets is particularly acute in the case of polyploids. Consider, for example, the case of tetraploids: if all four copies of a gene present in the genome amplify in a PCR reaction, an eightfold pool represents 32 alleles instead of 16. Thus, the pooling scheme used for TILLING a polyploid should take this dilution into account. In the case of allopolyploids, the problem of defining a target is even more acute because slightly diverged copies are expected for each gene. If the sequence of all homeologous loci of interest is not known, each individual amplification product must be characterized to determine if it is heterogeneous. Sequencing the amplified product can reveal gross heterogeneity, but may not detect sequences present in a small relative concentration that can still complicate the mismatch cleavage assay. If the sequence of all related genes is known, the degree of divergence between homeologous genes may allow specific PCR amplification. For example, gene-specific primers were required for optimal TILLING of the waxy loci of wheat (Slade et al., 2005). Thus, TILLING of genomes that are not sequenced in their entirety is possible, but will be less efficient because of the need to verify the outcome of PCR amplification.
The mismatch-cleavage detection method used in TILLING screens for mutations in a window corresponding to an amplified PCR product of up to 1.5 kb in size. The choice of a window must take into account the gene model, the effect on codons and splice junctions (EMS alkylates G residues), and sequence conservation of the encoded protein. The task of designing primers suitable for TILLING is simplified by CODDLE, which can fetch, analyze and design primers interactively. For example, after a user enters the cDNA and genomic sequence for the gene of interest in the CODDLE interface, protein alignment blocks derived from a reverse PSI BLAST search (Altschul et al., 1997) are returned. In this way, conserved protein residues are identified for the user, who can accept the default window choice or customize the search for alignment blocks by exploring alternatives that are provided as links. Once protein homology has been identified, CODDLE proceeds to choose a target window that maximizes the probability of recovering missense mutations and truncations based on properties of the mutagen and the organism and on user preferences. A graphical display of the search results is provided, indicating the region within the amplicon where mismatch detection is sensitive; in practice, this excludes the first 80 bases on each end (Till et al., 2003a). The user can accept CODDLE's choices or alter them interactively. Finally, primer pairs are chosen by CODDLE using the PRIMER3 algorithm (Rozen and Skaletsky, 2000) with default parameters for PCR amplification that are suitable for TILLING. Typically, the challenge of PCR-amplifying the target fragment increases in difficulty with increasing genomic size and with deviations from the optimal G + C content. Organism-specific procedures and rules can be devised to address these problems. For example, rice amplicons are relatively GC-rich, necessitating the addition of the helix-destabilizing solvent tetramethylsulfoxide to amplify difficult templates (Chakrabarti and Schutt, 2002).
The fast pace of development in genomic technologies has the potential of making any method obsolete in a relatively short period of time. In the case of TILLING, obsolescence will occur if a general technology for easy modification of target genes becomes available. Such technology, homologous gene replacement, is routine in yeast, but is tedious in other organisms, such as mouse, Drosophila melanogaster and Physcomitrella patens, and is not easily transferred to other systems. Improvements, such as the selective enzymatic cleavage of chromosomal loci, may become widely applicable and achieve efficient homologous recombination in recalcitrant systems (Urnov et al., 2005). The need to regenerate an individual from a single cell after selection, however, would still limit the rapid application of this new method to several important species. Therefore, it seems likely that TILLING will remain a valuable strategy for the foreseeable future.
Advances in genomic technology, such as improved SNP detection or sequencing, will make TILLING easier and simpler. A good candidate for TILLING improvement is resequencing. Incremental and quantum improvements in sequencing efficiency should make it more cost effective. A new, apparently powerful method, however, may not be directly transferable to TILLING. For example, improvements that allow very efficient sequencing of randomly generated genomic or cDNA fragments (Margulies et al., 2005; Shendure et al., 2005) may not allow sufficiently accurate resequencing of the same gene segment in thousands of individuals without considerable modification. In the distant future it may be possible to sequence the whole orfeome of an individual cheaply enough that all mutations in protein-coding regions of a few thousand mutagenized individuals can be determined. TILLING would then become an in silico process that uses extensive sequencing databases to identify suitable variation in the organism of choice. Because the cost of such a resource would be considerable, it would be advantageous to find genetic or cultural means to immortalize the M2 individuals to allow long-term utilization by the scientific community. Finally, improvements in mutagenesis could go a long way toward making TILLING efficient in most organisms by allowing the production of populations of individuals that display a suitable mutation density.
In conclusion, TILLING extends the long-established practice of using existing variation for functional genetic discovery. Its dependence on technology that discovers sequence variation means that forthcoming advances in SNP detection and in DNA sequencing can enhance TILLING. As a result, we can expect TILLING to continue to evolve in the coming years.
The Seattle TILLING Project is support by grants from the Plant Genome Research Program and Arabidopsis 2010 Initiative of the National Science Foundation, from the Genome Program of the US Department of Agriculture-National Research Initiative, and from the Rockefeller Foundation.