1. T-RFLP is an established tool for high-throughput studies of microbial communities, which can, with care and practical validation, be enhanced to aid identification of specific organisms in a community by associating T-RFs from experimental runs with predicted T-RFs from a set of existing sequences. A barrier to this approach is the laborious process of selecting diagnostic restriction enzyme(s) for further validation.
2. Here, we describe directed terminal restriction analysis tool (DRAT), a software tool that aids the design of directed terminal-restriction fragment length polymorphism (DT-RFLP) strategies, to separate DNA targets based on restriction enzyme polymorphisms. The software assesses multiple user-supplied DNA sequences, ranks optimal restriction endonucleases for separating targets and provides summary information including the length of diagnostic terminal restriction fragments. A worked example suggesting enzymes uniquely separating selected arbuscular mycorrhizal fungal groups is presented.
3. This tool greatly facilitates identification of diagnostic restriction enzymes for user-designated groups within complex populations and provides expected product sizes for all designated groups.
Terminal restriction fragment length polymorphism (T-RFLP) has been widely used in microbial ecology for differentiation of communities (Liu et al. 1997; Moeseneder et al. 1999) and for comparison of relative phenotype richness and structure of communities (LaMontagne, Schimel, & Holden 2003). Amplification with fluorescent primers is followed by digestion with restriction enzymes to produce labelled terminal-restriction fragments (T-RFs) that are separated by high-resolution capillary electrophoresis on automated DNA sequencers using internal size standards (Avaniss-Aghajani et al. 1994). The patterns of T-RFLP for a sample can be converted into numerical form and compared between samples using a variety of statistical approaches (Kent et al. 2003; Osborne et al. 2006).
A logical extension is the use of T-RFLP analysis to identify specific organisms in a community by associating T-RFs from experimental runs with predicted T-RFs from a database of existing sequences (Braker et al. 2001; Kaplan et al. 2001). This process requires careful validation that has, for example, been undertaken with, phytoplasmas (Hodgetts et al. 2007) and free-living nematodes (Donn et al. 2011). We refer to this as DT-RFLP (for ‘Directed T-RFLP’) as the design of the PCR primers and selection of restriction enzyme is carefully directed to allow identification of diagnostic T-RFs for specific target species or groups within samples.
Here, we present a software tool, directed terminal restriction analysis tool (DRAT), to aid selection of restriction enzyme(s) to differentially identify targeted species/groups within complex communities based on user-supplied sets of sequences. DRAT is a computer application written in Perl and C++ and requires Cygwin and Perl to run from the command line in MS Windows and will also run on Mac OS X and Linux. DRAT is available for non-commercial use free of charge from http://www.hutton.ac.uk/drat along with a tutorial, application examples and a troubleshooting guide. The main advantage of DRAT is that it is applicable to any organism/gene target and can be directed to search for enzymes, or combinations of enzymes within a single digestion to identify diagnostic T-RFs. Also, it can use input files with multiple examples of each type allowing understanding of intra-group variation. DRAT is aimed at users designing diagnostic or monitoring tools for specific species or ‘functional groups’ within complex communities. All output files can be viewed in Microsoft Excel.
Sequences to be analysed are input in a FASTA format with a specific naming convention for target groups using the first three characters of the sequence name. As DRAT is designed to search user-submitted sequences for suitable restriction enzymes, full-length query sequences (including primer) must be submitted to avoid misleading output. The command line for DRAT allows selection of the minimum acceptable distance (bp) between diagnostic T-RFs and the maximum number of restriction enzymes to be combined in a single digest and the enzyme(s) to be assessed (a single enzyme or all enzymes in the REBASE enzyme data file, http://rebase.neb.com/rebase/rebase.html).
Directed terminal restriction analysis tool consists of two components that run sequentially, a Fragment Length Generation Program (FLGP) written in Perl and an Enzyme Scoring Program (ESP) written in C++. The FLGP uses libraries from Tisdall (2003) to read the fasta input file and a REBASE restriction enzyme data file. The FLGP uses standard regular expression matching (Friedl 2006) to discover restriction sites in each sequence by each of the REBASE enzymes (unless a single enzyme is specified), creates a table of the 5′ and 3′ terminal fragment lengths, writes this table to a delimited text file and calls the ESP, passing the text file as a parameter. DRAT does not recognise IUPAC ambiguity codes and thus will not identify a restriction site containing ‘n’ or any other non-specific nucleotide. The ESP reads the terminal fragment lengths table and filters out isoschizomers. These enzymes or enzyme combinations are then scored for the ability to resolve groups using their unique fragment length combinations.
The ESP uses a novel heuristic algorithm to score the resolving power of an enzyme or enzyme combination. Briefly, for all sequences within a group, distances are calculated to all other group sequences to give the intra-group distances. Then, distances are calculated between each sequence within a group to the other groups, yielding inter-group distances. If, for any inter-group, sequence comparison distances are less than the specified minimum distance threshold, that group combination is considered irresolvable for that enzyme or enzyme combination and the maximum inter-group resolving score is decremented. For details of algorithm, see Appendix S1 in Supporting Information.
There are three primary outputs from the DRAT tool; these in combination allow selection of enzyme combinations for further validation and are as follows:
1 The .scores file, which ranks the top n enzymes (or enzyme combinations if more than one enzyme if selected for use in combination) by ability to produce diagnostic T-RFs, has four components:
A The total number of possible group combinations.
B Success: the number of pairwise group combinations each diagnostic enzyme or combination can distinguish.
C Where groups cannot be resolved, this gives the percentage of sequence pairs that drive this failure
D The fidelity of fragment lengths within each group.
This file suggests highly performing enzymes in a simple summary form.
2 Individual files produced for each of the diagnostic enzymes, demonstrating the resolving power for each group and their fidelity.
This summarises the strengths and weakness for each enzyme.
3 The .cuts file provides the predicted T-RFs for the diagnostic enzymes allowing checking of the suitability of digests for further validation and to ensure that selected digestions can be resolved with the intended sequencer.
From these outputs, it is a simple matter to select suitable candidate restriction enzyme(s) for practical testing. Whilst DT-RFLP is potentially a powerful tool for population analysis and monitoring, care must be taken during the design to avoid problems caused by, for example, enzymes with proximal restriction sites (Avis, Dickie, & Mueller 2006). As with all new applications, DT-RFLP tools require thorough validation with clones or type strain DNA to confirm suitability. Donn et al. (2011) have used DRAT to aid the design and validation of a DT-RFLP approach for the analysis of free-living soil nematodes.
As a simple worked example of the application of DRAT and to test the efficacy of the software, we have used data and clones published by Öpik et al. (2008) relating to a community structure analysis of boreal forest arbuscular mycorrhizal fungi. A subset of these cloned small-subunit ribosomal RNA gene (SSU rDNA) fragments and associated sequences generated by Öpik et al. (2008) was analysed by DRAT to identify diagnostic enzymes to separate artificially imposed groups. DT-RFLP analysis was also performed on the associated clones using the top-scoring enzyme suggested by DRAT to confirm the predicted T-RFs.
Using grouping based on relatedness (Fig. 1), sequences were compiled into a single.fasta format file with the group designators AAA, BBB, CCC and DDD named amtest.fasta. The group designation must form the first three letters of the sequence name to be recognised by DRAT. The.fasta file is submitted to the DRAT program using the following command line:
“Perl drat.pl –fasta=amtest -maxenz=1 –mindist=4 –topn=10 –sense=f –enzfile=bionetc.512 –enzname=all”; a description of parameters is given in Table 1.
Table 1. Explanation of the components of the command line used to submit sequences to DRAT
The name of the fasta file containing sequences (with group-specific names)
The maximum number of enzymes to try in combination
The minimum distance in base pairs (bp) threshold that is required to resolve peaks
The number of enzymes to report
f (forward) 5′, r (reverse) 3′ or b (both) fragments will be scored
The name of the file containing the enzyme cut data
All or the name of a specific enzyme to test
The.scores output file from our example data set is given in Table 2 and shows that, for this example, CviAII, FatI and Hin1II are able to fully resolve the four species groups in the supplied input file with a minimum of 4 bp between the diagnostic peaks. The remaining seven enzymes either produce diagnostic fragments under the selected minimum distance or produce two or more fragments within groups. All output files for this data set and full explanations are given in Appendix S2 (Supporting information).
Table 2. The .Scores file. This is a summary table of the top N enzymes (top 10 in our example) ranked according to ability to fully resolve all groups. The total number of group combinations is given followed by the number of groups that each enzyme (or set of enzymes) can resolve. Next, the average percentage of sequences that fail to be resolved by the enzyme is given followed by the average intra-group fidelity for the groups that each enzyme can resolve
Total group combinations
Average percent sequence fails
Average group fidelity
The expected T-RF sizes predicted by DRAT (See Appendix S2 for the predicted T-RFs) were compared with experimental results from DT-RFLP after a fragment of the SSU-RNA region was amplified from clones representing each of the sequence groups (A–D) using the primer NS31 (Simon, Lalonde, & Burns 1992), labelled with Fam and AM1 (Helgason et al. 1998). Two microlitres of purified SSU-RNA clone fragment was amplified as described (Öpik et al. 2008) before digestion with CviAII and separation as described by Uibopuu et al. (2009).
To simulate the presence of PCR product from more than one group in a single sample, we also combined groups to produce artificially mixed samples before digestion. Examples of the experimental output analysed in GeneMarker (Softgenetics, State College, PA, USA) compared to the DRAT prediction are shown in Fig. 2.
This example demonstrates the ability of DRAT to suggest suitable enzymes to resolve groups in a simple artificial case. Further exhaustive validation would be required prior to application in a real experimental situation. For instance, the difference between expected and observed T-RF sizes (between 0·2 and 4·4 bp, Table 3) whilst within the previously reported range (Kitts 2001; Kaplan & Kitts 2003; Bukovoskáet al. 2010) needs to be understood during the full validation stage of any DT-RFLP application. A more complete example including practical validation can be found in Donn et al. (2011) where DRAT has been used to aid the design of an approach to separate nematode groups followed by a complete validation of the suggested digest strategies to select an optimum solution both theoretically and practically.
Table 3. Expected (from the .cuts output from DRAT, see Appendix S2) and observed T-RFs produced by digestion with CviAII for each of the individual groups (A–D). Also shown are the T-RFs expected and observed when PCR products from two groups are combined before digestion with CviAII, simulating a natural mixed population
Combined A + C
106 + 168
103·1 + 166·8
Combined B + C
106 + 115
103·1 + 110·6
Directed terminal restriction analysis tool can aid the rapid identification of suitable restriction enzyme(s) for the design of DT-RFLP protocols that following validation be applied to detect specific targets within complex populations. The tool simplifies the design of new applications of DT-RFLP including analysing groups consisting of phylogenetically diverse members.
This work was supported the Scottish Government Rural and Environment Research and Analysis Directorate (RERAD, Workpackages 1.7 and 3.3) and by the Biotechnology and Biological Sciences Research Council (grant number BBS/S/K/2004/11271).