Directed terminal restriction analysis tool (DRAT): an aid to enzyme selection for directed terminal-restriction fragment length polymorphisms


*Correspondence author. E-mail:


1. T-RFLP is an established tool for high-throughput studies of microbial communities, which can, with care and practical validation, be enhanced to aid identification of specific organisms in a community by associating T-RFs from experimental runs with predicted T-RFs from a set of existing sequences. A barrier to this approach is the laborious process of selecting diagnostic restriction enzyme(s) for further validation.

2. Here, we describe directed terminal restriction analysis tool (DRAT), a software tool that aids the design of directed terminal-restriction fragment length polymorphism (DT-RFLP) strategies, to separate DNA targets based on restriction enzyme polymorphisms. The software assesses multiple user-supplied DNA sequences, ranks optimal restriction endonucleases for separating targets and provides summary information including the length of diagnostic terminal restriction fragments. A worked example suggesting enzymes uniquely separating selected arbuscular mycorrhizal fungal groups is presented.

3. This tool greatly facilitates identification of diagnostic restriction enzymes for user-designated groups within complex populations and provides expected product sizes for all designated groups.


Terminal restriction fragment length polymorphism (T-RFLP) has been widely used in microbial ecology for differentiation of communities (Liu et al. 1997; Moeseneder et al. 1999) and for comparison of relative phenotype richness and structure of communities (LaMontagne, Schimel, & Holden 2003). Amplification with fluorescent primers is followed by digestion with restriction enzymes to produce labelled terminal-restriction fragments (T-RFs) that are separated by high-resolution capillary electrophoresis on automated DNA sequencers using internal size standards (Avaniss-Aghajani et al. 1994). The patterns of T-RFLP for a sample can be converted into numerical form and compared between samples using a variety of statistical approaches (Kent et al. 2003; Osborne et al. 2006).

A logical extension is the use of T-RFLP analysis to identify specific organisms in a community by associating T-RFs from experimental runs with predicted T-RFs from a database of existing sequences (Braker et al. 2001; Kaplan et al. 2001). This process requires careful validation that has, for example, been undertaken with, phytoplasmas (Hodgetts et al. 2007) and free-living nematodes (Donn et al. 2011). We refer to this as DT-RFLP (for ‘Directed T-RFLP’) as the design of the PCR primers and selection of restriction enzyme is carefully directed to allow identification of diagnostic T-RFs for specific target species or groups within samples.

A number of tools have been developed to assist in analysis of T-RFLP data or in the selection of enzymes. In silico digestions are possible with MICA (Kent et al. 2003), TAP T-RFLP (Marsh et al. 2000) and TReFID (Rösch & Bothe 2005) but are limited to ribosomal genes. REMA (Szubert et al. 2007), ARB (Dicke, Kolb, & Braker 2005) and TRiFLe (Junier, Junier, & Witzel 2008) perform in silico digests of user-submitted sequences but treat every sequence as unique and therefore require significant additional searches to confirm a species- or group-specific T-RF. REPK (Collins & Rocap 2007) allows taxonomic rank discrimination but works on a restricted enzyme database and requires multiple individual digestion steps.

Here, we present a software tool, directed terminal restriction analysis tool (DRAT), to aid selection of restriction enzyme(s) to differentially identify targeted species/groups within complex communities based on user-supplied sets of sequences. DRAT is a computer application written in Perl and C++ and requires Cygwin and Perl to run from the command line in MS Windows and will also run on Mac OS X and Linux. DRAT is available for non-commercial use free of charge from along with a tutorial, application examples and a troubleshooting guide. The main advantage of DRAT is that it is applicable to any organism/gene target and can be directed to search for enzymes, or combinations of enzymes within a single digestion to identify diagnostic T-RFs. Also, it can use input files with multiple examples of each type allowing understanding of intra-group variation. DRAT is aimed at users designing diagnostic or monitoring tools for specific species or ‘functional groups’ within complex communities. All output files can be viewed in Microsoft Excel.

Program overview

Sequences to be analysed are input in a FASTA format with a specific naming convention for target groups using the first three characters of the sequence name. As DRAT is designed to search user-submitted sequences for suitable restriction enzymes, full-length query sequences (including primer) must be submitted to avoid misleading output. The command line for DRAT allows selection of the minimum acceptable distance (bp) between diagnostic T-RFs and the maximum number of restriction enzymes to be combined in a single digest and the enzyme(s) to be assessed (a single enzyme or all enzymes in the REBASE enzyme data file,

Directed terminal restriction analysis tool consists of two components that run sequentially, a Fragment Length Generation Program (FLGP) written in Perl and an Enzyme Scoring Program (ESP) written in C++. The FLGP uses libraries from Tisdall (2003) to read the fasta input file and a REBASE restriction enzyme data file. The FLGP uses standard regular expression matching (Friedl 2006) to discover restriction sites in each sequence by each of the REBASE enzymes (unless a single enzyme is specified), creates a table of the 5′ and 3′ terminal fragment lengths, writes this table to a delimited text file and calls the ESP, passing the text file as a parameter. DRAT does not recognise IUPAC ambiguity codes and thus will not identify a restriction site containing ‘n’ or any other non-specific nucleotide. The ESP reads the terminal fragment lengths table and filters out isoschizomers. These enzymes or enzyme combinations are then scored for the ability to resolve groups using their unique fragment length combinations.

The ESP uses a novel heuristic algorithm to score the resolving power of an enzyme or enzyme combination. Briefly, for all sequences within a group, distances are calculated to all other group sequences to give the intra-group distances. Then, distances are calculated between each sequence within a group to the other groups, yielding inter-group distances. If, for any inter-group, sequence comparison distances are less than the specified minimum distance threshold, that group combination is considered irresolvable for that enzyme or enzyme combination and the maximum inter-group resolving score is decremented. For details of algorithm, see Appendix S1 in Supporting Information.


There are three primary outputs from the DRAT tool; these in combination allow selection of enzyme combinations for further validation and are as follows:

  • 1 The .scores file, which ranks the top n enzymes (or enzyme combinations if more than one enzyme if selected for use in combination) by ability to produce diagnostic T-RFs, has four components:
  • A The total number of possible group combinations.
  • B Success: the number of pairwise group combinations each diagnostic enzyme or combination can distinguish.
  • C Where groups cannot be resolved, this gives the percentage of sequence pairs that drive this failure
  • D The fidelity of fragment lengths within each group.
  • This file suggests highly performing enzymes in a simple summary form.

  • 2 Individual files produced for each of the diagnostic enzymes, demonstrating the resolving power for each group and their fidelity.
  • This summarises the strengths and weakness for each enzyme.

  • 3 The .cuts file provides the predicted T-RFs for the diagnostic enzymes allowing checking of the suitability of digests for further validation and to ensure that selected digestions can be resolved with the intended sequencer.

From these outputs, it is a simple matter to select suitable candidate restriction enzyme(s) for practical testing. Whilst DT-RFLP is potentially a powerful tool for population analysis and monitoring, care must be taken during the design to avoid problems caused by, for example, enzymes with proximal restriction sites (Avis, Dickie, & Mueller 2006). As with all new applications, DT-RFLP tools require thorough validation with clones or type strain DNA to confirm suitability. Donn et al. (2011) have used DRAT to aid the design and validation of a DT-RFLP approach for the analysis of free-living soil nematodes.

Application example

As a simple worked example of the application of DRAT and to test the efficacy of the software, we have used data and clones published by Öpik et al. (2008) relating to a community structure analysis of boreal forest arbuscular mycorrhizal fungi. A subset of these cloned small-subunit ribosomal RNA gene (SSU rDNA) fragments and associated sequences generated by Öpik et al. (2008) was analysed by DRAT to identify diagnostic enzymes to separate artificially imposed groups. DT-RFLP analysis was also performed on the associated clones using the top-scoring enzyme suggested by DRAT to confirm the predicted T-RFs.

Using grouping based on relatedness (Fig. 1), sequences were compiled into a single.fasta format file with the group designators AAA, BBB, CCC and DDD named amtest.fasta. The group designation must form the first three letters of the sequence name to be recognised by DRAT. The.fasta file is submitted to the DRAT program using the following command line:

Figure 1.

 Phenogram tree resulting alignment of Glomeromycota sequences used in this study (Öpik et al. 2008). Sequences are of the small-subunit ribosomal RNA gene (SSU rDNA) fragment between the NS31 and AM1 primers, and numbers indicate bootstrap values over 1000 iterations. The phylogenetic analysis was carried out following ClustalW2 for sequence alignment (, and the tree was created in Topali V2 (Milne et al. 2008) using the F84 + Gamma Rates Model/Neighbor Joining model. Groups A–E were artificially created for this study.

“Perl –fasta=amtest -maxenz=1 –mindist=4 –topn=10 –sense=f –enzfile=bionetc.512 –enzname=all”; a description of parameters is given in Table 1.

Table 1.   Explanation of the components of the command line used to submit sequences to DRAT
fastaThe name of the fasta file containing sequences (with group-specific names)
maxenzThe maximum number of enzymes to try in combination
mindistThe minimum distance in base pairs (bp) threshold that is required to resolve peaks
topnThe number of enzymes to report
sensef (forward) 5′, r (reverse) 3′ or b (both) fragments will be scored
enzfileThe name of the file containing the enzyme cut data
enznameAll or the name of a specific enzyme to test

The.scores output file from our example data set is given in Table 2 and shows that, for this example, CviAII, FatI and Hin1II are able to fully resolve the four species groups in the supplied input file with a minimum of 4 bp between the diagnostic peaks. The remaining seven enzymes either produce diagnostic fragments under the selected minimum distance or produce two or more fragments within groups. All output files for this data set and full explanations are given in Appendix S2 (Supporting information).

Table 2.   The .Scores file. This is a summary table of the top N enzymes (top 10 in our example) ranked according to ability to fully resolve all groups. The total number of group combinations is given followed by the number of groups that each enzyme (or set of enzymes) can resolve. Next, the average percentage of sequences that fail to be resolved by the enzyme is given followed by the average intra-group fidelity for the groups that each enzyme can resolve
EnzymeTotal group combinationsSuccessAverage percent sequence failsAverage group fidelity

The expected T-RF sizes predicted by DRAT (See Appendix S2 for the predicted T-RFs) were compared with experimental results from DT-RFLP after a fragment of the SSU-RNA region was amplified from clones representing each of the sequence groups (A–D) using the primer NS31 (Simon, Lalonde, & Burns 1992), labelled with Fam and AM1 (Helgason et al. 1998). Two microlitres of purified SSU-RNA clone fragment was amplified as described (Öpik et al. 2008) before digestion with CviAII and separation as described by Uibopuu et al. (2009).

To simulate the presence of PCR product from more than one group in a single sample, we also combined groups to produce artificially mixed samples before digestion. Examples of the experimental output analysed in GeneMarker (Softgenetics, State College, PA, USA) compared to the DRAT prediction are shown in Fig. 2.

Figure 2.

 Output from GeneMarker (Softgenetics) showing example traces for each of the individual groups (A–D) and for artificially mixed samples (E–F). Y-axis indicates fluorescence intensity; X-axis is T-RF-size (bp). Primary T-RFs size is given. Sample E contained groups A + C, and sample F contained groups B + C. The secondary peak present in the output for group D is believed to be the result of an incomplete digestion of the PCR product.

This example demonstrates the ability of DRAT to suggest suitable enzymes to resolve groups in a simple artificial case. Further exhaustive validation would be required prior to application in a real experimental situation. For instance, the difference between expected and observed T-RF sizes (between 0·2 and 4·4 bp, Table 3) whilst within the previously reported range (Kitts 2001; Kaplan & Kitts 2003; Bukovoskáet al. 2010) needs to be understood during the full validation stage of any DT-RFLP application. A more complete example including practical validation can be found in Donn et al. (2011) where DRAT has been used to aid the design of an approach to separate nematode groups followed by a complete validation of the suggested digest strategies to select an optimum solution both theoretically and practically.

Table 3.   Expected (from the .cuts output from DRAT, see Appendix S2) and observed T-RFs produced by digestion with CviAII for each of the individual groups (A–D). Also shown are the T-RFs expected and observed when PCR products from two groups are combined before digestion with CviAII, simulating a natural mixed population
GroupExpected (bp)Observed (bp)
Combined A + C106 + 168103·1 + 166·8
Combined B + C106 + 115103·1 + 110·6


Directed terminal restriction analysis tool can aid the rapid identification of suitable restriction enzyme(s) for the design of DT-RFLP protocols that following validation be applied to detect specific targets within complex populations. The tool simplifies the design of new applications of DT-RFLP including analysing groups consisting of phylogenetically diverse members.


This work was supported the Scottish Government Rural and Environment Research and Analysis Directorate (RERAD, Workpackages 1.7 and 3.3) and by the Biotechnology and Biological Sciences Research Council (grant number BBS/S/K/2004/11271).