Detection of phytopathogens of the genus Dickeya using a PCR primer prediction pipeline for draft bacterial genome sequences

Authors


E-mail: ian.toth@hutton.ac.uk

Abstract

This study used a novel computational pipeline to exploit draft bacterial genome sequences in order to predict, automatically and rapidly, PCR primer sets for Dickeya spp. that were unbiased in terms of diagnostic gene choice. This pipeline was applied to 16 draft and four complete Dickeya genome sequences to generate >700 primer sets predicted to discriminate between Dickeya at the species level. Predicted diagnostic primer sets for both D. dianthicola (DIA-A and DIA-B) and ‘D. solani’ (SOL-C and SOL-D) were validated against a panel of 70 Dickeya reference strains, representative of the known diversity of this genus, to confirm primer specificity. The classification of the four previously sequenced strains was re-examined and evidence of possible misclassification of three of these strains is presented.

Introduction

Dickeya spp. are plant pathogenic bacteria that cause major economic losses to ornamental and crop plants worldwide (Laurila et al., 2010; Ngadze et al., 2010; Stead et al., 2010; Toth et al., 2011). Currently, six species of Dickeya are defined, resulting from the recent reclassification of Erwinia chrysanthemi: D. chrysanthemi, D. paradisiaca, D. dadantii, D. dianthicola, D. dieffenbachiae and D. zeae (Samson et al., 2005). Real-time quantitative PCR (qPCR) followed by sequencing of the recA and dnaX‘housekeeping’ genes has routinely been used for identification of these species (Parkinson et al., 2009; Sławiak et al., 2009; Stead et al., 2010) and has also led to the identification of new groups within the Dickeya genus, some of which may constitute new species (Parkinson et al., 2009; Sławiak et al., 2009; Laurila et al., 2010). A potentially new Dickeya species with the proposed name ‘D. solani’ has emerged as a significant threat to potato production in Europe, and in some countries it has overtaken the more established bacteria Pectobacterium atrosepticum and D. dianthicola in terms of losses to disease (Toth et al., 2011). The ability to identify Dickeya species reliably, and so monitor their presence and distribution, will be critical to controlling these pathogens and their impact.

Molecular diagnostic techniques such as qPCR and microarray technologies have extended and improved diagnostic capabilities for the identification of microorganisms, including phytopathogens (Mumford et al., 2006; Kim et al., 2008; Pelludat et al., 2009; López et al., 2010). qPCR offers the possibility of rapid, quantifiable and sensitive diagnostics on infected plant tissues. This technique is dependent on the availability of primer sets that can be used to amplify sequences that distinguish between target organisms. The design of discriminatory primer sets has often focused on a gene or set of genes common to the target organism or organisms that the experimenter aims to detect. Such targets include intergenic transcribed spacer (ITS) regions (Grisham et al., 2007), 16S–23S rDNA (Schaad et al., 2002), plasmid DNA (Salm & Geider, 2004), ‘housekeeping’ genes such as tox–argK (Schaad et al., 2007), gyrB (Weller et al., 2007) and lrp (Cubero & Graham, 2005), and virulence genes such as pthA (Mavrodieva et al., 2004) and hrpF (Berg et al., 2006).

Diagnostic amplification of DNA from members of the Dickeya genus by conventional PCR has been based on the sequence of the pel gene (Nassar et al., 1996) and, for qPCR, on a deletion in the 16S–23S rDNA ITS region (J. G. Elphinstone & N. M. Parkinson, unpublished data). Other useful primers with specificity to all economically important pectolytic Dickeya and related Pectobacterium species are based on a conserved region of the 16S rRNA gene (Toth & Hyman, 1999). Primers with specificity to P. atrosepticum, based on a unique sequence identified by subtractive sequence hybridization (De Boer & Ward, 1995), enable differentiation of this bacterium from Dickeya spp., even where disease progression is symptomatically similar. However, there are as yet no diagnostic primers available to discriminate amongst Dickeya species. There is evidence that strains of ‘D. solani’ and other Dickeya spp. infect a range of host plants, making it difficult to determine disease epidemiology and distribution. There is also considerable variability in aggressiveness amongst Dickeya spp. (Toth et al., 2011). Thus, there is an urgent need for rapid, accurate detection and identification of these pathogens to identify outbreaks, to monitor distribution and spread, to facilitate greater understanding of their biology, and to enable the development and application of appropriate control measures. This requires the design of improved diagnostic primers that are able to discriminate between species of Dickeya.

A number of software packages and primer design strategies have previously been proposed to design PCR primers that identify particular bacteria or groups of bacteria (e.g. Listeria monocytogenes, Zhang et al., 2010; Salmonella enterica serovar Typhimurium, Kim et al., 2006; and Vibrio parahaemolyticus, Zhu et al., 2009). Typically, these primer design packages are applied to gene sequences, rather than whole genomes. However, the increased availability and falling cost of high-throughput bacterial genome sequencing has made the use of whole genome sequences more feasible. The objectives of this study were to (i) use a bioinformatic pipeline to identify diagnostic primer sets that distinguish amongst Dickeya strains and species, paying particular attention to discrimination between ‘D. solani’ and D. dianthicola isolates, because of their current economic importance on potato in Europe (Toth et al., 2011); and (ii) validate the predicted primer sets against a panel of Dickeya isolates from the UK’s National Collection of Plant Pathogenic Bacteria (NCPPB), chosen to be representative of the known diversity of this genus.

Materials and methods

Genome sequences

The sequenced genomes of D. dadantii Ech586 (NC_013592), D. dadantii Ech703 (NC_012880) and D. zeae 1591 (NC_012912) were downloaded from the GenBank repository at ftp://ftp.ncbi.nih.gov/genomes in January 2010. The sequence of D. dadantii Ech3937 was downloaded from ASAP at the University of Wisconsin in January 2010 (http://asap.ahabs.wisc.edu/software/asap/).

Genomic DNA (gDNA) was extracted from 16 other strains of Dickeya spp. (Table 1) using the QIAGEN Genomic DNA Purification Procedure following the manufacturer’s instructions. The gDNA samples from each of these strains were sequenced using Roche/454 technology at the University of Liverpool, UK to produce single-read data at approximately 14× coverage. The 16 strains obtained by high-throughput sequencing were assembled using the newbler assembly package v. 2.3 (Roche Diagnostics Corporation (http://www.454.com/)) with default settings to obtain draft genome assemblies. All sequenced strains, their identifiers for this study and GenBank accession numbers are indicated in Table 1.

Table 1. Sequenced Dickeya strains, identifiers, sources and species classifications
OrganismAbbreviationSourceAccessionInitial classificationFinal classificationaNote
  1. aRevised classification scheme, following recA phylogenetic analysis.

Dickeya chrysanthemi Dch3533This study NCPPB 3533 chrysanthemi chrysanthemi  
Dickeya chrysanthemi bv. chrysanthemiDch402This study NCPPB 402 chrysanthemi chrysanthemi Type strain
Dickeya dadantii Ech586Dda586GenBank NC_013592 UW586 dadantii zeae  
Dickeya dadantii Ech703Dda703GenBank NC_012880 UW703 dadantii paradisiaca  
Dickeya dadantii Ech3937Dda3937ASAP ECH3937 (v6b)3937 dadantii dadantii  
DUC-2DdiMK7This studyMK7UnknownUnknown1 
Dickeya dianthicola Ddi453This study NCPPB 453 dianthicola dianthicola Type strain
Dickeya dianthicola DdiIPO980This study IPO980 dianthicola dianthicola  
Dickeya dianthicola Ddi3534This study NCPPB 3534 dianthicola dianthicola  
‘Dickeya solani’ DUC-1DsoIPO2222This study IPO2222 solani solani Proposed type strain
‘Dickeya solani’ DUC-1DsoMK16This studyMK16 solani solani  
Dickeya dieffenbachiae Ddf2976This study NCPPB 2976 dieffenbachiae dieffenbachiae Type strain
Dickeya paradisiaca Dpa2511This study NCPPB 2511 paradisiaca paradisiaca Type strain
Dickeya zeae Dze1591GenBank NC_012912 UW1591 zeae chrysanthemi  
Dickeya zeae DzeP7246This study Fera RW192 zeae zeae  
Dickeya zeae Dze3531This study NCPPB 3531 zeae zeae  
Dickeya zeae Dze2538This study NCPPB 2538 zeae zeae Type strain
Dickeya zeae DzeMK19This studyMK19 zeae zeae  
Dickeya sp.DunkP7247This study Fera 7247 UnknownUnknown2 
‘Dickeya solani’ DunkMK10This studyMK10 solani solani  

Primer prediction pipeline

The primer design pipeline is described fully in Pritchard et al. (2012). A configuration file specifying the location of the input sequences of each isolate in FASTA format, and the species classification of that isolate, was provided to the primer design script (available for download at https://github.com/widdowquinn/find_differential_primers). A single pseudochromosome was compiled for the set of draft contigs belonging to each isolate, and coding sequence (CDS) prediction carried out on this using prodigal (Hyatt et al., 2010) where necessary. One thousand primer sets were predicted for each isolate pseudochromosome using ePrimer3 (Rice et al., 2000), retaining only primer sets that lay wholly within a predicted CDS, as these regions were expected to be conserved. Functional annotation of the CDS to which primers were designed was not carried out. primersearch (Rice et al., 2000) was used to identify cross-amplification of each set of primers against the pseudochromosome for each isolate. A filter for off-target amplification was implemented by blasting primer sets against sequenced Enterobacteriaceae (but not Dickeya) genomes. Primer sets were classified according to their ability to amplify all Dickeya input sequences or specific Dickeya species. The results were written to summary output files.

Thermodynamic and sequence constraints were applied during the prediction process to design sets of amplicons of length 50–150 bp. Primer sets contained flanking primer sequences only, or flanking primer sequences and an internal oligomer for postamplification hybridization. The specifications for the primer flanking regions and internal probes are indicated in Table 2.

Table 2. Design parameters for flanking regions and internal probes of each Dickeya primer set generated using ePrimer3
PropertyAmplification primersHybridization probe
Length20 bp (optimal)13–30 bp (range)
Tm 58–60°C (range), 59°C (optimal)68–70°C (range)
GC, %30–8030–80
NotesNo more than two G+C in last five nt at 3′ end
Avoid runs of identical nucleotides
Fewer than four consecutive G bases
Should not amplifyany other enterobacteria
No G at 5′ endAvoid runs of identical nucleotides
Fewer than four consecutive G bases
Avoid six or more consecutive A bases
Avoid G at 3′ end
Avoid two or more CC dinucleotides in middle of probe
Avoid G at position 2 at 5′ end

Normalized count of cross-amplifying primer sets

The count of primer sets capable of cross-amplification was obtained for each combination of source and target organism, and used to generate a normalized count as an estimate of sequence distance for the comparison as dq,t = 1 – (cq + ct)/(nq + nt), where cq and ct are the counts of cross-amplifying primers from the query (q) and target (t) genomes respectively, and nq and nt are the corresponding total counts of primers for each sequence. The values of dq,t form a difference matrix that can be used to construct a cladogram of relationships between the sequenced genomes. The neighbor package from the phylip software suite (Felsenstein, 1989) was used to produce a rooted cladogram by UPGMA on the basis of this matrix for the 20 Dickeya genomes (Fig. 1).

Figure 1.

 Cladogram of 20 Dickeya isolates constructed by UPGMA from a difference matrix of normalized cross-amplifying primer set count. Clades are coloured and named according to the presence of a type strain. All clades that contain a type strain are distinct at a branch length of 0·14. For organism nomenclature see Table 1.

Phylogenetic reconstruction

A putative recA gene from each of the draft genome sequences was identified as the best match to the RecA protein sequence from the related organism P. atrosepticum (GenBank accession: NC_004547), using tblastn (blast 2.2.22+, Camacho et al., 2009). The resulting recA sequences, and the P. atrosepticum nucleotide sequence, were aligned using t_coffee with default settings (Notredame et al., 2000).

Phylogenetic reconstructions were carried out on this alignment using the topali package (Milne et al., 2009) with bootstrapped neighbour-joining and maximum likelihood methods. Neighbour-joining was performed with an F84+G model (ts/tv = 4·58, α = 0·15), and maximum likelihood reconstruction used RAxML (Stamatakis, 2006) with the GTR mixed model, free parameters being estimated by RAxML. One hundred bootstrap trees were generated for both methods.

Primer set evaluation

Specificity of real-time qPCR assays were determined using 70 Dickeya isolates (Table S1), selected as phylogenetically representative on the basis of recA sequence diversity (Parkinson et al., 2009). Reference strains of related genera were also included. Bacteria were cryoprotected at −80°C (Protect Bacterial Preservers®, Technical Service Consultants Ltd) and cultured in nutrient broth.

DNA extraction and purification

Cultured cells were resuspended in 1 mL water to optical density 0·1λ = 650 nm, pelleted by centrifugation at 9000 g for 5 min, resuspended in 300 μL 6% Chelex 100, heated at 56°C for 20 min, boiled at 100°C for 8 min and chilled on ice. Purified DNA in aqueous supernatant solution was removed after centrifugation at 20 000 g for 5 min.

Primer selection

A preliminary evaluation of 15 randomly selected primer sets from a total of 276 predicted specific assays was initially used to identify candidates for real-time PCR. Primers that demonstrated the required specificity and which generated a single amplicon in conventional PCR from template DNA purified from a small panel of 13 representative Dickeya strains were carried forward from this set (Table S2).

Real-time qPCR evaluation

Two primer/probe sets (DIA-C and DIA-D) with apparent D. dianthicola specificity were selected for further evaluation in real-time qPCR. A further two primer/probe sets (SOL-A and SOL-C) were similarly selected with apparent ‘D. solani’ specificity. A further single primer/probe set (DIC-D) was selected with apparent universal specificity for detection of all Dickeya species. These assays were further compared with the following three previously identified qPCR primer/probe sets: (i) ECH with universal specificity to the Dickeya genus, based on a deletion in the 16S–23S rDNA intergenic transcribed spacer (ITS) region (J. G. Elphinstone & N. M. Parkinson, unpublished); (ii) PEC with specificity to all economically important pectolytic Dickeya and Pectobacterium species, based on a conserved region of the 16S rRNA gene (Toth & Hyman, 1999); and (iii) ECA with species specificity to P. atrosepticum, based on sequence identified by subtractive hybridization (De Boer & Ward, 1995). Further comparisons were made in conventional PCR with the commonly used ADE primer set based on the Dickeya genus-specific pel gene sequence (Nassar et al., 1996).

Real-time qPCR conditions

Real-time qPCR assays were performed using TaqMan® (Applied Biosystems) technology using an ABI7500 real-time PCR system. Standard TaqMan® cycling conditions were: 30 s at 48°C, 10 min at 95°C, followed by 40 cycles of 15 s at 95°C and 1 min at 55°C. Reaction volumes of 25 μL 1× TaqMan® buffer contained, in final concentrations, 300 nm primers, 100 nm probe, 0·1% MgCl2, 200 μm dNTPs, 0·63 U TaqGold (Roche) and 1 μL (4%) prepared DNA template. Primer sequences are shown in Table 3. Probes were covalently labelled at the 5′-terminal nucleotide with the FAM (6-carboxyfluorescein) reporter dye and at the 3′-terminal nucleotide with the TAMRA (tetra-methylcarboxyrhodamine) quencher dye.

Table 3. Dickeya primers and probes used in validation studies
AssayExpected specificityForward primerReverse primerProbe (where used)
  1. Primer and probe sequences predicted by the pipeline (DIC-(A-E), SOL-(A-E), DIA-(A-E)), and in current use for Dickeya diagnostics (PEC, ECH, ECA, ADE). Predicted primers in the table were selected randomly from the complete set of predicted primers with the named specificity.

DIC-A Dickeya GAATAGGTGCCTTTGCCATTGGCATTCTGGGTGGTAAGTT 
DIC-B Dickeya TTCGTAGCATGACCGCTTACGGTTTCCAGATAACGCTGGT 
DIC-C Dickeya CTGTGCCAGATTACGCACTTCAATATCCTGGCACTGAACG 
DIC-D Dickeya GAATAGGTGCCTTTGCCATTCGGTCAGCGTACCAAGACTAGTTTGCGCGGACCCTTACGG
DIC-E Dickeya TTGCCAACCGTTTGTACACTTATCACCACCGCGTACATTT 
SOL-A ‘Dickeya solani’ TCGGGGATGAACTCTTACTGCAACCAATGTTTGAGGATCG 
SOL-B ‘Dickeya solani’ GCTTAAGGCAATTCCACACACTGCTGATAGGTTGCAGGAA 
SOL-C ‘Dickeya solani’ GCCTACACCATCAGGGCTATACACTACAGCGCGCATAAACCCAGGCCGTGCTCGAAATCC
SOL-D ‘Dickeya solani’ GCCTACACCATCAGGGCTATCACTACAGCGCGCATAACTCCAGGCCGTGCTCGAAATCC
SOL-E ‘Dickeya solani’ GCCTACACCATCAGGGCTATACTACAGCGCGCATAAACTG 
DIA-A Dickeya dianthicola GGCCGCCTGAATACTACATTTGGTATCTCTACGCCCATCAATTAACGGCGTCAACCCGGC
DIA-B Dickeya dianthicola CCTACCGAATCCAGACGAATTGGACAAGATTGCTGGGATA 
DIA-C Dickeya dianthicola CCAACGATTAGTCGGATCTTAGTTGGTGCCAGGTTGGTATCGACGTATGGGACGGTCGC
DIA-D Dickeya dianthicola TCCAGTTTGGCACAATGAATATTTCCGTTGGCAACAATA 
DIA-E Dickeya dianthicola CGTCAGCAGTAGCAGGACATCAAGCCACTTCGTCATCAGT 
PEC Dickeya and PectobacteriumGTGCAAGCGTTAATCGGAATGCTCTACAAGACTCTAGCCTGTCAGTTTTCTGGGCGTAAAGCGCACGCA
ECH Dickeya GAGTCAAAAGCGTCTTGCGAACCCTGTTACCGCCGTGAACTGACAAGTGATGTCCCCTTCGTCTAGAGG
ECA Pectobacterium atrosepticum CGGCATCATAAAAACACGCCCCTGTGTAATATCCGAAAGGTGGACATTCAGGCTGATATTCCCCCTGCC
ADE Dickeya GATCAGAAAGCCCGCAGCCAGATCTGTGGCCGATCAGGATGGTTTTGTCGTGC 

Results

Reclassification of isolates

A bioinformatic pipeline was constructed that implements the primer design strategy described in Materials and methods. Using this pipeline, whole and draft Dickeya genome sequences were systematically analysed to generate, for each genome, a large data set of thermodynamically plausible primer sets with defined criteria. Four complete and 16 draft genome assemblies of Dickeya spp. were used as input to the pipeline, with species classification as indicated in the Initial classification column of Table 1. This resulted in prediction of very few species-specific primer sets and extensive, systematic predicted cross-amplification between genomes that were attributed to different species. The extent of predicted cross-amplification was unexpected and, in order to investigate whether this was the result of an incorrect initial classification of the bacteria, the genomes were clustered on the basis of a normalized count of cross-amplifying primer sets, as described in Materials and methods, as a proxy for overall genomic similarity.

The resulting cladogram (Fig. 1) showed that clades containing type strains are distinct from each other at a cumulative branch length of approximately 0·14 (it should be noted that the sequence for D. dadantii type strain NCPPB 898 was not available at the time of the study). Each of the draft sequences with initial classifications determined by recA analysis (Parkinson et al., 2009) clustered with the expected type strain or otherwise formed a coherent grouping. The unclassified strains DunkP7247 and ‘D. dianthicola’ DdiMK7 (unidentified clade in recA phylogeny) failed to cluster with any of the type strains, as expected. However, three of the four completely sequenced genomes did not cluster as expected. The completely sequenced D. dadantii isolates Dda3937, Dda703 and Dda586 clustered with the D. dieffenbachiae, D. paradisiaca and D. zeae type strains, respectively, while the completely sequenced D. zeae Dze1591 clustered with the D. chrysanthemi type strain.

To investigate further the species classifications for each genome, a phylogeny of the 20 Dickeya isolates was estimated using maximum likelihood on the basis of their recA sequence (Fig. 2). A putative recA gene from each draft sequence was identified as the best tblastn match to the RecA protein sequence from the related organism P. atrosepticum (GenBank accession NC_004547). For the four completely sequenced Dickeya genomes the recA gene from the published annotation was used. The recA sequences of Dda703 and the D. paradisiaca type strain were found to be identical, as were those of Dze1591 and the D. chrysanthemi strain NCPPB 3533 (Dch3533), and Dda3937 and the D. dieffenbachiae type strain. Two other sets of recA sequences were also found to be identical: the three putative ‘D. solani’ strains (DsoMK16, DsoIPO2222, DunkMK10) and three D. dianthicola strains (DdiIPO980, Ddi453, Ddi3534). The recA sequence of Dda586 clustered with the D. zeae strains.

Figure 2.

 Maximum likelihood phylogenetic reconstruction on the recA gene from 20 Dickeya isolates, with Pectobacterium atrosepticum as an out-group. Clades are coloured and named according to the presence of a type strain, as in Figure 1. The phylogram recapitulates the topology of the cladogram constructed from cross-amplifying primer count.

The topologies of phylogenetic reconstructions produced by neighbour-joining and maximum likelihood approaches recapitulated each other and the cladogram produced from cross-hybridization data drawn from the whole genome of each isolate (Figs 1 and 2, Figs S1 and S2), with the exception of placing the divergence of DunkP7247 (unassigned species) either immediately before or after the D. paradisiaca clade, and the divergence of the ‘D. solani’ clade. The position of the ‘D. solani’ clade divergence also affected the location of the branch to the unassigned strain DdiMK7 (Fig. 2). All reconstructions, bootstrap scores, and relative branch lengths support reclassification of three of the four previously sequenced Dickeya genomes, as shown in the Final classification column of Table 1. The single exception is D. dadantii 3937 (Dda3937), whose position in this topology is consistent with a previously published recA topology (Parkinson et al., 2009).

Primer set prediction

The four complete and 16 draft genome assemblies of Dickeya spp. were then used as input to the primer prediction pipeline, with species classifications as indicated in the Final classification column of Table 1. Four runs of primer design were performed to generate primers with and without a hybridization oligo, and either screening or not against a blast database comprising all chromosome and plasmid sequences from all Enterobacteriaceae in the January 2010 GenBank database at ftp://ftp.ncbi.nih.gov/genomes/bacteria. All other pipeline settings were left at defaults, except for the number of primer sets to return, which was set to 1000. Table 4 indicates the counts of designed primers that were able to discriminate at the strain, species and genus levels under each of the four input settings.

Table 4. Counts of predicted diagnostic primer sets for Dickeya spp.
IdentifierClassificationAmplification onlyWith hybridization oligo
Strain-specificSpecies-specific‘Universal’Strain-specificSpecies-specific‘Universal’
  1. The number of predicted diagnostic primer sets specific to strain, to species, or universal to the Dickeya genus, when the pipeline is run to produce primers with and without a postamplification hybridization oligo. Numbers in parentheses are the counts of predicted primer sets when blastn-screened against sequenced members of the Enterobacteriaceae outside the Dickeya genus to eliminate potentially cross-amplifying primers.

Dch3533 chrysanthemi 0 (0)136 (76)9 (0)0 (0)139 (10)9 (0)
Dch402 chrysanthemi 65 (52)120 (69)12 (0)57 (10)118 (14)10 (0)
Dze1591 chrysanthemi 0 (0)116 (69)8 (0)0 (0)117 (9)8 (0)
DzeP7246 zeae 24 (16)128 (74)5 (0)21 (1)129 (13)5 (0)
Dze3531 zeae 17 (10)111 (63)5 (0)16 (0)117 (10)6 (0)
Dze2538 zeae 13 (8)114 (61)9 (0)12 (0)113 (8)9 (0)
DzeMK19 zeae 21 (17)113 (63)5 (0)22 (3)112 (9)6 (0)
Dda586 zeae 36 (22)127 (66)5 (0)30 (4)131 (7)5 (0)
Dda703 paradisiaca 0 (0)342 (183)10 (0)0 (0)347 (39)11 (0)
Dpa2511 paradisiaca 4 (3)342 (186)10 (0)4 (0)346 (34)12 (0)
Dda3937 dadantii 41 (24)39 (26)12 (0)39 (4)38 (7)11 (0)
Ddf2976 dadantii 51 (26)26 (18)16 (0)46 (5)26 (5)15 (0)
Ddi453 dianthicola 10 (9)75 (59)7 (0)10 (2)75 (9)8 (0)
DdiIPO980 dianthicola 24 (16)87 (74)6 (0)23 (1)82 (9)6 (0)
Ddi3534 dianthicola 23 (10)79 (67)8 (0)22 (4)76 (12)9 (0)
DsoIPO2222 solani 0 (0)57 (36)10 (0)0 (0)56 (9)8 (0)
DsoMK16 solani 0 (0)55 (41)13 (0)0 (0)52 (11)11 (0)
DunkMK10 solani 0 (0)57 (40)15 (0)0 (0)54 (6)13 (0)
DdiMK7Unknown1117 (83)117 (83)12 (1)112 (19)112 (19)12 (0)
DunkP7247Unknown2312 (162)312 (162)12 (1)310 (36)310 (36)12 (0)

Table 4 also indicates that the sampling process produced approximately the same number of predicted diagnostic primer sets (and therefore approximately the same number of cross-hybridizing primer sets) for each strain, whether or not a hybridization oligo was included, although the primer sequence sets differed in each case. This supports the earlier use of a normalized cross-hybridization count as a proxy for overall genomic sequence similarity.

The proportion of thermodynamically plausible primer sets at a given level of specificity varied for each strain, e.g. from approximately 2·5% in Ddf2976 to 30% in DunkP7247, for species-specific primers. Species-specific primers predicted to amplify all strains in a species classification, but not any strains from other species classifications, were found in all analyses. The number of predicted species-specific primer sets for a strain varied widely from five to approximately 350, depending on pipeline settings and strain.

Primer set validation

The predicted specificity of species-specific primer sets was tested using qPCR assays against 70 Dickeya isolates that were representative of genus diversity on the basis of recA sequencing (Table 5 and Table S1; Parkinson et al., 2009).

Table 5. Specificity of real-time qPCR assays predicted by the pipeline compared with existing qPCR and conventional PCR assays
Test speciesIsolates testedNumber of isolates detected, by assay
DICDIA-ADIA-CSOL-CSOL-DPECECHECAADE
  1. Validation of five predicted diagnostic primer sets against 70 bacterial isolates from the NCPPB using conventional and real-time qPCR. Positive results (critical threshold Ct <23) are shown in bold. Predicted primer sets DIA-A and DIA-C are confirmed to be specific to isolates of D. dianthicola. Predicted primer sets SOL-C and SOL-D are confirmed to be specific to isolates of ‘D. solani’, with the exception of one of 11 isolates (NCPPB 3065) predicted to be D. dadantii by recA sequencing. The predicted ‘universal’ primer set DIC is confirmed to be universal, but not specific, to all Dickeya isolates tested. The specificities of existing primer sets PEC, ECH, ECA and ADE are also confirmed.

D. dianthicola 7 7 7 7 00 7 7 0 7
D. solani (DUC-1)16 16 00 16 16 16 16 0 16
DUC-25 5 0000 5 5 0 5
DUC-31 1 0000 1 1 0 1
D. dadantii 11 11 00 1 1 11 11 0 11
D. dieffenbachiae 6 6 0000 6 6 0 6
D. chrysanthemi bv. chrysanthemi7 7 0000 7 7 0 7
D. chrysanthemi bv. parthenii3 3 0000 3 3 0 3
D. paradisiaca 1 1 0000 1 1 0 1
D. zeae 11 11 0000 11 11 0 11
New Dickeya species level clade (I)1 1 0000 1 1 0 1
New Dickeya species level clade (II)1 1 0000 1 1 0 1
Pectobacterium atrosepticum 1 1 0000 1 0 1 0
P. carotovorum subsp. carotovorum1 1 0000 1 000
P. betavasculorum 1 1 0000 1 000
P. carotovorum subsp. odoriferum1 1 0000 1 000
P. wasabiei 1 1 0000 1 000
Pantoea agglomerans 1 1 00000000
Brenneria quercina 1 1 00000000
Erwinia amylovora 1 1 00000000

To investigate whether primer sets could discriminate between D. dianthicola and ‘D. solani’, 15 primer sets were selected randomly for testing on the basis of whether they were predicted to be specific for D. dianthicola (five primer sets, DIA-A to DIA-E), for ‘D. solani’ (five primer sets, SOL-A to SOL-E) or common (or ‘universal’) to all Dickeya spp. (five primer sets, DIC-A to DIC-E). These 15 primer sets each generated a single amplicon in conventional PCR from template DNA that was purified from a small panel of 13 representative Dickeya strains (Table 3 and Table S2). All but one of the primer sets amplified DNA from the target bacteria in conventional PCR with annealing temperatures of 55°C, the exception being one primer set (DIA-D) with predicted specificity to D. dianthicola. The predicted absolute specificity of the primer sets was confirmed for two out of five primer sets for D. dianthicola (DIA-A and DIA-C) and for two out of five primer sets for ‘D. solani’ (SOL-C and SOL-D). All five primer sets predicted to be common to Dickeya spp. amplified all 13 Dickeya reference strains. Five primer sets in total – the two primer sets each with confirmed specificity to D. dianthicola and ‘D. solani’ and the ‘universal’ primer set DIC-D – were selected for more extensive evaluation in qPCR assays.

Specificity of primer/probe sets in qPCR

Five qPCR assays using the DIC-D, DIA-A, DIA-C, SOL-C and SOL-D primer sets were tested against 70 Dickeya strains and were found to be highly sensitive and efficient, detecting as few as 7·9 (±6·8) viable target cells in aqueous suspensions (equivalent to 0·054 (±0·006) pg purified target DNA per reaction). These assays were compared with the three qPCR primer/probe sets in common use: ECH, which is universal to the Dickeya genus, based on a deletion in the 16S–23S rDNA intergenic transcribed spacer (ITS) region (J. G. Elphinstone & N. M. Parkinson, unpublished data); PEC, which is specific to all known economically important pectolytic Dickeya and Pectobacterium species, based on a conserved region of the 16S rRNA gene (Toth & Hyman, 1999); and ECA, which is specific to P. atrosepticum, based on sequence identified by subtractive hybridization (De Boer & Ward, 1995). Further comparisons were made in conventional PCR with the commonly used ADE primer set based on the Dickeya genus-specific pel gene sequence (Nassar et al., 1996). The amplification results are given in Table 5. Mean critical threshold (Ct) values for positive qPCR tests were all <23 (Table S3).

Table 5 indicates that, as predicted, assays DIA-A and DIA-C amplified DNA from all tested isolates of D. dianthicola but not from any other of the Dickeya taxa or related genera. Similarly, assays SOL-C and SOL-D showed specificity to all tested isolates of ‘D. solani’, with the single exception of one (unsequenced) false positive isolate (NCPPB 3065), which was identified as D. dadantii according to a recA sequence phylogeny (Parkinson et al., 2009). Using the DIC-D assay, DNA was amplified from all Dickeya isolates included in the study, as predicted. However, this primer set was neither predicted nor found to be specific to Dickeya, and it also amplified DNA from other enterobacterial strains, including those from the genera Brenneria, Erwinia, Pantoea and Pectobacterium. The specificity of the ECH assay to Dickeya was confirmed, as was the specificity of the ADE primers to all species of the Dickeya genus, with the single exception of the reference strain of D. paradisiaca (NCPPB 2511). The specificities of the ECA primers to P. atrosepticum, and of the PEC assay to economically important pectolytic soft rot bacteria (Dickeya and Pectobacterium spp.) were also confirmed.

Discussion

The primer prediction pipeline developed for this study can take as input incomplete and unordered draft bacterial genome assemblies, making it particularly useful for exploiting the output of next-generation sequencing of bacteria for diagnostic testing. Manual primer design is a time-consuming and laborious job, and the availability of a computational pipeline that predicts diagnostic primers and checks for potential cross-hybridization will be a useful aid to molecular epidemiological studies. The accuracy of the pipeline predictions was confirmed by generating diagnostic primer sequences that discriminate between species of the bacterial genus Dickeya, and verifying these primer sequences for PCR-based diagnostic identification of the economically important potato pathogens D. dianthicola and ‘D. solani’.

In order to identify primer sets that are representative of a subgroup of genomes, and not biased by a single genome, a representative set of sequences should be available for that group and, where possible, with more than one example. This was achieved in the current study for both D. dianthicola and ‘D. solani’. Generation of many draft genomes by next-generation sequencing remained sufficiently costly at the time of study to place an economic limit on the number of examples that could be produced. Some over-prediction of diagnostic primers and their performance was therefore to be expected.

Classification of Dickeya spp. is typically carried out using a polyphasic approach, which includes fatty acid profiling, repetitive sequence PCR and phylogenetic identification based on several partial gene sequences, but primarily on the basis of the recA and dnaX gene sequences (Sławiak et al., 2009; Stead et al., 2010). Sequences were initially clustered by the count of predicted cross-amplifying primer sets for each strain as a proxy for whole-genome similarity, followed by phylogenetic reconstruction based on the recA gene sequences of each of the 20 Dickeya spp. used here. The results suggested that the sequenced strains previously identified as D. dadantii Dda703 (GenBank: NC_012880) and Dda586 (GenBank: NC_013592) may in fact be strains of D. paradisiaca and D. zeae, respectively (Table 1, Fig. 2). Furthermore, the sequenced strain previously identified as D. zeae Dze1591 (GenBank: NC_012912) may be a strain of D. chrysanthemi. More broadly, it may be appropriate, in the context of the rapidly increasing numbers of bacterial genome sequences being generated, to assign annotation quality or evidence codes to taxonomic classification in sequence databases in a similar fashion to those used and proposed for protein functional annotation (Schnoes et al., 2009). In addition, since the present study was performed, D. dieffenbachiae has been reclassified as a subspecies of D. dadantii (Brady et al., 2012), supporting the decision here to place isolates Ddf2976 and Dda3937 into the same classification (Table 4).

The proportion of predicted diagnostic primer sets varied, as would be expected, with the number of organisms sharing the same classification in the input sequence set, and the degree of divergence of those organisms, as indicated in Figure 2. For example, DunkP7247 and DdiMK7 are the only representatives of their species classification (unknown1 and unknown2), so in each case their strain- and species-specific primer sets are identical and large. In contrast, D. paradisiaca isolates Dda703 and Dpa2511 have the greatest number of predicted species-specific primer sets and, in the recA analysis, the longest branch length since the common ancestor with the remaining Dickeya spp. (other than DunkP7247; Fig. 2). Their recA genes are identical, however, and the number of strain-specific primer sets is correspondingly very small.

In all four output data sets, the number of predicted species-specific primer sets was greater than the number of strain-specific or genus-specific primer sets. This indicates a high degree of diagnostically useful sequence similarity within the species groupings used in this study, and supports the proposed reclassification of three of the completely sequenced genomes.

Discarding those predicted primers and oligos that made blast matches against sequenced members of the Enterobacteriaceae was seen to reduce the number of predicted diagnostic primer sets significantly in all cases, and only one set of ‘universal’ primers was retained after blast screening. However, this is to be expected as enterobacterial sequences were not used to constrain the initial predictions which, where they are common to Dickeya, are also likely to be common to related genera. In the absence of blast screening against Enterobacteriaceae, primers that uniquely amplify the source sequence (within the set of input sequences) were predicted for all but six strains. These six isolates without predicted strain-specific primers exhibit a very high degree of sequence similarity to at least one other strain in the input sequences. Specifically, Dze1591 and Dch3533 have identical recA genes; DsoIPO2222, DsoMK16 and DunkMK10 are three putative ‘D. solani’ strains that have identical recA sequences; and Dda703, which was reclassified in this study as D. paradisiaca, shares an identical recA gene with Dpa2511. The observed similarity of recA sequences is thus recapitulated by the primer design results.

The effect on classifier performance of having a small number of sequence representatives for a subgroup or species can be seen in Table 4. The sequenced Dda703 and Dpa2511 isolates appear to share a recent common ancestor. As a result, it should be expected that the majority of the primers predicted to distinguish D. paradisiaca from other Dickeya spp. should be common to both strains. This was seen in practice as the pipeline predicted the greatest number of species-specific primer sets (more than 300), but no strain-specific primer sets, for these isolates. Similarly, the ‘D. solani’ isolates are also highly similar and have identical recA sequences. Correspondingly, no strain-specific primer sets were predicted for this species by the pipeline.

The ‘universal’Dickeya genus-specific primers (DIA-(ABCDE)) tested in the validation process did not pass the Enterobacteriaceae blast screen, and so were not expected to differentiate members of the Dickeya genus from other genera. In future work, Dickeya genus-specific primers will be designed for diagnostic use by incorporating the sequenced genomes of Enterobacteriaceae as outgroups in the primer design process. These primers do, however, demonstrate that the pipeline is able to identify universal primers within a set of test sequences, as well as those that meet the criteria of a specific species or group.

It was essential to test predicted diagnostic primers against known examples from the NCPPB, both to evaluate the ability of the primer prediction pipeline to generate diagnostic primers, and to establish which of the predicted primer sets would be useful in the field. Primers diagnostic for both D. dianthicola (DIA-A and DIA-C) and ‘D. solani’ (SOL-C and SOL-D), tested against 70 Dickeya strains, were found to be specific to the target species (with the exception of a single strain of D. dadantii, which was amplified using both sets of ‘D. solani’ primers; Table 5). The reason for this amplification result is currently being investigated, but the single false positive result from over 70 validation strains indicates that this pipeline is able to produce robust diagnostic primer pairs for D. dianthicola and ‘D. solani’.

Acknowledgements

We wish to thank members of the Centre for Genomics Research at the University of Liverpool for sequencing 16 of the 20 Dickeya strains used in the study. The study was funded by the Scottish Government’s Rural and Environment Science and Analytical Services Division (RESAS) (CR/2007/02) and the Agriculture and Horticulture Development Board (AHDB) through the Potato Council (R437).

Ancillary