Cloning of prokaryotic genes by a universal degenerate primer PCR


  • Editor: David Studholme

Correspondence: Liyan Ping, Department of Bioorganic Chemistry, Max Planck Institute for Chemical Ecology, Hans-Knoell-Street 8, Jena 07745, Germany. Tel.: +49 3641 57 1214; fax: +49 3641 57 1202; e-mail:


A PCR approach was developed using a hexameric degenerate primer, which reflects the Shine–Dalgarno sequence of prokaryotic transcripts, hitherto named SD-PCR. In standard PCR reactions, the sizes and melting temperatures of the two primers are usually designed to be as equal as possible, while SD-PCR uses a single long gene-specific primer pairing with a much-shorter universal degenerate primer. This approach can be used in PCR walking to clone either the upstream or the downstream region of a known sequence. We have successfully applied the method to template DNAs of different GC contents as well as complex mixtures composed of highly contaminating DNA(s).


Cloning and sequencing full-length prokaryotic genes from a known short gene fragment is generally a challenging task for molecular microbiologists. Because some bacterial proteins (Bibb & Buttner, 2003) or peptide antibiotics (Corvey et al., 2003) are posttranslationally processed, the sequenced peptides seldom cover the complete target molecule. Therefore, DNA fragments resulting from PCR with primers based on such peptide sequences will usually not cover the complete gene. The same holds true for DNA fragments amplified using primers based on conserved domains of related genes (Shen et al., 1993). In some cases, one can retrieve the complete sequence from genomic databases, but the genomes of the majority of prokaryotes might not become accessible in the near future due to their enormous diversity and the expenses required for comprehensive sequencing. The number of bacterial species currently described in the literature (c. 5000), represents only 0.5–1% of the estimated existing microbial diversity (Hacker et al., 2003). Screening of environmental samples constantly reveals novel genes from uncultivated species (Brady et al., 2004). However, cloning complete gene sequences using the traditional approach, namely Southern hybridization on subgenomic libraries, is very time consuming. In addition, it is not always feasible as it requires large amounts of pure DNA and detection sensitivity can be very low if the probes are short. PCR allows the amplification of fragments from <1 ng of DNA sample (Cheung & Nelson, 1996), even if the template DNA is a mixture from different organisms (Piel, 2002). Here, we propose a simple PCR strategy to clone the flanking sequences of an incomplete gene fragment, even from very diluted or complex genomic DNA mixtures.

The prokaryotic mRNAs of high-level-expression genes generally contain within their 5′-untranslated region a unique ribosome-binding site called Shine–Dalgarno (SD) sequence, usually located seven to nine nucleotides upstream of the ‘AUG’ codon (Gold et al., 1981). Because of the lack of posttranscriptional modification, the relative position of the SD sequence to the downstream gene on mRNA is always identical to that on genomic DNA (supporting Fig. S1). It should therefore be possible to amplify the 5′-end of prokaryotic genes by PCR directly from a genomic DNA template using a degenerate primer corresponding to the SD sequence paired with a gene-specific primer. As we discuss in the text, this PCR approach actually has much broader application than our initial expectation at the beginning of the development of the methodology.

Degenerate primers designed from protein sequences, or from conserved regions of related genes, have been widely used for amplification of both novel and common genes (Shen et al., 1993), including orthologous genes in a phylogenetically diverse population (Hamann et al., 1999). Although many strategic variations have been developed (Cheung & Nelson, 1996), to our knowledge, a practical approach to exploit the use of the SD consensus sequence in the design of a degenerate primer to clone a structural gene has not yet been reported.

Materials and methods

Bacterial strains and template DNAs

Microbacterium arborescens SE 14 (Ping et al., 2007), Providencia rettgeri, Ochrobactrum sp., and Bacillus pumilus (unpublished data) were isolates by our lab. Escherichia coli EPI100 was purchased from Epicentre (Madison). DNAs were extracted from E. coli EPI100 grown at 37 °C in Luria–Bertani medium and other bacterial species cultured at 37 °C in brain heart infusion broth (Sigma). To remove any RNA contaminations, the purified DNA solutions were treated with RNAse A (0.1 μg mL−1 of DNA solution) for 1 h at room temperature. The DNA was further purified by phenol/chloroform extraction. The pure DNA was loaded on a 0.8% agarose gel and purified with the GFX Purification Kit (Amersham Biosciences).

Severalfold dilutions of bacterial genomic DNAs in combination with prokaryotic and eukaryotic DNA were tested. Each of these bacterial and eukaryotic genomic DNA mixtures were tested independently. Equal amounts of five different bacterial genomic DNAs (E. coli EPI100, M. arborescens SE14, P. rettgeri, Ochrobactrum sp., and B. pumilus) were mixed. Eukaryotic DNA from the budding yeast (Saccharomyces cerevisiae), a plant species (Arabidopsis thaliana), and an insect (Spodoptera exigua) were mixed independently with decreasing amounts of bacterial genomic DNA (1 : 1, 50 : 1, 100 : 1, and 500 : 1). Approximately 10 ng of the target DNA was normally used in any given PCR amplification, regardless of how much ‘contaminating’ DNA was used.

PCR amplification and sequencing

Several different Taq polymerases were tested for their performance. We have used Platinum Polymerase (Hot-Start, Invitrogen), AccuPrime Polymerase (High Fidelity, Hot-Start Mix; Invitrogen), TripleMaster Polymerase (High Fidelity mix, Eppendorf), Qiagen Taq polymerase (Qiagen), and Taq polymerase [New England Biolabs (NEB)]. Reaction buffers and conditions varied according to the respective Taq enzymes. In general, the PCR reaction (50 μL) included 0.4 μM gene-specific primers (Table 1), 6.0 μM Sdu primer (the SD sequence targeting primer), 2.5 or 2 U Taq polymerase, 300 μM dNTP, and the buffer supplied by the manufacturer. MgCl2 (2.5 mM) was used if not included in the buffer. Template DNA was denatured at 94 °C for 3 min, followed by 35 amplification cycles at 94 °C for 45 s, 55 °C for 30 s, and 72 °C for 2 min, with a final elongation step at 72 °C for 5 min. With the Taq polymerase from NEB the same thermo-cycling conditions as before were used, except that the extension was performed at 75 °C for 2 min.

Table 1.   Oligonucleotide primers
PrimerSequence 5′→3′Tm* (°C)Target sequence
  • *

    Melting temperature (Tm) was calculated as in 50 mM NaCl.

Sdu(A/C) (A/T/G/C) GGAG0–8.1SD complimentary sequence-like
Sd-1I (A/C) (A/T/G/C) GGAG6.1–23.1SD complimentary sequence-like
Sd-2II (A/C) (A/T/G/C) GGAG17.6–32.6SD complimentary sequence-like
Sd-3III (A/C) (A/T/G/C) GGAG26.8–40.2SD complimentary sequence-like
Sd-4IIII (A/C) (A/T/G/C) GGAG34.4–46.4SD complimentary sequence-like
Asmr3GTTCCAGTGCGCCTGCTTGC62.3M. arborescens SE14 aah gene
Asmr4CGACCGGAGTGAGGAACTGC60.2M. arborescens SE14 aah gene

Custom primers were synthesized by Invitrogen, and amplifications were performed by a two-step PCR procedure. The first round PCR product was diluted 10-fold and 1 μL of the diluted solution was directly used as a template for the second amplification cycle. The amplified DNA was separated on an agarose gel, purified, and cloned into the pCR2.1-TOPO vector of the TOPO TA Cloning Kit (Invitrogen, the Netherlands) according to the manufacturer's instructions. Transformations were performed with E. coli TOP10 or ELECTROMAX DH5α-E cells (both Invitrogen).

Plasmid minipreparation was performed according to standard procedures. Clones were sequenced by GATC Biotech AG (Konstanz, Germany) or in our own sequencing facility with an ABI 3730xl Sequencer using BigDye chemistry (PE Applied Biosystems Inc., Weiterstadt, Germany). Sequences were analyzed with the Lasergene dnastar software package (DNASTAR Inc., Madison, WI), and edited manually. Genes were identified using blast services at the NCBI (


Our original idea was to exploit the SD sequence of prokaryotic genes to amplify the downstream sequence of structural genes, therefore designated SD-PCR. An alignment of SD sequences from different organisms (Gold et al., 1981; Dale et al., 1995; Sazuka & Ohara, 1996; Lin & Tseng, 1997) revealed that the internal GGAG-code was highly conserved within the SD sequences. Accordingly, this nucleotide sequence was selected as the 3′-end of the primer. Because prokaryotic mRNA is not posttranscriptionally spliced (Gold et al., 1981), such a primer would be able to anneal to the corresponding sequence on the transcription-template chain of the double-stranded genomic DNA. The 5′-end of the primer is highly degenerate. The position immediately preceding the GGAG sequence is fully degenerate, as all four deoxyribonucleotides were used. The position even further upstream is arbitrarily filled by either A or C, because these two bases occur most frequently at this position. The primer is henceforth designated as Sdu for Shine–Dalgarno sequence-like universal primer. The SD-like sequence is not simply restricted to the upstream region of structural genes, but distributed throughout the entire genome. Theoretically, such an Sdu-annealing site occurs every 512 base pairs (44× 2) in prokaryotic genomic DNA. Based on this consideration, in this work gene-specific primers were mostly randomly selected, half of them are oriented toward the SD site, half of them are opposite, although ipso facto a specific primer located at the first half of the target gene toward the SD site is more promising in application (see Discussion).

Because deoxyinosine can base pair with all four naturally occurring deoxyribonucleotides (Shen et al., 1993), the possibility of tailing the 5′-end of the primer sequence with deoxyinosines to enhance the annealing of the SD primer was tested (Table 1). These primers are collectively called SD primers for convenience thereafter. The amplification pattern of PCR depends on the gene-specific primer used. Although in some cases there is no difference between primers with different number of deoxyinosines (Fig. 1b), in general when the number of deoxyinosines increase, the more likely the smaller fragments were amplified (Fig. 1). As expected, increasing inosine(s) also increases the chance of random priming (Fig. 1 arrows). Because not all of the random priming products are junk, some are quite reproducible, such as those in Fig. 1b–d, they might be used in gene cloning also. It is worthwhile to point out that although SD-like sequences can be found throughout the whole genome, PCRs using single SD primers, even with four deoxyinosines at the 5′-end, did not produce any visible fragment (data not shown).

Figure 1.

 Some representative results using SD primers with different numbers of deoxyinosine at the 5′-ends. The first lanes on the left of each gel are the DNA ladder. The number of deoxyinosines is shown on the corresponding lanes. The gene-specific primers paired with SD primers are (a) Em01, (b) Ec02, (c) Et03, and (d) Et04. Bands produced by nonspecific priming are indicated by arrows.

For an unknown reason, when PCR was performed with certain enzymes, such as the recombinant Taq polymerase from NEB, only one or two clear bands are produced in each reaction. Which one of the potential amplificants is actually produced in significant amounts probably depends on the few PCR cycles at the beginning (Fig. 1). With the ‘hot-start’ enzymes, such as the Platinum Polymerase from Invitrogen, more mispriming bands were observed (Fig. 2) whereas with AccuPrime (Fig. 3c) or the TripleMaster enzyme blends (Fig. 3d) we generally obtained the results in between these extremes.

Figure 2.

 Amplification of DNA fragments with nested gene-specific primers. (a) SD-PCR results with the corresponding gene-specific primers Enf1 (lane 1), Enf2 (lane 2), and Enf3 (lane 3). These primers are designed to be nested and amplify the downstream sequence. Correctly amplified bands are indicated by white arrows. (b) Diagrammatic representation of the result in (a). The hollow arrows show the arrangement of nuoC, nuoE, and nuoF genes. The numbers indicate the location of the fragment on the Escherichia coli K12 genome. The gene-specific primers designed from nuoC are depicted by solid triangles. Three complementary sequences of primer Sdu and their locations in nuoE and nuoF are depicted as hollow triangles. DNA fragments detected are shown as solid lines with primers listed on the left. The width of the lines is corresponding to the abundance the PCR products. (c) The aah gene fragment amplified from Microbacterium arborescens SE14. The Sdu primer is oriented in the same direction as the transcription of the aah gene. The corresponding gene-specific primers are Asmr3 (lane 4) and Asmr4 (lane 5). The correct amplification products are depicted with arrows. The detailed analysis of this result is shown in supporting Fig. S2.

Figure 3.

 SD-PCR results using complex DNA mixtures as templates. Correct amplificants confirmed by sequencing are indicated by arrows. (a) The result after first amplification of the two-step PCR procedure; (b) PCR results after second amplification. Lanes from left to right are marker, 1 ng Escherichia coli DNA (lane 1), 1 ng E. coli DNA diluted with 1 ng yeast DNA (lane 2), 1 ng E. coli DNA with 50 ng yeast DNA (lane 3), and 1 ng yeast DNA (lane 4). The gene-specific primer is Eem619 (see Table 1). Note that the smear (spot on bottom) increases as the amount of the yeast DNA increases in (a), and it disappeared in (b). (c) Amplification of E. coli DNA fragments in the presence of admixed DNA templates from other organisms: a mixture of five different bacterial DNAs (lane 5) and E. coli DNA mixed with plant DNA (lanes 6 and 7). Ratios of plant DNA to bacterial DNA (w/w) are 100 : 1 (lane 6) and 500 : 1 (lane 7). The gene-specific primer used is Enf2. (d) The gene-specific primer Asmr3 was used to amplify Microbacterium arborescens gene fragments from mixed DNA templates: the mixture of M. arborescens DNA with plant DNA (lane 8) and the mixture of M. arborescens DNA with insect DNA (lane 9).

Because of the high degeneracy of the SD primers, the selectivity of the PCR reaction is strictly defined by the gene-specific primers. Although some primers with annealing temperatures around 55 °C work well (Figs 1a, b and 2a), a longer primer with higher melting temperature would increase the selectivity; therefore long primers with melting temperatures up to 65 °C were tested. They often work better (Fig. 1). We also found that even if extension was performed at 75 °C, the SD primers still work very well (data not shown). This condition could further increase the SD-PCR selectivity.

Nested primers are very useful to eliminate the mismatches produced in PCR. Three nested primers were designed based on the E. coli nuoC gene on the E. coli K12 complete genome (NCBI, NC_000913). These primers were designed with the same orientation as the nuoC gene, thereby amplifying toward the 3′-end of the gene, rather than the upstream region (Fig. 2a). The sequencing results indicate that the actual SD primer annealing sites are located in two other structural genes, nuoE and nuoF (Fig. 2b). Two other nested primers were designed from the genomic sequence of M. arborescens. The GC content of its DNA is 70%, much higher than that of E. coli DNA (50%). SD-PCR works well on this GC-rich template (Fig. 2c).

To figure out whether SD-PCR is also applicable to complex template mixtures, we generated a bacterial genomic mixture comprised of equal amounts of DNA from five different bacterial species, a fold-dilution series of E. coli and M. arborescens DNA in eukaryotic genomic DNA from budding yeast (S. cerevisiae), an insect species (S. exigua), and a plant species (A. thaliana). With these complex template mixtures, several alternative cycling conditions were tested to optimize the performance and specificity of the PCR reactions. Primary experiments indicate that adding deoxyinosines at the 5′-end of SD primers dramatically decreases their capability of amplifying specific products from complex DNA (data not shown), therefore the complex template reactions were performed with SD primers without deoxyinosine (Fig. 3a and b) or with one deoxyinosine (Fig. 3c and d).

The pure E. coli DNA was first mixed with the yeast DNA (Fig. 3a and b). When E. coli-specific primers were used in SD-PCR, there is no product amplified in the pure yeast DNA control (Fig. 3b). The reaction of same primer pairs generated two correct bands in the E. coli DNA control. Even with the presence of 50 times more ‘junk’ yeast DNA, the correct E. coli gene can still be amplified. The amount of the junk DNA within this range seems have no significant influence on the result of SD-PCR reaction, except that in the first round of reaction, more junk DNA generates more smear (Fig. 3a). After dilution and the second round of SD-PCR, the smear disappeared (Fig. 3b).

When a template comprised of equal amount of DNA from five different bacterial species, E. coli EPI100, M. arborescens, P. rettgeri, Ochrobactrum sp., and B. pumilus, were used, the E. coli-specific fragment was successfully amplified (Fig. 3c). The same holds for the templates composed of bacterial DNA and A. thaliana and S. exigua DNA. The correct fragment has been successfully amplified even when the bacterial DNA was diluted by 500 times more plant DNA. In all the reactions, the target sequence had all been correctly amplified and the sequencing results clearly indicate that the actual fragments obtained with the bacterial DNA mixture as well as with the eukaryotic genomic DNA-diluted bacterial DNA are identical to those obtained from pure bacterial DNA.


Because of its simplicity and sensitivity, this new SD-PCR technique is a significant improvement in comparison with traditional ways of amplifying unknown genomic fragments from prokaryotic DNA. Although the primer Sdu is quite short when compared with standard primers, the base pairing of this short DNA fragment and the complementary template is sufficiently stable to ensure efficient priming under the PCR conditions tested hereof. Extension temperatures of up to 75 °C did not have a significant influence on SD-PCR efficiency. In cases where the Sdu primer does not anneal at a functional SD sequence, for example during amplification toward 3′-ends, it just served as a degenerate helper primer. Based on this observation, using octameric primers with any combination of the nucleotides in the 44× 2 permutation should yield similar results. Interchange between G(s) and C(s) is preferred over other replacements when maintenance of the strong base pair, viz. the annealing capability is considered. As a practical compromise, the use of one or two deoxyinosine(s) at the 5′-end of SD primers can enhance hybridization of the primer; therefore, these primers are recommended when the PCR template is pure DNA. On the other hand, the presence of deoxyinosine on the primer significantly handicaps the amplification from complex templates; using SD primers without or with one deoxyinosine is preferred under these conditions.

This kind of degenerate priming PCR can theoretically amplify the flanking region of any known sequence by producing a series of fragments of different lengths. Using SD-PCR with step-up gene-specific primers, the DNA sequence can be extended without limitation. This approach could be used as a method for PCR walking, with the advantage of omitting the steps of digestion, linker-ligation (Yan et al., 2003), biotin-capture (Rosenthal et al., 1991), etc. We have also successfully applied this method on eukaryotic cDNAs (data not shown).

Although only one gene-specific primer was used in SD-PCR, this method is totally different from the low-stringency single-specific-primer PCR for gene expression fingerprinting, which needs only one primer (Carvalho et al., 2003). Although the Sdu primer is highly degenerate, the possibility of the presence of two oppositely oriented Sdu-binding sites in a 4-kb DNA stretch is only 0.38%. The DNA sequence produced by a PCR with a 2-min elongation is normally shorter than 4 kb (Carballeira et al., 1990). When we only use the SD primer, even with up to four deoxyinosines, no band was amplified from the E. coli genomic DNA.

The high degeneracy of Sdu significantly reduces its actual concentration. In our experiment, the concentration of Sdu is 15-fold higher than those of gene-specific primers. Furthermore, Sdu is complementary to the 3′-end of the 16s rRNA gene (Shine & Dalgarno, 1974). The success of SD-PCR depends on complete removal of RNA from the DNA sample. Therefore, we used both RNAse digestion and purification to remove trace amounts of contaminating RNA. On the other hand, the presence of certain amount of Sdu-binding sites on DNA template is tolerable under the condition tested, as has been demonstrated, using mixed bacterial DNA and bacterial DNA diluted with other DNAs (Fig. 3).

The SD sequence is highly conserved in prokaryotes. The alignment of 120 gene translation initiation sites from Synechocystis sp. PCC6803 revealed that half of them contain the SD core sequence within the preceding 25 nucleotides (Sazuka & Ohara, 1996), while among 116 genes from a Xanthomonas species, only 18 do not (Lin & Tseng, 1997). The SD complementary sequence on template DNA provides an ideal site for SD-PCR, especially for cloning highly expressed genes because the SD sequences are more conserved. Furthermore, SD sequences do not occur in eukaryotic genes. This renders somewhat more selectivity when the target DNA is contaminated by eukaryotic DNA. SD-PCR can also be applied to directly cloning chloroplast genes from the plant total DNA, because most of the chloroplast genes possess SD-like sequences in the 5′-untranslated regions (Hirose & Sugiura, 2004). Although considered as a bacterial descendent, no functional SD sequences have been found in plant, yeast, and mammalian mitochondria (Green-Willms et al., 1998; Koc & Spremulli, 2002; Hazle & Bonen, 2007). In addition, most plant mitochondrial genomes exist in complex physical forms, resulting in mixed subgenomic populations often leading to variable gene order and the acquisition of new regulatory signals. Therefore, this selectivity likely does not exist for cloning mitochondrial genes using SD-PCR.

Because of the high degeneracy of the SD primers, mispriming is unavoidable in PCR, especially when working with complex DNA templates. In our experiments, we always separated the PCR amplificants by electrophoresis, purified all the significant bands, and verified the correct amplicons by subsequent sequencing. Although we always (more than 70% of the cases) found the dominant band to be the expected fragment, in some cases the correct products are not significantly more than some of the background mismatching bands. Even when cloning and sequencing of all those minor bands must be carried out, comparing with other strategies, this method is still quite efficient both financially and with time, not to mention that this disadvantage can be dramatically remediated using nested gene-specific primers (Fig. 2).

The SD primers can be used to amplify fragments from template DNA with a broad range of GC content. Although this work is focused on E. coli DNA, this method has been successfully applied to amplify an unknown gene from M. arborescens SE14 genomic DNA, a GC-rich template (Ping et al., 2007). Furthermore, we have thoroughly tested the potential of using SD-PCR to clone prokaryotic genes from a DNA sample ‘contaminated’ with other DNAs (Fig. 3). With mixtures of bacterial DNAs as well as with dilutions of prokaryotic DNA by a large excess of eukaryotic genomic DNA, we have confirmed that SD-PCR can be used for PCR walking from problematic template mixtures. If the gene-specific primer locates at a common sequence, it is possible to amplify orthologous genes from a DNA mixture isolated from phylogenetically related bacterial species in an SD-PCR reaction. One may take advantage of this fact to profile the orthologous genes by subsequent sequencing. On the other hand, if only one copy of the genes is wanted, it would be possible to find out primers specific enough to discriminate orthologous genes, because under stringent conditions no specific primer can tolerate more than two mismatching base pairs. Of cause, a second round of nested primer PCR can solve the same problem.

Finally, it should be pointed out that the specificity of SD-PCR depends largely on the gene-specific primer. The designation of the specific primer is very critical on the success of amplification. Besides their uniqueness on genome(s), the gene-specific primer should not show any complementarity to the SD primers. We experienced some unstable amplification; in other words, correct products were amplified in one reaction but not in the other, or no amplification at all when working with bad primers. When working with complex DNA templates, multiple gene-specific primers should be tested.

The SD sequence has been exploited by other researchers for PCR fingerprinting. A 13-mer primer based on the SD sequence has been used for differential display on etiologic variants of Staphylococcus aureus (Cuny & Witte, 1996; Netto dos Santos et al., 2001). Furthermore, a 14-mer (Fleming et al., 1998) or even longer (Puskas & Bottka, 1994) SD-sequence-based primer has been used in arbitrary primed PCR for detection of RNA profile. However, to our knowledge this is the first study utilizing the conserved SD sequence to amplify and clone bacterial genes. This method will be potentially useful in studies where only partial gene sequences are available and where obtaining large amounts of pure bacterial template DNA is not possible.


We thank Dr Nicolas Delaroque, Prof. Joern Piel, and Prof. Erika Kothe for helpful discussion.