A reliable amplification technique for the characterization of genomic DNA sequences flanking insertion sequences

Authors


Corresponding author. Tel.: +33 (1) 45 68 88 77; Fax: +33 (1) 45 68 88 43; E-mail: cguilhot@pasteur.fr

Abstract

A simple and efficient ligation-mediated PCR (LMPCR) is described for amplifying DNA adjacent to known sequences. The method uses one primer specific for the known sequence and a second specific for a synthetic linker ligated to restricted genomic DNA. Perkin-Elmer AmpliTaq Gold polymerase is used to minimize non-specific primer annealing and amplification. This LMPCR method was successfully applied to isolate DNA sequences flanking mobile elements present in mycobacterial mutants generated by transposon mutagenesis.

1Introduction

The isolation of a gene of unknown sequence, identified by a phenotype or tagged by the insertion of a mobile element, is often a long and labor intensive procedure. A standard approach consists in the construction and screening of a genomic library followed by the isolation, subcloning and sequencing of putative positive clones.

Direct cloning of an ‘unknown’ DNA located outside a known sequence is possible using alternative PCR-based strategies. One approach, termed inverse PCR, involves restriction of genomic DNA and ligation of the free ends under conditions that favor circularization of the DNA. The final step is the amplification using oligonucleotides complementary to the 5′ and 3′ ends of the known sequences [1]. However, the circularization step is unreliable and concatemers are not uncommonly produced.

Another method, based on ligation-mediated single-sided PCR, has most commonly been used for analysis of methylated cytosines and for in vivo footprinting [2, 3]. This method uses linker ligation and subsequent amplification with primer pairs recognizing the linker and genomic sequences. Nested PCR is usually required to obtain specificity. Modifications of this ligated-mediated PCR have been described for cloning of unknown sequences [4, 5] or to generate polymorphism patterns useful for epidemiologic studies [6–8].

We have extended the application of the ligation-mediated PCR (LMPCR) strategy to mycobacteria (Mycobacterium smegmatis and Mycobacterium tuberculosis) to permit direct amplification, sequencing and cloning of sequences tagged by DNA insertional mutagenesis.

2Materials and methods

2.1Strains and plasmids

The bacterial strains and plasmids used in this study are listed in Table 1. Plasmid p1D10 was constructed by inserting a 3.7 kb DNA Sau3A fragment of M. simiae in pBluescript II. Plasmid pA5 was constructed by cloning the purified PCR product A5 into the pGEM vector (Promega) according to the manufacturer's protocol. Eight mutant clones of M. smegmatis mc2155 were obtained by Tn611 transposition. This transposon, derived from Tn610 isolated from M. fortuitum FC1, is the first composite transposon for which transposition has been demonstrated in mycobacteria [9]. Its transposition mechanism is replicative and results in the formation of cointegrates: the whole delivery vector is integrated and an additional copy of the insertion sequence is synthesized [9].

Table 1.  Bacterial plasmids and strains used in this study
 Characteristics
  1. aCIPT, Collection Institut Pasteur Tuberculose, France.

Strains 
M. simiaeCIPTa 104 102 0001
M. smegmatis mc2155::Tn610 Nos. 1–8M. smegmatis mutants (Tn610 transposition)
M. tuberculosis::Tn5367 Nos. 1–30M. tuberculosis mutants (Tn5367 transposition)
Plasmids 
p1D10Insertion of an Sau3AI from M. simiae (3.7 kb) in pBluescript II KS
pA5Insertion of the amplicon A5 on a pGEM (Promega)

Thirty clones of M. tuberculosis (1–30) obtained by transposition of Tn5367 were also investigated [10]. Tn5367 is a derivative of IS1096 that does not form a cointegrate as the result of the transposition event. When the mobile element hoops onto the chromosome, the delivery vector is lost.

2.2Linkers and primers

The oligonucleotides used as linkers and primers are described in Table 2. The SalI linker was constructed by annealing two non-phosphorylated oligonucleotides, Salpt and Salgd. This linker was designed to ligate to the SalI cohesive ends thus eliminating the SalI restriction site after ligation. Salpt and Salgd oligonucleotides were mixed in equimolar amounts in 1×PCR buffer (10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, 0.001% (w/v) gelatin). This mix, placed in a thermal cycler, was heated to 80°C and the two oligonucleotides were allowed to anneal by slowly cooling the mixture to 4°C over 1 h. The primer G124 was specific for a target gene of M. simiae. The primers oligo G and oligo F were specific for IS6100 and were directed outward from this element. The primer ISada was specific for the inverted repeats (IR) of IS1096 and was directed to amplify the junction of the IS.

Table 2.  Oligonucleotides used in the study and their sequences
PrimeraSequence (5′ to 3′)
  1. aSalgd and Salpt: linker primers, SalI restriction site in bold, complementary sequence of the linker underlined; G124: M. simiae primer; Oligo F and G: IS6100 primers: ISada: IS1096 primer.

SalgdTAG CTT ATT CCT CAA GGC ACG AGC
SalptTCG AGC TCG TGC
G124CGA ACA AAA TGG GGG TAA GG
Oligo FAAG AAT TCA TCG TTC CGT CCG TCC AAT CTC C
Oligo GGAG CGA CAG CCT ACC TCT GAC T
ISadaTTT GAG CTC TAC ACC GTC AAG TGC GAA GAG C

2.3Preparation of PCR template

Aliquots of mycobacterial genomic or plasmid DNA (0.2–0.3 μg) were incubated in a total volume of 20 μl containing 10 U of SalI (Boehringer Mannheim) in 1×NEB4 buffer (BioLabs) for 1 h at 37°C. The digested DNA was ligated to the SalI linker by incubating for 1 h at 16°C a mixture containing 5 μl of digested DNA, the SalI linker (25 pmol), 10 U of T4 DNA ligase (Gibco BRL) and the 5×ligase buffer (final volume 20 μl). After ligation, the T4 DNA ligase was inactivated by incubation at 65°C for 10 min. The samples were then digested for 15 min at 37°C with 5 U of SalI, 2 μl of 10×buffer H (Boehringer Mannheim) and 2.5 μl of water, to cleave any remaining restriction sites resulting from partial genomic digestion or regeneration through ligation.

2.4PCR amplification

PCR was performed in a standard reaction mixture of 50 μl. Template DNA (5 μl of 1/10 dilution for genomic DNA or 1/100 dilution for plasmid DNA) was added to the reaction mixture which contained 1 U of AmpliTaq ‘Gold’ DNA polymerase (Perkin-Elmer Cetus, Norwalk, CT), deoxynucleoside triphosphates (200 μM each), and 5 μl of dimethyl sulfoxide in PCR buffer. The Salgd primers (1 μM) and specific primers (oligo F, oligo G, G124, or ISada) were added and the DNA was denatured by incubating the mixture at 95°C for 9 min. Amplification was achieved using 35 cycles of PCR (95°C for 30 s, 55°C for 30 s and 72°C for 90 s), followed by a final extension at 72°C for 10 min. Amplified products were separated by standard horizontal gel electrophoresis in a 1.5% agarose gel in TBE buffer (90 mM Tris, 90 mM boric acid, 2 mM EDTA) and were stained using ethidium bromide. PCR products were purified using the Nucleotrap kit (Masserey-Nagel, Düren, Germany).

2.5Sequencing of the PCR products

Sequencing was performed on an ABI 373 DNA sequencer with the Taq DyeDeoxy Terminator Cycle sequencing kit (Perkin-Elmer, Applied Biosystems Inc.).

3Results and discussion

The PCR process used for the amplification of the 3′ flanking region adjacent to a known sequence (‘gene A’) is summarized in Fig. 1. An asymmetric, non-phosphorylated, double-stranded linker was ligated by its long Salgd strand to the 5′ phosphate cohesive ends of SalI-digested genomic or plasmid DNA. The short strand (Salpt) of the linker was not ligated during the tag procedure due to the absence of phosphate at the 5′ extremity of the oligonucleotide. The Salpt-Salgd duplex is stable under ligation conditions (T= 16°C), but not at PCR temperature (T>55°C). Therefore, the Salpt oligonucleotide was separated from the DNA matrix during the heat denaturation step of the PCR and did not reanneal to DNA during the amplification. An oligonucleotide specific for the target gene and Salgd are then used as primers in a PCR reaction. During the first cycle, only products synthesized from the primer specific for the tagged gene are generated. This newly synthesized DNA is then used as a template in the subsequent cycles. Consequently, the fragment between the target sequence and the linker is selectively amplified.

Figure 1.

LMPCR strategy. A linker is ligated to SalI restricted DNA fragments. PCR specifically amplifies the region downstream of gene A using primers A and Salgd.

It should be noted that preventing the amplification between two linkers is a key point for the success of LMPCR. The undesired amplification could arise if two complementary DNA fragment hybridize in the annealing step of the PCR. However, this is unlikely since each fragment represents only a minute part of the initial mixture. Another reported possibility for a non-specific amplification is that the annealed primer is extended during the sample preparation [11]. To prevent these undesired amplifications, we tested the possibility of using the Amplitaq Gold polymerase. Indeed, in this formulation the polymerase is inactive until heated and could thus minimize undesirable extensions due to non-specific primer annealing. A specific DNA fragment of M. simiae, present on a plasmid p1D10 and in genomic DNA, was used as a target for comparing the amplification specificity of the standard Taq and the Taq Gold polymerases under the same reaction conditions (Fig. 2). The AmpliTaq ‘Gold’ polymerase generated the expected 800 bp length fragment from both DNA matrices. In contrast, the standard Taq polymerase amplified all three SalI fragments when p1D10 was used as target. This observation suggested that amplification between two linkers occurred with standard Taq polymerase. When M. simiae genomic DNA was used as a template, a smear was observed that obscured the presence of any distinct band amplified during the PCR. The observed smear possibly represents single-stranded DNA produced by non-specific amplification or amplification between linkers on the multitude of digested DNA fragments generated during the matrix DNA preparation. In conclusion, the ‘Gold’ Taq polymerase under the conditions used prevented the generation of these spurious amplifications products.

Figure 2.

LMPCR amplification with different DNA polymerases. Lanes 1 and 8: 1 kb ladder. Lanes 2–4: Amplicons obtained using AmpliTaq Gold polymerase. Lanes 5–7: Amplicons obtained with standard AmpliTaq polymerase. Lanes 2 and 5: p1D10 plasmid. Lanes 3 and 6: M. simiae DNA. Lanes 4 and 7: M. simiae DNA without linker.

The validity of the LMPCR procedure for the characterization of genomic sequences flanking transposons was first determined on M. smegmatis mutants obtained by Tn611 transposition. Fig. 3 C shows a schematic illustration of cointegrate obtained during Tn611 transposition. The length of the cointegrate (20.7 kb) and the presence of three insertion sequences render the identification of the insertion site tedious by other techniques such as inverted PCR or cloning of the junctions. Two oligonucleotides, oligo F and oligo G, were designed to amplify both sides of the insertion site. Three targets for each IS6100-derived oligo F and oligo G are present on the cointegrate. During LMPCR one and two amplicons are generated with oligo F and oligo G, respectively (Fig. 3A,B). Other internal targets are not amplified due to the excessive distance from the primers (oligo F or oligo G) to the nearest SalI site tagged with the linker. Using primers oligo F and Salgd, a one-sided flanking region was amplified for seven out of eight samples, ranging in size from 100 bp to 2.8 kb (Fig. 3A). For one sample (lane 3) no amplification was observed; this absence of amplification would be expected if the distance between the integration site and the first upstream SalI restriction site was too large for efficient amplification. As expected, when oligo G and Salgd are used as primers, two amplicons were generated. One constant amplicon (ca. 1159 bp) was observed for all eight samples (Fig. 3B), whereas additional bands of different length were generated for seven out of eight DNA matrices. The constant DNA fragment represent a primer target inside the cointegrate. The two one-sided insertion sites not amplified (Fig. 3A, lane 2, B, lane 8) using SalI linker where obtained in another LMPCR run using a BamHI adapted linker (data not shown).

Figure 3.

LMPCR amplification of eight M. smegmatis::Tn610 mutants. A: Lane 1: PstI-digested lambda DNA molecular marker. Lanes 2–9: Amplicons obtained by LMPCR with primers oligo F and Salgd. B: Lane 1: PstI-digested lambda DNA molecular marker. Lanes 2–9: Amplicons obtained by LMPCR with primers oligo G and Salgd of the same eight strains as for A. C: Schematic representation of cointegrate obtained by Tn611 transposition in M. smegmatis. Primers (oligo F and oligo G) used in LMPCR are shown as arrows. Amplicons obtained by LMPCR are represented by the heavy solid lines.

To demonstrate the specificity of LMPCR, one sample A5 (Fig. 3A, lane 5) was inserted in pGEM vector and sequenced. The sequence analysis showed at one side of the insert the oligonucleotide sequence of the oligo F and the end of the insertion sequence and at the opposite site of the insert the linker sequence Salgd (data not shown).

The validity of the previous results was also extended to M. tuberculosis and to the characterization on the genomic DNA sequences flanking the Tn5367 elements in insertion mutants [10]. During Tn5367 transposition, one copy of the mobile element is inserted in the chromosome and the delivery vector is lost. Our objective was to target simultaneously the two junctions. Therefore, to reduce the number of PCR to be performed, we designed a primer corresponding to the inverted repeats (IR) of Tn5367. This oligonucleotide has 2 bp mismatches with IR right but this does not prevent its hybridization. Thirty insertion mutants of M. tuberculosis were analyzed using the LMPCR method (Fig. 4). In 12 cases, two amplification products were obtained. The smaller one tended to be produced in higher yield than the larger suggesting that the competition between the two fragments is unfavorable to the larger product. The two PCR products obtained with four of these clones were gel purified and sequenced using Salgd as a primer. Sequence analysis revealed that the two fragments corresponded to the two junctions of the transposon with the chromosome. In 12 other cases, only one junction was amplified, perhaps because the other fragment was too long. For just five clones, no amplification was obtained. In these cases, the use of another linker (adapted to the BamHI restriction site) allowed the isolation of at least one of the two junctions. Overall, junctional sequences could successfully be analyzed by LMPCR for 29 out of 30 clones.

Figure 4.

LMPCR amplification of 15 M. tuberculosis::Tn5367 mutants. Lane 1: PstI-digested lambda DNA. Lanes 2–16: Amplicons obtained from 15 different M. tuberculosis insertional mutants by LMPCR with primers Salgd and ISada.

In conclusion, it can be noted that the LMPCR approach described above provides a rapid method for obtaining sequence information from the insertion site in insertional mutant libraries. When the genome sequence is available (like in M. tuberculosis H37Rv soon), the information is sufficient to precisely locate the insertion site on the chromosome. Otherwise, the sequence obtained by LMPCR allows the design of amplification primers for further chromosome walking. Alternatively, hybridization probes can be isolated after digestion of the linker of the amplification products.

The specificity of the LMPCR is determined by characteristics of the synthetic linker, choice of the primers, and the use of a DNA polymerase preparation that reduces the amount of non-specific amplicons generated. Under these conditions, nested PCR to enhance the specificity of the amplification was superfluous. This approach greatly simplifies the analysis of insertional mutants and avoids the fastidious cloning steps inherent in standard techniques.

Acknowledgements

This work was supported by a European Economic Community (EEC) Biotech program grant (BIO-CT92-0520), NIH Grant AI 35207 and the Institut Pasteur. G.P. received financial support from the Centre Hospitalier Universitaire Vaudois, Ciba-Geigy-Jubiläums-Stiftung, and Merck-Sharp and Dohme-Gibret SA.

Ancillary