SEARCH

SEARCH BY CITATION

Keywords:

  • 16S large ribosomal subunit;
  • amplicon sequencing;
  • cytochrome c oxidase subunit I;
  • DNA barcoding;
  • high-throughput sequencing;
  • insect;
  • MoTASP;
  • multiplex identifier

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information

High-throughput sequencing (HTS) of PCR amplicons is becoming the method of choice to sequence one or several targeted loci for phylogenetic and DNA barcoding studies. Although the development of HTS has allowed rapid generation of massive amounts of DNA sequence data, preparing amplicons for HTS remains a rate-limiting step. For example, HTS platforms require platform-specific adapter sequences to be present at the 5′ and 3′ end of the DNA fragment to be sequenced. In addition, short multiplex identifier (MID) tags are typically added to allow multiple samples to be pooled in a single HTS run. Existing methods to incorporate HTS adapters and MID tags into PCR amplicons are either inefficient, requiring multiple enzymatic reactions and clean-up steps, or costly when applied to multiple samples or loci (fusion primers). We describe a method to amplify a target locus and add HTS adapters and MID tags via a linker sequence using a single PCR. We demonstrate our approach by generating reference sequence data for two mitochondrial loci (COI and 16S) for a diverse suite of insect taxa. Our approach provides a flexible, cost-effective and efficient method to prepare amplicons for HTS.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information

High-throughput sequencing (HTS) of PCR amplicons (amplicon sequencing) is a common method of targeted sequencing one or several loci in large numbers of samples simultaneously. This approach is regularly employed in targeted resequencing and DNA barcoding projects (e.g. Sønstebø et al. 2010; Hajibabaei et al. 2011; O'Neill et al. 2013) to survey biodiversity at multiple levels. Currently available HTS platforms (Illumina, Roche 454, Ion Torrent and SOLiD) require adapter sequences to be present at the 5′ and 3′ end of the DNA fragment to be sequenced. In addition, short [4–10 base pair (bp)] sample-specific multiplex identifier (MID) tags are typically added to allow multiple samples to be pooled in a single HTS run (Binladen et al. 2007; Meyer & Kircher 2010).

Long primer constructs containing HTS adaptors, MID tags and locus-specific primers (fusion primers) can be used to generate HTS-ready amplicons with a single PCR and are commonly employed for this reason (e.g. Sønstebø et al. 2010). However, the cost of fusion primers can be prohibitive, particularly when targeting multiple genomic loci, due to separate primers being required for each combination of MID and locus-specific primer (e.g. targeting 10 loci in 20 samples requires 200 primer pairs). An alternative method is to ligate HTS adapters and MID tags to amplicons (Meyer & Kircher 2010; O'Neill et al. 2013); however, this requires multiple enzymatic and clean-up steps with concomitant labour costs and risks.

Bybee et al. (2011) and de Cárcer et al. (2011) demonstrated the potential of using a linker sequence at the 5′ end of the locus-specific primer to attach HTS adaptors and MID tags in a modular fashion using a second round of PCR. We develop this approach by amplifying the target locus and attaching adaptors and MID tags via a linker sequence within a single PCR. We use this method to generate COI and 16S rDNA reference sequences for a diverse set of insect taxa to demonstrate the efficiency of the methodology.

Materials and methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information

Insects were collected in the Adelaide Hills (South Australia) using a malaise trap on 2–3 November 2012, and stored in 100% ethanol. Drymaplaneta communis (Blattodea) was hand collected in Adelaide on 1 December 2012, and stored in 100% ethanol. Specimens were identified using existing morphological keys to family level. DNA was extracted from 12 specimens using a modified version of the Canadian Centre for DNA Barcoding plate extraction method (Ivanova et al. 2006, 2007). A DNA extract from Ischnura heterosticta (Odonata) used in a previous study (D. Green, unpublished data) was also included.

We adapted the multiplex-ready PCR method (Hayden et al. 2008a) to amplify a target locus and attach adaptors and MID tags in a single PCR prior to HTS. Multiplex-ready PCR (Hayden et al. 2008a) uses the principle of M13-tailed primers to add a fluorophore of choice for microsatellite and SNP genotyping in a two-stage PCR. Forward and reverse locus-specific primers are modified to include generic, noncomplementary nucleotide sequences at their 5′ ends that act as primer-binding sites in the second stage of PCR. In the first stage of PCR, the locus-specific primers are used to amplify the target loci. In the second stage of PCR, universal primers (tagF and tagR) tagged with a fluorophore amplify the first-stage products to a detectable level. The involvement of the tag primers is restricted to the second stage of the PCR by their lower annealing temperature compared with the locus-specific primers. To add MID tags and HTS adaptors to PCR amplicons (Fig. 1), the fluorophore at the 5′ end of the tagF primer was replaced by the Ion Torrent Primer P1-key, and the tagR primer was modified to include the Primer A-key at the 5′ end, followed by a 7-bp MID sequence (Meyer & Kircher 2010).

image

Figure 1. Schematic representation of modular tagging of amplicons using a single PCR (MoTASP) for high-throughput sequencing (HTS). In the first cycles of PCR, linker sequences (Link) are attached to amplicons via the locus-specific primers (LSP, A and B). In later cycles, HTS adaptors (Ad) and MID tags are attached to amplicons via the linker sequences (C and D).

Download figure to PowerPoint

Forward and reverse linker sequences (corresponding to the tagF and tagR primer sequences) were added to the 5′ end of locus-specific primers targeting the mitochondrial COI gene and 16S rDNA (Table 1), and loci were PCR amplified using a standardized thermal cycling protocol (Hayden et al. 2008b; Appendix S1, Supporting information). PCR was performed in a 12-μL reaction mixture containing 10 ng genomic DNA, 60 nm of forward and reverse locus-specific primer, 75 nm each of the tagF and tagR primer constructs, with all other reaction conditions as described in Hayden et al. (2008b, Appendix S1, Supporting information). Each insect extract was amplified with a tagF primer construct containing a unique MID tag, with the same set of 13 MID tags used for the 16S and COI loci.

Table 1. PCR primers used in this study. Amplicon lengths are given excluding primer sequences. Linker sequences are italics. The position of the multiplex identifier (MID) tag is shown as [N].
NameLocusPrimer sequence (5′–3′)Length (bp)Reference
tagF_C1-J-1709COIACGACGTTGTAAAAAATTGGWGGWTTYGGAAAYTG133Simon et al. (2006)
tagR_C1-N-1843dCOICATTAAGTTCCCATTAGMWARWGGWGGRTAWACWGTTCAZhang & Hewitt (1997)
tagF_Ins16S-1F16SACGACGTTGTAAAATRRGACGAGAAGACCCTATA156L. J. Clarke and A. Cooper, unpublished
tagR_Ins16S-1Rshort16SCATTAAGTTCCCATTAACGCTGTTATCCCTAARGTAL. J. Clarke and A. Cooper, unpublished
ionA_MID_tagR CCATCTCATCCCTGCGTGTCTCCGACTCAG[NNNNNNN]CATTAAGTTCCCATTA  
ionP1_tagF CCTCTCTATGGGCAGTCGGTGATAGAGACACGACGTTGTAAAA  

PCR products were purified by polyethylene glycol (PEG)/NaCl precipitation with a final concentration of 13% (w/v) PEG, using Sera-Mag carboxylate-modified magnetic speed-beads (Thermo Scientific, Waltham, MA, USA) as the solid phase (DeAngelis et al. 1995; Lundin et al. 2010). Purified products for each locus were quantified using the Qubit® dsDNA HS assay on a Qubit® 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and combined in equimolar ratios to create two libraries (COI and 16S). The size distribution and concentration of each library were assessed on an Agilent 2200 TapeStation using High Sensitivity D1K ScreenTape and reagents (Agilent Technologies, Santa Clara, CA, USA). A second PEG precipitation was performed to remove primer artefacts (9% PEG) where necessary. Emulsion PCR and Ion Sphere™ Particle enrichment were conducted on an Ion OneTouch™ System (Life Technologies) using the Ion OneTouch™ 200 Template Kit v2 DL according to the manufacturers' protocol. Each library was sequenced on an Ion Torrent Personal Genome Machine™ (PGM) Sequencer using the Ion PGM™ 200 Sequencing Kit and Ion 314™ semiconductor sequencing chips (Life Technologies).

We developed a customized pipeline to process HTS reads. The script fastx_barcode_splitter.pl from the FastX toolkit (version 0.0.13; http://hannonlab.cshl.edu/fastx_toolkit/) was used to sort reads by MID, using a strict zero mismatches threshold (–bol –mismatches 0). The quality of reads assigned to each MID was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Cutadapt v1.2.1 (Martin 2011) was used to trim linker sequences and locus-specific primers using a maximum error rate of 0.33 (-e 0.3333), and to remove short (-m 25 bp), long (-M 220 bp) and low-quality sequences (-q 20), with a total of five passes (-n 5). For each locus, trimmed reads were mapped to a reference sequence from the same order or family using Geneious 6.0.3 (Drummond et al. 2011), and a consensus sequence was generated by visual inspection of the aligned reads with a minimum of 50-fold coverage. BLAST searches (blastn algorithm, no filter for low complexity or human repeats, search restricted to insect sequences) were used to identify the most similar sequences in the NCBI nucleotide database (Altschul et al. 1990).

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information

A total of 511 502 raw reads were obtained for 16S and 329 814 for COI. The number of reads per MID ranged from 9464 to 52 747 (mean ± SD = 37 567 ± 14710, 23129 did not match one of the 13 MID tags) for 16S and 7232 to 59 673 (24 761 ± 12 879, 7918 unmatched) for COI (Table S2, Supporting information). Average Q-values for reads assigned to each MID ranged from 24.2 to 26.8 for 16S (mean ± SD = 25.6 ± 0.8) and 26.3 to 29.9 for COI (27.9 ± 1.0, Table S2, Supporting information). After primer trimming, consensus building and BLAST search, the hit with the smallest E-value for each 16S consensus sequence corresponded to the correct insect order in all cases, and to the correct family for 12 of 13 taxa (Table S1, Supporting information). The best BLAST hit for the 13 COI consensus sequences corresponded to the correct order for 10 taxa, with the two hymenopteran sequences retrieving BLAST hits for Lepidoptera. Sanger sequencing using C1-J-1709 and C1-N-1843d (Table 1) or the standard barcoding primers (LCO1490 and HCO2198, Folmer et al. 1994) was used to verify that the three COI consensus sequences retrieving BLAST hits to the incorrect order were accurate (100% pairwise identity between consensus and Sanger sequence in each case).

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information

Our modular approach to add HTS adapters and MID tags to amplicons in a single PCR provides a simpler and more efficient method than previous protocols (Bybee et al. 2011; de Cárcer et al. 2011). Incorporating adapters and MID tags in a single, closed-tube reaction reduces the number of clean-up steps required and the potential for contamination. Changing the linker sequence (tagF or tagR) attached to the locus-specific primer also provides a simple means to change the read direction if desired. The use of a standardized thermal cycling protocol simplifies PCR optimization, such that only locus-specific primer concentration requires optimizing.

We found adapting the multiplex-ready PCR protocol to add HTS adapters and MID tags to amplicons required higher concentrations of locus-specific primers compared with the original application of amplifying microsatellite or SNP loci (Hayden et al. 2008a). Hayden et al. (2008a,b) recommend testing concentrations of locus-specific primers between 20 and 80 nm to optimize amplification. We have previously found low concentrations (20–40 nm) were most suitable for amplifying microsatellite loci using standard tagF and tagR primers (e.g. Clarke et al. 2011). When HTS adapters and MID tags are added to the tagF and tagR primers, low concentrations of locus-specific primers led to large primer artefacts (ca. 100 bp) that needed to be removed prior to sequencing. Higher concentrations of locus-specific primer (60–80 nm) led to significant reductions in primer artefacts, presumably due to the presence of more PCR product following the first stage of amplification reducing the extent of primer–primer interactions in the second stage.

These experiments were intended as a proof of concept study; however, by utilizing bioinformatics and HTS capabilities, the number of samples and loci sequenced in a single run could be greatly increased. Although we sequenced the 16S and COI loci on separate Ion 314 chips, multiple loci could be pooled on the same chip using the same MID tags and separated post hoc by locus-specific primer sequence. Based on the minimum number of reads for a MID in this study (ca. 7200), a sequencing depth of 100-fold for each MID could be obtained by pooling 900 samples, or a combination of samples and loci, for example, 10 loci for 90 samples, on a single 314 chip. An order of magnitude more samples could be pooled using the Ion 316 sequencing chip, with a further 10-fold increase available using the Illumina MiSeq (v2 reagents). Although scaling up our approach to such an extent would require a very large number of unique MIDs, software capable of designing several thousands of unique MIDs is now available (Faircloth & Glenn 2012; Costea et al. 2013). Furthermore, the number of primer constructs containing a unique MID could be reduced for large numbers of samples by incorporating MID tags adjacent to the reverse HTS adapter (in this case, the Ion Torrent Primer P1-key). As long as full-length reads are obtained, reads can be separated by the combination of MID tags at the beginning and end of each sequence, greatly increasing the number of unique MID combinations possible. Alternatively, MID tags could be added to the linker-LSP construct and reads sorted based on the combination of two MID tags in a similar fashion.

The modular tagging of amplicons using a single PCR (MoTASP) method can potentially be applied to many loci, with the most critical factor for successful amplification of any given locus being the annealing temperature of the locus-specific primers. We have applied the MoTASP method to several other loci to date, including plant trnL, vertebrate 12S rDNA and alternative sites within the COI and 16S loci (L. J. Clarke & P. Czechowski, unpublished data). We have not applied this approach to single-copy nuclear genes as yet, but we expect the method should work as it was originally designed to amplify and fluorescently tag nuclear microsatellites (Hayden et al. 2008a). Advances in HTS leading to increased read lengths (for example, 400-bp kits are now available for the Ion Torrent PGM) will increase the number of loci to which this method can be applied and the amount of data that can be generated, and in turn the taxonomic resolution, for any given locus. In our experience, the most critical factor limiting the application of MoTASP is the annealing temperature of the locus-specific primers. Low annealing temperatures (<50 °C) for the locus-specific primers prevent amplification of the target locus prior to amplification with the linker primers. We have observed improved amplification of some loci by reducing the annealing temperature in the first phase of the thermal cycling protocol (see Appendix S1, Supporting information); however, some loci still failed to amplify. Locus-specific primers with low annealing temperatures could be redesigned (e.g. increased length or GC content) to facilitate amplification with this protocol.

In this study, we have demonstrated a novel method to attach HTS adapters and MID tags to amplicons using a single PCR and used the same set of MID tags to generate reference sequences for two loci commonly used in systematic, barcoding and phylogenetic studies. Modular attachment of HTS adapters and MID tags provides several advantages over the use of fusion primers. A modular approach permits straightforward transfer between HTS platforms by changing the HTS adapter and MID tag primer combined with the locus-specific primer. Furthermore, HTS adapter and MID tag primers can be applied to any locus, allowing transfer between experiments, projects or laboratory groups, representing a substantial cost reduction compared with ordering large numbers of unique fusion primers. MoTASP requires the same number of PCR and clean-up steps as a standard fusion primer approach; hence, laboratory costs are comparable between the two methods. In conclusion, our approach provides a flexible, cost-effective and efficient method to prepare amplicons for high-throughput sequencing.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information

Thanks to Renate Faast and Oliver Wooley for supplying insect specimens, Douglas Green for supplying DNA extracts and John Jennings, Gary Taylor and Remko Leijs for morphological identification. Bastien Llamas helped design the bioinformatic pipeline. We thank the ARC for funding.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology, 215, 403410.
  • Binladen J, Gilbert MTP, Bollback JP et al. (2007) The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLoS One, 2, e197.
  • Bybee SM, Bracken-Grissom H, Haynes BD et al. (2011) Targeted amplicon sequencing (TAS): a scalable next-gen approach to multilocus, multitaxa phylogenetics. Genome Biology and Evolution, 3, 13121323.
  • de Cárcer DA, Denman SE, McSweeney C, Morrison M (2011) Strategy for modular tagged high-throughput amplicon sequencing. Applied and Environmental Microbiology, 77, 63106312.
  • Clarke LJ, Mackay DA, Whalen MA (2011) Isolation of microsatellites from Baumea juncea (Cyperaceae). Conservation Genetics Resources, 3, 113115.
  • Costea PI, Lundeberg J, Akan P (2013) TagGD: fast and accurate software for DNA tag generation and demultiplexing. PLoS One, 8, e57521.
  • DeAngelis MM, Wang DG, Hawkins TL (1995) Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Research, 23, 47424743.
  • Drummond AJ, Ashton B, Buxton S et al. (2011) Geneious v5.6.3. Available from http://www.geneious.com.
  • Faircloth BC, Glenn TC (2012) Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels. PLoS One, 7, e42543.
  • Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and Biotechnology, 3, 294299.
  • Hajibabaei M, Shokralla S, Zhou X, Singer GAC, Baird DJ (2011) Environmental barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One, 6, e17497.
  • Hayden MJ, Nguyen TM, Waterman A, Chalmers KJ (2008a) Multiplex-Ready PCR: a new method for multiplexed SSR and SNP genotyping. BMC Genomics, 9, 80.
  • Hayden MJ, Nguyen TM, Waterman A, McMichael GL, Chalmers KJ (2008b) Application of multiplex-ready PCR for fluorescence-based SSR genotyping in barley and wheat. Molecular Breeding, 21, 271281.
  • Ivanova NV, deWaard JR, Hebert PDN (2006) An inexpensive, automation-friendly protocol for recovering high-quality DNA. Molecular Ecology Notes, 6, 9981002.
  • Ivanova NV, deWaard JR, Hebert PDN (2007) CCDB protocols, glass fiber plate DNA extraction. Available from http://ccdb.ca/docs/CCDB_DNA_Extraction.pdf.
  • Lundin S, Stranneheim H, Pettersson E, Klevebring D, Lundeberg J (2010) Increased throughput by parallelization of library preparation for massive sequencing. PLoS One, 5, e10029.
  • Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17, 1012.
  • Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harbor Protocols, doi:10.1101/pdb.prot5448.
  • O'Neill EM, Schwartz R, Bullock CT et al. (2013) Parallel tagged amplicon sequencing reveals major lineages and phylogenetic structure in the North American tiger salamander (Ambystoma tigrinum) species complex. Molecular Ecology, 22, 111129.
  • Simon C, Buckley TR, Frati F, Stewart JB, Beckenbach AT (2006) Incorporating molecular evolution into phylogenetic analysis, and a new compilation of conserved polymerase chain reaction primers for animal mitochondrial DNA. Annual Review of Ecology, Evolution and Systematics, 37, 545579.
  • Sønstebø JH, Gielly L, Brysting AK et al. (2010) Using next-generation sequencing for molecular reconstruction of past Arctic vegetation and climate. Molecular Ecology Resources, 10, 10091018.
  • Zhang D-X, Hewitt GM (1997) Assessment of the universality and utility of a set of conserved mitochondrial COI primers in insects. Insect Molecular Biology, 6, 143150.

L.C. led the writing, L.C. and P.C. conducted laboratory work and analysed the results, J.S. designed the bioinformatic analysis and pipeline, M.S. and A.C. helped conceive the experiments and all authors contributed to editing the manuscript.

Data Accessibility

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information

Contigs used to generate consensus sequences from HTS data (BAM files), consensus sequences (FASTA file) and the bioinformatic pipeline to process HTS reads using the FastX toolkit and Cutadapt are available on DataDryad (doi:10.5061/dryad.0f9n0).

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data Accessibility
  10. Supporting Information
FilenameFormatSizeDescription
men12162-sup-0001-TableS1-S2.pdfapplication/PDF157K

Table S1 Morphological and DNA sequence identification for each specimen in this study.

Table S2 Number of HTS reads assigned and mean Q-values for each MID.

men12162-sup-0002-AppendixS1.pdfapplication/PDF69KAppendix S1 MoTASP PCR protocol.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.