The total community genomic DNA (gDNA) from permafrost was extracted using four commercial DNA extraction kits. The gDNAs were compared using quantitative real-time PCR (qPCR) targeting 16S rRNA genes and bacterial diversity analyses obtained via 454 pyrosequencing of the 16S rRNA (V3 region) amplified in single or nested PCR. The FastDNA® SPIN (FDS) Kit provided the highest gDNA yields and 16S rRNA gene concentrations, followed by MoBio PowerSoil® (PS) and MoBio PowerLyzer™ (PL) kits. The lowest gDNA yields and 16S rRNA gene concentrations were from the Meta-G-Nome™ (MGN) DNA Isolation Kit. Bacterial phyla identified in all DNA extracts were similar to that found in other soils and were dominated by Actinobacteria, Firmicutes, Gemmatimonadetes, Proteobacteria, and Acidobacteria. Weighted UniFrac and statistical analyses indicated that bacterial community compositions derived from FDS, PS, and PL extracts were similar to each other. However, the bacterial community structure from the MGN extracts differed from other kits exhibiting higher proportions of easily lysed β- and γ-Proteobacteria and lower proportions of Actinobacteria and Methylocystaceae important in carbon cycling. These results indicate that gDNA yields differ between the extraction kits, but reproducible bacterial community structure analysis may be accomplished using gDNAs from the three bead-beating lysis extraction kits.
Permafrost or permanently frozen ground underlies approximately 2.3 × 107 km2 or 23.9% of the exposed land area in the Northern Hemisphere (Zhang et al., 2008). Despite the harsh freezing conditions within permafrost, this environment is inhabited by a diverse microbial community estimated at 106–108 cells g−1 in permafrost (Vishnivetskaya et al., 2000; Steven et al., 2007) versus 108–109 cells g−1 in the upper active layer (Kobabe et al., 2004). Traditionally, active layer and permafrost microbial communities have been studied using cultivation-dependent techniques (Vorobyova et al., 1997; Vishnivetskaya et al., 2000; Ozerskaya et al., 2009; Steven et al., 2009). With the general acceptance that the bulk of microorganisms in an environment are unculturable (Alexander, 1977), DNA-based assays have become routinely used in microbial ecology and environmental studies. Within the last few years, the application of molecular biology and genomics techniques has also been extended to permafrost ecosystems (Vishnivetskaya et al., 2006; Yergeau et al., 2010; Mackelprang et al., 2011). Attempts in applying metagenomic approaches to soils and sediments have been hampered by technical challenges as highlighted recently by Hazen et al. (2013). These include the extraction of representative, unbiased genetic material from organisms with varying cell wall compositions and differentially accessible DNA (Yergeau et al., 2010; Mackelprang et al., 2011), as well as the inherent DNA heterogeneity and uneven spatial distribution of microorganisms within soils (Torsvik et al., 1990; Curtis et al., 2002).
The total microbial biomass concentration in the Arctic active layer and permafrost is comparable to that of temperate soils (Vishnivetskaya et al., 2006; Hansen et al., 2007; Steven et al., 2007; Wagner et al., 2009), albeit previous studies have highlighted the low extraction yield of the total gDNA (Hansen et al., 2007; Yergeau et al., 2010). The quality and quantity of the nucleic acids are extremely important for in-depth analyses of microbial community structure (Hazen et al., 2013). Therefore, the use of an optimal DNA extraction protocol is crucial. Two main approaches exist for the isolation of microbial DNA from soil: (1) the separation of soil microorganisms from soil particles prior to cell lysis and DNA recovery (Bakken, 1985), and (2) the method of direct lysis of microorganisms in the soil samples, followed by DNA extraction (Ogram et al., 1987; de Lipthay et al., 2004). Numerous studies have described different approaches to cell lysis including freezing and thawing, sonicating, boiling, grinding in liquid N2, bead-beating, and/or cell wall fracturing via osmotic stress, sodium dodecyl sulfate treatment, or lysozyme addition to improve DNA extraction from a wide variety of environmental sources (e.g. soils, sediments, compost and fecal matter; Ogram et al., 1987; Zhou et al., 1996; de Lipthay et al., 2004; Carrigg et al., 2007; Yang et al., 2007; Ariefdjohan et al., 2010). Each of these techniques has both positive and negative aspects with regard to cell–particle separation, lysis, and DNA recovery parameters.
The objective of this study was to evaluate the effectiveness of four commercial DNA extraction kits and two polymerase chain reaction (PCR) amplification procedures on the resultant composition of bacterial community as inferred through 16S rRNA amplicon sequences. Each DNA extraction method was evaluated with respect to (1) cell lysis efficiency; (2) gDNA yield; (3) PCR amplification of isolated gDNA; and (4) reproducibility of community profiles generated by 454 pyrosequencing of 16S rRNA genes. The 454 pyrosequences for 24 samples were comparatively analyzed using taxonomy-, phylogeny-, and OTU-based approaches. The experimental findings from this work will enable other researchers to choose the best DNA extraction technology (and/or PCR amplification procedure) to accommodate their specific needs when analyzing microbial community diversity of permafrost soil and other soils from various regions of the world.
Materials and methods
Location of study site and sampling
The study site was located on an ice-wedge polygonal terrain near the McGill Arctic Research Station, Axel Heiberg Island (AHI), Nunavut, Canada (79°24′54.4″N, 90°44′35.6″W). One-meter-long cores were collected in May of 2011, when the ground was frozen, using sterile core liners and a rotary drill without any drilling fluids to avoid bacterial or chemical contamination. Drill tailing material, a homogenized mixture of permafrost and active layer soils (hereafter called AHI-PAL), was collected for this study. Sample AHI-PAL was frozen at −20 °C at the time of collection and kept frozen during transport to the University of Tennessee, Knoxville. In the laboratory, sample AHI-PAL was stored at −20 °C until processing. The pH of the sample was 5.5, the water content was c. 20% (w/w), and total organic carbon was c. 1% (w/w).
DNA extraction and quantification
The total gDNA was extracted from 0.5 to 1.0 g (wet weight) of sample AHI-PAL using four commercially available DNA extraction kits (Table 1). The following kits were evaluated (the capital letters indicates the kit designation in tables and figures):
MGN – Meta-G-Nome™ DNA Isolation Kit (Epicentre Biotechnologies, Madison, WI).
FDS – FastDNA® SPIN Kit for Soil (MP Biomedicals, Irvine, CA), followed by purification on a Qiagen column (Qiagen, Inc., Valencia, CA).
The gDNA extractions were accomplished following the manufacturer's protocols (Table 1). The gDNA concentration and quality were assessed by measuring absorbance at 260 and 280 nm using a spectrophotometer (NanoDrop2000™; Thermo Scientific Inc., West Palm Beach, FL) and Quant-iT™ PicoGreen® dsDNA Kit (Molecular Probes, Inc., Eugene, OR). For FDS, the gDNA yields were determined before and after additional DNA purification using a QIAquick® PCR Purification Kit (Qiagen Inc., Valencia, CA). The DNA extractions were carried out in triplicate for each kit, and subsequent analyses were performed on the 12 individual extracts.
Quantification of bacterial 16S rRNA gene copy number by qPCR
Bacterial 16S rRNA genes were amplified using the universal bacterial primers 1055f (5′-ATGGCTGTCGTC AGCT-3′; Ferris et al., 1996) and 1392r (5′-ACGGGCG GTGTGTAC-3′; Lane, 1991), along with the TaqMan® (Applied Biosystems Inc., Foster City, CA) probe 16STaq1115 (5′-(6-FAM)-CAACGAGCGCAACCC-(TAM RA)-3′; Harms et al., 2003). The qPCR reactions of total volume of 25 μL contained Absolute Blue™ Quantitative PCR mix (Thermo Scientific Inc., Waltham, MA), 15 pmol of each primer, 6.25 pmol of TaqMan® probe, 4–10 ng of gDNA, or dilutions of Escherichia coli DNA as a standard (from 25 to 2.5 × 106 copies of the 16S rRNA gene). The PCR program used was 3 min at 50 °C, 10 min at 95 °C, 45 cycles at 95 °C for 30 s, and 50 °C for 60 s. A control qPCR spike reaction using the 105 copies of the standard was performed for every gDNA sample to detect sample-specific inhibition. The mean values of 16S rRNA gene copies per gram of AHI-PAL soil and the standard deviations were calculated from triplicate reactions for each gDNA extract with conversion of 16S rRNA gene copies to number of cells per gram of soil being based on the assumption that the average 16S rRNA gene copy number per bacterial cell is 3.6 (Klappenbach et al., 2001).
Cell counting by fluorescent in situ hybridization
For each fluorescent in situ hybridization (FISH) probe, 0.5 g of soil (AHI-PAL) was transferred into a sterile 2-mL microcentrifuge tube with 4% (w/v) paraformaldehyde in phosphate-buffered saline (PBS) pH 7.4, thoroughly vortexed, and fixed at 4 °C overnight. The pellets were washed three times with fresh PBS and sequentially dehydrated with ethanol of increasing concentrations 50%, 80%, and 98% (v/v) for 5 min at room temperature. The sample was hybridized with 300 μL of 35% (v/v) formamide hybridization buffer pH 8.0 and an equal volume of probe working solution (50 ng μL−1 of Tris-EDTA) for 2 h at 46 °C. The following probes were used: ARC915 tagged with Alexa-555 (5′-GTGCTCCCCCGCCA ATTCCT-3′) targeting archaeal 16S rRNA genes; EUB338 tagged with Alexa-594 (5′-GCTGCCTCCCGTAGGAGT-3′) targeting bacterial 16S rRNA genes; and EUK516 tagged with Alexa-633 (5′-ACCAGACTTGCCCTCC-3′) targeting eukaryal 18S rRNA genes. Hybridized samples were washed with 300 μL of washing buffer for 30 min at 48 °C and resuspended in 500 μL of cold, sterile, double-distilled water. The resulting fluorescent signal was detected using a Leica SP2 confocal laser scanning microscope (Wetzlar, Germany). Image analysis was performed using Nikon Elements AR 3.2 software (Melville, NY) as described (Biggerstaff et al., 2006).
Genome weight estimation
The total microbial community genome weight was calculated using bacterial and eukaryotic cell counts and an average genome weight of 4.05 fg for a single bacterial cell (Ellenbroek & Cappenberg, 1991) and 13.59 fg for a single eukaryote cell, which was calculated from average genome size for 19 members of the yeast class Saccharomycetes found at the AHI study site and assuming that 1 pg of DNA represents 0.978 × 109 bp (Dolezel et al., 2003).
Lysis efficiency determination using mCherry and qPCR
An E. coli strain containing a cloned sequence of mCherry (Lagendijk, et al. 2010) was used for this purpose. The mCherry plasmids were constructed by cloning a synthetic mCherry sequence (GenScript) of 711 bp into high copy number TOPO 4.0 vector (up to 150 copies per cell; Invitrogen®; Life Technologies, Grand Island, NY). The mCherry marker confers a red phenotype to the E. coli cells that have taken up the mCherry plasmid(s), thereby allowing simple and direct cell quantification. The E. coli cells carrying mCherry plasmid were propagated in Luria–Bertani (LB) broth plus 50 μg mL−1 kanamycin until the cell density reached 900 cells μL−1 as determined using a BD Accuri™ C6 flow cytometer (Becton Dickinson, Franklin Lakes, NJ). A spike of 25 μL of mCherry E. coli, which equates to 2.25 × 104 cells, was added to 0.5 g of AHI-PAL soil before the total gDNA extraction. The qPCR assay was designed to detect the mCherry plasmid DNA using primers, mCher405f (5′-CTCCGACG GCCCCGTAATGC-3′) and mCher536r (5′-TCGTAGTG GCCGCCGTCCTT-3′) and probe mCher470Taq (5′-CC GAGGACGGCGCCCTGAAGGGCGA-3′ FamBhq1; Biosearch Technologies Inc., Novato, CA). The qPCR assay was performed using the standard protocol as described previously (Layton et al., 2006). Briefly, the Absolute Blue™ QPCR mix (Thermo Scientific, Waltham, MA) was employed, and the program begun with an initial temperature of 95 °C for 15 min (to activate the Taq polymerase) and an annealing temperature of 60 °C for 45 s. The amplification efficiency of the qPCR was determined as part of this same protocol. The amplification efficiency of qPCR reactions was calculated using a slope of the logarithmic standard curve and the qPCR efficiency calculator provided by Thermo Scientific (http://www.finnzymes.com/java_applets/qpcr_efficiency.html).
GS 454 FLX pyrosequencing
The hypervariable V3 region (c. 200 bp) of the bacterial 16S rRNA gene was amplified directly from gDNA (single PCR amplification) or re-amplified from a 16S rRNA gene fragment (nested PCR amplification). These amplification procedures were followed by 454 pyrosequencing, accordingly.
Single PCR amplification
The V3 region was amplified utilizing forward (5′-ACTCCTACGGGAGGCAGCAG-3′) and reverse (5′-TTACCG CGGCTGCTGGCAC-3′) primers, targeting positions 330–350 and 507–526 of the 16S rRNA gene E. coli ATCC 8739, respectively. Both primers contained adaptor sequences required for GS 454 FLX pyrosequencing and an additional 10-bp tag sequence as a sample identifier. All samples were sequenced in a single run. The PCRs (50 μL) consisted of 1× buffer, 0.3 μM of each primer, 0.2 mM of dNTPs, 5% final dimethyl sulfoxide (DMSO), c. 10 ng μL−1 of gDNA (as template), and 1.5 U of high fidelity Taq polymerase (FastStart High Fidelity PCR System; Roche Diagnostics, Inc., Indianapolis, IN). The gDNA was denatured (94 °C, 3 min) and then amplified (30 cycles of 94 °C, 15 s; 55 °C, 45 s; 72 °C, 1 min) with a final extension of 8 min at 72 °C.
Nested PCR amplification
Almost complete 16S rRNA gene fragments were amplified from the gDNA using universal bacterial primers 27f-YM (5′-AGAGTTTGAT(C/T)(C/A/T)TGGCTCAG-3′) and 1492r (5′-GGTTACCTTGTTACGACTT-3′), which target the positions 8–27 and 1492–1510 of the E. coli 16S rRNA gene, respectively (Lane, 1991; Frank et al., 2008). The gDNA was denatured (95 °C, 5 min) and amplified (30 cycles of 95 °C, 1 min; 55 °C, 1 min; 72 °C, 2 min) with a final extension of 8 min at 72 °C. The resulting 16S rRNA gene fragments were purified, and the V3 region was re-amplified as described above.
Positive PCR amplification was confirmed by electrophoresis on 1.5% (w/v) UltraPure™ agarose (Life Technologies) gels stained with ethidium bromide (0.5 μg mL−1). A 1-kb DNA ladder (Life Technologies) was used as a molecular weight standard. Gels were visualized by a UV transilluminator EPI-Chemi Darkroom (UVP Laboratory Products LLC, Upland, CA), and images were processed using AlphaImager v.5.5 computer software and stored as TIFF files.
The 24 V3 amplicon samples were purified using the Agencourt AMPure™ solid-phase paramagnetic bead technology (Agencourt Bioscience Corporation, Beverly, MA). The purity, concentration, and size of the PCR amplicons were estimated using DNA 1000 chips and an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., Waldbronn, Germany). Sequencing reactions were performed on a GS 454 Life Sciences Genome Sequencer FLX (Branford, CT) using Titanium chemistry (Roche Diagnostics, Inc.).
Analyses of the bacterial community structure
The 24 libraries of raw 454 sequences (c. 445 Mb) were analyzed independently using the Ribosomal Database Project (rdp) pyrosequencing pipeline (taxonomy-based approach), mg-rast (phylogeny-based approach), and mothur (OTU-based approach). Statistical analyses were performed for each pair of means using Student's t-test of program jmp pro v.9.0.
Taxonomy-based approach using rdp pyrosequencing pipeline
fasta files (raw reads) were initially processed through the rdp pyrosequencing pipeline (Cole et al., 2009). A total of 274 311 sequences were sorted by sample identifiers; the tag and 16S primer sequences were trimmed off, and low-quality sequences (using minimum average exp. quality score of 20, maximum number of N's = 0, minimum sequence length = 100) were removed resulting in 164 246 high-quality sequences. Hierarchical taxonomy was assigned to pyrosequences using the rdp naïve Bayesian rRNA Classifier version 2.5 (Wang et al., 2007) at a confidence cutoff of 50% (Claesson et al., 2009). For each sample, the abundance of bacterial phyla was expressed as the percentage of total sequences.
Library comparison and phylogeny-based visualization in mg-rast
The 164 246 sequences processed through the rdp's pyrosequencing pipeline were uploaded to mg-rast (Meyer et al., 2008) resulting in 150 388 sequences (91.6%) containing rRNA genes and 13 858 sequences (8.4%) without rRNA genes. The mg-rast website was used to compare the 24 amplicon libraries using the ‘best hit classification’ function with rdp as the annotation source, a maximum E value cutoff of 10−5, a minimum percent identity of 97%, and a minimum alignment length of 50 bp. A composite taxonomic tree was generated by combining the annotated bacterial sequences from 24 libraries (i.e. 51 702 unique sequences) using the parameters described above, followed by visualization using the ‘tree’ function. Phylogenetic data downloaded from mg-rast were imported into stamp v2.0.0 (release candidate 6; Parks & Beiko, 2010) for additional statistical analyses using anova.
Sequence analyses using mothur
The processing of the 454 pyrosequences using mothur v.1.28.0 started with the flowgram data (SFF files; Schloss et al., 2009). The sequences contained in three SFF files were sorted into groups by sequence identifiers. Identifiers and primers were removed by running trim.flows, shhh.flows, and trim.seqs. The resulting fasta, names, and groups files were merged to carry out comparative analyses across samples. The standard operating procedure (http://www.mothur.org/wiki/Schloss_SOP) was followed. Potential chimeras were identified using a chimera.uchime command and removed from further analyses. A 97% identity in the 16S rRNA gene sequence was employed to group sequences into the same operational taxonomic unit (OTU). The phylogenetic relationships were further analyzed using the Jaccard index, the Yue and Clayton measure, Unweighted and Weighted UniFrac algorithms to determine differences between replicates and DNA isolation kits. The phylogenetic relationship between microbial communities derived from different gDNA samples was visualized using Venn diagrams, tree viewer, and principal coordinates analysis (PCA) functions.
The total number of microbial cells determined by FISH and confocal laser scanning microscopy in the AHI-PAL soil was 1.1 × 109 cells g−1 of soil. Bacteria comprised the largest portion of the population at 9.7 × 108 cells g−1 of soil (89% of total cell count), followed by eukaryotes at 1.2 × 108 cells g−1 of soil (11%). Archaea were not detected by the FISH probes. As individual cells were detected optically, the ‘detection limit’ for FISH was dependent on the granularity and the opacity of the sample, as well as the number of scanned microscopic images. The total microbial community genome weight, therefore, was estimated to be 5.5 μg g−1 of soil, the genome weight of the bacterial population was 3.9 μg g−1 of soil, and the genome weight of the eukaryotes was 1.6 μg g−1 of soil.
Yield and quality of the total gDNA
The yield of the gDNA determined using the PicoGreen assay was highest for the FDS kit (2.1 μg g−1 of soil) followed by the PS and PL kits (0.9 μg g−1 of soil) and the MGN kit showing the lowest value (0.06 μg g−1 of soil). The extraction protocols also influenced gDNA purity and reliability which subsequently resulted in differences of PCR amplifications. Based upon the intensity of PCR amplification fragments visualized on agarose gel (not shown), gDNAs from the PS, PL, and MGN kits all gave consistent and reliable products, whereas no amplification was evident with the crude gDNA extract from the FDS kit. Further purification of the FDS kit-generated gDNA extracts with QIAquick® Purification Kit, however, increased in its quality and reliability but lowered the gDNA yield by > 50%. As evidenced from PCR amplification, the additional purification step was essential to improve gDNA quality; therefore, this step was incorporated as part of the standard extraction protocol for the FDS kit.
Efficiency of DNA extraction kit methods
The DNA extraction efficiency, expressed as the percentage of the total gDNA extracted to the FISH estimated total weight of nucleic acids for AHI-PAL sample (i.e. 5.5 μg g−1), ranged widely from 1% to 38%, with an average of 18 ± 16%. The gDNA recovery was the highest with the FDS kit (38 ± 2%), followed by the PS kit (16 ± 8%), the PL kit (15 ± 7%), and finally the MGN kit (1 ± 0.4%). The variance in the gDNA extraction efficiency was minimal with the FDS kit and greatest with the PS kit (Table 2).
Table 2. Cell counts, lysis efficiencies, crude, and final DNA yields for each of the DNA extraction kit
Relative Standard Deviation (RSD), %, is shown in parentheses.
The bracketed letters [a–c] describe the means with [a > b > c] to signify different means calculated for each pair using Student's test (program jmp pro v.9.0). Means not having the same letter are considered significantly different from one another.
gDNA yield after purification of crude extracts (10.18 ± 1.42 μg g−1 of soil) with Qiagen® columns.
Percentage of DNA recovery for each replicate was calculated as 100 × gDNA yield/5.53 μg g−1 of soil (i.e. estimated total gDNA weight in the soil as determined by FISH).
Bacterial cell concentration was derived from 16S rRNA copy number obtained from qRT-PCR divided by 3.6.
Percentage efficiency of bacterial cell lysis was calculated in relation to total bacterial cell count estimated using 100× qPCR copy number/9.7 × 108 cells g−1 of soil total bacterial cell count estimated by fluorescence microscopy.
Bacterial cell counts, estimated from 16S rRNA copy numbers, were highest (2.2 ± 0.3 × 108 cells g−1 of soil) for the FDS kit and lowest (7 ± 6 × 105 cells g−1 of soil) for the PL kit. Subsequently, the lysis efficiency based upon bacterial cell counts determined by fluorescence microscopy (i.e. 9.7 × 108 cells g−1 of soil), and qPCR was the highest with the FDS kit (23 ± 6%), followed in order by the PS kit (5 ± 5%), MGN kit (0.3 ± 0.2%), and PL kit (0.07 ± 0.10%).
Efficiency of gDNA recovery by seeded approach
An evaluation of gDNA recovery and bacterial cell lysis efficiency through a seeded approach with mCherry E. coli using two extraction kits that gave the best lysis efficiency for bacterial cells (Table 2), namely, the FDS kit and the PS kit, showed superior extraction efficiency with the PS kit. The efficiency of the gDNA recovery was calculated from the mCherry plasmid copy numbers isolated from the seeded AHI-PAL soil and from the E. coli biomass. The yield of genomic DNA from E. coli was lower using the FDS kit in comparison with the PS kit, suggesting that the FDS kit was less efficient in isolating DNA from low biomass cultures (e.g. the 2.3 × 104 cells per extraction) or caused more denaturation of DNA by its mechanical bead-beating aspect. The gDNA extraction efficiencies (based on the measurement of mCherry plasmid copies) for the FDS kit and PS kit were 84% and 98%, respectively.
Comparison of DNA extraction kits based on the bacterial community composition and diversity
The impact of the gDNA extraction kit procedure on soil microbial community composition and microbial diversity was investigated using a 454 pyrosequencing approach. Amplification of the V3 region of 16S rRNA gene directly from gDNA (single amplification) or re-amplification from 16S rRNA gene fragments (nested amplification) resulted in 24 libraries. All 24 libraries were sequenced in a single run using 454 GC FLX Titanium chemistry (Roche Diagnostics, Inc.) and yielded a total of 97.4 Mb of sequencing data. These libraries were derived from the same soil sample (AHI-PAL) and were expected to show identical bacterial communities; however, they showed variations in community composition when analyzed based on taxonomic assignment, phylogenetic relationship, and OTU clusters.
Taxonomy approach using rdp
The raw fasta files containing 274 311 sequences were processed through the rdp pyrosequencing pipeline yielding 164 246 (60%) high-quality sequences, with an average library size of 5300 ± 2800 and 7100 ± 3400 sequences for single and nested amplification, respectively. Based on hierarchical taxa assignment using rdp naïve Bayesian rRNA Classifier (Wang et al., 2007), sequences were assigned to 16 bacterial phyla and ‘unclassified bacteria’ (Fig. 1). In all sample libraries, five phyla (i.e. Acidobacteria, Actinobacteria, Firmicutes, Gemmatimonadetes, and Proteobacteria) were the most abundant members of the soil bacterial community, comprising 96.8–99.6% of the sequences, and Bacteroidetes, Cyanobacteria, and TM7 were detected in all gDNA extracts at low abundances (0.04–1.8%). The other seven phyla were detected in < 75% of extracts (Fig. 1). The bacterial communities obtained using the MGN kit (with single amplification) and the PL kit (with nested amplification) were different from that obtained using the FDS and the PS kits (Fig. 1). The FDS and the PS kits produced similar percentages for bacterial phyla and smaller differences among triplicates, suggesting that these kits gave more consistent, reproducible bacterial community analysis (Table 3).
Table 3. Sequence library characteristics and diversity indexes with respect to each of the DNA extraction kits
Relative standard deviation (RSD), %, is shown in parentheses.
Percentage was calculated from shared mean among triplicate OTUs and number of OTUs in each triplicate.
The equation for calculating coverage for a single sample (X) is Cx = 1 − (nx/N), where, nx is the number of OTUs, and N is the total number of individuals in the sample. A lower value for coverage indicates a higher number of OTUs or unique sequences in the population.
The Chao estimator augments the number of OTUs observed (Sobs) by a term that depends only on the observed number of singletons (a) and doubletons (b), Chao = Sobs + (a2/2b).
Reciprocal of Simpson's index; higher numbers represent greater diversity.
Annotation and sequence storage through the mg-rast web server
For these analyses, the 24 amplicon libraries separated by the rdp pipeline were uploaded and annotated. A phylogenetic analyses of sequences from the single amplification revealed the existence of 152 phylotypes at the family level and demonstrated that phylotypes were represented differently in various extractions (array of colors) with 26 (17.1%) phylotypes being present in all 12 amplicon libraries, whereas other 19 (15.8%) phylotypes were unique (one color; Supporting information, Fig. S1).
Additional statistical analysis using stamp of the single amplification phylogenetic profiles showed significant (P < 0.05) differences in relative abundance of 22 families between MGN and all other (FDS, PS and PL) kits (Fig. 2). The MGN kit resulted in higher relative proportions of β-Proteobacteria, for example Burkholderiaceae (P = 0.018); γ-Proteobacteria, for example Chromatiaceae (P = 0.022); and lower proportions of seven families Actinobacteria (P ≤ 0.04); two families Acidobacteria (P < 0.002); and α-Proteobacteria, for example Methylocystaceae (P < 0.0006). Comparison of FDS, PS, and PL phylogenetic profiles did not show any significant (P > 0.05) differences.
OTU-based approach using mothur
The removal of PyroNoise-affected sequences and trimming resulted in a total of 212 320 sequences, 88 214 of which were unique. After alignment and additional trimming of sequences to the same overlap region, the number of sequences decreased to 200 236 of which 62 223 were unique. The removal of chimeras further decreased number of sequences to 186 167 with 34 512 unique sequences. Two libraries derived from the PL kit (with nested amplification) contained c. 1.5% of Escherichia/Shigella sequences in comparison with other libraries; these sequences were assumed to be contaminants and were deleted from further analyses. Analysis of variance (t-test) assessed differences in the number of sequences and the number of OTUs between the DNA extraction kits and/or the amplification methods. The number of sequences obtained for each amplicon library indicated that there were significant differences between the single and nested amplification (P = 0.01). The number of sequences obtained in the amplicon libraries was higher for nested amplification (9416 ± 3153) than single amplification (5955 ± 3105), the variability possibly reflecting differences in the titration of the libraries for the 454 pyrosequencing. In contrast, there were no significant differences between the number of OTUs identified in the amplicon libraries obtained from single and nested amplification (P > 0.1). Analysis of means revealed that only amplicon libraries from the MGN kit (nested amplification) had a significantly lower average than the overall average of 1149 OTUs (P < 0.04). The other seven averages were within the 5% risk decision limits, indicating that their performance can be assumed to be similar (Fig. 3b). When the diversity parameters (i.e. Chao and 1/Simpson) for microbial communities obtained from DNA extraction kits were compared, there were no significant differences between FDS, PS, and PL kits (P > 0.1), but the MGN kit resulted in less diverse community (P < 0.01). With respect to single versus nested amplification, there were no significant differences (P > 0.05) among the kits for these same diversity parameters. These results suggested that the type of DNA extraction performed can impact the community diversity analysis profile and that the MGN kit appears to recover fewer types of bacteria desorbed from the soil particles by the washing procedure (Table 3 and Fig. 3).
As shown in the Venn diagrams (Fig. 3a), the number of unique and shared OTUs varied between replicates and combinations of DNA isolation kit and amplification approach. However, differences between the percentages of shared OTUs (Table 3) were not statistically significant (P > 0.1) for these four kits. Comparison of samples based on the number of OTUs in each combination of DNA isolation kit and amplification approach using the Jaccard index resulted in the similarity between two communities in the range of 6–23% (Fig. 4a). The similarity between replicates increased when the total number of OTUs and the relative abundance of each OTU in each community were included in the calculation (Fig. 4b). OTUs obtained after single amplification of FDS, PS, and PL extracts formed separate clusters from nested amplification (except replicate PSN-1) clusters with 53% similarity. Both single (except replicate MGNN-3) and nested amplification of MGN extracts clustered together and showed 43% similarity to other kits (Fig. 4b). Sequences from triplicate soil extracts obtained with the FDS kit, followed by both single and nested amplifications exhibited high reproducibility with difference < 9%.
The unweighted and weighted UniFrac followed by principal coordinate analysis were used to explore further the degrees of similarity between sequences from the different combinations of DNA isolation kits and amplification approaches. Percentage of variance explained by the first two coordinates was 18.3% for unweighted (Fig. 4c) and 67.7% for weighted (Fig. 4d) UniFrac. In both analyses, the axis 1 accounted for 0.67 part of the total variation and correlated with the isolation kit, with the exception of one triplicate from the PS kit (nested amplification). Axis 2 accounted for rest of the variation and correlated with the amplification approach (single or nested). Comparison of unweighted and weighted UniFrac results showed that the same types of organisms were amplified when using different extraction kits, but that the relative abundance of these organisms differed from one kit to another. Triplicates from the FDS, PS, and PL kits all were generally situated close to each other, whereas the MGN kit generated a substantially different microbial community profile based upon these analyses.
Comparison of performance and cost-effectiveness among DNA extraction kits
Based on performance criteria listed in Table 4, the DNA extraction kits were rated with FastDNA® SPIN Kit for soil being the first, following by PowerSoil® Kit, PowerLyzer™ Kit, and finally Meta-G-Nome™ DNA Isolation Kit. Given the differences in the cost of the extraction kits, it is worth noting that the total cost per sample preparation favors both the PS and PL kits ($4.58) compared with the FDS kit ($6.85) and the MGN kit ($13.00; Table 1). This can be of importance to researchers when choosing an extraction kit that minimizes sample cost, while simultaneously attempting to maximize sample numbers for statistical robustness purposes.
Table 4. Comparative ratings of the DNA extraction kits based on several performance criteria
DNA Extraction Kit Trade Name
FastDNA® SPIN Kit for Soil
Meta-G-Nome™ DNA Isolation Kit
The ratings of each kit was based on scale from 1 to 4 where 1 is the best and 4 is the worst.
16S rRNA Copy Number
Variation Between Replicates
Variation Between Single and Nested Amplifications
Number of Sequences
Number of OTUs
Number of Shared OTUs
Permafrost has been characterized as being a matrix in which the microbial community is difficult to analyze. Previous studies (Vishnivetskaya et al., 2006; Yergeau et al., 2010) have described low DNA yields from subsurface permafrost sediments, in spite of having bacterial population sizes comparable with other upper soil horizons. In this study, the efficiency of cell lysis, DNA yield, and microbial community composition (via pyrosequencing of 16S rRNA genes) was evaluated to determine an extraction procedure to effectively detect the representative community from permafrost and permafrost-affected soils. A recent review by Hazen et al. (2013) has summarized the limitations and biases associated with sample collection, processing, nucleic acid extraction, PCR amplification, sequencing, and microarray analyses in microbial ecology studies. In this study, the same soil sample (AHI-PAL) was used throughout the work to minimize biases that could result from sample collection and processing. From the three variables examined, namely nucleic acid extraction, PCR amplification, and sequencing, the largest differences in cell concentration and microbial diversity measurements were attributable to the DNA extraction kit.
Our study determined that the MGN kit differed the most from the other kits, giving the lowest values for the total gDNA yield, number of bacterial cells (as estimated by qPCR), number of OTUs, and number of shared OTUs. The microbial community profile obtained from the MGN kit also differed from the microbial community profiles of other kits. Therefore, the MGN kit, which is based on physical separation of microorganisms from soil particles, followed by cell lysis and DNA recovery, did not adequately detect representative characteristics of bacterial communities from permafrost soil. The kits that use a bead-beating extraction technique gave higher yield of crude gDNA. The substantial difference was that gDNA extracts from PS and PL kits could be used in PCR amplifications immediately, but gDNA from FDS kit needed additional purification. In previous work on the optimization of qPCR using gDNA extracts obtained with the FDS kit showed that crude gDNA had to be diluted to be suitable for PCR amplification (Harms et al., 2003). In this study, both the PS and PL kit extracts showed robust PCR amplification without inhibition, whereas qPCR inhibition was noted in the undiluted FDS samples. In comparison with the other kits, the yield of gDNA obtained by the FDS kit had the lowest variation among triplicates and consequently the FDS kit had the best overall lysis efficiency.
The 16S rRNA gene sequences obtained by 454 pyrosequencing were analyzed using three different techniques (i.e. rdp pyrosequencing, mg-rast, and mothur) with the goal of discerning similarities and dissimilarities in the microbial community structure resulting from the use of different kits (same amplification procedure) or between amplification procedures (same DNA extraction kit). The high reproducibility and reliability of 454 pyrosequencing in recovery of complex microbial communities was recently demonstrated (Pilloni et al., 2012). The rdp pyrosequencing pipeline, while easy to use, provides limited statistical analyses. Using the rdp taxonomy approach, the highest variations in percentages of different bacterial phyla were obtained with the MGN kit (with single amplification) and the PL kit (with nested amplification). The mg-rast pipeline was used to uncover phylogenetic relationships. However, comparisons can be skewed because unidentified species (which represent ≥ 20% sequences) are not considered by this technique. Therefore, mothur was also used to compare sequences (clustered as OTUs at the 97% similarity level) and to analyze community composition using diversity indexes. The microbial community structure identified in the DNA extracts from the MGN kit differed significantly (P < 0.05) from that of the other extraction kits by having lower numbers of OTUs. Interestingly, although the amount of DNA obtained from the FDS, PS, and PL kits varied considerably, there was no significant difference between the number of OTUs (P > 0.1) and shared OTUs (P > 0.1). The percentage of shared OTUs between three replicates ranged from 14.2% to 25.5%, which is in agreement with a recent study (Zhou et al., 2011), showing an average OTU overlap of 26.6% for two replicates and 13.3% for three replicates. As the presence/absence of OTUs may not reflect the real similarities in community composition and structure (Lemos et al., 2012), the bacterial communities from both single and nested amplifications were compared based upon phylogenetic assessment. This resulted in 11.2% of phylotypes shared among all samples; 11.2% of phylotypes being unique to one sample; and 77.6% of phylotypes present in varying number of samples. The irregularity in relative abundance of different phylotypes may be attributed to several types of biases including incomplete extraction of microbial DNA from soil (Feinstein et al., 2009), bacterial genome size and number of 16S rRNA genes in the genome (Farrelly et al., 1995), and biases associated with PCRs (Pinto & Raskin, 2012).
The unweighted UniFrac analysis, which does not account for differences in relative abundances, indicated that there were no significant differences among the phylogenetic lineages obtained from different DNA extracts. In contrast, the weighted UniFrac, which added the relative abundance to the phylogenetic information, was more informative and detected differences in microbial community composition derived from the MGN DNA extraction kit. PCA built upon weighted UniFrac indicated that microbial communities identified in the AHI-PAL soil extracts from the FDS, PS, and PL kits clustered together based on single or nested amplification and separately from the MGN kit.
The statistically significant differences in microbial community compositions between the MGN and other kits are due to the higher relative proportions of easily lysed β- and γ-Proteobacteria, such as Burkholderiacea and Chromatiaceae, and lower proportions of Actinobacteria, which have rigid cell walls. Other taxa showing reduced proportions in the MGN extracts included Acidobacteria (Solibacteraceae, Acidobacteriaceaea), δ-Proteobacteria (Polyangiaceae, Desulfuromonadales, Myxococcaceae), and α-Proteobacteria (Methylocystaceae). In general, the Actinobacteria, Acidobacteria, and Proteobacteria are dominant phyla found in active layer and permafrost communities around the world (Zhou et al., 1997; Vishnivetskaya et al., 2006; Gilichinsky et al., 2007, 2008; Steven et al., 2007; Wilhelm et al., 2011) and play major roles in decomposition or organic matter and nutrient cycling in polar ecosystems with limited trophic complexity (Yergeau et al., 2009). More specifically in the Canadian High Arctic, the α-Proteobacteria Methylocystaceae family may play a critical role in methyl and methane oxidation in the active layer (Martineau et al., 2010). Thus, interpretations derived from MGN profiles would indicate diminished carbon cycling capability and methane oxidation potential compared with interpretation derived from FDS, PS, or PL profiles. Importantly, a lower methane oxidation potential would suggest a greater global warming potential from permafrost thawing.
In summary, the choice of DNA extraction kit did influence the efficiency of DNA extraction, microbial cell quantification, and resultant interpretation of the microbial community composition. This is in agreement with other studies that have shown that the DNA extraction protocol influences the abundance and phylogenetic relation of the indigenous soil bacteria as detected by DGGE analysis (de Lipthay et al., 2004), and by ribosomal intergenic spacer analysis (Martin-Laurent et al., 2001). For this permafrost/active layer soil, we conclude that both the FDS and PS kits provide similar microbial community structure results with minimal variation among triplicates (best precision). The gDNA yield obtained from the FDS kit, however, was higher than for the PS kit, which may be important for cell quantification using qPCR. Differences in microbial community were seen using either rdp or mothur between bead-beating lysis (FDS, PS, PL kits) and lysis after the physical separation of microbial cells from soil particles (MGN kit). Overall, the findings of this study provide a holistic assessment of both the benefits and pitfalls of DNA extraction kits and give guidance to other investigators in the choice of technology best-suited for their application, taking into account cell lysis efficacy, gDNA recovery, microbial diversity, processing time, and cost-effectiveness.
This research was funded by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Genomic Science Program (DE-SC0004902). Oak Ridge National Laboratory (ORNL) is managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR. The work of undergraduate students (J.R.M. and A.W.R.) was supported by the University of Tennessee, Department of Undergraduate Research.