Rapid quantification and taxonomic classification of environmental DNA from both prokaryotic and eukaryotic origins using a microarray

Authors

  • Todd Z. DeSantis,

    1. Center for Environmental Biotechnology, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 70A-3317, Berkeley, CA 94720, USA
    Search for more papers by this author
  • Carol E. Stone,

    1. Defence Science and Technology Laboratory, Porton Down, Salisbury, Wiltshire, SP4 OJQ, UK
    Search for more papers by this author
  • Sonya R. Murray,

    1. Center for Environmental Biotechnology, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 70A-3317, Berkeley, CA 94720, USA
    Search for more papers by this author
  • Jordan P. Moberg,

    1. Center for Environmental Biotechnology, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 70A-3317, Berkeley, CA 94720, USA
    Search for more papers by this author
  • Gary L. Andersen

    Corresponding author
    1. Center for Environmental Biotechnology, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 70A-3317, Berkeley, CA 94720, USA
    Search for more papers by this author

  • Edited by A. Oren

*Corresponding author. Tel.: +1 510 495 2795; fax: +1 510 486 7152, E-mail address: glandersen@lbl.gov

Abstract

A microarray has been designed using 62,358 probes matched to both prokaryotic and eukaryotic small-subunit ribosomal RNA genes. The array categorized environmental DNA to specific phylogenetic clusters in under 9 h. To a background of DNA generated from natural outdoor aerosols, known quantities of rRNA gene copies from distinct organisms were added producing corresponding hybridization intensity scores that correlated well with their concentrations (r= 0.917). Reproducible differences in microbial community composition were observed by altering the genomic DNA extraction method. Notably, gentle extractions produced peak intensities for Mycoplasmatales and Burkholderiales, whereas a vigorous disruption produced peak intensities for Vibrionales, Clostridiales, and Bacillales.

1Introduction

Interest in methods for accurate quantification and identification of biodiversity in complex samples has increased markedly, particularly in the areas of environmental, clinical and food microbiology [1–3]. Analysis of nucleic acids isolated from such matrices has revealed a broader diversity of organisms compared to more traditional culture-based techniques [4]. Molecular approaches aimed at broad phylogenetic detection rely upon classifying heterogeneous nucleic acids into groups of similar sequences. Popular methods apply a polymerase chain reaction (PCR) to rRNA genes or reverse transcription (RT)-PCR to rRNA. In either case, the resulting mixed amplicons can be quickly, but coarsely, typed into anonymous groups by restriction fragment length polymorphisms (RFLP) [5], single-strand conformation polymorphisms (SSCP), or temperature/denaturant gradient gel electrophoresis (T/DGGE) [6]. Subsequent sequencing allows both application of taxonomic nomenclature to the groups and an estimation of the relative abundance of the post-PCR nucleic acid types [7,8] but requires additional labor. Before sequencing, the individual DNA types must be physically isolated either by cloning or by multiple gel extractions. The ideal method would provide rapid taxonomic classification as well as nucleic acid quantification. To achieve this goal, a high-density photolithography microarray displaying 62,358 oligonucleotides complementary to diverse rRNA sequences was manufactured [9].

In this study, we demonstrated the ability of the small-subunit (SSU) microarray to quantify known concentrations of mixed DNA amplicon types without physically separating the components. Quantitative array interpretation was validated by documenting the correlation between hybridization intensities and independently measured DNA concentrations. Lastly, the effect of DNA extraction method upon the resulting PCR amplicon pool was investigated as a useful example of a perceived community shift detectable by the SSU microarray.

2Materials and methods

2.1Microarray description

The SSU Chip was originally designed to simultaneously categorize broad groups of organisms in complex environmental samples by targeting multiple probes to identify phylogenetic clusters containing related rRNA gene sequences [9]. Briefly, Escherichia coli SSU rRNA positions 1409 to 1491, close the 3′ end of the gene, were obtained from 3286 SSU rRNA sequences of both prokaryotic and eukaryotic origins [10]. The region used is bounded on both ends by universally conserved segments that can be used as PCR priming sites to amplify any SSU rRNA gene [11,12]. Of the possible overlapping 20-mers composing each sequence, approximately half were complemented and synthesized on the array surface as perfectly matching (PM) probes. Paired with each PM probe was a control probe (mismatch, MM) in which the oligonucleotide was identical to the PM except that the 11th nucleotide consisted of a base uncomplimentary to the targeted sequence. Thus, the MM served as a control for nonspecific hybridization. A PM and its corresponding MM is termed a “probe pair”. Oligonucleotides were synthesized by a photolithographic method at Affymetrix Inc. (Santa Clara, CA, USA) directly onto a glass surface.

Updated prokaryotic and eukaryotic sequences were obtained from the Prokaryotic Multiple Sequence Alignment [13] and RDP v8.1 [14], respectively. The resulting set of 8903 non-redundant, sequences was considered the putative amplicons that could be generated by the two primers and ranged in length from 69 bp (salt marsh clone AF286040) to 181 bp (Thermoanaerobacter acetoethylicus X69336).

A clustering of the sequences was sought to enable each sequence of the cluster to be complementary to a set of PM probes. Putative amplicons were placed in the same cluster as a result of common 16-mers. The resulting 1973 clusters (http://greengenes.llnl.gov/16S_cgi/download/G1/SSU_SeqDescByOTU.txt) were considered as operational taxonomic units (OTU) representing all 91 known prokaryotic orders and 14 of 30 known eukaryotic phyla (or equivalent high ranking taxa). Each OTU was assigned a set of at least eight 20-mer probes perfectly matching (PM) the sequences within the OTU. Presumed cross-hybridizing probes were those 20-mers that contained a central 16-mer matching sequences in more than one OTU [15]. As each PM probe was chosen, it was paired with a control 20-mer (mismatching, MM), identical in all positions except the eleventh base. The MM probe did not contain an internal 16-mer complimentary to sequences in any OTU. The mean G + C content of the PM probes in a probe set was 59.7% with a standard deviation of 8.2%. The probe selection method was validated against previously published data [9] to confirm that the array could differentiate prokaryotic orders and eukaryotic phyla.

2.2Aerosol collection

Six-hour time samples of airborne particles were collected onto borosilicate filters (Pall Gelman Laboratory, Ann Arbor, MI, USA) using an Andersen Hi-Vol air sampler in Southern England from 1995 to 1996. Circular cores 18 mm in diameter were removed aseptically using a flamed cork borer.

2.3DNA Extraction

Filter cores were homogenized in bead tubes according to a published method [16] using a Fast Prep 120 agitator (Qbiogene, Carlsbad, CA, USA). Tubes were subjected to bead beating for 0, 5, 20, 45, or 450 s. DNA from the mixture was isolated using a spin column (MoBio Laboratories, Solana Beach, CA, USA) according to manufacturer instructions. Specific reagent compositions are available in the Supplementary Documentation.

2.4Target preparation, hybridization, scanning

Partial length 16S rRNA gene amplicons were generated by PCR from the DNA extract according to previously described conditions [9]. Universal primers CcompLong (TTGTACACACCGCCCGTCA, E. coli positions 1390 to 1408) and PC5B (TACCTTGTTACGACTT, E. coli positions 1507 to 1492) [11] putatively amplified all varieties of SSU rRNA genes. Typically 1012 amplicon molecules were spiked with known concentrations of synthetic SSU amplicons according to Table 2. The mixture was partially digested and biotin labeled as previously described [9]. Hybridization, washing, staining and scanning were performed according to the conditions detailed in the prior study [9]. Altogether, less than 9 h was required.

Table 2.  Quantification of synthetic SSU rRNA gene segments by A260 and by hybridization intensity
DNA Sequence typeDNA PoolaConc. by A260 (pM)bHybScore (a.u.)c
  1. aEach DNA pool contained one aliquot of each internal standard added to 18 μL of products from 30 cycle universal SSU PCR upon genomic DNA extracted from air samples.

  2. bPre-mixture rRNA gene segment concentration derived by A260 expressed in picomolar (pM) units as the final concentration of internal standard in the hybridization solution.

  3. cHybridization intensities (HybScore) are in arbitrary units (a.u.).

Mycoplasma neurolyticum1857621
 21648763
 36889
 4151285
 5364948
    
Oenococcus oeni14137
 210304
 3242610
 4574905
 51117680
    
Saprospira grandis1451622
 21084368
 32095229
 47513
 5191051
    
Fervidobacterium nodosum112686
 2293563
 3684972
 41325052
 55389
    
Caulobacter vibrioides12018265
 27747
 3181262
 4444800
 51046516

2.5Probe set scoring

Probe pairs scored as positive were those producing distinctly higher fluorescence intensity from the perfectly matched probe (PM) than the mismatched control (MM). The thresholds for defining a distinctive difference are available in the Supplementary Documentation. An OTU was considered present in the sample when all of its assigned probe pairs were positive. A hybridization score (HybScore) was calculated in arbitrary units (a.u.) for each probe set as the average of the PM minus MM intensity differences across the probe pairs in a given probe set. When phylogenetically summarizing chip results to the order or phyla, the probe set producing the highest HybScore of the order/phyla was used.

2.6Selection of rRNA gene segments for internal standards

SSU rRNA gene segments used as internal standards were chosen due to their putative non-presence in environmental samples. Data (not shown) from 45 previous environmental sample analyses was screened to identify probe sets that did not produce strong hybridization signals. All probe sets which consistently produced HybScores in the bottom 1% were recorded. The five sequences selected from diverse genera are listed in Table 1 and were synthesized by Sigma–Genosys (St. Louis, MO, USA).

Table 1.  Synthetic SSU rRNA gene segments used for internal standards.
DescriptionAccession numberSynthesized Sequence (5′ to 3′)% G + C sequence% G + C probes
Mycoplasma neurolyticumM23944TTGTACACACCGCCCGTCACCCATGGG50.045.4
  AGTTGGTAATACCCGAAGATGGTTAGT  
  TAACCTCGGAGGCAACTATCTAAGGTA  
  GGACTGAGACGGGAAGACGTAACAAG  
  GTA  
     
Oenococcus oeniAB054808TTGTACACACCGCCCGTCACATGGGAG50.950.8
  TCGGAAGTACCCAAAGTCGCTTGGCTA  
  ACTTTTGAGGCCGGTGCCTAAGGTAAA  
  ATCGATGACTGGGAAGACGTAACAAG  
  GTA  
     
Saprospira grandisM58795TTGTACACACCGCCCGTCAAGTCATGG51.850.9
  GAGTCGTGCCTGAAGATGGTGACCTTA  
  CCAGGAGCTATCTAGGGTAAACCTGGT  
  GACAGGCACAAGGAAGACGTAACAAG  
  GTA  
     
Fervidobacterium nodosumM59177TTGTACACACCGCCCGTCACGCCACCC58.253.8
  GAGTTGCGGGCACCCGAAGACGGTAA  
  TGCTTAGGCATACCGTTGAGGGAACGT  
  GGTGAGGGGGACGTAAGACGTAACAA  
  GGTA  
     
Caulobacter vibrioidesX03428TACACACCGCCCGTCACAGTTGGACTT56.458.5
  TACCCGAGCGCTAGCTCCTAACCTGCT  
  AAGGGGGCTAGGCTGATTCTGGTAGGG  
  CCGACTGACTGGGAAGACGTAACAAG  
  GTA  

2.7Latin Square

Environmental SSU amplicons derived from a mix of 20 aerosol filter samples (DNA extracted using 45-s method) were equally divided into 6 aliquots of 18 μL each. Five dilutions of each synthetic oligonucleotide were quantified using A260 as measured with a Duo 640 (Beckman Coulter, Fullerton, CA) and were added to the environmental SSU amplicons according to Table 2. To distinguish any sequence-specific effects, the concentrations of the internal standards were rotated for each array using a Latin Square format [17]. The target mixtures were fragmented, labeled and hybridized. The intensity data was normalized for array-to-array comparison to create an average HybScore of 2500 a.u.

2.8Simulated community shift

Six cores were removed by a sterile 18-mm diameter cork borer from one filter containing a 6-h air sample from 27 October 1995 at the Lizard Peninsula, mainland England's most southern point. One core was washed with 2 mL sterile water for fungal spore identification using a light microscope (magnification, 400×). DNA was extracted from five cores using various durations of bead beating (see above). Three separate PCR reactions from each DNA preparation were pooled. Each pool was split for four hybridization replicates. As a negative control, replicate PCR was performed without aerosol DNA and products were applied to the array. A phylogenetic order was considered present if at least one probe set corresponding to the order was scored as present. None of the phylogenetic orders listed in Fig. 3 contained any probe sets scored as present among the four negative controls. HybScores were log2 transformed and were used for clustering by array and by OTU using CLUSTFAVOR v6.0 [18] via the UPGMA method and the correlation distance function. The same transformed HybScores were compared with a one-way ANOVA to determine the significance of observed changes.

Figure 3.

Heat map display of hybridization intensities from probes corresponding to 14 orders of prokaryotes and 3 phyla of eukaryotes detected in an aerosol sample using the microarray. Each column in the image represents 1 of 4 replicates for each of 6 conditions. Each row displays the log2 HybScore for a single order/phyla across 24 array experiments. Color gradations are from green to black to red, denoting low to moderate to high intensity, respectively. Horizontal dendogram was created by clustering rows by correlation distance with branch lengths corresponding to the joining step. For each row, order and class are listed unless otherwise specified. The bead-beading durations corresponding to the peak HybScore for each row are in the right-most column. Peaks not supported as significant by ANOVA (p > 0.0001) are labeled as “n.s.”.

3Results and discussion

3.1Simultaneous quantitative response to diverse rRNA gene sequences

We hypothesized that increases in the concentration of OTU-specific DNA sequences would correlate with greater hybridization intensity from that OTU's probe set. Various concentrations of distinct rRNA gene types were hybridized to the array in rotating combinations using a 5 by 5 Latin Square design [17]. The Latin Square was useful for reducing any confounding effect caused by one OTU-specific DNA concentration upon another in the same experiment. Briefly, the Latin Square table used for assessing the quantitative response contained five columns, each representing a single hybridization experiment, and five rows representing the variable concentration of an individual standard across the five experiments. The table was arranged so that no identical columns and no identical rows existed. Specifically, each of five distinct DNA standards was tested once in each concentration category (in picomolar, 4–7, 10–19, 24–45, 57–108, 111–209) concurrent with testing the other standards occupying the remaining concentration categories within the same hybridization solution (Table 2). The Latin Square approach has been successfully applied to oligonucleotide arrays used in quantitative mRNA profiling by adding known concentrations of anti-sense transcripts to a complex cRNA background to establish the relationship between concentration and hybridization intensity [19]. Here, each DNA sequence was individually quantified by UV absorbance then pooled in defined combinations according to Table 2. Each DNA pool in the Latin Square experiment also contained a heterogeneous pool of aerosol rRNA gene PCR amplicons from four locations in England.

Table 2 details the experimental setup and the resulting HybScores for the probe sets corresponding to the internal standards. HybScores were plotted against their measured concentrations using log2–log2 scale in Fig. 1. Considering a sample size of 25 (5 internal standards, 5 concentrations each) with 23 degrees of freedom, the correlation coefficient of 0.917 exceeds the 99% (α= 0.01) confidence level. Thus, it can be concluded that a strong linear relationship existed between the concentration of a spiked-in DNA type and its hybridization intensity. It has been reported that microbial genes can be quantified using a microarray [20,21], but here we report, for the first time to our knowledge, a quantitative method using an environmental background where the multiple analytes are from diverse phylogenetic classes and are measured simultaneously.

Figure 1.

Log2–log2 plot of the hybridization intensities of probe sets complementary to internal standard DNA against their concentrations as determined independently by A260. Concentrations expressed in picomolar (pM) units and HybScores are in arbitrary units (a.u.). Color-coded circles indicate internal standard DNA used to produce the linear regression shown with a solid line. The correlation coefficient (r) was 0.917 exceeding the 99% (α= 0.01) confidence level. The G + C content of each sequence and its corresponding probes are shown.

Although it would be difficult to test all probe sets on the SSU microarray for their concentration-versus-intensity relationship, the linear results from the five distinctive spike-ins, with G + C contents ranging from 48.4% to 58.5%, suggests that concentrations of multiple environmental ribosomal nucleic acid types may be simultaneously differentiated. This allows the microarray to measure population shifts in complex environments by comparing the relative fluorescent intensities from identical probes among different experiments, especially when the hybridization scores are not dependent on a sole probe but on an average from a probe set [22]. Even when a particular probe set's attributes are suspected to vary from those of the DNA standards, a change in the HybScore from experiment to experiment indicates a change in relative abundance of the amplicon concentration in the sample. As an example, a high G + C probe set, outside of the range of those tested in this experiment, may give a higher intensity reading than a lower G + C probe set at an identical concentration, but the linear response of the individual probes within the probe set will identify changes in the specific sequence concentration when other samples are tested.

When attempting to quantify environmental amplicons using both cloning-and-sequencing and the SSU microarray, we had previously reported a lack of correlation between the numbers of clones from a phylogenetic taxon and the corresponding hybridization score [9]. Junca and Pieper [23] have documented an analogous discrepancy when sampling environmental PCR products by cloning versus SSCP. It is possible that the cloning process itself is limited due to non-random selection from a heterogeneous pool when amplicons are non-uniform in length [24] or form variable secondary structures [25] or by the large amount of sampling required to identify clones from amplicons of low abundance [26]. Conditions exist which could confound accurate concentration determination by the microarray, such as non-uniform probe melting temperatures [27] or secondary structures [28], but these conditions do not appear to be problematic when using sets of probes, such as the five sets tested.

3.2Simulated community shift

It has previously been demonstrated that genomic DNA extraction technique can impact perceived community composition [16,29]. This understanding was exploited to simulate a shift in community structure using environmental material collected by an air sampler. From microscopic fungal identification, 135 ascospores per cubic meter with morphology resembling Cladosporium (phylum Ascomycota) were found in a UK sample from the Lizard Peninsula. Because ascospores are generally recalcitrant to DNA extraction, their presence in the sample was considered advantageous for testing whether increasing the severity of the extraction protocol would alter the SSU microarray detection of the corresponding amplicons. Environmental DNA was extracted from replicate cores of a single filter by bead-beating for 0, 5, 20, 45 or 450 s to observe differences in the resulting rRNA gene PCR amplicon pool. It was hypothesized that the differences among the amplicon pools could be reproducibly detected using the SSU microarray. Replicate hybridizations of each PCR pool to the SSU microarray allowed comparison of the amplicon diversity generated from each method. As expected, UPGMA statistical analysis of the array results clustered replicate experiments (Fig. 2), providing evidence that the SSU microarray measurements are reproducible. Fig. 3 lists the 14 prokaryotic orders and three eukaryotic phyla detected in at least one of the five experimental conditions. These 17 taxa were clustered according to the correlation of their HybScores across 24 microarrays (20 experimental, 4 negative controls). The largest cluster, comprising Vibrionales, Flavobacteriales, Clostridiales, Rhizobiales, Lactobacillales, Bacillales, Acholeplasmatales, Deferribacterales, Ascomycota and Basidiomycota, was characterized by peak intensities in the 45 or 450-s amplicon community. Mycoplasmatales, Burkholderiales, Sphingomonadales, Sphingobacteriales, and Phaeophyceae (brown algae), all produced their largest HybScores in the 0 or 5 s treatment. In most orders, a single peak was found. For instance, Mycoplasmatales DNA was in the highest concentration after 5 s of bead-beating and steadily decreased with increasing bead-beating duration. Using ANOVA, 15 intensity patterns displayed hybridization scores that varied significantly for at least one of the conditions (p < 0.0001).

Figure 2.

Dendogram produced from cluster analysis of 24 (6 conditions by 4 replicates) SSU microarray experiments. Each experiment is labeled by duration of bead-beating and replicate number. Input to clustering was log2 HybScores.

These observations indicate that the SSU microarray has the ability to rapidly detect alterations in environmental PCR products and that laboratory methodology can influence perceived community structure. The results presented in Fig. 3 can aid in developing extraction protocols for future work when considered collectively with other factors such as the homogeneity of distribution of microorganisms within a sample and the desired post-extraction DNA fragment size.

When compared to the clone library approach, the SSU microarray has the potential advantage of increased sensitivity. Typical clone libraries which sample only hundreds of amplicons may overlook the low abundance taxa whereas the entire PCR products are exposed to the probes on the microarray allowing detection of amplicons which are orders of magnitude less abundant than the dominant ones.

In theory, the contents of the hybridization solution could be from a direct genomic DNA or rRNA isolation from the environment [30] however, amplicons were used here due to the limited aerosol biomass collected. It is recognized that rRNA gene copy number, and preferential PCR amplification can lead to erroneous conclusions concerning the in situ community structure [31–33]. Nevertheless, the responsiveness of the SSU microarray to multiple changes in analyte concentrations within an environmental background demonstrated a necessary advance toward the goal of high-throughput ecological monitoring.

4Conclusions

Although investigation of microbial diversity based on nucleic acids is not free from bias, the SSU microarray can aid the researcher in observing types and amounts of DNA corresponding to small-subunit ribosomal genes within a sample more rapidly than the serial cloning approach. As demonstrated here, even something as simple as altering a DNA extraction protocol may have a profound effect on microbial diversity measurements. An advantage of the SSU microarray is that multiple parameters can be reproducibly and statistically assayed in measuring population shifts within a microbial community.

Using a Latin Square, the known concentrations of five divergent rRNA gene types were differentiated within a complex background of environmental PCR products using the SSU microarray. A strong relationship (r= 0.917) was found between the concentration of the internal standards and the hybridization scores they produced. It was concluded that the SSU microarray was able to produce a quantitative response to the five DNA standards applied, inferring that environmental analytes can be likewise quantified providing the probe sets have similar properties. This SSU microarray methodology is the first to our knowledge to produce a simultaneous quantitative assessment of multiple rRNA gene types from diverse phylogenetic orders in the presence of an environmental background.

As an illustration of the utility of the SSU microarray, PCR products from simulated community shifts were directly compared using hybridization scores from 21 taxa. Reproducible differences were found between the PCR products, which can facilitate protocol development when targeting specific taxa in future studies.

Acknowledgments

The authors are grateful to Dr. Eoin Brodie for manuscript consultations. This work was performed under the auspices of the US Department of Energy by the University of California, Lawrence Berkeley National Laboratory under Contract No. DE-AC03-76SF00098 and was funded in part by the Chemical and Biological National Security Program NN-22 for the Department of Energy and Hazardous Materials Response Unit of the Federal Bureau of Investigation.

Appendix A Supplementary data

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.femsle.2005.03.016.

Ancillary