Quantifying microbial communities with 454 pyrosequencing: does read abundance count?


Anthony S. Amend, Fax: (510) 642 4995; E-mail: a.amend@berkeley.edu


Pyrosequencing technologies have revolutionized how we describe and compare complex microbial communities. In 454 pyrosequencing data sets, the abundance of reads pertaining to taxa or phylotypes is commonly interpreted as a measure of genic or taxon abundance, useful for quantitative comparisons of community similarity. Potentially systematic biases inherent in sample processing, amplification and sequencing, however, may alter read abundance and reduce the utility of quantitative metrics. Here, we examine the relationship between read abundance and biological abundance in a sample of house dust spiked with known quantities and identities of fungi along a dilution gradient. Our results show one order of magnitude differences in read abundance among species. Precision of quantification within species along the dilution gradient varied from R2 of 0.96–0.54. Read-quality based processing stringency profoundly affected the abundance of one species containing long homopolymers in a read orientation-biased manner. Order-level composition of background environmental fungal communities determined from pyrosequencing data was comparable with that derived from cloning and Sanger sequencing and was not biased by read orientation. We conclude that read abundance is approximately quantitative within species, but between-species comparisons can be biased by innate sequence structure. Our results showed a trade off between sequence quality stringency and quantification. Careful consideration of sequence processing methods and community analyses are warranted when testing hypotheses using read abundance data.