Next-generation sequencing for molecular ecology: a caveat regarding pooled samples

Authors

  • Eric C. Anderson,

    Corresponding author
    1. Fisheries Ecology Division, Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, Santa Cruz, CA, USA
    2. Department of Applied Math and Statistics (SOE2), University of California, Santa Cruz, CA, USA
    Search for more papers by this author
  • Hans J. Skaug,

    1. Department of Applied Math and Statistics (SOE2), University of California, Santa Cruz, CA, USA
    2. Department of Mathematics, University of Bergen, Bergen, Norway
    Search for more papers by this author
  • Daniel J. Barshis

    1. Fisheries Ecology Division, Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, Santa Cruz, CA, USA
    Search for more papers by this author

Abstract

We develop a model based on the Dirichlet-compound multinomial distribution (CMD) and Ewens sampling formula to predict the fraction of SNP loci that will appear fixed for alternate alleles between two pooled samples drawn from the same underlying population. We apply this model to next-generation sequencing (NGS) data from Baltic Sea herring recently published by (Corander et al., 2013, Molecular Ecology, 2931–2940), and show that there are many more fixed loci than expected in the absence of genetic structure. However, we show through coalescent simulations that the degree of population structure required to explain the fraction of alternatively fixed SNPs is extraordinarily high and that the surplus of fixed loci is more likely a consequence of limited representation of individual gene copies in the pooled samples, than it is of population structure. Our analysis signals that the use of NGS on pooled samples to identify divergent SNPs warrants caution. With pooled samples, it is hard to diagnose when an NGS experiment has gone awry; especially when NGS data on pooled samples are of low read depth with a limited number of individuals, it may be worthwhile to temper claims of unexpected population differentiation from pooled samples, pending verification with more reliable methods or stricter adherence to recommended sampling designs for pooled sequencing e.g. Futschik & Schlötterer 2010, Genetics, 186, 207; Gautier et al., 2013a, Molecular Ecology, 3766–3779). Analysis of the data and diagnosis of problems is easier and more reliable (and can be less costly) with individually barcoded samples. Consequently, for some scenarios, individual barcoding may be preferable to pooling of samples.

Ancillary