Population genomics from pool sequencing

Authors

  • Luca Ferretti,

    Corresponding author
    1. Center for Research in Agricultural Genomics (CRAG), UAB, Bellaterra, Spain
    Search for more papers by this author
  • Sebastián E. Ramos-Onsins,

    1. Center for Research in Agricultural Genomics (CRAG), UAB, Bellaterra, Spain
    Search for more papers by this author
  • Miguel Pérez-Enciso

    1. Center for Research in Agricultural Genomics (CRAG), UAB, Bellaterra, Spain
    2. Department of Animal Science and Food, Faculty of Veterinary, Universitat Autonoma de Barcelona, Bellaterra, Spain
    3. Institut Català de Recerca i Estudis Avancats (ICREA), Barcelona, Spain
    Search for more papers by this author

Abstract

Next generation sequencing of pooled samples is an effective approach for studies of variability and differentiation in populations. In this paper we provide a comprehensive set of estimators of the most common statistics in population genetics based on the frequency spectrum, namely the Watterson estimator math formula, nucleotide pairwise diversity Π, Tajima's D, Fu and Li's D and F, Fay and Wu's H, McDonald-Kreitman and HKA tests and math formula, corrected for sequencing errors and ascertainment bias. In a simulation study, we show that pool and individual θ estimates are highly correlated and discuss how the performance of the statistics vary with read depth and sample size in different evolutionary scenarios. As an application, we reanalyse sequences from Drosophila mauritiana and from an evolution experiment in Drosophila melanogaster. These methods are useful for population genetic projects with limited budget, study of communities of individuals that are hard to isolate, or autopolyploid species.

Ancillary