Get access

Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics

Authors

  • B. Nevado,

    Corresponding author
    1. Centre for Research in Agricultural Genomics, Campus UAB, Bellaterra, Spain
    2. Universitat Autònoma de Barcelona, Bellaterra, Spain
    Current affiliation:
    1. Campus UAB—Edifici CRAG, Bellaterra, Barcelona, Spain
    Search for more papers by this author
  • S. E. Ramos-Onsins,

    1. Centre for Research in Agricultural Genomics, Campus UAB, Bellaterra, Spain
    Search for more papers by this author
  • M. Perez-Enciso

    1. Centre for Research in Agricultural Genomics, Campus UAB, Bellaterra, Spain
    2. Universitat Autònoma de Barcelona, Bellaterra, Spain
    3. Institut Català de Recerca I Estudis Avancats (ICREA), Barcelona, Spain
    Search for more papers by this author

Abstract

Decreasing costs of next-generation sequencing (NGS) experiments have made a wide range of genomic questions open for study with nonmodel organisms. However, experimental designs and analysis of NGS data from less well-known species are challenging because of the lack of genomic resources. In this work, we investigate the performance of alternative experimental designs and bioinformatics approaches in estimating variability and neutrality tests based on the site-frequency-spectrum (SFS) from individual resequencing data. We pay particular attention to challenges faced in the study of nonmodel organisms, in particular the absence of a species-specific reference genome, although phylogenetically close genomes are assumed to be available. We compare the performance of three alternative bioinformatics approaches – genotype calling, genotype–haplotype calling and direct estimation without calling genotypes. We find that relying on genotype calls provides biased estimates of population genetic statistics at low to moderate read depth (2–8×). Genotype–haplotype calling returns more accurate estimates irrespective of the divergence to the reference genome, but requires moderate depth (8–20×). Direct estimation without calling genotypes returns the most accurate estimates of variability and of most SFS tests investigated, including at low read depth (2–4×). Studies without species-specific reference genome should thus aim for low read depth and avoid genotype calling whenever individual genotypes are not essential. Otherwise, aiming for moderate to high depth at the expense of number of individuals, and using genotype–haplotype calling, is recommended.

Ancillary