Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference

Authors

  • A. Mastretta-Yanes,

    Corresponding author
    1. Centre for Ecology, Evolution and Conservation, School of Biological Sciences, University of East Anglia, 14 Norwich, UK
    • Correspondence: Alicia Mastretta-Yanes, Fax: +34 922 260135; E-mail: A.Yanes@uea.ac.uk

    Search for more papers by this author
  • N. Arrigo,

    1. Department of Ecology and Evolution, Biophore Building, University of Lausanne, Lausanne, Switzerland
    Search for more papers by this author
  • N. Alvarez,

    1. Department of Ecology and Evolution, Biophore Building, University of Lausanne, Lausanne, Switzerland
    Search for more papers by this author
  • T. H. Jorgensen,

    1. Department of Bioscience, Aarhus University, Universitets Parken, Aarhus, Denmark
    Search for more papers by this author
  • D. Piñero,

    1. Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico, DF, Mexico
    Search for more papers by this author
  • B. C. Emerson

    1. Centre for Ecology, Evolution and Conservation, School of Biological Sciences, University of East Anglia, 14 Norwich, UK
    2. Island Ecology and Evolution Research Group, Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), La Laguna, Tenerife, Canary Islands, Spain
    Search for more papers by this author

Abstract

Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.

Ancillary