SEARCH

SEARCH BY CITATION

Keywords:

  • allele dropout;
  • allele frequency;
  • F ST ;
  • heterozygosity;
  • next-generation sequencing;
  • RAD markers;
  • single-nucleotide polymorphisms

Abstract

Inexpensive short-read sequencing technologies applied to reduced representation genomes is revolutionizing genetic research, especially population genetics analysis, by allowing the genotyping of massive numbers of single-nucleotide polymorphisms (SNP) for large numbers of individuals and populations. Restriction site–associated DNA (RAD) sequencing is a recent technique based on the characterization of genomic regions flanking restriction sites. One of its potential drawbacks is the presence of polymorphism within the restriction site, which makes it impossible to observe the associated SNP allele (i.e. allele dropout, ADO). To investigate the effect of ADO on genetic variation estimated from RAD markers, we first mathematically derived measures of the effect of ADO on allele frequencies as a function of different parameters within a single population. We then used RAD data sets simulated using a coalescence model to investigate the magnitude of biases induced by ADO on the estimation of expected heterozygosity and FST under a simple demographic model of divergence between two populations. We found that ADO tends to overestimate genetic variation both within and between populations. Assuming a mutation rate per nucleotide between 10−9 and 10−8, this bias remained low for most studied combinations of divergence time and effective population size, except for large effective population sizes. Averaging FST values over multiple SNPs, for example, by sliding window analysis, did not correct ADO biases. We briefly discuss possible solutions to filter the most problematic cases of ADO using read coverage to detect markers with a large excess of null alleles.