Get access

SNP development from RNA-seq data in a nonmodel fish: how many individuals are needed for accurate allele frequency prediction?



Single nucleotide polymorphisms (SNPs) are rapidly becoming the marker of choice in population genetics due to a variety of advantages relative to other markers, including higher genomic density, data quality, reproducibility and genotyping efficiency, as well as ease of portability between laboratories. Advances in sequencing technology and methodologies to reduce genomic representation have made the isolation of SNPs feasible for nonmodel organisms. RNA-seq is one such technique for the discovery of SNPs and development of markers for large-scale genotyping. Here, we report the development of 192 validated SNP markers for parentage analysis in Tripterygion delaisi (the black-faced blenny), a small rocky-shore fish from the Mediterranean Sea. RNA-seq data for 15 individual samples were used for SNP discovery by applying a series of selection criteria. Genotypes were then collected from 1599 individuals from the same population with the resulting loci. Differences in heterozygosity and allele frequencies were found between the two data sets. Heterozygosity was lower, on average, in the population sample, and the mean difference between the frequencies of particular alleles in the two data sets was 0.135 ± 0.100. We used bootstrap resampling of the sequence data to predict appropriate sample sizes for SNP discovery. As cDNA library production is time-consuming and expensive, we suggest that using seven individuals for RNA sequencing reduces the probability of discarding highly informative SNP loci, due to lack of observed polymorphism, whereas use of more than 12 samples does not considerably improve prediction of true allele frequencies.