Volume 16, Issue 3
Resource Article

Accounting for genotype uncertainty in the estimation of allele frequencies in autopolyploids

Paul D. Blischak

Corresponding Author

Department of Evolution, Ecology and Organismal Biology, Ohio State University, 318 W. 12th Avenue, Columbus, OH, 43210 USA

Correspondence: Paul D. Blischak, Fax: +1 614 292 2030; E‐mail: blischak.4@osu.eduSearch for more papers by this author
Laura S. Kubatko

Department of Evolution, Ecology and Organismal Biology, Ohio State University, 318 W. 12th Avenue, Columbus, OH, 43210 USA

Department of Statistics, Ohio State University, 1958 Neil Avenue, Columbus, OH, 43210 USA

Search for more papers by this author
Andrea D. Wolfe

Department of Evolution, Ecology and Organismal Biology, Ohio State University, 318 W. 12th Avenue, Columbus, OH, 43210 USA

Search for more papers by this author
First published: 26 November 2015
Citations: 30

Abstract

Despite the increasing opportunity to collect large‐scale data sets for population genomic analyses, the use of high‐throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty–ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high‐throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.

Number of times cited according to CrossRef: 30

  • Impact of dominance effects on autotetraploid genomic prediction, Crop Science, 10.1002/csc2.20075, 60, 2, (656-665), (2020).
  • Molecular and phenotypic diversity of groundnut (Arachis hypogaea L.) cultivars in Togo, Physiology and Molecular Biology of Plants, 10.1007/s12298-020-00837-8, 26, 7, (1489-1504), (2020).
  • Identification of mixed linkage β‐glucan quantitative trait loci and evaluation of AsCslF6 homoeologs in hexaploid oat, Crop Science, 10.1002/csc2.20015, 60, 2, (914-933), (2020).
  • A new approach to crop model calibration: Phenotyping plus post‐processing, Crop Science, 10.1002/csc2.20016, 60, 2, (709-720), (2020).
  • Reverse introduction of two‐ and six‐rowed barley lines from the United States into Egypt, Crop Science, 10.1002/csc2.20061, 60, 2, (812-829), (2020).
  • Temporal water use by two maize lines differing in leaf osmotic potential, Crop Science, 10.1002/csc2.20062, 60, 2, (945-953), (2020).
  • Tiller development affected by nitrogen fertilization in a high‐yielding wheat production system, Crop Science, 10.1002/csc2.20140, 60, 2, (1034-1047), (2020).
  • Differences in temperature response of phenological development among diverse Ethiopian sorghum genotypes are linked to racial grouping and agroecological adaptation, Crop Science, 10.1002/csc2.20128, 60, 2, (977-990), (2020).
  • Tillering dynamics of ‘Mulato II’ brachiariagrass under continuous stocking, Crop Science, 10.1002/csc2.20008, 60, 2, (1105-1112), (2020).
  • Sequencing depth and genotype quality: accuracy and breeding operation considerations for genomic selection applications in autopolyploid crops, Theoretical and Applied Genetics, 10.1007/s00122-020-03673-2, (2020).
  • Combined Analyses of Phenotype, Genotype and Climate Implicate Local Adaptation as a Driver of Diversity in Eucalyptus microcarpa (Grey box), Forests, 10.3390/f11050495, 11, 5, (495), (2020).
  • Development of diagnostic SNP markers for quality assurance and control in sweetpotato [Ipomoea batatas (L.) Lam.] breeding programs, PLOS ONE, 10.1371/journal.pone.0232173, 15, 4, (e0232173), (2020).
  • Development of marama bean, an orphan legume, as a crop, Food and Energy Security, 10.1002/fes3.164, 8, 3, (2019).
  • Phylogenomic Relationships of Diploids and the Origins of Allotetraploids in Dactylorhiza (Orchidaceae), Systematic Biology, 10.1093/sysbio/syz035, (2019).
  • Genomic Prediction in a Multiploid Crop: Genotype by Environment Interaction and Allele Dosage Effects on Predictive Ability in Banana, The Plant Genome, 10.3835/plantgenome2017.10.0090, 11, 2, (1-16), (2018).
  • Genotyping Polyploids from Messy Sequencing Data, Genetics, 10.1534/genetics.118.301468, 210, 3, (789-807), (2018).
  • Integrating Networks, Phylogenomics, and Population Genomics for the Study of Polyploidy, Annual Review of Ecology, Evolution, and Systematics, 10.1146/annurev-ecolsys-121415-032302, 49, 1, (253-278), (2018).
  • Resolving allele dosage in duplicated loci using genotyping‐by‐sequencing data: A path forward for population genetic analysis, Molecular Ecology Resources, 10.1111/1755-0998.12763, 18, 3, (570-579), (2018).
  • The Site Frequency/Dosage Spectrum of Autopolyploid Populations, Frontiers in Genetics, 10.3389/fgene.2018.00480, 9, (2018).
  • Population structure of Miscanthus sacchariflorus reveals two major polyploidization events, tetraploid-mediated unidirectional introgression from diploid M. sinensis, and diversity centred around the Yellow Sea, Annals of Botany, 10.1093/aob/mcy161, (2018).
  • Insights Into the Genetic Basis of Blueberry Fruit-Related Traits Using Diploid and Polyploid Models in a GWAS Context, Frontiers in Ecology and Evolution, 10.3389/fevo.2018.00107, 6, (2018).
  • SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data, Bioinformatics, 10.1093/bioinformatics/btx587, 34, 3, (407-415), (2017).
  • Screening of duplicated loci reveals hidden divergence patterns in a complex salmonid genome, Molecular Ecology, 10.1111/mec.14201, 26, 17, (4509-4522), (2017).
  • Congruent population structure across paralogous and nonparalogous loci in Salish Sea chum salmon (Oncorhynchus keta), Molecular Ecology, 10.1111/mec.14163, 26, 16, (4131-4144), (2017).
  • Development of genome‐ and transcriptome‐derived microsatellites in related species of snapping shrimps with highly duplicated genomes, Molecular Ecology Resources, 10.1111/1755-0998.12705, 17, 6, (e160-e173), (2017).
  • Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping‐by‐sequencing data from natural populations, Molecular Ecology Resources, 10.1111/1755-0998.12613, 17, 4, (656-669), (2016).
  • Linking genomics and population genetics with R, Molecular Ecology Resources, 10.1111/1755-0998.12577, 17, 1, (54-66), (2016).
  • Unidirectional diploid–tetraploid introgression among British birch trees with shifting ranges shown by restriction site‐associated markers, Molecular Ecology, 10.1111/mec.13644, 25, 11, (2413-2426), (2016).
  • Sorting duplicated loci disentangles complexities of polyploid genomes masked by genotyping by sequencing, Molecular Ecology, 10.1111/mec.13601, 25, 10, (2117-2129), (2016).
  • TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data, Source Code for Biology and Medicine, 10.1186/s13029-016-0057-7, 11, 1, (2016).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.