Accounting for genotype uncertainty in the estimation of allele frequencies in autopolyploids
Abstract
Despite the increasing opportunity to collect large‐scale data sets for population genomic analyses, the use of high‐throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty–ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high‐throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.
Citing Literature
Number of times cited according to CrossRef: 30
- Rodrigo R. Amadeu, Luis Felipe V. Ferrão, Ivone de Bem Oliveira, Juliana Benevenuto, Jeffrey B. Endelman, Patricio R. Munoz, Impact of dominance effects on autotetraploid genomic prediction, Crop Science, 10.1002/csc2.20075, 60, 2, (656-665), (2020).
- Essohouna Modom Banla, Daniel Kwadjo Dzidzienyo, Mouhamadou Moussa Diangar, Leander Dede Melomey, Samuel Kwame Offei, Pangirayi Tongoona, Haile Desmae, Molecular and phenotypic diversity of groundnut (Arachis hypogaea L.) cultivars in Togo, Physiology and Molecular Biology of Plants, 10.1007/s12298-020-00837-8, 26, 7, (1489-1504), (2020).
- Melissa C. Fogarty, Scott M. Smith, Jaime L. Sheridan, Gongshe Hu, Emir Islamovic, Rob Reid, Eric W. Jackson, Peter J. Maughan, Nancy P. Ames, Eric N. Jellen, Tzung‐Fu Hsieh, Identification of mixed linkage β‐glucan quantitative trait loci and evaluation of AsCslF6 homoeologs in hexaploid oat, Crop Science, 10.1002/csc2.20015, 60, 2, (914-933), (2020).
- Pierre Casadebaig, Philippe Debaeke, Daniel Wallach, A new approach to crop model calibration: Phenotyping plus post‐processing, Crop Science, 10.1002/csc2.20016, 60, 2, (709-720), (2020).
- Ibrahim S. Elbasyoni, Sabah M. Morsy, Mahmoud Naser, Heba Ali, Kevin P. Smith, P. Stephen Baenziger, Reverse introduction of two‐ and six‐rowed barley lines from the United States into Egypt, Crop Science, 10.1002/csc2.20061, 60, 2, (812-829), (2020).
- Amber L. Beseli, Avat Shekoofa, Mujahid Ali, Thomas R. Sinclair, Temporal water use by two maize lines differing in leaf osmotic potential, Crop Science, 10.1002/csc2.20062, 60, 2, (945-953), (2020).
- Ling Zhang, Xiaoming He, Zhengyuan Liang, Wei Zhang, Chunqin Zou, Xinping Chen, Tiller development affected by nitrogen fertilization in a high‐yielding wheat production system, Crop Science, 10.1002/csc2.20140, 60, 2, (1034-1047), (2020).
- Alemu Tirfessa, Greg McLean, Emma Mace, Erik Oosterom, David Jordan, Graeme Hammer, Differences in temperature response of phenological development among diverse Ethiopian sorghum genotypes are linked to racial grouping and agroecological adaptation, Crop Science, 10.1002/csc2.20128, 60, 2, (977-990), (2020).
- Liliane S. Silva, Valdson J. Silva, Junior I. Yasuoka, Lynn E. Sollenberger, Carlos G.S. Pedreira, Tillering dynamics of ‘Mulato II’ brachiariagrass under continuous stocking, Crop Science, 10.1002/csc2.20008, 60, 2, (1105-1112), (2020).
- Dorcus C. Gemenet, Hannele Lindqvist-Kreuze, Bert De Boeck, Guilherme da Silva Pereira, Marcelo Mollinari, Zhao-Bang Zeng, G. Craig Yencho, Hugo Campos, Sequencing depth and genotype quality: accuracy and breeding operation considerations for genomic selection applications in autopolyploid crops, Theoretical and Applied Genetics, 10.1007/s00122-020-03673-2, (2020).
- Rebecca Jordan, Suzanne M. Prober, Ary A. Hoffmann, Shannon K. Dillon, Combined Analyses of Phenotype, Genotype and Climate Implicate Local Adaptation as a Driver of Diversity in Eucalyptus microcarpa (Grey box), Forests, 10.3390/f11050495, 11, 5, (495), (2020).
- Dorcus C. Gemenet, Mercy N. Kitavi, Maria David, Dorcah Ndege, Reuben T. Ssali, Jolien Swanckaert, Godwill Makunde, G. Craig Yencho, Wolfgang Gruneberg, Edward Carey, Robert O. Mwanga, Maria I. Andrade, Simon Heck, Hugo Campos, Development of diagnostic SNP markers for quality assurance and control in sweetpotato [Ipomoea batatas (L.) Lam.] breeding programs, PLOS ONE, 10.1371/journal.pone.0232173, 15, 4, (e0232173), (2020).
- Christopher Cullis, David W. Lawlor, Percy Chimwamurombe, Nchimunya Bbebe, Karl Kunert, Juan Vorster, Development of marama bean, an orphan legume, as a crop, Food and Energy Security, 10.1002/fes3.164, 8, 3, (2019).
- Marie K Brandrud, Juliane Baar, Maria T Lorenzo, Alexander Athanasiadis, Richard M Bateman, Mark W Chase, Mikael Hedrén, Ovidiu Paun, Phylogenomic Relationships of Diploids and the Origins of Allotetraploids in Dactylorhiza (Orchidaceae), Systematic Biology, 10.1093/sysbio/syz035, (2019).
- Moses Nyine, Brigitte Uwimana, Nicolas Blavet, Eva Hřibová, Helena Vanrespaille, Michael Batte, Violet Akech, Allan Brown, Jim Lorenzen, Rony Swennen, Jaroslav Doležel, Genomic Prediction in a Multiploid Crop: Genotype by Environment Interaction and Allele Dosage Effects on Predictive Ability in Banana, The Plant Genome, 10.3835/plantgenome2017.10.0090, 11, 2, (1-16), (2018).
- David Gerard, Luis Felipe Ventorim Ferrão, Antonio Augusto Franco Garcia, Matthew Stephens, Genotyping Polyploids from Messy Sequencing Data, Genetics, 10.1534/genetics.118.301468, 210, 3, (789-807), (2018).
- Paul D. Blischak, Makenzie E. Mabry, Gavin C. Conant, J. Chris Pires, Integrating Networks, Phylogenomics, and Population Genomics for the Study of Polyploidy, Annual Review of Ecology, Evolution, and Systematics, 10.1146/annurev-ecolsys-121415-032302, 49, 1, (253-278), (2018).
- Garrett J. McKinney, Ryan K. Waples, Carita E. Pascal, Lisa W. Seeb, James E. Seeb, Resolving allele dosage in duplicated loci using genotyping‐by‐sequencing data: A path forward for population genetic analysis, Molecular Ecology Resources, 10.1111/1755-0998.12763, 18, 3, (570-579), (2018).
- Luca Ferretti, Paolo Ribeca, Sebastian E. Ramos-Onsins, The Site Frequency/Dosage Spectrum of Autopolyploid Populations, Frontiers in Genetics, 10.3389/fgene.2018.00480, 9, (2018).
- Lindsay V Clark, Xiaoli Jin, Karen Koefoed Petersen, Kossanou G Anzoua, Larissa Bagmet, Pavel Chebukin, Martin Deuter, Elena Dzyubenko, Nicolay Dzyubenko, Kweon Heo, Douglas A Johnson, Uffe Jørgensen, Jens Bonderup Kjeldsen, Hironori Nagano, Junhua Peng, Andrey Sabitov, Toshihiko Yamada, Ji Hye Yoo, Chang Yeon Yu, Stephen P Long, Erik J Sacks, Population structure of Miscanthus sacchariflorus reveals two major polyploidization events, tetraploid-mediated unidirectional introgression from diploid M. sinensis, and diversity centred around the Yellow Sea, Annals of Botany, 10.1093/aob/mcy161, (2018).
- Luís Felipe V. Ferrão, Juliana Benevenuto, Ivone de Bem Oliveira, Catherine Cellon, James Olmstead, Matias Kirst, Marcio F. R. Resende, Patricio Munoz, Insights Into the Genetic Basis of Blueberry Fruit-Related Traits Using Diploid and Polyploid Models in a GWAS Context, Frontiers in Ecology and Evolution, 10.3389/fevo.2018.00107, 6, (2018).
- Paul D Blischak, Laura S Kubatko, Andrea D Wolfe, SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data, Bioinformatics, 10.1093/bioinformatics/btx587, 34, 3, (407-415), (2017).
- Morten T. Limborg, Wesley A. Larson, Lisa W. Seeb, James E. Seeb, Screening of duplicated loci reveals hidden divergence patterns in a complex salmonid genome, Molecular Ecology, 10.1111/mec.14201, 26, 17, (4509-4522), (2017).
- R. K. Waples, J. E. Seeb, L. W. Seeb, Congruent population structure across paralogous and nonparalogous loci in Salish Sea chum salmon (Oncorhynchus keta), Molecular Ecology, 10.1111/mec.14163, 26, 16, (4131-4144), (2017).
- Kaitlyn M. Gaynor, Joseph W. Solomon, Stefanie Siller, Linnet Jessell, J. Emmett Duffy, Dustin R. Rubenstein, Development of genome‐ and transcriptome‐derived microsatellites in related species of snapping shrimps with highly duplicated genomes, Molecular Ecology Resources, 10.1111/1755-0998.12705, 17, 6, (e160-e173), (2017).
- Garrett J. McKinney, Ryan K. Waples, Lisa W. Seeb, James E. Seeb, Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping‐by‐sequencing data from natural populations, Molecular Ecology Resources, 10.1111/1755-0998.12613, 17, 4, (656-669), (2016).
- Emmanuel Paradis, Thierry Gosselin, Jérôme Goudet, Thibaut Jombart, Klaus Schliep, Linking genomics and population genetics with R, Molecular Ecology Resources, 10.1111/1755-0998.12577, 17, 1, (54-66), (2016).
- Jasmin Zohren, Nian Wang, Igor Kardailsky, James S. Borrell, Anika Joecker, Richard A. Nichols, Richard J. A. Buggs, Unidirectional diploid–tetraploid introgression among British birch trees with shifting ranges shown by restriction site‐associated markers, Molecular Ecology, 10.1111/mec.13644, 25, 11, (2413-2426), (2016).
- Morten T. Limborg, Lisa W. Seeb, James E. Seeb, Sorting duplicated loci disentangles complexities of polyploid genomes masked by genotyping by sequencing, Molecular Ecology, 10.1111/mec.13601, 25, 10, (2117-2129), (2016).
- Lindsay V. Clark, Erik J. Sacks, TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data, Source Code for Biology and Medicine, 10.1186/s13029-016-0057-7, 11, 1, (2016).




