Volume 20, Issue 4
RESOURCE ARTICLE

jackalope: A swift, versatile phylogenomic and high‐throughput sequencing simulator

Lucas A. Nell

Corresponding Author

E-mail address: lucas@lucasnell.com

Department of Integrative Biology, University of Wisconsin, Madison, WI, USA

Correspondence

Lucas A. Nell, Department of Integrative Biology, University of Wisconsin, Madison, WI, USA.

Email: lucas@lucasnell.com

Search for more papers by this author
First published: 22 April 2020

Abstract

High‐throughput sequencing (HTS) is central to the study of population genomics and has an increasingly important role in constructing phylogenies. Choices in research design for sequencing projects can include a wide range of factors, such as sequencing platform, depth of coverage and bioinformatic tools. Simulating HTS data better informs these decisions, as users can validate software by comparing output to the known simulation parameters. However, current standalone HTS simulators cannot generate variant haplotypes under even somewhat complex evolutionary scenarios, such as recombination or demographic change. This greatly reduces their usefulness for fields such as population genomics and phylogenomics. Here I present the R package jackalope that simply and efficiently simulates (i) sets of variant haplotypes from a reference genome and (ii) reads from both Illumina and Pacific Biosciences platforms. Haplotypes can be simulated using phylogenies, gene trees, coalescent‐simulation output, population‐genomic summary statistics, and Variant Call Format (VCF) files. jackalope can simulate single, paired‐end or mate‐pair Illumina reads, as well as reads from Pacific Biosciences. These simulations include sequencing errors, mapping qualities, multiplexing and optical/PCR duplicates. It can read reference genomes from fasta files and can simulate new ones, and all outputs can be written to standard file formats. jackalope is available for Mac, Windows and Linux systems.

DATA AVAILABILITY STATEMENT

jackalope is open source, under the MIT licence. The stable version of jackalope is available on CRAN (https://CRAN.R‐project.org/package=jackalope), and the development version is on GitHub (https://github.com/lucasnell/jackalope). The documentation can be found at https://jackalope.lucasnell.com. The version used in this paper was 1.1.0. Code for the example usage, validation and performance is available on GitHub at https://github.com/lucasnell/jlp_ms.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.