DnaSAM: Software to perform neutrality testing for large datasets with complex null models

Authors

  • ANDREW J. ECKERT,

    1. Section of Evolution and Ecology, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA
    2. Center for Population Biology
    Search for more papers by this author
    • 1

      These authors contributed equally to this work.

  • JOHN D. LIECHTY,

    1. Department of Plant Sciences, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author
    • 1

      These authors contributed equally to this work.

  • BRANDON R. TEARSE,

    1. Department of Plant Sciences, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author
  • BARNALY PANDE,

    1. Department of Plant Sciences, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author
  • DAVID B. NEALE

    1. Department of Plant Sciences, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA
    Search for more papers by this author

David B. Neale, Fax: 530-754-9366; E-mail: dbneale@ucdavis.edu

Abstract

Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file.

Ancillary