De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

Authors

  • Sonal Singhal

    Corresponding author
    1. Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
    • Museum of Vertebrate Zoology, University of California, Berkeley, Berkeley, CA, USA
    Search for more papers by this author

Correspondence: Sonal Singhal, E-mail: singhal@berkeley.edu

Abstract

High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis – from assembly to annotation to variant discovery – researchers have to distinguish technical artefacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic data set data for a clade of lizards and constructing a pipeline to analyse these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms.

Ancillary