• annotation;
  • de novo assembly;
  • suture zones;
  • transcriptomes;
  • variant discovery


High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis – from assembly to annotation to variant discovery – researchers have to distinguish technical artefacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic data set data for a clade of lizards and constructing a pipeline to analyse these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms.