De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set
Article first published online: 18 FEB 2013
© 2013 Blackwell Publishing Ltd
Molecular Ecology Resources
Volume 13, Issue 3, pages 403–416, May 2013
How to Cite
Singhal, S. (2013), De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set. Molecular Ecology Resources, 13: 403–416. doi: 10.1111/1755-0998.12077
- Issue published online: 12 APR 2013
- Article first published online: 18 FEB 2013
- Manuscript Accepted: 22 DEC 2012
- Manuscript Revised: 13 DEC 2012
- Manuscript Received: 7 NOV 2012
- National Science Foundation
- Museum of Vertebrate Zoology Wolff Fund
- Society of the Study of Evolution
Fig. S1 Pipeline used in this work, annotated to show (1) different approaches tested [pink], (2) the approach used for the final analysis [blue], and (3) scripts used, as named in the DataDryad package [green].
Fig. S2 A. Phylogeny of the lineages studied in this work. Boxes indicate contacts studied; the top percentage reflects the mitochondrial divergence between lineages and the bottom is nuclear. B. A map of the Australian Wet Tropics, with all identified contact zones represented by black lines. Contacts of interest in this study are labelled.
Fig. S3 Quality scores in Phred along a read; top graph shows quality prior to cleaning and filtering, bottom shows quality after cleaning.
Fig. S4 Identified mismatches between reads from a randomly selected individual and the reference sequence, A. expressed in raw numbers and B. as a density distribution.
Fig. S5 Correlation between contig length and coverage for a randomly selected final assembly.
Fig. S6: Correlation between contig length and polymorphism for a randomly-selected final assembly.
Fig. S7: Gene ontology for annotated contigs for a randomly selected lineage, with respect to cellular component, biological process and molecular function.
Fig. S8: Identifying unannotated contigs from a randomly selected assembly, as identified from a BLAST search to the NCBI ‘nr’ nucleotide database.
Fig. S9: Correlation in coverage between homologous, annotated contigs for a randomly selected lineage pair.
Fig. S10: Summary of SNPs found in a randomly selected lineage pair, annotated with respect to SNP and coding type.
Fig. S11: Top row shows correlation in sequence divergence and bottom row shows correlation in inferred dN dS ratios for homologs for a randomly selected lineage pair for three methods of homolog discovery: annotation, in which contigs which share the same annotation are inferred to be homologous, BLAST, in which reciprocal best-hit BLAST is used to identify homologs, and SNP methods, in which variant information is used to reconstruct one homolog with respect to another.
Table S1: Individuals included in this study and their associated locality data; individuals are accessioned at the Museum of Vertebrate Zoology at University of California, Berkeley.
Table S2: Quality control filtering and their rates for raw data, summarized across seven lineages.
Table S3: Number of contigs annotated according to different reference databases for a randomly selected assembly.
Table S4: Prevalence of chimerism, or percentage of contigs that appeared to consist of multiple genes misassembled together, and stop codons, or percentage of contigs that had nonsense mutations, in assemblies, summarized across seven lineages both before and after the data were run in the annotation pipeline.
Table S5: Number of annotated contigs which have given coverage for each individual; shown for one randomly selected lineage pair.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.