A de novo transcriptome assembly of Lucilia sericata (Diptera: Calliphoridae) with predicted alternative splices, single nucleotide polymorphisms and transcript expression estimates
Article first published online: 29 JAN 2012
© 2012 The Authors. Insect Molecular Biology © 2012 The Royal Entomological Society
Insect Molecular Biology
Volume 21, Issue 2, pages 205–221, April 2012
How to Cite
Sze, S.-H., Dunham, J. P., Carey, B., Chang, P. L., Li, F., Edman, R. M., Fjeldsted, C., Scott, M. J., Nuzhdin, S. V. and Tarone, A. M. (2012), A de novo transcriptome assembly of Lucilia sericata (Diptera: Calliphoridae) with predicted alternative splices, single nucleotide polymorphisms and transcript expression estimates. Insect Molecular Biology, 21: 205–221. doi: 10.1111/j.1365-2583.2011.01127.x
- Issue published online: 8 MAR 2012
- Article first published online: 29 JAN 2012
Information S1. The postprocessing algorithm used to assemble the Lucilia sericata transcriptome.
Information S2–S19. The results of Lucilia sericata transcriptome assemblies under varying k-mer lengths and k-mer coverage cut-offs c. Each assembly is designated by its k_c thresholds. S2–S7 correspond to 25_3 assemblies, S8–S13 correspond to 25_5 assemblies, and S14–S19 correspond to 25_10 assemblies. A complete set of assembly files designated as k_c_splicing_graphs.fa is provided (S3, S9, S15). The splicing graphs are represented in an annotated FASTA format, in which each potentially nonlinear structure is given as a collection of nodes, with connecting edge information embedded within the node names. Different splicing graphs are separated by blank lines. Each node name is given as >NODE_u:v1,v2 . . . ,vp , where u is the ID of the current node, and u v1, u v2, . . . , u vp are edges in the splicing graph, followed by number of reads per kilobase of node per million reads (RPKM) estimates for each library. The order of RPKM values is: embryo, third instar (feeding), third instar (postfeeding), early pupation, mid pupation, adult female, adult male, subtraction, salivary gland. The k_c_transposable.txt files (S4, S10, S16) designate the graphs in an assembly that were found by hidden Markov models to contain transposable element (TE) related sequences. These files provide the identity of the TE related sequence, the graph ID (NODE) number, P-value for the best BLAST hit to Drosophila melanogaster (below 10−7) and the BLAST hit itself (if found). Files labelled as k_c_unique_aael.txt (S5, S11, S17) or k_c_unique_agap.txt (S6, S12, S18) provide information regarding genes that are uniquely detected in Aedes aegypti or Anopheles gambiae (P-value below 10−7). Files labelled k_c_unique_stages.txt (S7, S13, S19) identify graphs that were uniquely detected in specific RNA-Seq libraries derived from different immature developmental stages or different adult sexes. Files labelled k_c_salivary.txt (S2, S8, S14) indicate transcripts detected in salivary glands. Graphs with a * next to their ID information were only detected in third instar RNA-Seq libraries. Information on BLAST hits to D. melanogaster has been provided for transcripts detected during development. Note that NODE numbers are specific to assembly conditions. Note also that only raw assemblies with k-mer lengths of 25 are reported here because of space limitations. The rest can be found at http://faculty.cse.tamu.edu/shsze/postprocess.
Information S20. A list of the specific nodes found in the clusters of gene expression observed over the development of Lucilia sericata as seen in Fig. 4. CLUSTER identifies the cluster in Fig. 4. NODE designates node numbers from the L. sericata transcriptome that are included in that cluster. The last two columns identify Flybase gene numbers and gene names for any node that produced BLAST hits to the Drosophila melanogaster genome. Only the best hit to a Drosophila gene is reported.
Information S21. Lists of the numbers of graphs expressed in each Illumina library, those unique to certain libraries and those differentially expressed (10 × and 4 ×) amongst select libraries. Note that these are unreplicated comparisons. Results are presented per assembly where k_c designates k-mer length and k-mer coverage cut-off c, respectively. A summary of the list of 80 known Drosophila embryonic patterning genes found (and not found) in the Lucilia sericata transcriptome can also be found in this file.
Information S22. The numbers of splicing graphs derived from mRNA that contained transposable element (TE) related domains. Results are presented per assembly where k_c designates k-mer length and k-mer coverage cut-off c, respectively. Information regarding the sequence graphs containing these TE sequences can be found in Supporting Information S2–S19.
|IMB_1127_sm_FileS1.docx||27K||Supporting info item|
|IMB_1127_sm_FileS2.txt||45K||Supporting info item|
|IMB_1127_sm_FileS3.fa||35899K||Supporting info item|
|IMB_1127_sm_FileS4.txt||5K||Supporting info item|
|IMB_1127_sm_FileS5.txt||7K||Supporting info item|
|IMB_1127_sm_FileS6.txt||4K||Supporting info item|
|IMB_1127_sm_FileS7.txt||99K||Supporting info item|
|IMB_1127_sm_FileS8.txt||49K||Supporting info item|
|IMB_1127_sm_FileS9.fa||27580K||Supporting info item|
|IMB_1127_sm_FileS10.txt||3K||Supporting info item|
|IMB_1127_sm_FileS11.txt||5K||Supporting info item|
|IMB_1127_sm_FileS12.txt||2K||Supporting info item|
|IMB_1127_sm_FileS13.txt||27K||Supporting info item|
|IMB_1127_sm_FileS14.txt||52K||Supporting info item|
|IMB_1127_sm_FileS15.fa||18920K||Supporting info item|
|IMB_1127_sm_FileS16.txt||1K||Supporting info item|
|IMB_1127_sm_FileS17.txt||4K||Supporting info item|
|IMB_1127_sm_FileS18.txt||1K||Supporting info item|
|IMB_1127_sm_FileS19.txt||9K||Supporting info item|
|IMB_1127_sm_FileS20.txt||1497K||Supporting info item|
|IMB_1127_sm_FileS21.doc||70K||Supporting info item|
|IMB_1127_sm_FileS22.doc||35K||Supporting info item|
Please note: Neither the Editors nor Wiley Blackwell are responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.