SEARCH

SEARCH BY CITATION

Information S1. The postprocessing algorithm used to assemble the Lucilia sericata transcriptome.

Information S2–S19. The results of Lucilia sericata transcriptome assemblies under varying k-mer lengths and k-mer coverage cut-offs c. Each assembly is designated by its k_c thresholds. S2–S7 correspond to 25_3 assemblies, S8–S13 correspond to 25_5 assemblies, and S14–S19 correspond to 25_10 assemblies. A complete set of assembly files designated as k_c_splicing_graphs.fa is provided (S3, S9, S15). The splicing graphs are represented in an annotated FASTA format, in which each potentially nonlinear structure is given as a collection of nodes, with connecting edge information embedded within the node names. Different splicing graphs are separated by blank lines. Each node name is given as >NODE_u:v1,v2 . . . ,vp , where u is the ID of the current node, and u[RIGHTWARDS ARROW] v1, u[RIGHTWARDS ARROW] v2, . . . , u[RIGHTWARDS ARROW] vp are edges in the splicing graph, followed by number of reads per kilobase of node per million reads (RPKM) estimates for each library. The order of RPKM values is: embryo, third instar (feeding), third instar (postfeeding), early pupation, mid pupation, adult female, adult male, subtraction, salivary gland. The k_c_transposable.txt files (S4, S10, S16) designate the graphs in an assembly that were found by hidden Markov models to contain transposable element (TE) related sequences. These files provide the identity of the TE related sequence, the graph ID (NODE) number, P-value for the best BLAST hit to Drosophila melanogaster (below 10−7) and the BLAST hit itself (if found). Files labelled as k_c_unique_aael.txt (S5, S11, S17) or k_c_unique_agap.txt (S6, S12, S18) provide information regarding genes that are uniquely detected in Aedes aegypti or Anopheles gambiae (P-value below 10−7). Files labelled k_c_unique_stages.txt (S7, S13, S19) identify graphs that were uniquely detected in specific RNA-Seq libraries derived from different immature developmental stages or different adult sexes. Files labelled k_c_salivary.txt (S2, S8, S14) indicate transcripts detected in salivary glands. Graphs with a * next to their ID information were only detected in third instar RNA-Seq libraries. Information on BLAST hits to D. melanogaster has been provided for transcripts detected during development. Note that NODE numbers are specific to assembly conditions. Note also that only raw assemblies with k-mer lengths of 25 are reported here because of space limitations. The rest can be found at http://faculty.cse.tamu.edu/shsze/postprocess.

Information S20. A list of the specific nodes found in the clusters of gene expression observed over the development of Lucilia sericata as seen in Fig. 4. CLUSTER identifies the cluster in Fig. 4. NODE designates node numbers from the L. sericata transcriptome that are included in that cluster. The last two columns identify Flybase gene numbers and gene names for any node that produced BLAST hits to the Drosophila melanogaster genome. Only the best hit to a Drosophila gene is reported.

Information S21. Lists of the numbers of graphs expressed in each Illumina library, those unique to certain libraries and those differentially expressed (10 × and 4 ×) amongst select libraries. Note that these are unreplicated comparisons. Results are presented per assembly where k_c designates k-mer length and k-mer coverage cut-off c, respectively. A summary of the list of 80 known Drosophila embryonic patterning genes found (and not found) in the Lucilia sericata transcriptome can also be found in this file.

Information S22. The numbers of splicing graphs derived from mRNA that contained transposable element (TE) related domains. Results are presented per assembly where k_c designates k-mer length and k-mer coverage cut-off c, respectively. Information regarding the sequence graphs containing these TE sequences can be found in Supporting Information S2–S19.

FilenameFormatSizeDescription
IMB_1127_sm_FileS1.docx27KSupporting info item
IMB_1127_sm_FileS2.txt45KSupporting info item
IMB_1127_sm_FileS3.fa35899KSupporting info item
IMB_1127_sm_FileS4.txt5KSupporting info item
IMB_1127_sm_FileS5.txt7KSupporting info item
IMB_1127_sm_FileS6.txt4KSupporting info item
IMB_1127_sm_FileS7.txt99KSupporting info item
IMB_1127_sm_FileS8.txt49KSupporting info item
IMB_1127_sm_FileS9.fa27580KSupporting info item
IMB_1127_sm_FileS10.txt3KSupporting info item
IMB_1127_sm_FileS11.txt5KSupporting info item
IMB_1127_sm_FileS12.txt2KSupporting info item
IMB_1127_sm_FileS13.txt27KSupporting info item
IMB_1127_sm_FileS14.txt52KSupporting info item
IMB_1127_sm_FileS15.fa18920KSupporting info item
IMB_1127_sm_FileS16.txt1KSupporting info item
IMB_1127_sm_FileS17.txt4KSupporting info item
IMB_1127_sm_FileS18.txt1KSupporting info item
IMB_1127_sm_FileS19.txt9KSupporting info item
IMB_1127_sm_FileS20.txt1497KSupporting info item
IMB_1127_sm_FileS21.doc70KSupporting info item
IMB_1127_sm_FileS22.doc35KSupporting info item

Please note: Neither the Editors nor Wiley Blackwell are responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.