De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae)


Daniel B. Sloan, Department of Ecology and Evolutionary Biology, West Campus, Yale University, West Haven, CT 06516, USA, Fax: 203-737-3109; E-mail:


Members of the angiosperm genus Silene are widely used in studies of ecology and evolution, but available genomic and population genetic resources within Silene remain limited. Deep transcriptome (i.e. expressed sequence tag or EST) sequencing has proven to be a rapid and cost-effective means to characterize gene content and identify polymorphic markers in non-model organisms. In this study, we report the results of 454 GS-FLX Titanium sequencing of a polyA-selected and normalized cDNA library from Silene vulgaris. The library was generated from a single pool of transcripts, combining RNA from leaf, root and floral tissue from three genetically divergent European subpopulations of S. vulgaris. A single full-plate 454 run produced 959 520 reads totalling 363.6 Mb of sequence data with an average read length of 379.0 bp after quality trimming and removal of custom library adaptors. We assembled 832 251 (86.7%) of these reads into 40 964 contigs, which have a total length of 25.4 Mb and can be organized into 18 178 graph-based clusters or ‘isogroups’. Assembled sequences were annotated based on homology to genes in multiple public databases. Analysis of sequence variants identified 13 432 putative single-nucleotide polymorphisms (SNPs) and 1320 simple sequence repeats (SSRs) that are candidates for microsatellite analysis. Estimates of nucleotide diversity from 1577 contigs were used to generate genome-wide distributions that revealed several outliers with high diversity. All of these resources are publicly available through NCBI and/or our website ( and should provide valuable genomic and population genetic tools for the Silene research community.