• genome sequencing;
  • de novo sequence assembly;
  • Pseudomonas syringae;
  • Bioinformatics;
  • Illumina;
  • Solexa


Illumina's Genome Analyzer generates ultra-short sequence reads, typically 36 nucleotides in length, and is primarily intended for resequencing. We tested the potential of this technology for de novo sequence assembly on the 6 Mbp genome of Pseudomonas syringae pv. syringae B728a with several freely available assembly software packages. Using an unpaired data set, velvet assembled >96% of the genome into contigs with an N50 length of 8289 nucleotides and an error rate of 0.33%. edena generated smaller contigs (N50 was 4192 nucleotides) and comparable error rates. ssake and vcakeyielded shorter contigs with very high error rates. Assembly of paired-end sequence data carrying 400 bp inserts produced longer contigs (N50 up to 15 628 nucleotides), but with increased error rates (0.5%). Contig length and error rate were very sensitive to the choice of parameter values. Noncoding RNA genes were poorly resolved in de novo assemblies, while >90% of the protein-coding genes were assembled with 100% accuracy over their full length. This study demonstrates that, in practice, de novo assembly of 36-nucleotide reads can generate reasonably accurate assemblies from about 40 × deep sequence data sets. These draft assemblies are useful for exploring an organism's proteomic potential, at a very economic low cost.