Sigma factors in a thousand E. coli genomes

Authors

  • Helen Cook,

    1. Department of Systems Biology, Center for Biological Sequence Analysis, The Technical University of Denmark, Lyngby, Denmark
    Search for more papers by this author
  • David W. Ussery

    Corresponding author
    1. Department of Systems Biology, Center for Biological Sequence Analysis, The Technical University of Denmark, Lyngby, Denmark
    Current affiliation:
    1. Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
    • For correspondence. E-mail dave@cbs.dtu.dk; Tel. (+1) (865) 574 8201; Fax (+1) (865) 574 3555.

    Search for more papers by this author

Summary

Everyone working with bacterial genomics is familiar with the phrase ‘too much data’. In this Genome Update, we discuss two methods for helping to deal with this explosion of genomic information. First, we introduce the concept of calculating a quality score for each sequenced genome, and second, we describe a method to quickly sort through genomes for a particular set of protein families. We apply these two methods to all of the current Escherichia coli genomes available in the The National Center for Biotechnology Information database. Out of the 2074 E. coli/Shigella genomes listed (June, 2013), only less than half (983) are of sufficient quality to use in comparative genomic work. Unfortunately, even some of the ‘complete’ E. coli genomes are in pieces, and a few ‘draft’ genomes are good quality. Six of the seven known sigma factors in E. coli strain K-12 are extremely well conserved; the iron-regulating sigma factor FecI (σ19) is missing in most genomes. Surprisingly, the E. coli strain CFT073 genome does not encode a functional RpoD (σ70), which is obviously essential, and this is likely due to poor genome assembly/annotation. We find a possible novel sigma factor present in more than a hundred E. coli genomes.

Ancillary