Repeats found in H. halophilus.

Fig. S1. General genome features. The horizontal axis represents the genome from left to right, with a scale given in Mb. From top to bottom are plots of: (a) cumulative GC-skew, (b) rRNA operons, (c) clusters of tRNA genes, (d) %GC if the deviation for a 1 kb window is more than 2.5 SD from the average, and (e) variation in tetramer composition (TETRA), where darker colour indicated more prominent deviation. For clusters of tRNA genes, the number of tRNAs within the cluster are indicated by the number. Marked above the %GC deviations are the rRNA operons which have a higher %GC than average, a plasmid-like low-GC region and a low-GC region which preferentially contains hypothetical proteins (HY region). Underlying data were computed as described in Dyall-Smith and colleagues (2011).

Fig. S2. MEGAN analysis. MEGAN (Huson et al., 2007) was used to compare the protein set from Halobacillus halophilus to that of all completely sequenced genomes present in NCBI as of October 2010 as maintained in the MiGenAS system (Rampp et al., 2006). Results from protein BLAST are subjected to the least common ancestor algorithm of MEGAN which allows assigning a protein either to a given genome or to a higher rank on the NCBI taxonomy. Genomes to which at least five proteins were assigned are represented at the right by named circles. The number of assigned proteins is specified and the circle size increases with the number of assigned proteins. At higher taxonomic levels, the right number indicates the total number of proteins assigned and the left number indicates the proteins newly assigned at this level. Oceanobacillus iheyensis is the species to which most proteins are assigned. About half of the proteins from Halobacillus are assigned to Bacillaceae. Genomes are considered only when at least five proteins are assigned; proteins assigned to low-hit genomes are collected as ‘not assigned’. ‘No hit’ indicates that eventual BLAST results did not reach the applied cut-off values. The analysis included a small number of spurious ORFs which are not considered to code for protein, e.g. because they are located on the opposite strand of genes.

Table S1. rRNA polymorphisms in H. halophilus.

Table S2. Frequently occupied COGs in H. halophilus.

Table S3. Co-occupancy of COGs in H. halophilus.

Table S4. COGs which are occupied in most Bacillaceae but only few other organisms.

Table S5. COGs which are not or rarely occupied in Bacillaceae.

Table S6. Average pI difference between proteins from H. halophilus and from other Bacillaceae.

emi2770_sm_FigS1.jpg223KSupporting info item
emi2770_sm_FigS2.jpg99KSupporting info item
emi2770_sm_Data.doc519KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.