Appendix S1. Experimental procedures.

Fig. S1. Comparative phylogenetic analysis of 16S rRNA gene sequences and six conserved sporulation-related genes (spo0A, spoIVB, spoVAC, spoVAD, spoVT and gpr)(spore proteome) for 27 spore-forming Firmicutes with a complete genome sequence reported and annotated. Alignments were constructed with MAFFT (Katoh et al., 2005) or Muscle (Edgar, 2004) using default parameters. Multiple-FastA alignments were converted to Phylip format with the seqret program from the EMBOSS package (Rice et al., 2000). Phylogenies were constructed from Phylip-formatted alignments with PhyML (Guindon and Gascuel, 2003), using default parameters, except the following: JTT+Γ substitution model for proteins and GTR+Γ model for nucleic acids; four classes of substitution rate categories; estimation of the shape parameter, proportion of invariants and transition/transversion ratios (for nucleotides). Trees were processed (re-rooting, extracting topology and plotting) with the Newick Utilities (Junier and Zdobnov, 2010). Bootstrap values (percentage over 1000 samplings) are shown at the nodes of the trees.

Fig. S2. Phylogenetic reconstruction (above) and conservation profiles (below) for sequences of the stage 0 sporulation protein Spo0A. Conservation plots were made with the plotcon program from EMBOSS. This is a sliding-window program that computes a weighted average of the similarity scores for all residue pairs in each window. We used the default window size of four residues.

Fig. S3. Alignment of spo0A gene of Sulfobacillus acidophilus and Alicyclobacillus acidocaldarius Tc41 against spo0A of Bacillus subtilis 168. The two regions shown correspond to the forward primer 166f (left) and the reverse primer 748r (right) described in this study. Stars indicate 100% identity. The exclamation points highlight mismatches with the primer sequence.

Fig. S4. Cladogram of spo0A sequences from sediment of Lake Geneva extracted with protocols 1 (blue), 2 (yellow) and 3 (red). The nucleotide sequences were then clustered into putative OTUs (identity of > 97%) with the program from the QIIME package using the Uclust method (Caporaso, 2010), and a representative was used to build the phylogeny. Phylogenies were constructed from Phylip-formatted alignments with PhyML (Guindon, 2003 #88), using default parameters. The trees were re-rooted, condensed according to DNA extraction protocol, and displayed with the Newick utilities (Junier, 2010). Each branch represents a cluster of OTUs of > 97% sequence similarity. Identification of the closest relatives of the environmental sequences from the indirect extractions (protocol 3) was done by protein BLAST (Altschul et al., 1997), with the translated protein sequences using a reference database of 581 Spo0A protein sequences from the InterPro site (Mulder et al., 2002). Classes of closest relative are shown in color with indication of the identity ranges [< 65% identity (−), 65–74% (<), 75–84% (∼), 85–94%(#), > 95% (+)]. A, Bacillus amyloliquefaciens; B, B. methanolicus; C, Geobacillus sp. (strain WCH70); D, B. cereus subsp. cytotoxis (strain NVH 391-98); E, B. thuringiensis; F, Geobacillus thermodenitrificans (strain NG80-2); G, B. atrophaeus (strain 1942); H, B. subtilis; I, B. mycoides; J, B. pseudofirmus (strain OF4); K, Lysinibacillus sphaericus (strain C3-41); L, Brevibacillus laterosporus; M, Brevibacillus brevis (strain 47); N, Thermincola potens (strain JR); O, Desulfotomaculum acetoxidans (strain ATCC 49208); P, Desulfosporosinus orientis (strain ATCC 19365); Q, Thermosediminbacter oceani (strain ATCC BAA-1034); R, Syntrophobotulus glycolicus (strain DSM 8271); S, Heliobacterium medesticaldum (strain ATCC 51547); T, Clostridium clariflavum (strain DSM 19732); U, B. cereus; V, C. thermocellum; W, C. cellulovorans (strain ATCC 35296); X, C. cellulolyticum (strain ATCC 35319); Y, C. botulinum; Z, C. lijungdahlii (strain ATCC 55383); AA, C. perfringens; AB, C. sporogenes; AC, Alkaliphilus metalliredigens (strain QYMF); AD, A. oremlandii (strain OhILAs); AE, Desulfotomaculum kuznetsovii (strain DSM 6115); AF, Geobacillus sp. (strain Y412MC10); AG, Paenibacillus polymyxa; AH, P. mucilaginosus (strain KNP414).

Fig. S5. Cladogram of spo0A sequences from sediment of Lake Baikal extracted with protocols 1 (blue), 2 (yellow) and 3 (red). Each branch represents a cluster of OTUs of > 97% sequence similarity. Closest relatives are shown in letters around the tree together with identity ranges [< 65% identity (−), 65–74% (<), 75–84% (∼), 85–94%(#), > 95% (+)]. For classes, see legend in Fig. 4 and the following: AI, B. megaterium; AJ, B. licheniformis; AK, B. megaterium (strain DSM 319); AL, C. haemolyticum; AM, Paenibacillus sp. (strain JDR-2); AN, B. cellulosilyticus (strain ATCC 21833); AO, Sulfobacillus acidophilus (strain TPY); AP, S. acidophilus (strain ATCC 700253); AQ, Desulforudis audaxviator (strain MP104C); AR, C. butyricum; AS, C. kluyveri (strain ATCC 8527).

Table S1. List of genome sequences from the 27 endospore-forming Firmicutes used in this study. Complete and draft genome sequences were downloaded from the Comprehensive Microbial Resource (CMR, 24.0 data release, and Integrated Microbial Genomes (IMG, 3.0, websites. Protein and nucleotide sequences of spore-related genes were obtained by search for role category/function sporulation and germination (CMR) and sporulating (IMG). Additional information on all retrieved genomes was obtained from the GenBank database ( Clas = taxonomical classification; B = Bacilli; C = Clostridia; T° = temperature range; M = mesophile; T = thermophile; P = psychrophile; H = hyperthermophile; Sp. Genes = number of sporulation-related genes. The number of sporulation-related genes was retrieved from the available genome annotations.

Table S2. Orthologous genes found after bi-directional BLAST of the sporulation-related genes common to 27 genomes of endospore-forming Firmicutes. Protein lengths indicated for Bacillus subtilis as a reference were obtained from Stragier and Losick (1996).

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.