Fig. S1. Phylum distribution of protein-coding genes. The taxonomic analysis of each metagenomic dataset used in this study was assessed through the phylogenetic distribution tool of the IMG/M database, which displays the phylum distribution of protein-coding genes in each metagenome based on their best match using BLASTp. Proteins which display less than 60% identity are excluded from this analysis.


Fig. S2. Distribution of secretion systems in metagenomic datasets. The x axis represents the number of predicted bacterial protein-coding genes present in each metagenomic dataset. The y axis represents the average of COGs per metagenome for multi-component secretion systems (A–D) and the number of COGs for single-component secretion system (E–F). Graphics A, B, C and D represent the distribution of T2SS, T3SS, T4SS and T6SS while graphics E, F, G and H represent the distribution of T1SS, T5aSS, T5bSS and T5cSS.


Fig. S3. Length of contigs carrying YscT (A) and IglA (B) protein coding sequences.


Table S1. Metagenomic datasets used in this study.


Table S2. COGs used in this study. COG identifiers corresponding to specific protein families of each secretion systems were selected according to literature (Delepelaire, 2004; Henderson et al., 2004; Cianciotto, 2005; Cornelis, 2006; Alvarez-Martinez and Christie, 2009; Boyer et al., 2009). The number of COGs used for estimate the distribution each secretion system family is indicated in parentheses. The total number of COGs in all metagenomic datasets examined is also indicated.


Table S3. Taxonomic distribution of secretion systems in genome sequences. The average number of secretion systems per genome was calculated in each bacterial phylum (every bacterial class for the Proteobacteria) by using all the specific COG identifiers selected in Table S2. The asterisk indicates that multiple COG identifiers have been selected for the estimation of T5aSS.


Table S4. Distribution of secretion systems within each metagenomic dataset.

