SEARCH

SEARCH BY CITATION

FilenameFormatSizeDescription
emi2852-sup-0001-si.tif231K

Fig. S1. Isolation of 106 plant-associated strains.

A. Geographical location of sampling of GMB isolates, ‘Location 6’ encompasses locations outside England, the yellow dot represents the location of our laboratory.

B. Time of sampling, the asterisk denotes strains that were isolated on the same day in the same field.

C and D. (C) Distribution of GMB isolates according to geographical location of sampling and (D) according to the plant of sampling. In (C) and (D), ‘mixed’ refers to isolation from post-harvest mixed salad samples containing salad from different locations or sources. In (B), ‘unknown’ refers to a strain (GMB103) whose isolation history is unknown.

emi2852-sup-0002-si.tif327K

Fig. S2. Phylogenetic tree of Escherichia sp. strains. A maximum-likelihood phylogenetic tree based on concatenated sequences of internal fragments of 8 housekeeping genes from GMB, ECOR and other Escherichia sp. strains was calculated. Sequences from the Escherichia sp. strains were part of a recent genome sequencing project (Luo et al., 2011) and retrieved from the NCBI website. GMB56 and 57 are clones of GMB46.

emi2852-sup-0003-si.tif65K

Fig. S3. Distribution of C-source utilization correlation coefficients (r2) within and between phylogroups. Box plots show the distribution of r2 correlation coefficients for each pairwise comparison between phylogroups as displayed in the correlation matrix from Fig. S4. The coloured box plots represent intra-phylogroup comparisons; the blank box plots represent inter-phylogroup comparisons.

emi2852-sup-0004-si.tif27K

Fig. S4. Inter and intra-phylogroup correlation of C-source utilization profiles. Box plots show the distribution of r2 correlation coefficients for inter and intra-phylogroup correlations. The asterisks indicate statistically significant difference between distribution means as found by a unpaired t test with Welch's correction (***P < 0.0001).

emi2852-sup-0005-si.tif112K

Fig. S5. Phenotypes used to define the ‘plant association index’ (PAi) differ between E. coli phylogroups.

A. Nutritional ability, as indicated by the average combined growth on the 18 C-sources most significantly associated with variation between ECOR and GMB, as shown in Table S1.

B. biofilm formation at 28°C for 72 h.

C and D. (C) growth yields reached after 24 h on sucrose and (D) pHPA. Asterisks indicate statistically significant difference between distributions as found by a Dunn's comparison test after Kruskal–Wallis tests (*P < 0.05; **P < 0.001; ***P < 0.0001). The P-value of the corresponding KW tests is indicated below the title of each graph.

emi2852-sup-0006-si.tif80K

Fig. S6. Empirical definition of a statistical threshold for positive carbon source utilization. (A) Histogram (red columns) shows OD600 values across 26 non-utilized carbon sources; kernel density estimation of the probability density function is represented by the black line and its cumulative density function in panel (B), showing values for 5% and 1% tails. See Experimental procedures for details.

emi2852-sup-0007-si.doc111K

Table S1: Individual GMB strain information. For each GMB strain, location and date of isolation are provided. Locations correspond to the ones indicated on the map presented in Fig. S1. The table is ordered by phylogroups. *‘Mixed’ indicates isolation from post-harvest mixed salad samples containing salad from different locations or sources.

emi2852-sup-0008-si.doc238K

Table S2. Morphotype and Plant Association Index (PAi) for plant and host-associated E. coli isolates. The PAi was calculated for the 173 strains (nECOR = 72; nGMB = 101) used in this study. The list is ordered according to PAi for which a heatmap has been applied for clarity (green, high PAi; red, low PAi; yellow, midpoint). Morphotypes on Congo red-containing agar are also indicated. See text and Experimental procedures for more details.

emi2852-sup-0009-si.doc38K

Table S3. Cross-validation success rates in PLS-DA using ECOR and GMB as groups for model dimensions up to 10. The maximum cross-validation success rate is highlighted and represents 3 PLS-DA dimensions.

emi2852-sup-0010-si.doc107K

Table S4. C-sources differentially used by plant and host-associated E. coli isolates. Only C-sources used by more than 5% of strains (63/95 carbon sources) are shown; the table is ordered by the P-values of a Mann–Whitney–Wilcoxon test to find differentially used C-sources between ECOR and GMB. Red text indicates the C-sources found statistically significant after a Bonferroni correction, blue highlighting indicates C-sources for which there is more than 20% difference between ECOR or GMB utilization (% of strains reaching OD600 > 0.63 after 24 h at 37°C using the given compound as a sole C-source).

emi2852-sup-0011-si.doc42K

Table S5. Cross-validation success rates in PLS-DA using E. coli phylogroups as groups for model dimensions up to 10. The maximum cross-validation success rate (74.4%) is highlighted and represents 7 PLS-DA dimensions. On Fig. 3, only two dimensions are represented for clarity, which corresponds to a 62.5% success rate. Phylogroup E was excluded from the analysis because of a low sampling size (n = 5).

emi2852-sup-0012-si.doc39K

Appendix S1. Supplementary methods.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.