CRISPR1 analysis of naturalized surface water and fecal Escherichia coli suggests common origin

Abstract Clustered regularly interspaced short palindromic repeats (CRISPRs) are part of an acquired bacterial immune system that functions as a barrier to exogenous genetic elements. Since naturalized Escherichia coli are likely to encounter different genetic elements in aquatic environments compared to enteric strains, we hypothesized that such differences would be reflected within the hypervariable CRISPR alleles of these two populations. Comparison of CRISPR1 alleles from naturalized and fecal phylogroup B1 E. coli strains revealed that the alleles could be categorized into four major distinct groups (designated G6–G9), and all four allele groups were found among naturalized strains and fecal strains. The distribution of CRIPSR G6 and G8 alleles was similar among strains of both ecotypes, while naturalized strains tended to have CRISPR G7 alleles rather than G9 alleles. Since CRISPR G7 alleles were not specific to naturalized strains, they, however, would not be useful as a marker for identifying naturalized strains. Notably, CRISPR alleles from naturalized and fecal strains also had similar spacer repertoires. This indicates a shared history of encounter with mobile genetic elements and suggests that the two populations were derived from common ancestors.


Introduction
Although the existence of naturalized E. coli that persist and multiply in aquatic environments is supported by an increasing number of studies (Ishii and Sadowsky 2008), the fundamental question remains of whether these naturalized E. coli represent an autochthonous population (i.e., self-sustaining in the absence of fecal input) or whether they are environmentally selected fecal contaminants. Although some strains associated primarily with aquatic environments have shown to be phylogenetically divergent from E. coli, such as those from cryptic clades III-V (Walk et al. 2009;Clermont et al. 2011), most studies suggest that naturalized aquatic populations are predominated by phylogroup B1 strains (Ratajczak et al. 2010;Berthe et al. 2013;Tymensen et al. 2015). However, phylogroup B1 strains are also abundant in the feces of certain livestock and wildlife (Higgins et al. 2007), making it difficult to determine whether naturalized E. coli populations are truly autochthonous. To address this, we compared the genetic relatedness of CRISPR (clustered regularly interspaced short palindromic repeats) arrays of naturalized and fecal phylogroup B1 E. coli.
CRISPRs canonically serve as part of an adaptive bacterial immune system against foreign nucleic acids (Horvath and Barrangou 2010). They consist of partially palindromic direct DNA repeats separated by spacers that are often derived from foreign genetic elements. The spacers can serve as templates for RNA-mediated interference with the exogenous genetic elements, thereby limiting horizontal gene transfer. Bacteria continually acquire CRISPR spacers from attacking foreign mobile genetic elements (Yosef and

Abstract
Clustered regularly interspaced short palindromic repeats (CRISPRs) are part of an acquired bacterial immune system that functions as a barrier to exogenous genetic elements. Since naturalized Escherichia coli are likely to encounter different genetic elements in aquatic environments compared to enteric strains, we hypothesized that such differences would be reflected within the hypervariable CRISPR alleles of these two populations. Comparison of CRISPR1 alleles from naturalized and fecal phylogroup B1 E. coli strains revealed that the alleles could be categorized into four major distinct groups (designated G6-G9), and all four allele groups were found among naturalized strains and fecal strains. The distribution of CRIPSR G6 and G8 alleles was similar among strains of both ecotypes, while naturalized strains tended to have CRISPR G7 alleles rather than G9 alleles. Since CRISPR G7 alleles were not specific to naturalized strains, they, however, would not be useful as a marker for identifying naturalized strains. Notably, CRISPR alleles from naturalized and fecal strains also had similar spacer repertoires. This indicates a shared history of encounter with mobile genetic elements and suggests that the two populations were derived from common ancestors.
Qimron 2015), ultimately generating a hypervariable spacer repertoire among different strains. This variability has been leveraged for use in genetic typing (Delannoy et al. 2012). Although CRISPR immunity does not appear to be highly active in present-day E. coli (Touchon et al. 2011), their alleles ostensibly reflect historical encounters with exogenous genetic elements. Since naturalized E. coli populations have likely encountered different genetic elements in aquatic environments compared to enteric strains residing in the intestine, we hypothesized that these two populations would have different CRISPR spacer repertoires owing to the incorporation of different foreign genetic elements present in their respective environments.

Experimental Procedures
Strains E. coli strains were obtained from a previously established collection of surface water, sediment, and fecal strains, that were isolated from the Milk River watershed in Alberta, Canada (Tymensen et al. 2015) ( Table 1). The naturalized strains included those from the ET-1 clade, along with other phylogroup B1 strains, and represented several clonal genotypes (based on accessory gene profiles) that were either specific to or numerically more abundant (i.e., overrepresented) in surface water and sediment compared to feces. Fecal strains were largely from cattle, as they were the predominant contributor of fecal contamination in the watershed; however, several ET-1 clade strains from other livestock and wildlife were also included, since few cattle ET-1 clade strains were present in the isolate collection.
CRISPR sequencing and analysis CRISPR1 arrays of each E. coli strain were amplified by PCR using primers flanking the iap (C1Fw, GTTATGCGGATAATGCTACC) and cas2 (C1Rev, CGTAYYCCGGTRGATTTGGA) genes, as previously described (Touchon et al. 2011). Forward and reverse sequencing of the PCR products was conducted by Functional Biosciences (Madison, WI). Consensus sequences were assembled using the Staden package v3.3. (Hinxton, UK) (Staden et al. 2003). Sequences were submitted to GenBank with accession numbers KT821503 to KT821545.
Analysis of CRISPR sequences was performed using the CRISPRdb database and CRISPRtionary tool (Grissa et al. 2007). Spacers were automatically numbered, with each distinct spacer being assigned a different number. A twobase mismatch for spacers was allowed for distinct spacer assignment. CRISPR1 sequences of representative reference strains from the E. coli reference (ECOR) collection and each of the four CRISPR sequence groups, G6 to G9 (as previously identified by Touchon et al. 2011), were obtained from GenBank.

MLST
Phylogenetic reconstruction was based on the seven-gene multilocus sequence typing (MLST) protocol described elsewhere (Shigatox.net). Forward and reverse sequences for each of the seven genes (aspC, clpX, fadD, icdA, lysP, mdh, and uidA) were assembled using the Staden package v3.3 (Hinxton, UK). (Staden et al. 2003). New alleles and sequence types were submitted to the Shigatox EcMLST database (Qi et al. 2004). For each E. coli strain, consensus sequences for each of the seven genes were concatenated and aligned using MUSCLE (default parameters) (Edgar 2004) as implemented from within Mega5 (Tamura et al. 2011). The alignment was imported into SplitsTree4 v.4.13.1 (Tubingen, Germany) and analyzed according to the Neighbour-Net algorithm (default parameters) (Huson and Bryant 2006). MLST data of representative E. coli strains from the ECOR collection, clade ET1, and several additional non-phylogroup B1 strains from the Milk River were included in the analysis for comparison purposes (Table S1). MLST sequences for reference strains from the ECOR collection were obtained from the Shigatox EcMLST database. Dr. Seth Walk (Montana State University, Bozeman, MT) kindly provided the MLST sequences for the reference clade ET-1 strains.

Results and Discussion
CRISPR1 arrays of 56 fecal and naturalized E. coli strains from a previous study (Tymensen et al. 2015) were sequenced (see Table 1). CRISPR sequence analysis identified a total of 177 distinct spacers among the surface water and fecal E. coli strains (Fig. 1). Among naturalized strains, 68 of the 74 common spacers, which were defined as being present in two or more strains, were also present in fecal strains, indicating that common spacer repertoires were largely similar (Table S2), which was contrary to our hypothesis. The spacers were arranged as 40 different alleles, with only one allele (ST35) that was shared by naturalized and fecal strains (Fig. 1). Allelic variation among naturalized strains was largely due to spacer deletion, where alleles from naturalized strains were similar to those of fecal strains, but missing spacers. The remaining variation could be attributed to the presence of strainspecific spacers in eight of the 21 alleles.
CRISPR alleles from naturalized strains contained an average of 11 ± 3 spacers per allele (mean ± SD) compared to fecal strains which had 13 ± 5 spacers per allele (Fig. 2). These values were not statistically significantly different (P = 0.08, Mann-Whitney Rank Sum Test). Using spacer count as a proxy for CRISPR activity (Gophna et al. 2015), it appears that naturalized and fecal strains have similar immunity to exogenous DNA. Reduced CRISPR immunity may facilitate the acquisition of environmentally adaptive genetic traits through the uptake of foreign DNA. This mechanism has been proposed to facilitate acquisition of virulence factors among pathogenic strains (Toro et al. 2014;Garcia-Gutierrez et al. 2015). While the data from the current study suggest that CRIPSR-mediated immunity (or lack thereof) does not play a major role in environmental adaptation among naturalized strains, this interpretation should be viewed cautiously as the number of naturalized and fecal strain used in this study was relatively  1. Graphical representation of CRISPR1 alleles from fecal and naturalized E. coli. Each spacer is represented by a square (direct repeats not shown). Identical spacers found in two or more strains have identical numbers and colors, while strain-specific spacers are white. Alleles were aligned with the most ancient spacers on the left. Gaps were introduced to improve spacer alignment. Isolate source and CRISPR sequence types (ST) are shown. Spacers were grouped according to four different spacer repertoire relatedness groups (G6 to G9, as previously defined by Touchon et al. 2011). 'MANY' includes strains ARDMR007, −017, −078, −090, −094, −098, −101, −102, −106, −108, −110, and −117. Reference strains were from the ECOR collection or Touchon et al. 2011 (strains 518, R379, 725, and R410). cs, cliff swallow; co, cow; hs, horse; sw, surface water/sediment; sh, sheep; de, deer; IS, insertion sequence.
small. It is therefore recommended that future studies in other watersheds should include larger numbers of strains, particularly given the tremendous diversity of CRISPR1 alleles observed among E coli strains. Previous examination of E. coli CRISPRs indicates that certain alleles are predominantly associated with specific phylogroups (Touchon et al. 2011). Among phylogroup B1strains, four major distinct groups of alleles, with almost completely different spacer repertoires, have been previously identified (herein referred to as G6 to G9, as designated previously). The majority of CRISPR alleles from the current study belonged to one of the four groups (Fig. 1), and all four allele groups were found among naturalized strains and fecal strains. Four alleles were uncategorized. To examine CRISPR alleles in the context of phylogenetic relatedness, the phylogeny was reconstructed based on MLST. Several strains from other major phylogroups were also included in the reconstruction   (naturalized, circle; fecal, square), and the CRISPR relatedness group (G6, red;G7, green;G8, purple;G9, teal;ungrouped, black;Touchon et al. 2011). Representative phylogroup B1 ECOR strains and clade ET-1 strains (denoted by the prefix TW) were also included. (Table S1). Among the phylogroup B1 strains, three distinct clades, including the previously identified naturalized ET-1 clade (Walk et al. 2007), were observed (Fig. 3). Similar phylogenetic structure has been reported among phylogroup B1 strains from soil (Bergholz et al. 2011), although no attempt was made to see if the clades in this current study corresponded with those of the previous study. Conspicuously, the different CRISPR allele groups tended to be conserved among strains of the same phylogenetic clade. For example, all strains with G7 alleles belonged to the ET-1 clade, while seven of the nine strains with G6 alleles clustered in a second phylogroup B1 clade, and half of the strains with G9 alleles clustered in a third phylogroup B1 clade. Some strains did not group with their respective clade. This is likely due to genetic recombination, which is noted to be especially common among phylogroup B1 strains (Almendros et al. 2014).
Looking specifically at the distribution of the different CRISPR alleles, it is noteworthy that the majority of strains that have CRISPR G7 alleles (13 of 19 or 68%) were naturalized. Conversely, the majority strains with CRISPR G9 alleles were of fecal origin including 11 of 16 strains (69%). CRISPR G6 and G8 alleles were evenly distributed among naturalized and fecal strains. This indicates that while naturalized strains are genetically diverse, there appeared to be a bias toward naturalized strains having CRIPSR G7 alleles rather than G9 alleles (P = 0.04, Fisher's exact test). This is consistent with notion that the ET-1 clade (in which all CRISPR G7 alleles are found) is a naturalized clade found in aquatic environments (Walk et al. 2007). Regardless of the bias, CRISPR G7 alleles were not specific to naturalized strains, and therefore not useful as a marker for identifying naturalized strains.

Conclusion
Despite the tremendous genetic diversity among the strains of both ecotypes, the observation that fecal and naturalized E. coli strains possess largely similar CRISPR spacer repertoires suggests these strains likely have a shared history of encounter with exogenous genetic elements. The most parsimonious explanation is that the strains were derived from common ancestral lineages and/or were from the same fecal sources. Likewise, MLST data also supports that naturalized phylogroup B1 strains are not genetically divergent from fecal strains, but rather, appear to represent a continuum within the global E. coli population. From a practical perspective, the large variation among individual strains will preclude the use of CRISPRs as typing markers for identifying naturalized populations.

Supporting Information
Additional supporting information may be found in the online version of this article: Table S1. Isolation source, sequence type (ST), and MLST data for additional non-phylogroup B1 strains analyzed in this study. Table S2. Characteristics of CRISPR alleles from naturalized and fecal E. coli strains.