Faecal pollution contains a rich and diverse community of bacteria derived from animals and humans, many of which might serve as alternatives to the traditional enterococci and Escherichia coli faecal indicators. We used massively parallel sequencing (MPS) of the 16S rRNA gene to characterize microbial communities from wastewater treatment plant (WWTP) influent sewage from 12 cities geographically distributed across the USA. We examined members of the Clostridiales, which included the families Clostridiaceae, Lachnospiraceae and Ruminococcaceae for their potential as sewage indicators. Lachnospiraceae was one of the most abundant groups of faecal bacteria in sewage, and several Lachnospiraceae high-abundance sewage pyrotags occurred in at least 46 of 48 human faecal samples. Clone libraries targeting Clostridium coccoides (C. coccoides) in sewage samples demonstrated that Lachnospiraceae-annotated V6 pyrotags encompassed the previously reported C. coccoides group. We used oligotyping to profile the genus Blautia within Lachnospiraceae and found oligotypes comprised of 24 entropy components that showed patterns of host specificity. These findings suggest that indicators based on Blautia might have the capacity to discriminate between different faecal pollution sources. Development of source-specific alternative indicators would enhance water quality assessments, which leads to improved ecosystem health and reduced human health risk due to waterborne disease.
Faecal pollution contains a broad array of microorganisms from animals and humans, the majority of which are faecal anaerobes (Franks et al., 1998; Eckburg et al., 2005; Ley et al., 2008). However, water quality surveillance relies upon a small subset of easily culturable facultative anaerobes, such as Escherichia coli (E. coli) or enterococci. These bacteria commonly occur in both animals and humans, thereby providing no information as to the source of faecal pollution. Faecal pollution remains a major source of water quality impairment of rivers, streams and coastal waters (USEPA, 2009). Receiving waters in watersheds often collect inputs from upstream rural and agricultural land and downstream urbanized regions, making it difficult to estimate relative contributions of various faecal pollution sources (sewage, agricultural animals, wildlife, etc.). Despite ambiguity about their source, the detection of faecal indicator bacteria commonly leads to advisories or closures of coastal beaches (USEPA, 2012).
Previous studies demonstrate that diet, host factors and host–microbe co-evolution shape the composition of gut microbiota (Ley et al., 2008; Sekelja et al., 2011; Shanks et al., 2011). If these processes have major influences on community structure, then characterization of host-associated microbiota might identify organisms that can serve as host-specific indicators of faecal pollution. Within the order Bacteroidales, traditional molecular methods have identified host-specific species and phylotypes. Terminal restriction fragment length polymorphism (TRFLP) and/or family-specific cloning and sequencing of nearly full-length 16 S rRNA genes by Sanger technologies have identified diagnostic phylotypes for humans, cows, pigs, dogs, etc. (Bernhard and Field, 2000a; Dick et al., 2005; Fogarty and Voytek, 2005; Kildare et al., 2007; Fremaux et al., 2009). Two established human-specific Bacteroides assays (Bernhard and Field, 2000b; Kildare et al., 2007) target the V2 region of very closely related phylotypes. Subtractive hybridization and genomic enrichment of the metagenome have identified candidate alternative indicators for host-specific assays, including a human-specific assay that targets an unidentified enzyme of a Bacteroides spp. (Shanks et al., 2007). While Bacteroidales is perhaps the most studied taxonomic group for the development of alternative indicators, a few indicators have also been described for Bifidobacteriaceae (Bonjoch et al., 2004; Gomez-Donate et al., 2012), and recently Lachnospiraceae (Newton et al., 2011).
In recent years the Human Microbiome Project (HMP) has generated large molecular data sets that provide new information about the complexity of microbial community composition in humans. Bacteroidales and Clostridiales are among the most abundant faecal anaerobes and the most diverse (Robinson et al., 2010). These studies demonstrate large interpersonal variation (Turnbaugh et al., 2009; Robinson et al., 2010; Lozupone et al., 2012), which makes it difficult to identify the most common and abundant host-specific microbes across human populations. Only a few studies have used analysis of treated or untreated sewage to search for novel indicators of human faecal pollution (McLellan et al., 2010; Wery et al., 2010). Sewage effectively represents a random, composite sampling of tens of thousands to millions of individuals, which circumvents the issue of individual variability in identifying common microorganisms in the human population.
Constancy of Bacteroidales and Clostridiales microbial communities has been reported across different wastewater treatment plants (WWTPs) (McLellan et al., 2010; Wery et al., 2010) and over a 2-year period in a single plant (McLellan et al., 2010). This paper examines the population structure of Clostridiales, and in particular Lachnospiraceae, a robust group of organisms that commonly occur in the gut of humans and other animals. Here, we characterize 38 sewage samples collected over a 4-year period from one city and 11 sewage samples collected from diverse geographic regions to demonstrate that surveys of sewage allow us to describe microbial community structure in the human population. We identified a complex array of unique Lachnospiraceae V6 pyrotags that appear to represent abundant, human-specific microbial populations. These Lachnospiraceae taxa have the potential to serve as alternative indicators of sewage that can differentiate between human and non-human faecal pollution sources.
Distribution of Clostridiales in sewage, human, cattle and chickens
Table 1 lists the sewage and host samples that we included in this study. The family level composition of Clostridiales in sewage influent samples generally was similar to a composite data set of 48 human faecal samples; however, sewage reflected a higher relative abundance of Peptostreptococcaceae and Veillonellaceae (Fig. 1). Sequences annotated as Lachnospiraceae made up the majority of Clostridiales sequences in both the sewage and human faecal samples. Lachnospiraceae comprised 5.6% of the total microbial community in Milwaukee sewage samples and, on average, 6.2% of the total microbial community in sewage from multiple cities. Cattle faecal samples contained significantly higher Ruminococcaceae compared with sewage or humans (P < 0.05). Chickens had a low relative abundance of Clostridiales, most of which mapped to Lachnospiraceae.
Table 1. Sewage, human, cattle and chicken data sets used in this study
No. of samples
No. of high-quality bacterial reads
No. of high-quality Clostridiales reads
aThe city and states included Seattle (WA), Duluth (MN), Rutland (VT), Albany (NY), Crystal Lake (IL), Clarksburg (WV), Elk Grove (CA), Tulsa (OK), Las Vegas (NV), Tallahassee (FL) and Kihei (HI).
High-abundance Clostridiales pyrotags in sewage and humans
More than half of the top 30 most abundant Clostridiales pyrotags in sewage represented the family Lachnospiraceae, and abundance patterns were similar to the human faecal data set (Fig. 2). These abundant Lachnospiraceae pyrotags were rarely present in the cow or chicken faecal data sets. Ruminococcaceae, specifically pyrotags classified as Faecalibacterium sp., were also among the most abundant in the sewage and human faecal data sets, but many of these pyrotags were also present in the cow faecal data set. The top two most abundant V6 pyrotags in sewage resolved to Lachnospiraceae and matched with 100% identity to uncultured Lachnospiraceae A1-86 (pyrotags annotated as Roseburia) and Blautia wexlerae (pyrotags annotated as Blautia) respectively (Fig. 2). The majority of abundant pyrotags in the sewage data set, but not present in the human data set, were from the Clostridiaceae, Veillonellaceae and Peptostreptococcaceae families.
Relating Lachnospiraceae pyrotags to the Clostridium coccoides group
We generated 2018 near full-length 16S rRNA gene sequences from a subset of sewage samples using a primer set targeting the C. coccoides group. A total of 307 sequences were unique and 305 of those classified to the family Lachnospiraceae, mapping to unclassified Lachnospiraceae (33.1%), Blautia (26%) and Lachnospiraceae Incerte sedis (24.0%). Only two clones were not classified as Lachnospiraceae, instead mapping to Veillonellaceae, genus Anaeroarcus. All of the 30 most abundant Lachnospiraceae pyrotags in the sewage and human faecal data sets matched (had 100% identity) to at least one of the cloned sequences (Fig. 2). The great majority of cloned sequences (92%) matched a Lachnospiraceae pyrotag, but only 44% of the unique pyrotags matched one of our cloned sequences. Given the depth of sequencing (62 092 Lachnospiraceae annotated pyrotags compared with 2018 cloned sequences), we expected that lower-abundance pyrotags would not be represented in the cloned library. Given an equal sequencing depth for the two sequencing methods, we estimate that, 87% of the Lachnospiraceae pyrotags data set would be represented by the clone library.
Identification of core Lachnospiraceae in humans and sewage
The top three most abundant Lachnospiraceae sewage pyrotags (representing 15.5% of sewage Lachnospiraceae pyrotags) occurred in all 48 human faecal samples (Fig. 3), and the top eight most abundant Lachnospiraceae sewage pyrotags (representing 27.1% of sewage Lachnospiraceae pyrotags) occurred in at least 46 of 48 individuals, although the relative abundance of these pyrotags in any individual human faecal sample was highly variable (Table S1). For example, the most abundant Lachnospiraceae sewage pyrotag, which corresponded to a Roseburia (Fig. 2) and accounted for 6.1% of the Lachnospiraceae pyrotags in sewage, represented the most abundant Lachnospiraceae pyrotag in 29 of the 48 human faecal samples, averaging 13% of the Lachnospiraceae. However, this pyrotag was not among the top Lachnospiraceae in seven of the individuals and averaged less than 0.3% of the Lachnospiraceae recovered. The second most abundant Lachnospiraceae sewage pyrotag mapped to the genus Blautia (Fig. 2) and was highly abundant in 19 of 48 individuals with distribution patterns among human faecal samples that paralleled the Roseburia-classified pyrotag.
In Milwaukee sewage, the top five most abundant Lachnospiraceae pyrotags exhibited stable rank abundance patterns across multiple samples collected over multiple years (2005 and 2007–2009) (Fig. 4). Pyrotags further down the rank abundance distribution exhibited more rank variability over temporal scales, but were always relatively dominant in sewage influent. We also compared pyrotags from Milwaukee sewage (averaged across the 38 samples) with 11 sewage samples from across the USA. The rank abundance of Lachnospiraceae pyrotags in Milwaukee sewage was highly correlated with the rank abundance in 11 other cities (rho = 0.8441, P < 0.001). Only one of 11 cities (Tulsa) had a disparate pattern, but the correlation was still significant (rho = 0.4103, P < 0.001). Table S2 summarizes individual WWTP correlations based upon comparisons of ranked abundance of Lachnospiraceae V6 pyrotag sequences.
Network analysis of Clostridiales
Network analysis was used to examine the distribution of the 400 most abundant Clostridiales V6 pyrotags from four faecal sources: humans, sewage, cattle and chickens (Fig. 5). When considering all Clostridiales taxa in sewage, we found Lachnospiraceae was the family with the most pyrotags shared between humans and sewage and these pyrotags were rarely present in the cow and chicken data sets. Overall 31.0% of Lachnospiraceae pyrotags in sewage overlapped with humans, 10.4% overlapped with cattle and 0.5% overlapped with chickens (Table 2). Ruminococcaceae was less specific; 18.5% of the sewage pyrotags overlapped with cattle. The number of unique Clostridiaceae pyrotags was approximately 10-fold lower than what was found with Lachnospiraceae, and the vast majority (94.5%) were unique to sewage. Further, there was more overlap between sewage Clostridiaceae pyrotags and cows and chickens than there was with the human faecal samples.
Table 2. Clostridiales sewage pyrotags shared with humans, cattle and chickens
Family (n unique pyrotags in sewage)
Sewage pyrotags shared with other sources
Lachnospiraceae (n = 8933)
Ruminococcaceae (n = 5156)
Clostridiaceae (n = 770)
Oligotypes within Blautia
To explore the level of host specificity within the genus Blautia we used oligotyping, a supervised computational method that can detect very subtle nucleotide variations among closely related taxa, thereby facilitating the identification of closely related but distinct organisms that may not be detected by taxonomic classification or de facto 3% clustering methods (Eren et al., 2011). Oligotyping analysis of 152 730 V6 reads that mapped to Blautia from our four source data sets revealed a total of 108 oligotypes. Figure 6 shows the distribution of oligotypes among samples. Some oligotypes exhibited remarkable host specificity. These oligotypes occurred only in chicken, only in cattle or only in human and sewage samples, indicating that V6 pyrotags could be used for faecal source identification. Figure 7 illustrates eight of the host-specific oligotypes and their abundance in individual samples. Table 3 lists the total counts and their parent V6 reads.
Table 3. Full-length V6 reads for oligotypes shown in Fig. 7
Count column shows the total number of reads represented by a given oligotype.
Lachnospiraceae are candidates for alternative indicators
Traditional faecal indicators can detect faecal pollution, but they fall short for identifying causes of poor water quality (Field and Samadpour, 2007; Stewart et al., 2008). Yet the efficient use of our limited resources for mitigation and, ultimately, reduction of human health risks requires information about the specific sources of faecal pollution. With the rapid advances in sequencing technologies, it is now possible to characterize microbial communities and their structure in great depth and interrogate these data sets to identify new host-specific indicators of faecal pollution. Ideally, host-specific indicators will: (i) represent abundant taxa in the host of interest, thereby maximizing sensitivity for detection, (ii) not occur in other hosts and thus provide specificity and (iii) prove to be robust over a large geographic region (NRC, 2004).
Clostridiales, a major group within the human gut microbiome, has been largely underexplored for identification of human-specific indicators (McLellan et al., 2010; Wery et al., 2010). In humans, the major families within Clostridiales include Lachnospiraceae, Ruminococcaceae and, to a lesser extent, Clostridiaceae (Eckburg et al., 2005). Lachnospiraceae is estimated to comprise 19% to 50% of faecal microbiota (Hayashi et al., 2002; 2006; Hold et al., 2002; Matsuki et al., 2004; Rajilic-Stojanovic et al., 2009; Gosalbes et al., 2011). Initial investigations of Lachnospiraceae suggest this group contains many organisms that could be used as faecal indicators (McLellan et al., 2010; Newton et al., 2011). Our profiling of untreated sewage using V6 sequencing revealed Lachnospiraceae was one of the most numerically dominant taxonomic groups, despite the overprinting of non-faecal bacteria that comprised nearly 85% of the total community (McLellan et al., 2010). From these data, we developed a qPCR assay that targets the second most abundant Lachnospiraceae (matching Blautia wexlerae) in our Milwaukee sewage samples and designated this gene sequence as Lachno2. We then demonstrated that the Lachno2 qPCR assay correlated to a previously described qPCR assay targeting a human Bacteroides sp. and provided evidence of chronic human faecal pollution in surface waters (Newton et al., 2011).
The evolving nomenclature for the Clostridiales makes it difficult to relate current and past studies of community structure and diversity. Within Clostridiales, Collins and colleagues (1994) proposed Clostridium clusters I to XIX. The C. coccoides group is analogous to Clostridium cluster XIVa (Collins et al., 1994; Matsuki et al., 2002), which consists of up to 20 different genera including Anaerostipes, Butyrivibrio, Clostridium, Roseburia and Ruminococcus to name a few (Hayashi et al., 2006; Liu et al., 2008). Several of these named genera have been reclassified into the recently described genus Blautia (Liu et al., 2008). Cloning with the previously described primer set targeting the C. coccoides group (Matsuki et al., 2002) generated clones representing much of the diversity in our Lachnospiraceae-annotated V6 pyrotags, including the most dominant pyrotags present in sewage and humans (Fig. 2). The general abundance patterns between clones and pyrotags were very similar, and the sequencing depth of the clone library appeared to be the largest factor limiting our ability to capture the Lachnospiraceae diversity identified in the pyrotags. Overall, our results suggest that the Lachnospiraceae pyrotags described in this study are analogous to the previously reported C. coccoides group and Clostridium cluster XIVa and include unclassified Lachnospiraceae not previously described.
Abundant sewage Lachnospiraceae pyrotags represent core human faecal microbiota
The high individual diversity but consistent metabolic pathways of the human gut microbiota suggests the presence of a microbiota functional core rather than a phylogenetic core (Eckburg et al., 2005; Turnbaugh et al., 2009; Robinson et al., 2010; Lozupone et al., 2012). Multiple studies have identified genera, phylogroups or phylotypes (represented by sequences or OTUs) that consistently occur in humans (Rajilic-Stojanovic et al., 2009; Tap et al., 2009; Qin et al., 2010; Turnbaugh et al., 2010; Sekelja et al., 2011). However, the prevailing thought is that it is unlikely that a core set (present in all humans) of microbial species exists for the human gut (Lozupone et al., 2012). Sewage represents faecal microbiota from thousands to millions of people, and represents the phylotypes, OTUs, etc. that are the most common among a human population. Therefore, the most abundant and most consistently present faecal microbes in sewage are likely to represent what could be considered a core gut community. Members of Lachnospiraceae (i.e. Clostridium cluster XIVa) are among the most frequently identified as ‘core’ gut microbes (Sekelja et al., 2011; Lozupone et al., 2012).
In sewage, we found the same Lachnospiraceae high-abundance pyrotags in all 12 cities' WWTP influents, which suggests the presence of a cosmopolitan distribution for some gut bacteria in the human population of the USA. The most abundant of these shared Lachnospiraceae pyrotags were particularly stable in terms of rank relative abundance (i.e. most abundant, second most abundant, etc.) over a 4-year period in Milwaukee's sewage influent. Instead of representing a binary relationship (either stable or not stable), there was a continuum of decreasing stability, where increasingly lower ranked pyrotags (i.e. lower overall relative abundance among Lachnospiraceae in sewage) showed increasing variability in rank abundance (Fig. 4). This pattern suggests to us that some pyrotags are present in a large percentage of Milwaukee's population and could be considered core for this city, while other pyrotags occur in a smaller percentage of people, which leads to their increased variability. Sekelja and colleagues (2011) found that ‘core’ phylogroups were more stable over time than non-core, which supports our hypothesis that the highest abundance sewage pyrotags in sewage represent core microbiota.
Surveying sewage for abundant and specific indicators is useful, but such characterizations could have much broader applications. Sewage represents trends in a particular human population that cannot be readily observed by sampling a limited number of individuals. We have found that WWTP influent displays a more consistent pyrotag profile than that of individual human faecal samples (Fig. 6; Newton et al., 2011). In the present study, the relative abundance of the eight most abundant Clostridiales pyrotags in sewage was highly variable in 48 human faecal samples, which is consistent with other reports describing the abundance patterns of ‘core’ members (Qin et al., 2010; Turnbaugh et al., 2010; Sekelja et al., 2011). Interestingly, individuals whose microbiota was dominated by Bacteroides were more likely to have only minor representation of high-abundance sewage Lachnospiraceae (Table S1). Rather, the predominant Lachnospiraceae present in these individuals were those that were more rare across the averaged human population represented in sewage. This result suggests sewage profiles provide a benchmark for norms in a human population. Simple averaging of highly diverse individuals would not be sufficient to establish this same benchmark. We suggest sewage may be used to observe microbial community patterns in the human population that are linked to population level statistics such as age, health or dietary habit.
The role of Lachnospiraceae in humans and host-specific patterns
We focused our efforts on Lachnospiraceae because of its diversity and high abundance in sewage and humans. Further, a large number of pyrotags within the family Lachnospiraceae were found in both sewage and human faecal samples, but not in cattle or chicken faeces, suggesting that Lachnospiraceae might serve important functional roles specific to humans. Ongoing research suggests this may be the case. It is thought that Lachnospiraceae taxa have an important role in maintaining gut homeostasis (Frank et al., 2007) and are involved in human metabolism as butyrate producers (Sekelja et al., 2011). Lachnospiraceae taxa also appear to be important for the exclusion of pathogens (Reeves et al., 2012). Given the depth of Lachnospiraceae diversity observed in this study and by others, it remains unclear to what extent cultured strains represent the functional diversity of this group (Hayashi et al., 2002), particularly in relation to traits accounting for host specificity. To understand the functional role fulfilled by Lachnospiraceae, and whether or not these roles contribute to host specificity, cultured organisms that represent the range of diversity in the natural population are needed.
In contrast to Lachnospiraceae, the Ruminococcaceae and Clostridiaceae families did not show as much promise as groups that harboured large numbers of indicator organisms that could be used to detect human faecal pollution. Humans and cows and, to a lesser extent, chickens shared many Ruminococcaceae pyrotags including several high-abundance pyrotags present in the sewage data set (Fig. 2). In a previous survey of farm animals and humans, the Clostridium leptum group (i.e. Clostridium group IV, encompassed within Ruminococcaceae) was constantly present and exhibited low variability in abundance between humans and a number of animals including rabbits, goats, horses, sheep, cows and pigs (Furet et al., 2009). This same study demonstrated C. coccoides levels distinguished humans from the majority of these same sources, but not pigs (Furet et al., 2009). The genus Clostridium and other Clostridiaceae occurred at relatively low abundance in the human, cow and chicken data sets (Fig. 1). There were, however, four abundant sewage pyrotags that did not occur in the human, cattle or chicken data sets, suggesting Clostridiaceae or Clostridium sp. may serve as indicators for animals not tested in this study. Alternatively, these pyrotags may represent non-host-associated, free-living organisms. Additional examination of the occurrence in humans and analysis of more animal samples would clarify the range of host specificity among these Clostridiales taxa.
Small sequence variations in the 16S rRNA gene indicate host specificity
Large data sets of short-read sequences generated by massively parallel sequencing (MPS) show promise for identification of indicators to track faecal pollution sources; however, sensitive approaches are needed to discriminate among organisms with closely related 16S rRNA gene sequences but different ecological characteristics. Aggregating sequences into groups (OTUs) can reduce resolution to the point that the full suite of potential candidates cannot be identified. In this study and others by our lab we found unique V6 pyrotags that represented ecologically meaningful populations. For example, a single base pair change in pyrotags within the genus Acinetobacter mapped to two populations whose relative abundance fluctuated seasonally but inversely in urban sewer infrastructure (VandeWalle et al., 2012). In this study, we found several pyrotags (i.e. unique V6 sequences) that appeared in sewage and humans but not in chickens or cows, while a pyrotag with only a slight sequence variation to the human-specific pyrotag was present in one of the animal faecal data sets (Table 3). The host distribution, abundance patterns and relationship to phylogenetically distinct 16S rRNA genes (Figs 2 and 4) suggest that the pyrotags represent ecologically relevant phylotypes. Host-associated phylotypes, distinguished by small variations in 16S rRNA gene sequences, also have been observed in the studies of Bacteroidales (Dick et al., 2005; Jeter et al., 2009).
In this study, oligotyping was a useful tool for systematically identifying small sequence changes corresponding to within-genus sequence-based groupings that differentiated host organism microbial communities. By utilizing Shannon entropy to identify nucleotide locations of high variation among very closely related taxa, oligotyping can elaborate the differences among samples with respect to the chosen taxon. Patterns of host-associated sequence types may not be easily inferred from phylogenies, as many of our identified host associations were represented by only a few nucleotide substitutions and thus would be represented by divergence only at the tips of phylogenetic representations of communities. In this study we targeted Blautia for oligotyping specifically because the second most abundant Lachnospiraceae pyrotag (Lachno2) appeared specific to humans and was classified to this genus (Newton et al., 2011). The great diversity that is concealed within Blautia but revealed by oligotyping suggests that there may be other genera in the Lachnospiraceae family that can be used for further identification of host-specific indicators. Oligotyping shows great promise for being able to confidently distinguish a large range of hosts' microbes typically found in environmental samples.
Clustering of MPS data (i.e. OTUs) is frequently used to mitigate artefacts caused by sequencing errors (Huse et al., 2010). In this study, we did not use clustered sequence data, instead we chose the more stringent criterion of unique sequence groups for all analyses. Primarily sequence errors affect MPS data by artificially increasing the sequence types present in a sample. Since our primary objective was to identify common pyrotags or oligotypes across multiple samples and in this case many different sequencing runs, the confounding issue of random errors increasing sample diversity is not likely to have had a large effect on our analysis. Instead, by examining exact sequences and through oligotyping, we identified important patterns related to small sequence variations that would have been missed if cluster analysis had been used.
Applications for detecting sources of faecal pollution
In previous source tracking studies that focused on the C. coccoides group (i.e. Lachnospiraceae) as an indicator of human faecal pollution, either PCR was used as a faecal detection method (Bonkosky et al., 2009), or abundance patterns among faecal groups were used to distinguish humans from farm animals (Furet et al., 2009). In this study, MPS provided a higher resolution of the Clostridiales, particularly Lachnospiraceae, community structure leading to the identification of hundreds of candidate host-specific indicators. However, only a limited range of host animals was examined; therefore, more rigorous validation is needed to determine the extent of candidate indicator host specificity. In addition, these results only reflect microbiota data collected from humans, sewage and animals within the USA and thus, may or may not reflect sewage signature potential in other countries. Identification of potential candidates, or candidate genera, as in the case of Blautia, will streamline this process so that a combination of MPS approaches and targeted qPCR could be used to validate these alternative indicators.
As a first tier assessment of faecal pollution sources, distinguishing human sources from non-human sources is important because human faeces is a major reservoir for human pathogens. We found Lachnospiraceae to be the most promising group for identification of human host-specific indicators among Clostridiales. Multiple pyrotag and oligotype sequences identified in this study appear to be ecologically distinct and warrant further investigation. In contrast to Lachnospiraceae, Ruminococcaceae pyrotags were more commonly shared among host sources, particularly between humans and cattle, reducing specificity. Clostridium was neither commonly abundant nor frequently human-specific, which would impair sensitivity and specificity. Applications using a MPS approach have provided unprecedented insights into the population structure of human faecal communities and have documented high diversity among common taxa. Due to transient colonization of multiple hosts and equivalent niches among hosts, it is unlikely that a single indicator will be exclusively specific to a single host source, and have appropriate sensitivity for quick detection methods. Rather, future studies should consider using a suite of indicators, which are more likely to provide the specificity and sensitivity needed to profile contaminated water samples (Wu et al., 2010; Newton et al., 2013). The ecology of indicator organisms post release into the studied environment and the use of different taxonomic groups (e.g. Gram-positive vs. Gram-negative organisms) covering a range of persistence times in that environment should be considered and could be used to discern recent and past contamination events. As technology advances, such approaches will move from the research arena to improved tools for water quality assessments that are necessary for more efficiently addressing pollution concerns. We suggest a particularly powerful approach would be to identify and then incorporate a suite of general and host-specific phylotypes into platforms such as phylochips (Wu et al., 2010) or other rapid sequence profiliers that can characterize the community of impacted surface waters.
Analysis of 454 pyrosequencing data from sewage, humans, cattle and chickens
Three previously published data sets and one new data set with five chicken faecal samples were used to assess Clostridiales population structure. A total of 132 samples were used for analysis; their sources and previous studies are shown in Table 1. The chickens were from the same farm collected in Athens, GA at the US Department of Agriculture research facility. After collection, samples were frozen immediately and shipped on ice to the EPA. Upon arrival in the lab, the samples were stored at −80°C until time of DNA isolation (< 6 months). The DNA from five chicken faecal samples was sequenced as described previously (Shanks et al., 2011). Briefly, we amplified the V6 hypervariable region of the 16S rRNA coding region using a mixture of five fused primers at the 5′ end of the V6 region (E. coli positions 967–985) and four primers at the 3′ end (E. coli positions 1046–1028) to capture the breadth of diversity of rRNA sequences represented in molecular databases (Sogin et al., 2006; Huber et al., 2007). We amplified libraries from at least three independent PCR cocktails for each sample to minimize the impact of potential early-round PCR errors. Amplicons were prepared and sequenced using the Roche Genome Sequencer GS-FLX according to the Roche standard protocols.
The relative abundance of Clostridiales and the family composition in the different sources (sewage, humans, cattle and chickens) was determined using normalized (to the smallest) data sets of all bacterial tags. Approximately 1.38 M Clostridiales pyrotags were parsed from these data sets and used to assess the population structure and phylogenetic relationships of matched reference sequences and cloned representatives. For comparisons of pyrotags distributed among samples for network analysis, data were normalized to the amount of Clostridiales in each sample.
Phylogenetic tree reconstruction and heatmap
The 30 most abundant Clostridiales in sewage and the 30 most abundant in human faecal samples (12 sequences were shared between both sources) were chosen for construction of a heatmap. Only one representative in the family Clostridiaceae was among the 30 most abundant in sewage or humans, so nine additional pyrotags representing the most abundant Clostridiaceae were added to the heatmap data set. Peptostreptococcaceae and Veillonellaceae are not commonly found in humans and their abundance patterns in sewage suggest some members of these families are free-living (McLellan et al., 2010); therefore, Peptostreptococcaceae was not included in the tree but Veillonellaceae was included for a point of reference in the phylogenetic analyses.
A reference ARB database of all near full-length 16S rRNA gene sequences representing the families Lachnospiraceae, Clostridiaceae, Ruminococcaceae and Veillonellaceae was downloaded from the Silva database (Pruesse et al., 2007) (May 2010). The subset of abundant pyrotags used in the heatmap was added and aligned to the ARB database using the FAST_ALIGNER tool (Ludwig et al., 2004), before the sequences were heuristically adjusted using the rRNA secondary structure as a guide. Near full-length sequences that were identical to the tag sequences were identified and used for phylogenetic analysis. A mask trimming sequences to an equal length was applied before we used the ARB neighbour-joining algorithm for phylogenetic reconstruction.
A network analysis was implemented to visualize the relationship of the top 400 most abundant Clostridiales pyrotags from the average of 38 Milwaukee sewage influent samples. The network was generated with Cytoscape version 2.7 (Shannon et al., 2003) by implementing an edge-weighted spring-embedded model (Eades, 1984). Sample sources: Milwaukee sewage influent, human faeces, cow faeces and chicken faeces are represented by averages of the samples included in each data set. See Table 1 for data set details. Lines in the network indicate a pyrotag was present in the connected data set, where the thickness of the line represents the relative abundance of the pyrotag in that data set.
Identification of core microbiota in humans
To assess if abundant sewage pyrotags represented core phylotypes in humans, Lachnospiraceae-annotated V6 pyrotags were selected from the normalized total bacterial data set and the number of individuals in which a particular pyrotag occurred was counted and plotted against the abundance in sewage. We also compared the abundance rank to determine if the high-abundance sewage pyrotags were also most abundant in individuals. We defined the top eight ranked pyrotags as ‘high abundance’ because this cut-off encompassed 27.1% of the total Lachnospiraceae pyrotag sewage data set and these pyrotags were found in at least 46 of 48 individuals. The eight top ranked pyrotags in humans corresponded to 30% of the total Lachnospiraceae pyrotags in the composite data set of individuals.
Clostridium coccoides clone libraries
Clone libraries were constructed in order to generate longer sequences (∼ 900 bp) from the family Lachnospiraceae that could be compared with V6 pyrotag data (∼ 60 bp). Lachnospiraceae was PCR amplified from five sewage influent samples from two Milwaukee WWTPs (SS and JI) on three separate dates (18 August 2008, 19 November 2008 and 22 April 2009) using a combination of two forward primers, the previously published group-specific primer g-CcocF (Matsuki et al., 2002) and a second primer with two base pair changes that included unclassified Lachnospiraceae, designated BF-063 with the sequence 5′ AAGTGACGGTACCTGAATAA 3′. These forward primers in a 3:1 ratio, respectively, were paired with the universal reverse primer (1492R) so that the amplification product included the V6 region. PCR product was purified using QIAGEN PCR purification kit (Qiagen, Valencia, CA). Products were cloned into pCR2.1 vector using the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). Plasmid DNA was isolated using a manual method adapted to a 96-well microtitre plate format (Sambrook and Russell, 2001). Sequencing was carried out from the M13R primer using the ABI Big Dye Terminator Kit (Applied Biosystems, Foster City, CA) on an ABI Prism 3700xi (Applied Biosystems, Foster City, CA), which generated on average 800 bp reads. Sequences were trimmed for quality using PHRED (Ewing and Green, 1998), vector sequence was removed and sequences less than 500 bp were removed from further analyses. A total of 2070 sequences were generated from the five clone libraries with 2018 high-quality sequences used for comparisons after quality filtering and removal of chimeras identified by Mallard (Ashelford et al., 2006). Sequences flagged by Mallard were analysed using Chimera Check (Cole et al., 2003) to verify. Clones were then blasted against the pyrotags to determine what percentage of the Lachnospiraceae family represented by the pyrotags was also represented by our clones.
For oligotyping analysis we used 152 730 quality-controlled V6 reads from 132 samples that were identified as Blautia by GAST (Huse et al., 2008). Reads were aligned with PyNAST (DOI 10.1093/bioinformatics/btp636) using the GreenGenes (DeSantis et al., 2006) gold standard 16S rRNA gene sequence templates for Blautia. Following the entropy analysis oligotyping was performed with 24 components using the version 0.6 of the oligotyping pipeline (available from http://oligotyping.org). To reduce noise, we imposed requirements that each oligotype must: (i) appear in at least three samples, (ii) occur in more than 1% of the reads for at least one sample and (iii) have a most abundant unique sequence to occur at a minimum of 30 reads. After removal of oligotypes that did not meet these criteria, the analysis retained 140 804 reads (92.19% of the original reads). Oligotyping analysis identified 108 oligotypes, 93 of which perfectly matched sequences in NCBI's nr database over their entire length.
Student's t-test was used to assess significance of family abundance differences among host sources. Standard Spearman rank correlations were carried out using the R package (R Development Core Team, 2012).
Sequence data submission
Clostridium coccoides cloned sequences from libraries are deposited under GenBank Accession Numbers JX228967–JX230954. Other sequences already published in Newton and colleagues (2011) from these same libraries are deposited under JF826248 to JF826279. Pyrotag sequences from chickens and sewage from geographically dispersed cities, as well as previously published data for sewage samples, cattle and human faecal samples (Table 1), are available through VAMPS (http://www.vamps.mbl.edu).
This work was supported by the grants 1R21AI076970-02 and 1R01AI091829-01A1 to S. L. M. and NSF/BDI 0960626 to S. M. H. We would like to thank Giles Goetz for bioinformatics support in the clone library comparisons and Morgan Schroeder for assistance with network analysis. Information has been subjected to the US EPA's peer and administrative review and has been approved for external publication. Any opinions expressed in this paper are those of the author(s) and do not necessarily reflect the official positions and policies of the US EPA. Any mention of trade names or commercial products does not constitute endorsement or recommendation for use.