Application of Illumina next-generation sequencing to characterize the bacterial community of the Upper Mississippi River


  • C. Staley,

    1. BioTechnology Institute, University of Minnesota, St. Paul, MN, USA
    Search for more papers by this author
  • T. Unno,

    1. BioTechnology Institute, University of Minnesota, St. Paul, MN, USA
    Current affiliation:
    1. Applied Bioinformatics Lab (ABL), Jeju National University, Jeju-si, Jeju-do, Republic of Korea
    Search for more papers by this author
  • T.J. Gould,

    1. BioTechnology Institute, University of Minnesota, St. Paul, MN, USA
    2. Biology Program, University of Minnesota, Minneapolis, MN, USA
    Search for more papers by this author
  • B. Jarvis,

    1. BioTechnology Institute, University of Minnesota, St. Paul, MN, USA
    2. Biology Program, University of Minnesota, Minneapolis, MN, USA
    Current affiliation:
    1. Department of Biology, University of Wisconsin-Platteville, Platteville, WI, USA
    Search for more papers by this author
  • J. Phillips,

    1. Biology Program, University of Minnesota, Minneapolis, MN, USA
    Search for more papers by this author
  • J.B. Cotner,

    1. Department of Ecology, Evolution and Behavior, University of Minnesota, Saint Paul, MN, USA
    Search for more papers by this author
  • M.J. Sadowsky

    Corresponding author
    1. BioTechnology Institute, University of Minnesota, St. Paul, MN, USA
    2. Department of Soil, Water and Climate, University of Minnesota, St. Paul, MN, USA
    • Correspondence: Michael J. Sadowsky, BioTechnology Institute, University of Minnesota, 140 Gortner Lab, 1479 Gortner Avenue, St. Paul, MN 55108, USA.


    Search for more papers by this author



A next-generation, Illumina-based sequencing approach was used to characterize the bacterial community at ten sites along the Upper Mississippi River to evaluate shifts in the community potentially resulting from upstream inputs and land use changes. Furthermore, methodological parameters including filter size, sample volume and sample reproducibility were evaluated to determine the best sampling practices for community characterization.

Methods and Results

Community structure and diversity in the river was determined using Illumina next-generation sequencing technology and the V6 hypervariable region of 16S rDNA. A total of 16 400 operational taxonomic units (OTUs) were observed (4594 ± 824 OTUs per sample). Proteobacteria, Actinobacteria, Bacteroidetes, Cyanobacteria and Verrucomicrobia accounted for 93·6 ± 1·3% of all sequence reads, and 90·5 ± 2·5% belonged to OTUs shared among all sites (n = 552). Among nonshared sequence reads at each site, 33–49% were associated with potentially anthropogenic impacts upstream of the second sampling site. Alpha diversity decreased with distance from the pristine headwaters, while rainfall and pH were positively correlated with diversity. Replication and smaller filter pore sizes minimally influenced the characterization of community structure.


Shifts in community structure are related to changes in the relative abundance, rather than presence/absence of OTUs, suggesting a ‘core bacterial community’ is present throughout the Upper Mississippi River.

Significance and Impact of the Study

This study is among the first to characterize a large riverine bacterial community using a next-generation-sequencing approach and demonstrates that upstream influences and potentially anthropogenic impacts can influence the presence and relative abundance of OTUs downstream resulting in significant variation in community structure.


The Mississippi River is the largest watershed in the United States running approximately 4000 km from its headwaters at Lake Itasca, Minnesota to the Gulf of Mexico and draining an area of approximately 2·9 million km2 (Pereira and Hostettler 1993). The river is used as a major source of drinking water by many cities along its length and for a variety of transportation, recreational, industrial and agricultural purposes. It is also known to be impacted by a variety of point and nonpoint sources of pollution which contribute chemicals (e.g. herbicides, pesticides), nutrients, pharmaceuticals, heavy metals, and nonindigenous bacteria (Rada et al. 1990; Pereira and Hostettler 1993; Topp et al. 2008). These contributions likely significantly alter the bacterial community structure within the water column, even over short distances. Furthermore, differences in pollutant sources (e.g. agricultural runoff or municipal wastewater discharge) may alter the presence or abundance of specific taxa in a source-dependent manner (Zampella et al. 2007; Tu 2011). Determination of upstream communities and their influence on the operational taxonomic units (OTUs) present in downstream bacterial communities may allow for the identification of OTUs that reflect specific sources of pollution along the river (Unno et al. 2011).

Characterization of the river's microbiota via 16S rDNA cloning and sequence analysis provides an assessment of bacterial community structure and diversity and has recently emerged as a powerful tool to examine bacterial communities in a variety of matrices, including soils (Liles et al. 2003), marine waters (Venter et al. 2004; Sogin et al. 2006) and freshwater environments (Lindstrom et al. 2005; Winter et al. 2007; Mueller-Spitz et al. 2009).

Next-generation sequencing (NGS) methods have greatly increased sequencing throughput via the use of massively parallel sequencing (Sogin et al. 2006). Amplification of small but highly variable regions of the 16S rDNA (e.g. the V3, V5 or V6 hypervariable regions) has resulted in extremely deep sequencing of bacterial communities. This has allowed for the identification of rare populations in low abundance in bacterial communities that may account for functional diversity and ecosystem stability (Kysela et al. 2005; Sogin et al. 2006). These NGS methods have been successfully utilized to characterize bacterial communities in marine waters (Brown et al. 2009), soils (Jones et al. 2009), wastewater (Sanapareddy et al. 2009) and several human microbiomes (Hamady and Knight 2009; Peterson et al. 2009). In contrast, however, determination of freshwater community structures has largely been limited to PCR-based amplicon surveys of clone libraries (Gløckner et al. 2000; Cottrell et al. 2005; Henriques et al. 2006; Mueller-Spitz et al. 2009) with NGS methods being used only recently (Ghai et al. 2011; Oh et al. 2011; Fortunato et al. 2012).

Previous characterization of freshwater microbiota has shown that bacterial communities are generally similar to each other, but differ significantly from bacterial communities in soils or in marine waters (Zwart et al. 2002; Oh et al. 2011). The major, globally distributed, phyla that are indigenous to freshwater bacterial communities include the Proteobacteria (α, β and γ), Bacteroidetes, Cyanobacteria, Actinobacteria, Verrucomicrobia and Planctomycetes (Zwart et al. 2002; Ghai et al. 2011; Oh et al. 2011). Factors such as temperature, predation, flow rate and other biotic and abiotic factors are hypothesized to alter community structure and function (Crump and Hobbie 2005; Matz and Kjelleberg 2005; Pernthaler 2005; Moss et al. 2006). While community structure has been reported to vary significantly, both temporally and spatially, even within the same watershed (Henriques et al. 2006; Mueller-Spitz et al. 2009), the specific factors driving these differences are not well understood (Henriques et al. 2006; Newton et al. 2006; Mueller-Spitz et al. 2009). For example, a recent study of community structure in Lake Michigan found that temporal and spatial changes resulted in considerably greater variation in structure than did depth (Mueller-Spitz et al. 2009). Furthermore, the effect of sample volumes on the estimated community structure and the reproducibility of that community from replicate samples remain understudied (Zinger et al. 2012).

In the study reported here, we assessed the community structure at ten sites along the Upper Mississippi River, and associated tributaries, from its relatively pristine source at Lake Itasca to La Crescent, near the southern border of Minnesota. Bacterial community composition was characterized using Illumina-based sequencing of the V6 hypervariable region of 16S rDNA. Community structure at sites was hypothesized to be influenced by upstream communities as well as due to inputs of nonindigenous bacteria. Furthermore, we hypothesized that these nonindigenous bacteria could be differentiated from a common riverine community among all sites and could be indicative of anthropogenic impacts. In addition, the community structure estimated from replicate, smaller volume (1 l) samples over a short time period was evaluated to determine whether replicate samples revealed similar community structure at a given sampling site. The number and types of OTUs able to escape capture by our filtration method were also investigated to address potential methodological biases in community characterization. To our knowledge, this study is among the first to employ high-throughput, Illumina-based sequencing to characterize the phylogenetic diversity of freshwater bacterial communities among multiple sites along a large river, starting with its relatively pristine headwaters. Results of this study provide a more complete assessment of the impact of human activity on the Upper Mississippi River with potential implications for monitoring and improving water quality. In addition, this study helps to improve our understanding of the influence of sampling practices on community characterization in a complex riverine system.

Materials and methods

Sample collection

Water from the Mississippi River was sampled in summer 2010 at eight sites from near the headwaters at Lake Itasca to near La Crescent, MN (Fig. 1, Table 1). Two other rivers, the Minnesota River at Site 5 and the St. Croix River at Site 8, were also sampled near their confluences with the Mississippi River. At each site, 40 l of near-surface river water was collected approximately 1·8 m from the shore, transported in 20 l carboys and stored <24 h at 15°C prior to filtration. At the time of sample collection, water temperature and pH were recorded at each sampling site, and antecedent 72 h rainfall data were obtained from the National Weather Service ( (Table 1). In addition, flow rates at five dams (Coon Rapids Dam, Lower St. Anthony Falls Dam, Ford Dam, Lock and Dam 2, and Lock and Dam 5; locations shown in Fig. 1) were obtained from the U.S. Army Corp of Engineers website ( (Table 2).

Table 1. Site locations, sampling dates, proportion of surrounding land use and physicochemical properties at sample sites
SiteSite nameDateLatitudeLongitudeDistance (km)aLand use (%)Antecedent rainfall (mm)bTemp (°C)pH
ForestedUrbanizedAgricultural72 h48 h24 hCumulative
  1. a

    Distance from the headwaters at Lake Itasca.

  2. b

    Rainfall that occurred 24, 48 or 72 h prior to sampling or cumulative rainfall over a 3-day period prior to sampling.

  3. c

    Underlined values indicate the land-use category to which the site was assigned based on the highest percentage.

  4. d

    Not applicable as sampling site is not located on the Mississippi River.

  5. e

    Confluence of the Mississippi and Minnesota Rivers.

1ItascaMay 1947°20·8′N95°10·9′W0 70·8 c 2·28·70·000·000·000·00178·0
2St. CloudJune 1545°32·8′N94°8·7′W26312·818·8 55·3 <0·253·052·545·59187·8
3ClearwaterJune 1545°25·2′N94°2·5′W27120·07·3 55·8 <0·253·052·545·59187·9
4Twin CitiesJune 844°54·2′N93°11·4′W3119·2 79·7 1·5<0·250·2521·3421·59218·3
5Minnesota RiverJune 844°53·1′N93°10·4′WNAd11·8 73·1 2·9<0·250·2521·3421·59218·1
6ConfluenceeJune 844°55·1′N93°7·7′W3139·2 77·8 3·8<0·250·2521·3421·59208·1
7HastingsJune 644°44·7′N92°51·0′W33012·112·2 55·5 24·380·004·0628·44218·2
8St. Croix RiverJune 644°44·9′N92°48·3′WNA13·412·1 58·3 24·380·004·0628·44197·0
9RochesterJune 644°44·7′N92°47·9′W33212·711·4 58·7 24·380·004·0628·44217·1
10La CrescentJune 243°51·4′N91°18·2′W401 37·7 14·825·35·330·000·005·33169·1
Table 2. Flow rate (cubic metres per second) at five dams along the Upper Mississippi River on the dates of sampling
Sampling siteCoon Rapids DamLower St. Anthony Falls DamFord DamLock and Dam 2Lock and Dam 5
St. Cloud227·10192·55241·51586·16974·10
Twin Cities144·13134·51170·38322·81685·27
Minnesota River144·13134·51170·38322·81685·27
St. Croix River152·63167·07158·06382·28688·10
La Crescent165·94179·81174·49402·10821·19
Figure 1.

Map of sampling sites and dams at which flow rate was assessed. Sites 1–4, 6, 7, 9 and 10 are located on the Mississippi River. Site 5 is located on the Minnesota River and Site 8 on the St. Croix River. Letters represent dams at which flow rate was measured: (A) Coon Rapids, (B) Lower Saint Anthony Falls, (C) Ford, (D) Lock and Dam 2 and (E) Lock and Dam 5. Shading represents forested areas.

To assess the reproducibility of estimated community structure among replicates, five 1 l samples were collected approximately 1 min apart from Site 4 in sterile 1 l bottles. The smaller volume used for these samples still resulted in mean replicate coverage of 99·2 ± 0·05%. To determine which microbes could evade capture by our filtration methodology (see below), a 40 l sample collected from Site 1 was used to examine the impact of membrane filtration pore size on bacterial diversity estimates.

Sample processing and DNA extraction

DNA was extracted from water samples using Epicentre's Metagenomic DNA Isolation Kit for Water (Madison, WI, USA), with slight modifications to the protocol recommended by the manufacturer. Water samples were prefiltered through sterile cheesecloth and sequentially pumped through 90-mm-diameter P5 filters (Whatman Inc., Piscataway, NJ, USA) to remove debris and organisms ≥5–10 μm. The filtered water was subsequently pumped through six to eight 142 mm diameter, 0·45-μm polyethane sulfonate filters (Pall Co., Port Washington, NY, USA) to trap microbial cells (the number of filters was adjusted as necessary due to clogging).

Each 0·45-μm-pore-size filter was cut into half immediately following filtration, and cells from each filter half were elutriated by vortexing in a 50-ml conical tube containing 2 ml of 0·1% sodium pyrophosphate buffer, pH 7·0, containing 0·2% Tween 20 (PP) for 3 min at room temperature. Cell suspensions from both 50-ml tubes for each filter half were aliquoted into three 1·6-ml microcentrifuge tubes, centrifuged at 13 300 g for 3 min and the supernatants discarded. The process was repeated for each filter half (i.e. two separate vortexing steps for each filter half), and the cell suspensions from approximately 3–4 whole filters were concentrated over three microcentrifuge tubes (pellets represented 6–7 l of water). Cells were stored frozen as pellets at −80°C. One pellet per site was used for 16S rDNA preparation. Cell pellets were resuspended in 600 μl TE buffer, separated into duplicate 300-μl samples, and DNA was purified using the Metagenomic DNA Isolation Kit for Water (Epicentre Biotechnologies, Madison, WI, USA) following the manufacturer's instructions. The DNA from each split sample was suspended in TE buffer and combined.

For 1 l replicate samples (n = 5), only one 0·45-μm filter was necessary for filtration and for each replicate, cell pellets were collected in a single microcentrifuge tube. To determine the effect of filter size on community estimation, 40 l of water was filtered through 0·45-μm filters as described above, and the flow-through was subsequently filtered through 47-mm diameter, 0·22-μm pore-size mixed cellulose esters filters (EMD Millipore, Billerica, MA, USA). The 0·45-μm-pore-size filters were processed as described above. Two 0·22-μm pore-size filters were placed in 50-ml tubes, and bacteria were elutriated as described above. A total of 6 l of flow-through was filtered, and cell suspension was pelleted in a single microcentrifuge tube.

PCR and Illumina sequencing

The V6 region of the 16S rDNA was amplified by PCR using a cocktail of five modified 967F primers (5′-CNACGCGAAGAACCTTANC, 5′-CAACGCGAAAAACCTTACC, 5′-CAACGCGCAGAACCTTACC, 5′-ATACGCGARGAACCTTACC and 5′-CTAACCGANGAACCTYACC) and the 1046R reverse primer (5′–ID-CGACRRCCATGCANCACCT), where ID is a barcoded Illumina adapter sequence including a six-base multiplexing identification barcode unique to each sampling site (Bartram et al. 2011). The cocktail of forward primers was used to increase the number of taxa that matched primer sequences, as has been previously described for Archaea (Teske and Sorensen 2008). Amplicons were extracted from 2% agarose gels and purified using the QIAquick® Gel Extraction kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. Purified amplicons were pooled in equal concentrations and paired-end sequenced (2 × 150) on an Illumina MiSeq platform at the University of Minnesota Genomics Center (Saint Paul, MN, USA). Sequencing results are available through GenBank BioProject PRJNA189273.

Processing of sequence data

Illumina sequence reads were processed using mothur version 1.27.0 (Schloss et al. 2009). Sequence reads were trimmed of primer sequences and multiplexing barcodes specific to each site. Sequences that differed by >1 nt from primer sequences, had at least one ambiguous base, had >8 nt homopolymers, and those which had a quality score <35 in a window of 50 nt were removed. All sequences with abundances <2 were also removed. Chimeras were also removed using UCHIME (Edgar et al. 2011). Sequences were aligned against the SILVA database (Pruesse et al. 2007), and classification of OTUs at 97% identity using furthest neighbour clustering was done using mothur and the Ribosomal Database Project (RDP) taxonomic database release 9 (Cole et al. 2009). To control for differences in numbers of sequence reads at each site, while still capturing as much genetic diversity as possible, the number of sequence reads in each sample was normalized by randomly subsampling to the number of reads in the sample with the fewest reads (280 197 sequences for 2010 samples; 232 141 for replicates and 516 344 reads for filter-size comparison).

Land-use analyses

As several OTUs were suspected to be linked to specific land-use practices, post hoc analyses were performed to assess surrounding land use at sampling sites. Land use of a 322 km2 area surrounding each sampling site was investigated using the National Land Cover Dataset (NLCD2006) (Fry et al. 2011). Images of the sample sites within the NLCD2006 overlay layer were extracted from the Multi-Resolution Land Characteristics Consortium Viewer ( at a scale of 1″:157 000″. The ImageJ plugin Color-inspector 3D ( was used to quantify the area of each land-use type by counting colour-coded pixels (Table 1).

Statistical analyses

All statistical analyses were conducted at α = 0·05. Calculations for Shannon, nonparametric Shannon, and Simpson diversity indices were performed using mothur version 1.27.0 (Schloss et al. 2009). Weighted and unweighted UniFrac calculations were performed to assess differences among sites based on phylogenetic information (Lozupone and Knight 2005). In addition, distance matrices between samples were calculated using the Bray–Curtis measure of dissimilarity (Bray and Curtis 1957) and were used for nonmetric multidimensional scaling (NMDS) (Kenkel and Orloci 1986) and analysis of molecular variance (amova), performed using mothur (Schloss et al. 2009). Spearman rank correlations were calculated to examine relationships between pH, temperature, rainfall, flow rate, percentage of surrounding land use and abundance of specific lineages. These calculations were performed using SPSS Statistics software v. 19.0 (IBM, Armonk, NY, USA).


Overview of bacterial communities in the Mississippi River

For 2010 samples, >8·8 × 106 Illumina sequence reads were obtained, at a read length of 151 nt before trimming for quality. Of these, approximately 6·5 × 106 (~70%) reads met quality criteria, and 2·8 × 105 reads per site were used for analysis, after subsampling to control for site-specific differences in read number. Sample coverage was estimated at 99·2 ± 0·2% at each site. A total of 16 400 OTUs were observed among all samples with 4594 ± 824 mean OTUs identified in individual samples. Of these, 97·03% were classified as Bacteria, 2·94% as Archaea and 0·03% could not be assigned.

Classified OTUs belonged to 32 phyla among all sites and several dominant phyla were observed and included the Proteobacteria, Actinobacteria, Bacteroidetes, Cyanobacteria and Verrucomicrobia (Figs 2 and 3). Only seven OTUs could not be assigned to a phylum. The five most abundant phyla accounted for 93·6 ± 1·3% of sequences among all sites. Only 12·3 ± 2·5% of OTUs at each site were found at all ten sampling sites; however, shared OTUs accounted for 90·5 ± 2·5% of sequence reads among all sampling sites. Of the sequence reads for unshared (termed ‘nonubiquitous’) OTUs, 82·1 ± 6·2% belonged to the five most abundant phyla. Therefore, a minority of sequence reads from all sites (<10%) accounted for nearly 90% of phylogenetic variation among sites, which was observed primarily among the most abundant phyla. The less abundant phyla accounted for 0·13 ± 0·00063% of sequence reads among all sites (Fig. 3).

Figure 2.

Distribution of the 15 most abundant phyla among sampling sites with 280 197 sequence reads per sample. (image_n/jam12323-gra-0001.png) Less abundant phyla; (image_n/jam12323-gra-0002.png) Aquificae; (image_n/jam12323-gra-0003.png) Crenarchaeota; (image_n/jam12323-gra-0004.png) Gemmatimonadetes; (image_n/jam12323-gra-0005.png) Armatimonadetes; (image_n/jam12323-gra-0006.png) Chloroflexi; (image_n/jam12323-gra-0007.png) Acidobacteria; (image_n/jam12323-gra-0008.png) Euryarchaeota; (image_n/jam12323-gra-0009.png) Fusobacteria; (image_n/jam12323-gra-0010.png) Firmicutes; (image_n/jam12323-gra-0011.png) OD1; (image_n/jam12323-gra-0012.png) Verrucomicrobia; (image_n/jam12323-gra-0013.png) Cyanobacteria; (image_n/jam12323-gra-0014.png) Bacteroidetes; (image_n/jam12323-gra-0015.png) Actinobacteria and (image_n/jam12323-gra-0016.png) Proteobacteria.

Figure 3.

Distribution of the 18 less abundant phyla among total sequence reads. (image_n/jam12323-gra-0017.png) Caldiserica; (image_n/jam12323-gra-0018.png) Thermodesulfobacteria; (image_n/jam12323-gra-0019.png) Thermotogae; (image_n/jam12323-gra-0020.png) OP11; (image_n/jam12323-gra-0021.png) Synergistetes; (image_n/jam12323-gra-0022.png) Lentisphaerae; (image_n/jam12323-gra-0023.png) Tenericutes; (image_n/jam12323-gra-0024.png) Spirochaetes; (image_n/jam12323-gra-0025.png) Deferribacteres; (image_n/jam12323-gra-0026.png) Chlorobi; (image_n/jam12323-gra-0027.png) Chlamydiae; (image_n/jam12323-gra-0028.png) Chrysiogenetes; (image_n/jam12323-gra-0029.png) Elusimicrobia; (image_n/jam12323-gra-0030.png) unclassified; (image_n/jam12323-gra-0031.png) Deinococcus-Thermus; (image_n/jam12323-gra-0032.png) TM7; (image_n/jam12323-gra-0033.png) WS3 and (image_n/jam12323-gra-0034.png) Planctomycetes.

Evaluation of downstream community structure

To assess the effect of upstream communities on the composition of bacterial communities downstream, nonubiquitous OTUs were presumed to be introduced into the river near the most upstream site at which they were first identified. Based on these associations, the percentage of sequence reads associated with a particular site of introduction was tracked throughout the main branch of the Mississippi River (Fig. 4). At sites downstream of the urbanized areas (e.g. Sites 3–9: Clearwater to Rochester), ≥10% of sequence reads belonged to nonubiquitous OTUs, and the majority of these sequence reads were introduced near St. Cloud. At the confluence of the Mississippi and Minnesota Rivers (Site 5), the number of nonubiquitous sequence reads dropped by nearly 6% of total sequence reads compared to the Twin Cities site (Site 4), immediately upstream.

Figure 4.

Distribution of nonubiquitous operational taxonomic units (OTUs) among total sequence reads at sampling sites along the main branch of the Mississippi River. OTUs were associated with the most upstream site at which they were identified. (image_n/jam12323-gra-0035.png) La Crescent; (image_n/jam12323-gra-0036.png) Rochester; (image_n/jam12323-gra-0037.png) St. Croix; (image_n/jam12323-gra-0038.png) Hastings; (image_n/jam12323-gra-0039.png) Confluence; (image_n/jam12323-gra-0040.png) MN River; (image_n/jam12323-gra-0041.png) Twin Cities; (image_n/jam12323-gra-0042.png) Clearwater; (image_n/jam12323-gra-0043.png) St. Cloud and (image_n/jam12323-gra-0044.png) Itasca.

When correlation analyses were conducted relating flow rates throughout the river with the proportions of nonubiquitous sequence reads from each site, a positive correlation, although weak, was only observed for sequence reads associated with OTUs from Itasca (Site 1), although no relationships were significant at α = 0·05 (r = 0·408–0·575, P = 0·082–0·241). The number of sequence reads associated with St. Cloud (Site 2) or the Twin Cities (Site 4) was significantly negatively correlated with flow throughout the portion of the river studied (r = −0·773 to −0·835, P ≤ 0·009 for Site 2; r = −0·613 to −0·838, P = 0·059 at Lower St. Anthony Falls, P ≤ 0·026 for all other dams for Site 4). In addition, the numbers of sequence reads associated with Clearwater (Site 3) were negatively correlated with flow rate at Lock and Dam no. 2 and no. 5 in the southern portion of the study area (r = −0·637, P = 0·048 and r = −0·654, P = 0·040, respectively).

Community diversity and physicochemical parameters

Community diversity was similar among sites (Table 3); however, richness (number of OTUs) was negatively correlated with distance from the headwaters at Itasca (Site 1; r = −0·786, P = 0·021). While 72 h antecedent rainfall (rainfall prior to the sampling date) was negatively correlated with richness (r = −0·742, P = 0·014), 48 h antecedent rainfall was positively correlated with the number of OTUs (r = 0·816, P = 0·004). Rainfall was not correlated with the relative abundance of individual phyla. Water pH was correlated with only the Simpson diversity index (r = 0·790, P = 0·007) and percentage of surrounding agricultural area (r = −0·681, P = 0·030).

Table 3. Diversity indices for all sites
SiteDiversity index values
Number of operational taxonomic unitsShannonNP_ShannonaSimpson
  1. a

    Nonparametric Shannon index.


Correlative analysis relating physicochemical parameters to the relative abundance of the five most abundant phyla identified from all sites (Proteobacteria, Actinobacteria, Bacteroidetes, Cyanobacteria and Verrucomicrobia) revealed that pH was correlated with the relative abundance of Cyanobacteria (r = 0·675, P = 0·032). Temperature was moderately correlated with the relative abundance of Bacteroidetes at all sites (r = −0·778, P = 0·008), but was inversely correlated with the relative abundance of Verrucomicrobia (r = 0·634, P = 0·049). Furthermore, the relative abundance of Bacteroidetes was correlated with river flow at all dams (r = 0·648–0·798, P ≤ 0·043). Water temperature was significantly negatively correlated with flow rate at all dams (r = −0·774 to −0·735, P ≤ 0·015) and was positively correlated with 24 h and cumulative antecedent rainfall (r = 0·800–0·848, P ≤ 0·005). No other relationships between water physicochemical parameters, diversity indices, surrounding land use and relative taxonomic abundance were significantly different among sampling sites.

Comparison of bacterial community structure

Weighted UniFrac analyses revealed significant differences in bacterial community structure between each of the three sites in the Twin Cities metropolitan area (Sites 4, 5 and 6) and Site 8, located on the St. Croix River (P = 0·030, 0·024, 0·014, respectively). Similarly, the Minnesota River site and confluence of the Mississippi and Minnesota River site (Sites 5 and 6) also had significantly different structures when compared by weighted UniFrac analysis (P = 0·039). Furthermore, community structure at Rochester (Site 9), located at the confluence of the St. Croix River with the Mississippi River, was significantly different than that seen at all other sites (P < 0·001). Among the less abundant phyla, the relative abundance of candidate division TM7 was significantly negatively correlated with distance from the headwaters (r = −0·838, P = 0·009). No difference in bacterial community structure was observed by unweighted UniFrac analysis among all sampling sites (P = 1).

Impact of land use on bacterial community structure

To test the hypothesis that specific land-use practices impact community structure, sites were grouped into forested, urbanized or agricultural sites based on their primary surrounding land use (Table 1). These groupings were supported by anova, where the percentage of surrounding land use was significantly different (P ≤ 0·002) among all groups. For example, sites grouped as urbanized (sites 4, 5 and 6) had a significantly greater percentage of urbanized land surrounding the site than those in the forested or agricultural groups. Community structure among land usage groups (pairwise comparisons) was significantly different by weighted UniFrac analysis (P ≤ 0·002). However, the relative abundance of the five most abundant phyla did not differ significantly among all sites (P = 0·072–0·638).

Nonmetric multidimensional scaling was also performed on the total community structure followed by amova to determine whether surrounding land use could be linked to differences in communities (Fig. 5). Clustering of the three urban sites (Sites 4, 5 and 6) was observed, and these sites grouped apart from other sites along axis 2. The agricultural sites (Sites 2 and 3 and 8 and 9) also clustered together. Forested sites did not group based on community structure, and the more pristine Mississippi River headwater Itasca site was located apart from the others. The Hastings site (Site 7) was also distantly related to other sampling sites. Despite separation of Hastings as well as forested sites, grouping of sites according to predominant surrounding land use was significant at P = 0·07.

Figure 5.

Nonmetric multidimensional scaling plot of community composition at all sites. Open symbols represent sites surrounded primarily by forest, solid symbols were primarily urbanized, and shaded symbols were primarily surrounded by agricultural areas. The lowest stress is 0·197 and r2 = 0·84.

Replicate sample comparisons

A total of 7429 OTUs were observed among all five 1 l replicate samples, and 44·4–62·1% of these OTUs were found in each individual replicate (mean of 3842 ± 478 OTUs per sample). Among total sequence reads for replicates, a mean of 98·2 ± 0·6% belonged to OTUs that were shared among all five samples. The majority of OTUs were classified among the Proteobacteria, Bacteroidetes and Actinobacteria; however, even at the phylum level, shifts in relative abundance were apparent among replicate samples (e.g. Proteobacteria accounted for 28·6–50·8% of sequence reads per replicate). Differences in community composition between replicates were further supported by weighted UniFrac analysis that revealed a significant difference in the compositions of replicates 2 and 4 (P = 0·048) and replicate 5 compared to any other sample (P < 0·001).

Filter-size analyses

A total of 4968 OTUs were identified in a water sample filtered through a 0·45-μm filter followed by filtration of the flow-through through a 0·22-μm filter. OTUs unique to the 0·45-μm-filtered fraction accounted for 39·7% of total OTUs, while those unique to the 0·22-μm-filtered sample accounted for only 13·6%. Shared OTUs in communities captured from both filter sizes accounted for 99·7 and 98·7% of total sequence reads in the 0·22-μm fractions and 0·45-μm fractions, respectively. Among the most abundant phyla identified, Actinobacteria had a higher relative abundance (61·0 vs 36·3% of total sequence reads) in the 0·22-μm sample compared to the 0·45-μm sample; Firmicutes also had greater abundance at 0·22 μm than 0·45 μm. The two most abundant families that were unique to the 0·22-μm sample were the Bacillaceae making up 0·06% of total reads and Microbacteriaceae comprising 0·04% of total reads. Proteobacteria, Bacteroidetes and Synergistetes were more abundant in the 0·45-μm sample compared to the 0·22-μm sample by at least 1% of total sequence reads; Cyanobacteria were also more abundant in the 0·45-μm sample compared to 0·22-μm sample, but were still a minority of sequence reads.


In this study, we examined the bacterial community structures at ten sites along the Upper Mississippi River in Minnesota using Illumina NGS technology. To our knowledge, this is among the first studies to characterize a large freshwater riverine bacterial community using a NGS technique. Community characterization using NGS revealed considerably higher diversity than was previously reported for a riverine community using clone libraries or DGGE community profiling (Shannon index 2·6–3·2; Simpson index 0·075–0·115) (Cottrell et al. 2005; Liu et al. 2012). The use of NGS allowed for nearly a complete analysis of the bacterial community at all sites. However, 4% of identified OTUs could not be assigned to families, and 18% could not be assigned to the genus level. It is possible that these unclassified bacteria are members of the ‘rare biosphere’ and may help account for differences in community structure among sites, as has been previously suggested (Sogin et al. 2006). A previous study using Illumina NGS targeting the V4/V5 region demonstrated a higher percentage (>30%) of unclassified genera in a human faecal sample, and this was attributed to low sequence quality (Claesson et al. 2010). Improved taxonomic classification in the current study may result from increased sequence quality as the sequencing chemistry improves, removal of singleton sequence reads from the data set, or may be due to the fact that the V6 region rather than V4/V5 was targeted. The use of new Illumina chemistry providing 2 × 250-bp read length will greatly enhance our ability to resolve community composition at finer taxonomic levels and may better reveal specific lineages associated with different upstream influences on the river.

The necessity for replication, especially in studies employing emerging technologies, is receiving increasing attention in the literature (Prosser 2010). However, in addition to replication, sample volume remains an important and relatively understudied concern in estimating community structure, particularly of ecosystems containing a large number of microhabitats (Zinger et al. 2012). In the present study, a large sample volume was used to capture changes in the presence and abundance of taxa due to heterogeneity in the water column. Rivers, as well as other aquatic ecosystems, have demonstrated seasonal variation in community structure (Crump and Hobbie 2005; Gilbert et al. 2009; Fortunato et al. 2012), so an important concern in the study design here was to collect samples over a large spatial scale (>400 km) that would be indicative of the riverine community over a short time period. Treating samples as spatial replicates throughout the Upper Mississippi River, we identified a bacterial assemblage that appears spatially stable over >400 km, changing minimally throughout the river.

The majority of sequence reads at each site were assigned to a relatively small number of OTUs present among all sites suggesting that there is a ubiquitous core bacterial community throughout the study area. These OTUs were classified primarily as α- and β-Proteobacteria, Actinobacteria, Bacteroidetes, Cyanobacteria and Verrucomicrobia, which have been previously described as ubiquitous freshwater lineages using clone-based library approaches (Zwart et al. 2002; Cottrell et al. 2005; Newton et al. 2011). Shifts in community structure among sites were attributed to changes in the presence and relative numbers of less abundant OTUs, as well as shifts in the relative abundance of the dominant lineages. The majority of the nonubiquitous OTUs identified in this study were likely introduced from runoff, sediment resuspension, sewage effluent or from other anthropogenic impacts on the river but are displaced by indigenous riverine microbiota as the river moves downstream, as noted by a decline in diversity with distance from the pristine headwaters. Follow-up studies at these sites in future years will allow for analysis of temporal variation within the study area and may further support relationships between surrounding land-use practices and bacterial community structure along the river.

In addition to upstream impacts on community composition, rainfall, temperature and pH were also found to significantly affect diversity among sampling sites as well as the relative abundances of dominant phyla. Factors such as pH and temperature have been previously demonstrated to affect bacterial community composition among lake communities (Lindstrom et al. 2005). Heterogeneity in environmental parameters has been shown to be more strongly related to differences in community composition than is spatial distance (Horner-Devine et al. 2004). In the present study, sites were sampled over approximately a 1-month period. The different sampling dates, in addition to water chemistry and distance, may have also resulted in differences in bacterial communities as was seen over a short time periods among replicate samples. Sites that were sampled on the same date (e.g. Sites 5 and 6) had significantly different community structures, and no relationship was observed relating sites by either sampling date or geographical location.

Sample volume and filter size may also have influenced the bacterial community characterized among samples. Currently, no standards for water sampling exist, and the choice of sample volume and filter size, which is largely chosen based on the desired volume, vary by study and research group (Zinger et al. 2012). Filters of 0·45-μm pore size were chosen for this study to efficiently filter 40 l of water after prefiltering, which also affected the bacterial community structure characterized. Our evaluation that approximately 99% of sequence reads were shared in bacterial fractions captured on the 0·45 μm and 0·22-μm filters suggests that either filter size is acceptable to characterize the most abundant free-living taxa. However, a small but potentially important percentage of OTUs were unique to the 0·22-μm filtered fraction, suggesting that minor taxa are missed using the 0·45-μm pore size. Furthermore, prefiltering, which is essential for filtration of large volumes of water, removed larger aggregates that are recognized as important sources of phylogenetic and functional microbial diversity (Grossart 2010), and the extent to which micro-organisms in aggregate fractions may influence the overall assemblage throughout the river was not evaluated in the current study. These considerations highlight the necessity of developing a more comprehensive strategy to assess the total bacterial assemblage through all size-fractions in the water column, potentially through analysis of micro-organisms trapped on a variety of filter sizes rather than using a single, standardized pore size.

Characterization of bacterial communities in the Upper Mississippi River using Illumina-based NGS suggest a persistent ‘core bacterial community’ ubiquitous within the study area. This study is among the first to examine the influence of upstream bacterial communities on those downstream in a riverine system. A recent study identified signatures in bacterial communities associated with sewer, generic faecal, and human-specific faecal contamination (Newton et al. 2013). In the current study, a mean of 0·06% ± 0·03% of OTUs belonged to families associated with generic faecal contamination, which is roughly one-tenth of the level of faecal contamination observed throughout Lake Michigan (Newton et al. 2013). This lower level may reflect less faecal contamination in the Mississippi River or geographical differences in the community composition of faecal sources; however, these results suggest that further investigation of the communities characterized as well as those from faecal sources may lead to the identification of source-specific OTUs that could be useful for water quality monitoring, as has been previously suggested (Unno et al. 2010). The continuing decreasing costs of NGS and the increasing availability of data similar to that presented here offer exciting opportunities to better understand the relationship between bacterial communities and anthropogenic activity in aquatic environments and may lead to improved understanding of water quality and effective monitoring practices.


Funding for this project was provided, in part, by the American Recovery and Reinvestment Act of 2009 (ARRA) and the Minnesota Environment and Natural Resources Trust Fund as recommended by the Legislative-Citizen Commission on Minnesota Resources (LCCMR). This work was carried out in part using computing resources at the University of Minnesota Supercomputing Institute.

Conflict of interest

No conflict of interest declared.