Overview of the taxonomic makeup
The quantity and length distribution of the retrieved 16S rRNA gene fragments were described elsewhere (Yilmaz et al., 2011), while the number of taxa at each rank level, overall taxonomic composition, and other details regarding the taxonomic classification can be found in the supplementary material (Table S6, S7, and S8).
The overall assessment of the taxonomic makeup is in agreement with previous studies on the GOS metagenome (Rusch et al., 2007; Biers et al., 2009; Yooseph et al., 2010), as well as with expectations of ocean surface microbial communities (Fuhrman & Hagström, 2008). Compared to previous assessments of the GOS metagenome taxonomic makeup, we have observed more sequences belonging to the Candidate divisions. For example, only Candidate division OD1 is acknowledged in the study by Biers and colleagues (Biers et al., 2009), whereas we report the occurrence of TM7, WS3, OP3, SR1, and OP10. Clone sequences belonging to these divisions are isolated from a wide variety of sources, including sludge, soil, human or other host tissues, deep-sea sediments, lakes, or biofilms (Hugenholtz et al., 1998; Pace, 2009). In the GOS metagenome, their distribution was limited to, except for Candidate division OD1, coastal, estuarine, brackish, hypersaline waters, as well as freshwater environments. The absence of these divisions from surface open ocean waters is congruent with previous observations, whereas the presence in estuarine and coastal waters could be indicative of anthropogenic inputs considering their prevalence in wastewater/sludge type environments, while the fresh, brackish, and hypersaline water prevalence is in line with potentially differing metabolic capabilities in comparison with surface ocean communities. Candidate division OD1 was the most widespread candidate division within the GOS metagenome, and in addition to the previously listed locations, this taxon was also observed in ocean waters from sites GS000a, GS114, and GS117 (Fig. S2). Although the OD1 is environmentally widespread, our survey of previous isolation sources did not encounter any other surface ocean clones.
Community structures at different taxonomic rank levels
Spatial and temporal patterns, along with ecological coherence of higher bacterial and archaeal taxonomic ranks, have been discussed previously (Philippot et al., 2009, 2010). Although the GOS metagenome is only composed of surface water samples, the ‘surfaces’ sampled have an interesting variety of contrasting habitats, such as estuary vs. open ocean, or hypersaline vs. freshwater. We used this diversity of habitats in order to reveal how well these habitat differences will be reflected in community structures composed of bacterial and archaeal taxa at different rank levels.
The sites*species matrix for ordination analysis consisted of standardized relative abundances at five different rank levels. The phylum level consisted of 33 distinct taxa, with 12 313 sequences classified into these taxa. The class level had 55 distinct taxa, with 12 222 sequences; order level 107 distinct taxa and 12 049 sequences; family level 201 distinct taxa and 10 616 sequences; and finally genus level 363 distinct taxa and 4270 sequences.
At any taxonomic rank level, the NMDS analysis showed that certain sites have remarkably different community structures (Fig. 1). Specifically, a recurring trend was a halo of coastal (GS013), estuary (GS011–GS012), hypersaline (GS033), freshwater (GS020), mangrove (GS032), fringing reef (GS025), and some open ocean (GS00a–c) sites, surrounding a cluster of mainly open ocean sites. In addition to the aforementioned sites, another set of coastal/estuary (GS002–GS010), warm seep (GS030), coral reef (GS048), and coastal upwelling (GS031) sites were to some extent distinguishable from the open ocean cluster. The relative distances between sites at different ranks were not always the same. For example, GS011–GS012 couple was placed at varying distances from each other, but nevertheless they retained their general distinctness from the rest. Another one is GS013 and GS025 couple, clearly different from the rest of the sites, but conspicuously placed too near each other at phyla rank level. The taxa composition and relative abundances change with each rank level; therefore, such conformation changes are expected. The NMDS ordinations have high stress values, although still being within acceptable limits (Kruskal, 1964). Other multivariate analysis methods may also be suitable; however, NMDS has the advantage of being a nonparametric method, therefore not assuming that species have linear responses to environmental gradients.
Figure 1. Panel figure showing NMDS analysis for each taxonomic rank level. NMDS is an iterative search for ranking and placement of n entities (samples) in k dimensions (ordination axes) that minimizes the stress of the k-dimensional configuration. The ‘stress’ value is a measure of departure from monotonicity in the relationship between the dissimilarity (distance) in the original p-dimensional space and that in the reduced k-dimensional ordination space (Clarke, 1993; Ramette, 2007). NMDS is therefore used to find a configuration in a given number of dimensions, which preserves rank-order dissimilarities in species composition as closely as possible, such that distance along a NMDS axis corresponds to relative difference in community composition. The axes (NMDS1 and NMDS2) are arbitrary and just represent a framework for sampling site points; however, they are scaled so that one unit means doubling of community dissimilarity. The community dissimilarities were calculated based on taxa-standardized relative abundances. For visibility, the ‘GS’ prefix was omitted from sampling site names. Stress values are indicated at the top-right corner of each figure, whereas the ranks are indicated at the bottom-left corner. Sampling sites mentioned in Results and discussion are highlighted in red.
Download figure to PowerPoint
Because a systematic classification of the habitat types can extend these observations further, all NMDS plots were annotated with Environment Ontology (EnvO) biome, feature, and material terms (Fig. 2). The three different levels of EnvO terms provide an increasing order of granularity to habitat description of the sampling sites; the first-level biome (e.g. large lake or estuarine) broadly establishes the system that defines the scope of potential ecological inputs that a biological entity may be subjected to, whereas an environmental feature (e.g. atoll, bay) describes a range of biotic and abiotic entities and phenomena that are more local to that entity than its biome, and finally, material (e.g. coastal water, ocean water) is understood as the substance immediately surrounding that entity and acting as the primary transmitter of ecological forces to and from it.
Figure 2. Panel figure showing NMDS analysis for each taxonomic rank level. The configuration of sites is the same as Fig. 1; however, now sites are annotated with EnvO terms at three different levels, biome, feature, and material. Rows indicate different rank levels, whereas columns indicate different sets of terms. Legends at the bottom of each column show the color and shape code of EnvO terms. Goodness-of-fit of term levels to the ordination are indicated by the R2 values, and the significances by asterisk symbols (0 < P *** < 0.001 < P ** < 0.01 < P * < 0.05 < P < 0.1 < P < 1).
Download figure to PowerPoint
All EnvO term levels produced significant correlations with the ordinations; however, biome and material, overall, produced 1.5–2 times higher correlations, compared to feature terms. Although contrasting biome or material types, such as lake vs. oceanic, or hypersaline vs. estuarine water, were distinguishable on all rank levels, meaningful associations of identical terms started to appear at the class rank and improved at lower rank levels. At phylum and class level, for example, oceanic epipelagic zone and neritic epipelagic zone biomes, or coastal water and (open) ocean water sites were intermixed, while order level on these two biomes was more separated. These two clusters were not clearly separated, and some neritic/coastal sites were overlapping with the oceanic cluster. This was observed because most of these sites were sampled around islands and were heavily mixed with open ocean waters. Therefore, although the ontologically correct annotation would be ‘neritic epipelagic zone biome’ or ‘coastal water’, the community composition resembles the open ocean.
The two estuarine biomes were placed together on all rank-level ordinations; however, a third one occurred with neritic biomes and separated from the former two. This deviation can be explained by the locality; the former two sites are located at Delaware Bay and Chesapeake Bay, respectively, whereas the latter site is listed as Bay of Fundy, which are drastically different estuaries owing to higher anthropogenic influences at Delaware and Chesapeake Bays (Lotze et al., 2006). This suggests that sites with the same habitat type may show differences in high- and low-level taxon ranks owing to external influences and/or mixing of water masses.
A number of other biomes were sampled during the GOS expedition, namely marine coral reef, marine reef, warm seep, and marginal sea. The ordinations did not reveal these biomes as being different from oceanic and neritic sites, although it is known that specific groups of bacteria are known to be associated with corals (Rohwer et al., 2002; Pantos et al., 2003; Bourne & Munn, 2005), and a low similarity between Pacific Ocean and Caribbean Sea samples has been observed previously (Lee & Fuhrman, 1991). The possible explanation for this observation can be that the habitat annotations are misleading and that the prominent feature of these sites is being neritic or oceanic biomes, rather than being reef or marginal sea biomes. Another clue supporting this argument is recognized with EnvO feature annotations; the sample feature of the majority of these biomes is photic zone, and with this annotation, they are clustered with sites sharing this feature.
In summary, with the application of ontological annotations to sampling sites, a context to community structure differences was gained, whereby an understanding of ecological structuring of the high-level taxa can be observed. Our in silico observations, along with previous in situ and in silico evidence (Fierer et al., 2007; Philippot et al., 2009; Zinger et al., 2011), support that higher taxonomic rank levels such as phylum or class provide enough information to distinguish between highly contrasting habitat types. Hence, phyla or classes can be used as indicator taxa to identify the specific habitats. However, at the interface of two habitats, like coastal vs. (open) ocean water, it is necessary to have more resolution for discriminatory power. A basic example is the case of Betaproteobacteria; at phylum level, this low brackish-preferring class will be accounted as Proteobacteria, hence leading to a poor ordination. Nevertheless, these ordinations do not provide clear-cut habitat clusters, but a certain amount of fuzziness is observed even at genus level.
To test whether bacterial and archaeal taxa distribution is related to environmental conditions, we fitted vectors and nonparametrically smoothed surfaces of seven environmental variables (Virtanen et al., 2006), which were obtained both in situ and by interpolation. The combined interpretation of variable vectors and fitted surfaces is to be made as follows (see Fig. 3); the vector arrow points to the direction of most rapid change in the environmental variable, or the direction of the gradient, and the length of the arrow is proportional to the correlation between ordination and variable. A planar fitted surface indicates that the response of the community to the variable is linear, and the surface R2 will be equal to or close to the R2 of the vector. If the response is nonlinear, R2 for the surface will be higher than for the vector. For example, if the R2 values for temperature vector and fitted surface are equal, then this would imply that temperature has a direct effect on the bacterioplankton community, that is, by causing higher metabolic rates or death/dormancy in members of the community, hence changing the community structure. If the effect is nonlinear, then the effect of the environmental variable on the community structure will be indirect, that is, a high nutrient situation providing excess concentration of dissolved organic matter needed for bacterial growth via phytoplankton exudates (Larsson & Hagström, 1979).
Figure 3. Panel figure showing NMDS analysis for each taxonomic rank level with fitted environmental variable vectors and nonparametric surfaces. Rows indicate different rank levels, whereas columns indicate different variables. All variable values were z-score standardized prior to vector and surface fitting; hence, the isocline values reflect the z-scores. Goodness-of-fit of vectors and surfaces are again indicated by the R2 values, R2 linear, and R2 surface, respectively, while significances by asterisks (0 < P *** < 0.001 < P ** < 0.01 < P * < 0.05 < P < 0.1 < P < 1).
Download figure to PowerPoint
Of the seven variables, only three, namely temperature, salinity, and chlorophyll a concentration, produced significant correlations with community structures (Fig. 3). It is surprising that the nutrients did not significantly correlate with the microbial community structure, as they have high contribution to the production of inorganic nutrients via remineralization in the surface waters (Azam et al., 1983). Furthermore, it was suggested that silicate regeneration in the oceans is controlled by bacterial dissolution of diatom frustules (Bidle & Azam, 1999), implying there could be a link between silicate concentration and bacterioplankton community composition. Nevertheless, individual correlations of certain taxa with these nutrients cannot be dismissed, although they do not seem to affect the ‘big picture’.
Temperature produced significant correlation at all rank levels except at phylum and class levels. The strength of the correlation was high at order level (0.394), but a small decrease at family level was observed (0.213), which was followed by an increase at genus level (0.313). In any case, the response of the community structure to temperature was nonlinear, as indicated by higher surface correlations. This finding both supports and contrasts previous studies; Pommier et al. (2007) and Fuhrman et al. (2008) showed that bacterial richness is linearly positively correlated with temperature and suggested a direct effect of temperature on bacterioplankton diversity through enzyme kinetics (Pommier et al., 2007; Fuhrman et al., 2008). Our results support that temperature is an important determinant of community composition, but because of nonlinear effects observed, secondary variables acting together with temperature are worth considering. For example, studies demonstrate that metabolic activity at low temperatures requires higher concentrations of specific substrates. Further, the temperature and respiration relationship, expressed as Q10, suggests an exponential relationship (Wiebe et al., 1992; Pomeroy & Wiebe, 2001).
Salinity was a significant variable at phylum, order, and family levels, and correlation was highest at order level (0.417). As with temperature, the effect of salinity was also nonlinear. Again, a number of other studies have determined the importance of salinity on bacterial community composition, both on different aquatic environments (Nold & Zwart, 1998) and on global scale (Lozupone & Knight, 2007). The nonlinear effect observed concurs with the wide range of physiological effects that salinity can have (e.g. membrane potential, transport systems). Additionally, the indirect relationship could also be due to a secondary influence originating from mixing of water masses, which would both affect the salinity and community composition.
Chlorophyll a concentration correlations were significant at all levels, except at phylum level, and produced the strongest correlation of all the three variables. Additionally, a linear effect was observed at class level, although this effect changed to nonlinear at lower rank levels. As chlorophyll a concentration is an indicator of phytoplankton biomass, and as heterotrophic bacterioplankton depends on their products and remains, this is not an unexpected outcome. In fact, chlorophyll a concentration was found to be a determinant of seasonal and annual community composition dynamics (Fuhrman et al., 2006; Gilbert et al., 2011).
Temperature, salinity, and chlorophyll a concentrations are environmental variables with known effects on structuring the marine bacterial and archaeal communities. This study confirms those previous local observations on a global scale and underlines the environmental effects on ecological structuring of high taxon levels.
Controls on geographic distribution of marine clades
These correlations can provide novel indications about the relationships of variable gradients with taxa, which are useful, especially in the case of taxa with few or no cultured members. For example, clade BD1-5 and TM6 (phylum level) can be ascribed as brackish-preferring clades, whereas RF3 appears at extreme salinity levels (Fig. S2). However, clade BD1-5 was found both at GS033 (marine salt marsh biome) and at GS032 (low salinity with 29.47 PSU) as this clade occurs at both extremes of the salinity gradient, but with higher abundance at the lower end of the gradient, supporting the brackish preference of BD1-5 clade. Effects of salinity on the metabolic capabilities of bacterioplankton have been observed previously, especially in river and estuarine systems (Bouvier & del Giorgio, 2002; Langenheder et al., 2003), potentially explaining specific environment preferences. At class level, SAR202 clade lies on the lower end of the chlorophyll gradient (Fig. S3), in accordance with previous observations that this clade has abundance maximum at the lower boundary of the deep chlorophyll maximum layer (Giovannoni et al., 1996). Other members of the Chloroflexi phylum (Keppen et al., 2000) have anoxygenic photosynthesis capacity, which could be the case with the SAR202 clade. However, previous studies indicate that oxygenic and anoxygenic phototrophic bacteria co-occur in the euphotic zone (Kolber et al., 2001), implying a different strategy than anoxygenic phototrophy.
Individual taxa were selected for correlation with environmental factors. Significant correlations were found only for a few groups and mainly for temperature, salinity, and chlorophyll a concentration values (Table S9), which were also demonstrated in the NMDS analyses. Although not all taxa are considered in this analysis, the selected taxa here are dominant members of bacterioplankton communities, and they could be the ones driving the community structure patterns observed in NMDS analyses.
Latitudinal distribution patterns were also investigated for the same selected taxa in an effort to uncover additional factors that may influence their distribution. Although some deviations were evident, the latitudinal distributions revealed two distinct types of generalized patterns; one having two abundance peaks at temperate and tropical regions (pattern 1 – Fig. 4a and c) and one having a tropical peak (pattern 2 – Fig. 4b and d). Moreover, these two patterns were observed for phylogenetically diverse groups of organisms, as well as for lower and higher abundance groups (Fig. 4 and Fig. S4). Examples of pattern 1 included surface 1 subgroup of SAR11 clade, NS4 clade of Flavobacteria, and SAR86 clade of Gammaproteobacteria, while examples of pattern 2 included Prochlorococcus, Synechococcus, SAR202 clade of Chloroflexi, and OM27 clade of Deltaproteobacteria. When comparing the latitudinal distribution of taxa in patterns 1 and 2 with global distribution of net primary production (Behrenfeld & Falkowski, 1997), we observed that pattern 1 had peaks in both high primary production areas and oligotrophic areas, while pattern 2 only peaked in oligotrophic gyre regions (Fig. 4e).
Figure 4. Latitudinal distribution patterns for selected marine taxa. The relative abundances are calculated as described in the materials and methods section, and then by multiplying the resulting value with 105. Relative abundances for latitude intervals are the sum of all relative abundances from GOS sites within that interval. (a) and (b) represent lower abundance taxa, while (c) and (d) show higher abundances. Standard deviations are indicated by gray bars and are calculated for the sites within a given interval. The ‘*’ symbol indicates fewer than two sites within the interval and therefore no standard deviation. Connector lines are added for emphasizing the trends and do not imply continuity. (e) A global ocean net primary productivity map for 2004, with overlaying GOS sites (white dots). Productivity values are expressed as mg C m−2 day−1.
Download figure to PowerPoint
The observations for pattern 2 are consistent with previous observations for photoautotrophic bacterioplankton, where models of global distribution of Prochlorococcus were also found to peak in oligotrophic gyre regions (Johnson et al., 2006; Follows et al., 2007) (Fig. 4d). Synechococcus did not show lower abundances between 0- and 4-S interval (southern Pacific upwelling area), coinciding with previous observations for this organism (Rocap et al., 2002) (Fig. 4d). Finally, this distribution pattern follows the general notion that these two organisms are dominant in tropical and subtropical areas. Other taxa of unknown characteristics, which fit to this pattern, may have similar metabolic capabilities, that is, photoautotrophs, with a dominant presence in tropical and subtropical regions. For example, members of SAR324 clade from the mesopelagic zone have recently been found to be capable of chemolithoautotrophy (Swan et al., 2011), supporting the possibility of photoautotrophy in surface-dwelling SAR324 clade members observed in pattern 2 (Fig. 4b).
The peaks at both high primary production areas and oligotrophic areas in pattern 1 may reflect the convergent metabolic strategies exhibited by phylogenetically diverse bacterioplankton within this pattern. Autotrophic and mixotrophic functionalities, as well as implications of proteorhodopsin presence for different groups of marine bacterioplankton, have been reviewed previously (Fuhrman & Steele, 2008). Based on these reviews, we suggest that at temperate high primary production areas, these bacterioplankton exhibit a truly heterotrophic strategy and dwell on organic material produced by phytoplankton (Herndl et al., 2008). At oligotrophic areas, they may depend on photoheterotrophy or mixotrophy. Because our definitions of taxa are quite broad, and there could be many subclades within all the clades considered, it will be interesting to investigate whether different subclades of certain taxa are specific to the individual locations considered in this study. This may reveal more habitat-specialized functional groups within broader clades. This will of course require a study with higher coverage in sequencing effort.