In the last decade, systems biology has emerged as a holistic scientific discipline targeted at elucidating emergent properties from large datasets describing highly complex interactions. It aims to understand how complex biological and ecological processes arise from interactions occurring across different scales of biological organisation (e.g. from DNA/RNA/protein through to cells, organisms and up to community structure and function) and environmental features (e.g. from nutrients and soil moisture up to local and global climatic conditions). Systems biology is inherently interdisciplinary (Karsenti 2012) and requires a wide range of approaches. These include graphical network models that integrate high information content data streams and identify putatively interacting variables, multi-factorial experiments to examine the causality and directionality of interactions, and theoretical and simulation models to explore population and community dynamical responses to these interactions.
The methods used to identify associations between variables in large data sets may vary depending on the questions being asked, and ecosystems generally display important traits that need to be considered. For example, complex systems can display nonlinear behaviours with time lags and thresholds (tipping points). Many components may display only weak to moderate coupling (McCann et al. 1998), and there may be built-in flexibility and redundancy (Wright et al. 2012). Further, many species share similar abiotic environments which can lead to correlations and apparent synchrony among non-interacting species (Sugihara et al. 2012).
Network analysis has identified extensive phylogenetic and functional trait associations among soil bacteria generally (Barberan et al. 2011) and in response to disturbance (Zhou et al. 2011) and specific antagonistic effects (Prasad et al. 2011). Similar to other systems, work describing microbial network associations suggests they follow a small world, scale-free model which may be visualised as a network with a few nodes displaying high connectedness (‘hubs’) and many nodes with only one or a few connections, such that the network follows a power law distribution (Janssen et al. 2006; Wright et al. 2012). In such networks, high abundance organisms are not necessarily highly interconnected and keystone species are those with high connectivity relative to their abundance or those which provide critical links between nodes nested in otherwise disparate local networks. Thus, resilience is directly related to the degree of disturbance required to ‘fracture’ the service network into disconnected subnets, and functionality may be maintained if a disturbance removes only low connectivity nodes. Resilience is thus reliant on network topology, including network diameter as well as various metrics of network connectivity, for instance, high network density can facilitate rapid recolonisation following disturbance (Janssen et al. 2006). Scale-free networks are generally robust to the random removal of links (Janssen et al. 2006); however, targeted removal of hubs or keystone species will result in rapid fragmentation. Hence, the presence of keystone species may actually decrease resilience (Levin 2001). Interestingly, food-web networks, which have been used for several decades in ecology, and which are based on observations of system trophic interactions and energy flow, may be among the few naturally occurring networks that do not follow the small world, scale-free network topology (Dunne et al. 2002). However, such networks generally have fewer nodes than the types of microbial association network generated using molecular data which we discuss here.
As is the case with other complex biological networks, such as those describing cellular or neurological processes, an understanding of soil association networks requires experimental verification of the strength, direction and reliability of large numbers of potential positive and negative biotic and abiotic interactions (the sum of the soil interactome), a goal which, while not currently feasible (Koch 2012), is one that the field is moving towards. Under laboratory conditions it has been observed that species in mixed communities adapted to perturbation more readily than when individual community components were challenged in isolation (Lawrence et al. 2012). Hence, identification of groups of components that behave as a single functional module will reduce the complexity of associations needed for evaluation, enhancing prospects of gaining a mechanistic understanding of system processes (Koch 2012). Modular redundancy may be identified using network diagrams (Wright et al. 2012), and redundant modules may facilitate a ‘species change-over’ in community composition.
Network re-analysis of bacterial saltmarsh communities
Bowen et al. (2011) examined microbial community response to experimental nutrient loading in saltmarsh sediments where widespread ecological responses to pollution had previously been documented. These responses included macro-faunal community shifts and altered ecosystem functions, including increases in%C,%N, bacterial production and alterations in sediment redox chemistry (Bowen et al. 2011 and references therein). However, Bowen et al. (2011) found that microbial community composition was remarkably resistant to increased nutrient loading despite functional changes, suggesting decoupling of microbial community structure and biogeochemical processes.
The statistical methods used by Bowen et al. (2011) examined changes in relative abundance of each operational taxonomic unit (OTU) individually across paired experimental treatment and control samples, or over four time points in temporal experiments. An alternative approach is to examine associations between OTU's and environmental parameters across all samples. Using a subset (approximately 36% of the total community composition) of the same data analysed by Bowen et al. (2011), we identified co-occurrence relationships of bacteria and environmental variables across all marsh sediment samples (Supp. Methods) and generated an association network (Fig. 2a). Here, network nodes are defined as bacteria (OTU based on 16S rRNA gene sequence identity) or environmental factors, and significant co-occurrence relationships define network edges, where OTUs were determined to co-exist with a ‘neighbour’ more frequently than expected by chance across the 24 samples, as defined by the local similarity analysis algorithm (Ruan et al. 2006). Co-occurrence associations could be positive or negative and linear or nonlinear.
The topology of the resultant network informs questions regarding ecosystem function. The network topology conforms to the scale-free model (Barabási & Oltvai 2004; Chaffron et al. 2010) (Fig. 2). Most of the environmental parameters (green triangles) are correlated with each other, but are not correlated with members of the bacterial community (Fig. 2a). Hence, changes in environmental factors would not be expected to influence microbial community structure – the bacterial community appears resistant to environmental changes, as observed by Bowen et al. (2011). We can make some other general observations from the network, providing avenues for further hypothesis testing. For example, (1) levels of N and P fertilisation were only associated with actual measured environmental levels of C and N, not with any microbial node and (2) and rates of N2 fixation were associated with some environmental variables (C:N, N), but were also positively associated with the Chromatiales (known N2 fixers) implicating them as important nitrogen fixers in this system.
Clearly, some of these results could have been obtained from other statistical approaches routinely used in ecology such as permanova and indicator species analysis. However, these approaches do not explicitly identify the biotic linkages between species that we explore here, being more generally used to identify environmental drivers or shifts in community composition. Network analysis does not result in a large reduction of the dimensionality of data as occurs in distance-based statistical approaches, nor does it require a priori knowledge or designation of sample status. Networks allow observations and exploration of the data underlying more traditional methods, and are inherently useful for developing hypotheses and research directions.
The overall degree of connectivity (average number of connections per node) may prove informative about the complexity of biogeochemical transformations or general ecosystem stability. Nodes may have a large number of neighbours but low betweeness if they associate only with their immediate neighbours or module and not with other modules (see below). Hence, betweeness-centrality is an important indicator of which organisms act as intermediaries between groups of other organisms (nodes with high betweeness have large influence on ‘information’ flow through the network). They may have fewer connections, but mediate more associations. Analysis of the network highlighted one node in particular as displaying high betweeness-centrality (Fig. 2b). Although this node, representing candidate bacterial division WS3 (pink diamond), does not have the most neighbours, its position within the network suggests it provides an important link between the connections of many other organisms (Fig. 2c). Interestingly, this node is not one of the most abundant in the network. Given this, it appears possible that WS3, about which little is known, acts as a keystone species in this environment, and highlights the ability of the network approach to reveal the potential importance of numerically non-abundant or cryptic groups.
Modules are highly interconnected network regions with fewer node connections outside the module than inside. Although the entire network in Fig. 2a highlights co-occurrence and not necessarily interactions, modular associations are likely to infer actual interactions (Zhou et al. 2011). Modules could originate from many sources, (resource partitioning, ecological niche overlap, convergent evolution, phylogenetic relatedness) and could be important for system stability (Olsen et al. 2007). We highlight two modules identified from the saltmarsh data set.
The first (Fig. 2d) consists of two sub-modules displaying strong negative co-occurrence between them. These modules represent organisms that appear in distinct subsets of marsh samples, highlighting patchiness in community composition and potentially of functionality. Such patchiness may reflect functional differences between marshes, or it may be a result of modular redundancy, wherein different groups of co-occurring organism maintain a common functionality. The positive relationships between 11 Alphaproteobacteria taxa (red circles) may signify that functionally similar, closely related taxa do not compete with each other in this environment, and potentially act synergistically. This has also been found in the human gut microbiome – where closely related taxa a positively correlated, whilst functionally similar, but distantly related taxa appear to compete and are negatively correlated (Faust et al. 2012). The second module (Fig. 2e) is in fact an extension from members of the first module. It links two groups of organisms involved in both the nitrogen and sulphur cycles. Photosynthetic Rhodospirillaceae and nitrifying Nitrosococcus, are linked with the denitryfying Sinobacteria and several nodes representing the sulphate reducing Desulfobulbacaea.
In network theory, overlap in modular structure, such as that due to species that belong to multiple modules, is a critical feature allowing effective ‘information transfer’ between processes occurring independently. Whether modularity increases resilience in environmental networks has long been a subject of debate, mostly concerning food-web networks, and remains ‘theoretically unclear and empirically controversial’ (Ruiz-Moreno et al. 2006). For example, it has been argued that modularity may act to retain the impact of disturbances within ‘compartments’ (Kokkoris et al. 2002), and also that modularity is favoured as an adaptation to short rather than long-term disturbances (Ruiz-Moreno et al. 2006). In microbial systems, simplifying the vast complexity of interactions through the identification of modular components relating to functionally coherent units is an important step towards predictive models, and can uncover useful biomarkers for effective monitoring. However, even for defined functions, such as nitrification, linking composition to function is exceptionally difficult. Many organisms have multiple modes of metabolism, hence their presence may not be indicative of the same functional trait across multiple samples, and the taxonomic co-occurrence patterns would potentially be misleading. For example, AO can consume ammonia by reducing oxygen (classic nitrification) or by reducing nitrite (nitrifier-denitrification). Further, some AO can opportunistically oxidise methane or live heterotrophically, oxidising suitable organic carbon substrates (heterotrophic nitrification) (Arp & Stein 2003). Thus, it has been suggested that rather than AO communities being regarded as filling a soil ‘niche’ dictated by available nutrients and energy, ammonia oxidation could be thought of as a metabolic ‘niche’ in which how an organism utilise oxidised or reduced N is the critical concept. Rather than considering taxonomic networks as the principal factor determining community response, one can envision a nitrogen gene network and how it links to reduced nitrogen species metabolism that determines community response to disturbance as discussed for general functional gene assemblies (Burke et al. 2011).
We have discussed interactions within microbial communities and how these link to shifts in abiotic factors, yet these communities also exist within higher levels of ecological organisation and thus, interactions with other organisms such as plants can have profound effects on microbial community response to disturbance. Below we focus on plant–soil symbiont interactions and examine impacts on these associations.