Our understanding of the controls on mycorrhizal fungal species distribution and community organization is in its early childhood – especially when compared with that of the more mature fields of plant and animal community ecology and biogeography – largely because of the historical difficulty of gathering species distribution information. This challenge, arising from the paucity of mycorrhizal morphological characteristics, is magnified because of high diversity, particularly in ectomycorrhizal fungal communities. Although some regional models of ectomycorrhizal sporocarp–environment relationships have been developed (e.g. Tyler, 1985; Hansen, 1988, 1989; Rydin et al., 1997), sporocarps represent a biased subsample of the below-ground community (Gardes & Bruns, 1996). The advent of molecular tools has allowed us to move forward with many detailed below-ground mycorrhizal community analyses (Horton & Bruns, 2001). Many of these analyses have been linked with experiments, gradients and chronosequences, leading to an increased understanding of environmental controls on species distribution and abundance. Although an essential first step, these studies are mostly carried out at a local scale, leading to a highly fragmented picture of species distribution and relationship with the environment that cannot be extrapolated to other sites.
We believe that, in addition to the aforementioned local approach, there is much to be gained by wedding regional- to continental-scale mycorrhizal fungal community characterizations, environmental measurements and the best new modelling approaches to develop a more general understanding of the relationship between mycorrhizal fungal communities and their environment. This approach would allow us to begin to develop species– or community–environment predictive models sufficiently accurate that for any site we could predict the potential pool of dominant fungal taxa (recognizing that stochastic processes will probably determine the actual pool). While ambitious, this approach is essential in order to predict species–environment relationships beyond a narrow set of sites.
A major motivation for this effort is the high rate of human-accelerated environmental change, including elevated atmospheric ozone (O3), CO2, nitrogen (N) deposition, climate change and land use/land cover change (Vitousek et al., 1997; Cubasch et al., 2001; Tilman & Lehman, 2001). It is clear that these changes can affect mycorrhizal fungal species, but it is also clear that we do not yet have data sets sufficiently saturated, or models sufficiently powerful, to determine the exact nature, timing and spatial pattern of fungal community responses. Given that mycorrhizal fungi are phylogenetically and functionally diverse, consume a significant portion of global terrestrial production, play a critical role in nutrient cycling and food webs, and exhibit high sensitivity to environmental change, the ability to predict such community responses is critical for conserving fungal diversity and maintaining ecosystem processes.
Given these concerns, an efficient way to focus our efforts in obtaining community information would be to optimize sampling and experimental designs to address questions on the effects of human-accelerated environmental change on mycorrhizal fungal communities at a global scale. This would achieve several related objectives. First, it would allow us to develop saturated databases of fungal community composition, structure and spatio-temporal dynamics in relation to variable resources and conditions. Second, it would provide a baseline against which to measure the effects of future environmental change. Third, it would permit us to determine where and how fungal communities are presently responding to environmental change. Last, it would identify sites with large components of unidentified fungi that could be foci for much-needed investigation by fungal taxonomists (Korf, 2005).
In order to accomplish this, we must understand how communities of fungi change in response to all the key anthropogenic and natural environmental drivers. This requires the development of quantitative models of species–environment relationships built on several key elements: appropriate study designs, community data, environmental data, and models. In the following sections we describe some initial considerations in bringing these elements together.
Appropriate study designs
Experiments vs gradients
Species–environment response functions cannot be derived from experimental studies involving only two levels of a perturbation, unless those functions are known a priori to be linear. Multilevel experiments or gradient studies are necessary for determining the shape of a response curve. However, once we move beyond the local scale, multilevel experiments become difficult to fund and manage, making sampling of replicate gradients or related stratified sampling techniques the only viable alternative for generating large-scale species–environment relationships. Combining multilevel experiments at a strategic subset of sites with large-scale gradient-based sampling could provide the greatest information and insights.
Using gradients to tie data collection to environmental change
Most environmental changes are spatially variable (e.g. Galloway & Cowling, 2002; Chandra et al., 2003), and multiple change agents can be correlated. By identifying where gradients of environmental change are steepest we can define areas of greatest interest for investigation. To maximize our potential to determine the community response to diverse environmental changes, we should sample across multiple types of gradients (climate, pollution, land use, disturbance, etc.), and incorporate sites that break down correlations between multiple change agents (e.g. between O3 and N deposition). Some environmental changes – notably, elevated CO2– are less amenable to gradient analysis, because they are relatively uniform at the global scale. Although there is some possibility of using localized natural or anthropogenic gradients of CO2 (e.g. Rillig et al., 2000), experimental approaches will probably play a larger role in developing response functions to CO2.
Appropriate species distribution data
The development of DNA-based molecular tools has led to an explosion of investigations into mycorrhizal fungal community ecology (Horton & Bruns, 2001). This development holds great promise, but in order to maximize our ability to use these data to build general predictive models, several requirements must be met.
Species distribution/abundance data across sites must be comparable. This requires that mycorrhizal fungal identity be established using a common metric, with the most useful being internal transcribed spacer (ITS) ribosomal DNA sequences (Horton & Bruns, 2001; Kõljalg et al., 2005). Sequence data are preferable over other approaches because they reduce the ambiguity of species identifications, allowing for comparison among sites and studies. The ITS provides sufficient variation to discriminate at approximately the species level and is readily amplified from small amounts of material using primers of varying specificity. Other ribosomal DNA regions, such as portions of the large subunit (LSU) and small subunit (SSU), are useful in the phylogenetic placement of unknowns when ITS sequences are not informative because of insufficiently saturated databases (Horton & Bruns, 2001), but these regions lack the taxonomic resolution needed for species-level modelling.
Processing of samples from a large-scale sampling program would require high-throughput approaches to sequencing, such as those used by the Fungal Metagenomics Project (Senkowsky, 2006). The rice genome required over 7 million sequences (Goff et al., 2002) and the human genome required over 27 million sequences (Venter et al., 2001), and costs of sequencing continue to decline, so generating several million sequences to characterize the global diversity, distribution and response to environmental change by one of the most important classes of mutualists seems both achievable and reasonable.
Consistent high-throughput methods must be used
Consistent sampling methods would improve the quality of a global data set. Method choice will depend on whether the study focuses on ectomycorrhizal fungi alone or on all mycorrhizal fungi. Unlike other mycorrhizal fungi, ectomycorrhizal fungi are typically monodominant on root tips, permitting sorting of tips into morphotypes followed by DNA analysis. This permits the characterization of frequency, biomass and number of root tips of different taxa (Horton & Bruns, 2001). Caution must be used in interpretation of these data, because each root tip does not represent a separate individual (Taylor, 2002). In addition, this approach is susceptible to lumping species of similar morphologies during the sorting process, and can be labor intensive.
By contrast, all mycorrhizal fungi can be sampled via random sampling of individual mycorrhizal root tips followed by polymerase chain reaction (PCR)-based identification (e.g. Peter et al., 2001; Parrent et al., 2006), although a cloning step is required for most nonectomycorrhizal types. This approach is compatible with presence–absence or frequency-based metrics of abundance.
Similarly, bulk DNA extraction of pooled mycorrhizal root tips is viable for all classes of mycorrhizae. Although soil or hyphae can also be extracted, these will have a higher proportion of nonmycorrhizal fungi than roots, so are more appropriate for total soil fungal community analysis. These mixtures can then be subjected to PCR, separated and sequenced. PCR is a very powerful approach, but results for mixtures are subject to bias, sensitivity limitations for amplifying rare or divergent sequences, and the potential for chimera formation, which need to be taken into account during sampling and analysis.
Current PCR-based approaches used for analyzing DNA mixtures are semiquantitative. A common approach is the cloning and sequencing of PCR products. Methods for high-throughput cloning and sequencing are rapidly evolving (e.g. Hutchison et al., 2005; Metzker, 2005) and could be adapted for large-scale community analysis. The major drawback of this approach is the redundant sequencing of dominant taxa required to obtain sequences of rarer taxa, but costs of sequencing are dropping quickly enough that this is less of an issue.
Another commonly used approach is slab gel electrophoresis-based separation approaches, such as temperature or denaturation gradient gel electrophoresis (TGGE and DGGE, respectively) followed by sequencing of unique fragments (Anderson & Cairney, 2004). These approaches are generally labor intensive and therefore are relatively low throughput. Potential, but as-yet untapped, high-throughput analogs of the above methods are carried out by either capillary electrophoresis (CE) or denaturing high pressure liquid chromatography (DHPLC) combined with automatic fraction collectors (e.g. Berka et al., 2003; Domann et al., 2003). DHPLC is presently commercially available (e.g. Domann et al., 2003), but it is not widely used. Although CE has the potential for parallel processing via capillary arrays, which could greatly accelerate throughput (Berka et al., 2003), it is not yet commercially available.
As an alternative to sequencing approaches, community microarrays are under development (Anderson & Cairney, 2004; DeSantis et al., 2005; Sessitsch et al., 2006) that hybridize target DNA with a high density array of thousands of probes, providing a rapid evaluation of whole-community composition and semiquantitative abundance determined from hybridization intensity. If technical challenges are overcome and the cost per microarray chip becomes reasonable, this would permit very rapid characterization of high numbers of samples, providing the possibility of more replicate samples per site and a resultant high sampling density that would improve modelling efforts. The main disadvantage is that species not included in the array will be missed in the analysis, making it less valuable in systems where many community members are unknown. Thus, the microarray approach would be most useful after intensive high-throughput sequencing-based approaches have generated sufficiently saturated sequence databases.
Appropriate environmental data
Scale affects choice of predictor variables
The scale of investigation will affect the environmental variable selection. In local models, variables such as disturbance or land use history, host community, soil pH and nutrients, host nutrition, parent material, slope and aspect are likely to be important. As the scale of investigation expands, additional variables, such as temperature, precipitation and biogeographic constraints (e.g. endemism), will probably emerge as significant variables. Some of these data will be readily available in geographic information system (GIS)-based data sets, but other data must be collected on site.
Distal vs proximal variables
Variable choice affects both model quality and data collection costs. An important choice to make is between distal and proximal variables. Distal variables are farther removed from, and hence do not act directly on, the dependent variable. By contrast, proximal variables are closer to, and hence may directly act on, the dependent variable. In Fig. 1 we present an example of selected distal and proximal variables that could be used in characterizing the community response to components of changing atmospheric chemistry.
There may be advantages and disadvantages of using distal vs proximal variables in modelling species distribution and abundance. The main advantage of distal variables is that they are usually easier to measure or estimate, and are often available as GIS layers. For example, latitude and longitude, topography, geology, climate, N deposition, atmospheric O3, and foliar N might be much easier to measure or model (e.g. Smith et al., 2002) than soil moisture, soil N, soil texture, or below-ground carbon (C) allocation by the host tree. When distal variables are easier to measure and highly correlated with proximal variables it will be advantageous to use the distal variable. However, in some cases distal variables will be poorly correlated with the proximal variable. Two examples illustrate the complexities involved in variable choice: below-ground C allocation; and soil N.
Most models of plant C allocation suggest that below-ground C allocation is a function of plant C gain and nutrient status (Le Roux et al., 2001). The response of the fungal community to environmental changes, such as N deposition, CO2 and O3, could depend very much on complex interactions among host nutrition, C gain and below-ground C allocation, although the exact nature of these interactions and their effect on mycorrhizal fungal communities is at present poorly understood. Although it would be ideal to measure below-ground C allocation directly, these measurements are notoriously difficult to make (Giardina et al., 2005), so it would have to be either ignored or modelled using easier to measure, but more distal, variables, such as foliar nutrients, tree growth and atmospheric chemistry. In this case, it becomes important to incorporate C allocation models that capture below-ground allocation dynamics and can be appropriately parameterized across a broad range of species.
In contrast, the proximal variable soil N (e.g. extractable mineral pools, organic horizon C : N) is relatively easy to measure and appears to be a good predictor of ectomycorrhizal species or genus abundance (Lilleskov et al., 2001, 2002). Given that soil N is a complex product of multiple distal variables (e.g. N deposition, site history, soil type, biota, climate), during model parameterization it would be preferable to measure soil N directly, rather than attempting to model it using distal variables. However, efforts to extend these predictions beyond sampled sites would still require input from biogeochemical models that use distal predictors to estimate soil N at unsampled locations (e.g. Rowe et al., 2005).
An additional problem with certain distal/indirect variables (e.g. elevation) is that as the scale of studies expands, predictions using these variables become worse (Guisan & Zimmerman, 2000), limiting their utilities in more general, large-scale models.
Class and continuous variables
Once we have determined the most relevant predictors for characterizing species–environment relationships, we need to determine the most appropriate way to measure them. Some predictors, such as host identity, are clearly class variables. Others, such as soil pH or N, are clearly continuous variables. However, many factors can be conceptualized as either class or continuous variables (e.g. disturbance, host community or substrate). When possible, it is more useful for defining response functions to conceptualize and measure variables in a continuous manner. For example, rather than specifying stands as disturbed or undisturbed, more useful metrics would be related continuous variables such as time since disturbance, forest floor biomass or host species biomass. Similarly, host biomass data are preferred to host presence/absence data. If necessary, continuous data can always be converted to class data, but not vice versa.
Most current models focus on predicting individual species distributions rather than whole communities (Guisan & Zimmermann, 2000; Austin, 2002). This derives not only from the relative simplicity of modelling individual species, but also from our understanding, derived largely from plant ecology, of the individualistic nature of species assemblages (Gleason, 1926), as evidenced by their independent assortment along environmental gradients (Whittaker, 1967) and lability of community composition over time and space (Davis, 1981).
Community models are either in conflict with this theoretical formulation or an extension of it, depending on the approach used. During community modelling, (1) whole communities can be characterized then modelled as a function of the environmental variables, (2) multiple species can be simultaneously modelled as a function of environmental variables, or (3) the net result of multiple individual species–environment models can be assembled into a community prediction (Ferrier & Guisan, 2006). The first (and least ‘Gleasonian’) class does not allow individualistic species responses, provide individual species maps, or extrapolate beyond known communities. The third class should do best at modelling individualistic species responses and defining individual species distributions. The first and second classes have a variety of strengths, for example they can rapidly analyze large numbers of species and perform well when species are encountered infrequently (Ferrier & Guisan, 2006), as in ectomycorrhizal fungal community sampling. The most appropriate methods for modelling mycorrhizal fungal communities will have to be determined by comparative analysis of different approaches, but will probably derive from the second or third class of models.
Static vs dynamic models
Most models used to predict distribution and abundance of species are static (Guisan & Zimmermann, 2000). Static models predict current distribution in relation to environmental variables, assuming equilibrium conditions. The major advantages of static models are that they are relatively easy to build, parameterize and test, and are therefore favored for large-scale species distribution modelling efforts. One of the simplest classes of static models is regression. Other static models allow more flexibility in modelling species–environment relationships (e.g. generalized linear models, generalized additive models, ordination methods, regression and classification tree analysis) (Guisan & Zimmermann, 2000).
However, the equilibrium assumption may not be valid when modelling fungal communities in a changing environment. Nonequilibrium conditions arise in response to naturally dynamic conditions (e.g. disturbance, climate change), but human-accelerated environmental changes may increase disequilibria. For example, it appears that there may be significant lags in the ectomycorrhizal fungal below-ground community response to elevated N deposition (Lilleskov, 2005). Lags of this sort could lead to poor static model parameterization.
By contrast, dynamic models can address nonequilibrium processes, such as succession, changing soil chemistry, changing below-ground C allocation and climate change. Forest ecologists have long used dynamic models to predict spatio–temporal dynamics in species distribution and community structure and composition (e.g. Urban et al., 1991; Carey, 1996; Gao et al., 1996; He et al., 2002; Gratzer et al., 2004), and simple dynamic models have been explored for fungal communities (e.g. Halley et al., 1994). However, to structure and parameterize these models correctly requires much more information than for static models, and so they are rarely parameterized for species distribution modelling at large scales. Characterizing the spatio–temporal dynamics of mycorrhizal fungal species assemblages in relation to multiple variables across a broad range of environments would be extremely challenging, requiring data that are not easily obtainable.
To deal with these difficulties, a viable two-pronged approach would be initially to build and test static species distribution and abundance models. If serious deficiencies are apparent that are probably the result of disequilibria, then we can work towards the parameterization of dynamic models, based on the results of experimental, gradient, chronosequence and longitudinal studies.
Although it is tempting to throw up our hands given the complexity of this challenge, we believe that the attempt should be made to begin to build global data sets and predictive species/community models, recognizing that this will be an iterative process, involving continual improvement of tools, data and models (Fig. 2). An efficient approach to providing high cross-comparability of both species and environmental data would be to develop a research consortium that uses a mutually agreed upon sampling scheme to achieve maximum coverage for minimum effort, similar to the community effort that supported the Deep Hypha project (http://ocid.nacse.org/research/deephyphae/projects.php). The price of not acting now will be a lost opportunity to define baseline species distribution data in the face of rapid global change. We have touched on a few issues. The key next steps are rallying a diverse group of researchers to collaborate in this process, and finding the resources to support large-scale data collection and modelling efforts. The time to take these steps has come.
We thank Dr Andy Taylor for the invitation to give the talk at the 5th International Conference on Mycorrhiza, 23–27 July 2006, Granada, Spain, on which this paper was based.