Co-occurrence based assessment of species habitat specialization is affected by the size of species pool: reply to

Authors


*Correspondence author. E-mail: zeleny@sci.muni.cz

Summary

  • 1Fridley et al. (2007) introduced a technique of species habitat specialization assessment based on co-occurrence analysis of large species-plot matrixes, with a continuous metric (θ value) intended to reflect relative species niche width.
  • 2They used simulated data in order to demonstrate the functionality of the new method. I repeated their simulation and introduced three alternative scenarios with various patterns of species pool size along a simulated gradient. Results indicated that the co-occurrence based estimation of species niche width is dependent on the size of species pool at the position of species optima. This relationship was also revealed in an analysis of a real data set with Ellenberg indicator values as surrogates for environmental gradients.
  • 3I introduced a modification of the original algorithm, which corrects the effect of the species pool on the estimation of species niche width: the beta diversity measure based on additive partitioning was replaced with the multiplicative Whittaker's beta. Even after this, the method can satisfactorily recover the real pattern of species specialization only for unsaturated communities with a linear relationship between local and regional species richness.
  • 4Synthesis. This paper corrects the algorithm for co-occurrence based estimation of species specialization, introduced by Fridley et al. (2007), which was sensitive to the changes in species pool size along environmental gradients.

Introduction

In their recent work, Fridley et al. (2007) introduced a novel technique to assess habitat generalists and specialists, based on analysis of co-occurrence data extracted from large vegetation data sets. The theory is simple and straightforward: for species occupying many different habitats – generalists – the rate of species turnover among plots in which they occur will be relatively high, while for species restricted to specific habitats – specialists – the species turnover rate will be relatively low, simply because they consistently occur with a limited number of other species. A continuous metric of habitat specialization proposed by Fridley et al. (2007) is called ‘theta’ (θ) and its calculation is based on a measure of beta diversity among the plots with given species. The θ value should be an estimate of species niche width. However, given that real vegetation data are used to calculate θ, the results will reveal realized, not fundamental, species niche, and their validity will be limited to the data set used for analysis. The main advantage of this method is that there is no need for information about the ecological gradient and species position along this gradient. Instead, only a sufficiently large data set of vegetation plots and an algorithm written in R by Fridley et al. (2007) is required. Fridley et al. (2007) also tested the effectiveness of the proposed θ metric using a simulation of species abundance along a single gradient with known species niche widths. They concluded that the method recovers the simulated pattern of species niche widths and it is fairly robust considering sampling bias and various shapes of species response curves. Taking these results as proof of the ability of the θ metric to recover the real pattern, Fridley et al. (2007) analyzed vegetation data sets together with species trait databases and interpreted species habitat specialization by means of species life histories.

The original simulation algorithm of Fridley et al. (2007) assumes that the optima of simulated species response curves are situated along the environmental gradient in a random fashion (see Table 1 p. 711 of their paper), which (with a low number of simulated species response curves and sampled plots) results in a relatively even pattern of species pool size along the gradient. However, numerous studies (for example Aarssen & Schamp 2002; Ewald 2003; Peet et al. 2003; Hájek et al. 2007) have reported uneven patterns of species pool size along gradients, indicating that changes in species pool size along gradients must be taken into account. Here, I define the size of the species pool in a particular position along (or section of, respectively) the gradient as the number of species having at this position (or section, respectively) non-zero probability of occurrence (and being thus a subset of the overall species pool, which includes all species in the study).

In this paper, I address the questions of how and why changes of species pool size along environmental gradients affect the results of the co-occurrence based estimation of habitat specialization. First, using a modification of the simulation algorithm of Fridley et al. (2007), I show that with uneven patterns of species pool size along the gradients the original algorithm does not satisfactorily recover the simulated pattern of species niche widths. An identical effect is also documented by analysis of an extensive vegetation data set. Subsequently, I propose a modification of the original algorithm, which is less affected by the variation in species pool size along the gradients. Finally, I discuss the conditions under which this method gives reliable estimates of species niche widths.

Relationship between estimated θ value, actual niche width and species pool size

In their simulation, Fridley et al. (2007) compared four scenarios focused on shape of species response curves (symmetrical or skewed) and sampling bias (samples distributed along the gradient randomly or strongly biased to one end). I used the same simulation algorithm, but modified the assumption about the distribution of species response curves along the gradient in order to produce an uneven pattern of species pool size along this gradient, and asked the following question: how will the uneven pattern of species pool size affect the results of the co-occurrence method of habitat specialization assessment?

From the simulation scenarios used by Fridley et al. (2007), I selected the one with symmetrical species response curves and samples randomly distributed along the gradient. To alter the pattern of species pool size changes along the gradient, I introduced three alternative scenarios with various distributions of species optima along the gradient: (i) species optima distributed randomly along the gradient, identical to the original algorithm (Fig. 1a), (ii) species optima concentrated in the central part of the gradient (Fig. 1c), and (iii) species optima distribution strongly skewed towards one end of the gradient (Fig. 1e). While Fridley et al. (2007) in their simulation used 50 species and 500 plots, I increased these numbers to 300 species and 3000 plots. Technical aspects of the algorithm modification are described in Appendix S1 in Supporting Information. All calculations in this paper have been carried out in the R program (R Development Core Team 2007).

Figure 1.

Changes in the size of species pool along the gradient for three simulation scenarios, differing by the distribution pattern of species optima along the gradient: species optima distributed randomly along the gradient (a, b), species optima concentrated along the gradient midpoint (c, d) and distribution of species optima skewed towards one end of the gradient (e, f). Figures in the left column show distribution of species optima along the gradient, figures in the right column represent the resulting pattern of changes in species pool size along the gradient. The size of the species pool at a particular gradient position corresponds to the number of species with non-zero probability of occurrence.

Figure 1 illustrates the effect of three simulation scenarios with various species optima distribution on changes in species pool size along the gradient. Species pool in a particular position of the gradient is defined as the sum of species with non-zero probability of occurrence in this position. Even in the case of random distribution of species optima along the gradient (Fig. 1a), the species pool shows a hump-back shape with a maximum close to the gradient midpoint (Fig. 1b). The hump-back shape is even more pronounced if species optima are concentrated in the central part of the gradient (Fig. 1c,d). The skewed distribution of species optima along the gradient (Fig. 1e) results in a similarly skewed response in species pool size (Fig. 1f). The degree to which the estimated θ value reflects the actual niche width and species pool is shown in Fig. 2. The estimated θ value is best correlated with the actual niche width in the case of random distribution of species optima along the gradient (Fig. 2a, R2 = 0.83), while in the case of skewed distribution of species optima the correlation is rather weak (Fig. 2e, R2 = 0.35). The less the θ value reflects the actual niche width, the more it correlates with the size of species pool at the given position along the gradient: while in the first scenario (random distribution of species optima) the species pool explains 35% of variability (Fig. 2b), in the case of the second and third scenarios (non-random distribution of species optima) it is around 60% (Fig. 2d,f). The reason for the fact that there is a correlation between θ values and the size of species pool even in the case of random distribution of species optima along the gradient is a hump-back shape of species pool size along the gradient, a pattern which could be interpreted as a mid-domain effect: species ranges overlap increasingly toward the centre of a bounded domain, in our case an environmental gradient (e.g. Colwell & Lees 2000). An identical mechanism is perhaps responsible for the decline of the species pool near the end of the gradient in the case of skewed distribution of species optima (Fig. 1f).

Figure 2.

Results of three simulation scenarios, differing in the distribution of species optima along the gradient: species optima distributed randomly along the gradient (a, b), species optima concentrated along the gradient midpoint (c, d), and distribution of species optima skewed towards one end of the gradient (e, f). Figures in the left column display the correlations between the calculated θ value and simulated species niche width, with circle sizes corresponding to the size of species pool at the position of the given species optimum. Figures in the right column show the correlations between the calculated θ value and size of species pool at the particular gradient position. Simulation models are identical to those used in Fig. 1.

In Fig. 2a, c and e, the size of the species pool at the position of the given species optima is proportional to the size of circle for that species. Especially in scenarios with a non-random distribution of species optima along the gradient, the larger circles tend to be above the regression line, and smaller circles below. This means that for species with optima in the part of the gradient with a larger species pool, the θ value tends to overestimate the real species niche width, while for species with optima in the part of the gradient with a smaller species pool the real niche width will be underestimated.

Analysis of real vegetation data with Ellenberg indicator values as surrogates for environmental gradients

The following example will illustrate the effect of species pool size on the estimation of species niche width, using the method of Fridley et al. (2007). Let us suppose that the Ellenberg indicator values for vascular plants (EIVs, Ellenberg et al. 1992) can be taken as satisfactory surrogates for basic environmental factors. Most of the Central European species have been assigned in 1 of 9 (12) ordinal classes of EIVs for moisture, nutrients, soil reaction, temperature, continentality and light according to the position of their ecological optima along these gradients. Classes of particular EIVs are known to contain different numbers of species (e.g. Aarssen & Schamp 2002; Ewald 2003) which reflects changes in species pool size along particular gradients. EIVs indicate the position of species optima along gradients but contain no information on niche width. As such the number of species in particular EIV classes underestimates the real species pool size in a given position on the gradient. However, in this study, relative changes in species pool size along the gradient are of main importance, and these are reasonably reflected by the numbers of species assigned to particular EIV classes. Using extensive vegetation data sets, I calculated the θ value for more than 700 species in the list of EIVs and try to answer the question: how is the estimated θ value for species in given class of EIV dependent on the size of species pool (number of species) of this class?

I used a data set of 43 807 phytosociological relevés, which carry information about co-occurrence patterns of more than 2200 species in small plots (4–400 m2, depending on vegetation type; Chytrý & Otýpková 2003). This data set results from geographically stratified resampling (Knollováet al. 2005) of more than 80 000 relevés stored in the Czech National Vegetation Database (Chytrý & Rafajová 2003) and contains all vegetation types recorded in the Czech Republic over the past 90 years. For all species occurring in at least 20 relevés (all together 1428 species), I calculated θ values using the algorithm proposed by Fridley et al. (2007). For 705 of these species with assigned EIVs, I plotted their θ values against EIVs for soil reaction, nutrients and moisture (Fig. 3a,c,e). Afterwards, I plotted median θ values for species in particular EIV classes against the number of species in these classes (Fig. 3b,d,f).

Figure 3.

Relationship between co-occurrence based estimate of niche width (θ) of species, calculated from a large matrix of 43 807 relevés, and Ellenberg indicator values for (a) reaction, (c) nutrients and (e) moisture. The corresponding figures in the right column (b, d, f) display the correlation between the median θ value, calculated for species from particular classes of the given Ellenberg values, and the number of species in these classes (numbers displayed in the plot correspond to the class numbers). The axes with species numbers in (b) and (f) are log scaled.

The resulting pattern is quite clear: except for a few marginal classes (class 9 for soil reaction and class 1 for moisture) there is a positive relationship between median θ value and the number of species in particular EIV classes. Interpretation of these results is identical to the interpretation of the simulation results: species occurring in EIV classes with more species (i.e. at the position of the gradient with the larger species pool) have systematically higher θ values. With the method of Fridley et al. (2007) they would be misinterpreted as more generalist than species occurring in classes with lower species frequencies.

Main pitfalls of the method, possible corrections and conditions of use

The critical aspect of the algorithm proposed by Fridley et al. (2007) seems to be the selection of a beta diversity measure based on ‘additive partitioning’ of diversity components. The method of co-occurrence based estimation of species specialization compares beta diversities among groups of plots and each of these groups can be derived from species pools of different size. Therefore, the beta diversity measure used must be independent of the size of species pool. As discussed further, beta diversity measure based on additive partitioning does not fulfil this criterion.

The conceptual model in Fig. 4 shows an effect of the local–regional species richness relationship on the calculation of beta diversity using additive partitioning and multiplicative partitioning approaches. Suppose that we study the unsaturated community (Fig. 4a, solid line) with a linear relationship between local and regional species richness (e.g. Srivastava 1999), which can be expressed by the following equation:

Figure 4.

Effect of local–regional species richness relationship (a) on the calculation of beta diversity based on (b) additive partitioning of diversity components and (c) multiplicative partitioning using Whittaker's beta. In all figures, the solid line represents an unsaturated community with a linear relationship between regional and local species richness, while the dashed line represents a saturated community with a curvilinear local–regional richness relationship.

µ(α) = kγ(eqn 1)

where µ(α) is local species richness (mean alpha diversity), γ is regional species richness (or size of species pool), and k is the slope of the correlation between local and regional species richness. Beta diversity βa based on additive partitioning of diversity (e.g. Veech et al. 2002) is expressed as:

βa = γ – µ(α)(eqn 2a)

and Whittaker's beta diversity βw (Whittaker 1960) as:

βw = γ/µ(α)(eqn 3a)

By combining eqn 1 with eqns 2a and 3a we get:

βa = γ – µ(α) = γ – kγ = γ(1 – k)(eqn 2b)
βw = γ/µ(α) = γ/kγ = 1/k(eqn 3b)

For a community with a linear relationship between local and regional species richness, k will be constant. A beta diversity measure based on additive partitioning (Fig. 4b, solid line) will increase with the size of species pool (Gaston et al. 2007), while Whittaker's multiplicative measure of beta diversity (Fig. 4c, solid line) will not be affected by the changing size of species pool (Srivastava 1999).

Therefore, I modified the original algorithm of the θ calculation proposed by Fridley et al. (2007), replacing the additive measure of beta diversity with Whittaker's beta (Appendix S2). I re-ran the simulation described at the beginning of this paper, particularly the third scenario with species optima distribution strongly skewed towards one end of the gradient (Fig. 1e). The results are shown in Fig. 5a,b. Compared to the result of the original simulation (Fig. 2e,f), the correlation between simulated niche widths and beta diversity measure became significantly stronger (R2 = 0.75 vs. original R2 = 0.35) and the correlation between species pool and beta diversity became weaker (R2 = 0.15 vs. 0.62). I used this modified algorithm for θ calculation and for recalculation of the analysis with real vegetation data and EIVs. The results showed there was no correlation between estimated θ value and number of species in particular EIV class (not shown).

Figure 5.

Results of the simulation with a left skewed distribution of species optima along the gradient, where additive beta diversity measure was replaced with multiplicative Whittaker's beta. Figures in left columns show the relationship between simulated species niche width and Whittaker's beta, with the size of the circles corresponding to the size of the species pool at the position of a given species optima. Figures in right column show the relationship between the species pool size and Whittaker's beta. Figures (a) and (b) are the result of a simulation using the original algorithm with a curvilinear local–regional species richness relationship (Fig. 4a, dashed line). Figures (c) and (d) are based on the modified simulation algorithm with a linear local–regional species richness pattern (Fig. 4a, solid line).

However, as is obvious from the distribution of circle sizes in Fig. 5a, the effect of species pool size has not been removed completely, as the smaller circles are still found below the regression line and the larger circles above (circle size increases with the size of the species pool). The reason for this lies in the original simulation algorithm, responsible for the projection of simulated species pool sizes into the species richness of individual plots. This algorithm randomly assigned a given number of individuals (around 100) to species according to a simulated probability of species occurrence in the given gradient position, whereas more than one individual could be assigned to the same species. This will result in the nonlinear shape of the local–regional species richness relationship (Fig. 4a, dashed line), which is typical for saturated communities (Srivastava 1999). In this relationship, the slope k of the local–regional species richness relationship is not constant, but it is a function of species pool size:

k = f(γ)(eqn 4)

Modified equations for additive (eqn 2b) and Whittaker's (multiplicative; eqn 3b) beta diversity measures will be:

βa = γ(1 – k) = γ(1 – f(γ))(eqn 2c)
βw = 1/k = 1/f(γ)(eqn 3c)

which means that for a curvilinear pattern of the local–regional richness relationship, both additive and multiplicative measures of beta diversity will be affected by the size of species pool (Fig. 4b,c, dashed line; see also Srivastava 1999). If we modify the original simulation algorithm by replacing the curvilinear local–regional richness relationship for the linear one (see Appendix S3), the effect of species pool size on the estimation of niche width will disappear (Fig. 5c,d).

Conclusions

The algorithm for estimating species niche width from co-occurrence data, as introduced by Fridley et al. (2007), is sensitive to the variation in species pool size along the gradient, resulting in strongly biased estimates of species niche width. I propose a modification of this algorithm, which reduces the effect of the species pool size. It replaces the original beta diversity measure based on additive partitioning of diversity components with Whittaker's beta diversity, which is a multiplicative measure. However, even after this modification, the method will satisfactorily recover the real pattern of species specialization only for unsaturated communities, i.e. those with a linear relationship between local and regional species richness. As available empirical evidence indicates that unsaturated communities prevail in nature (e.g. Cornell & Lawton 1992; Caley & Schluter 1997; Lawton 1999), the method of Fridley et al. (2007) with the modification proposed here will give reliable results for most studies.

Acknowledgements

This paper benefited from valuable comments by Ching-Feng Li and Milan Chytrý (Department of Botany and Zoology, Masaryk University, Brno). I also appreciated the recommendations of Sándor Bartha and an anonymous reviewer on an earlier version of this paper. The study was supported by the long-term research plan MSM 0021622416.

Ancillary