1Manipulative experiments are often used to identify causal linkages between biodiversity and productivity in terrestrial and aquatic habitats.
2Most studies have identified an effect of biodiversity, but their interpretation has stimulated considerable debate. The main difficulties lie in separating the effect of species richness from those due to changes in identity and relative density of species.
3Various experimental designs have been adopted to circumvent problems in the analysis of biodiversity. Here I show that these designs may not be able to maintain the probability of type I errors at the nominal level (α = 0·05) under a true null hypothesis of no effect of species richness, in the presence of effects of density and identity of species.
4Alternative designs have been proposed to discriminate unambiguously the effects of identity and density of species from those due to number of species. Simulations show that the proposed experiments may have increased capacity to control for type I errors when effects of density and identity of species are also present. These designs have enough flexibility to be useful in the experimental analysis of biodiversity in various assemblages and under a wide range of environmental conditions.
Understanding how changes in biodiversity affect the quality of human life is of great concern to ecologists, policy-makers and the general public (Chapin et al. 1998). This requires a better understanding of the extent to which biodiversity is causally related to productivity and other aggregate properties of natural systems (Naeem et al. 1994; Hector et al. 1999; Emmerson et al. 2001; Tilman et al. 2001). A large number of empirical and theoretical studies have addressed these issues in the past decade (Tilman 1999; Loreau 2000; Naeem & Wright 2003). These studies have considerably advanced our understanding of the role of biological diversity in maintaining fundamental ecological processes. While generating novel understanding, experiments have also drawn attention to the difficulties inherent in identifying the underlying causal mechanisms associated with changes in biodiversity (Aarssen 1997; Huston 1997; Wardle 1999a, 1999b). A widely discussed problem is that of separating effects due to changes in number of species (or other taxonomic categorizations) from effects due to changes in identity of species. Positive effects of biodiversity may be due to the addition of complementary functional traits as more species accumulate in a system (complementary effect); to increased probability of including species with particularly important traits in species-rich assemblages (selection effect); or to both mechanisms simultaneously (Huston 1997; Lepšet al. 2001). A less debated, although explicitly recognized, problem is that of controlling for effects associated with changes in density (or biomass) of species across manipulated gradients of number of species (Hector 1998). Biodiversity is a general term that includes all these effects. Here the term species richness (SR) is used to indicate complementarity or facilitative (Huston & McBride 2002) effects simply due to increasing number of species; identity of species (ID) is used to indicate species-specific traits (e.g. size) of individual species that may have disproportionate effects on response variables; and density of species (DE) to indicate the importance of density or biomass of individual species on response variables. Attempts to separate these effects have generated various kinds of experimental design for analyses of biodiversity (Hector 1998; Allison 1999; Tilman 2001; Naeem 2002; Mikola et al. 2002).
When analyses focus primarily on hypotheses about effects of complementarity, it is necessary to mitigate the influence of possible effects of ID (Huston 1997). A widely accepted procedure to achieve this is the use of hierarchical designs in which replicated random assemblages are nested within each level of SR (Hooper & Vitousek 1997; Hector et al. 1999; Cottingham et al. 2001, Tilman 2001; Downing & Leibold 2002). Fabricating an experimental gradient of SR, however, also creates differences in overall density or, alternatively, in relative density of species across treatments. The common course of action to control for changes in density is that of reducing the relative abundances of species as SR increases, in order to maintain a constant overall density across treatments. This is the approach characterizing substitutive designs, a kind of replacement series experiments that are widely used in studies of biodiversity (Hector 1998; Joliffe 2000).
The use of one or both of these procedures requires considerable attention to the specific hypotheses addressed by the experiment, and to whether each approach is appropriate for the system under investigation. The efficacy of nested designs in separating potential effects of ID may depend on the distribution of relevant traits across species. In principle, assemblages with skewed distributions may be more problematic than those in which traits are distributed more evenly across species. The use of hierarchical designs in which replicated assemblages are nested within levels of SR is, in general, a sound procedure to test hypotheses about the effect of increasing numbers of species over and above possible effects of ID. However, a potential problem with this approach, rarely considered in experiments, is that the number of possible selections is not constant across levels of SR. For example, with levels of two, four and eight species and with three replicate assemblages within each level, there are six, 12 and 24 species chosen, respectively. That is, there is still a greater chance for the largest level of SR to include more species with ‘important’ traits than the other levels.
Similarly, prioritizing control on overall density of species in substitutive designs may be appropriate if changes in relative density have negligible effects on targeted response variable(s). In contrast, if changes in relative density are important, substitutive designs may not be appropriate to distil an effect of SR per se. Suppose that density-dependent processes operate in a system so that species A has a negative effect on the response variable at a density of two units per experimental plot (the density of A in a low-diversity treatment), whereas it has no effect at a density of one unit (the density of A in a high-diversity treatment). Under this scenario an apparent effect of diversity is detected when, in fact, the effect would be entirely driven by changes in the relative density of species.
Negative density-dependent effects are biologically realistic and may occur under a wide range of conditions. Examples include effects of early colonists on the establishment of later arrivals (Robinson & Edgemon 1988, Drake 1991); consumer–resource interactions (Barkai & McQuaid 1988); and apparent competition (Rand 2003). If negative effects target species that are particularly important in driving the functioning of the system (however defined), then substitutive designs in which negative density-dependent interactions fade away as levels of SR increase will confound effects of SR with those of DE.
In this paper, effects of ID and DE have been modelled to explore how far they can impair the ability of commonly used experimental designs to detect unequivocal effects of SR. Spurious effects were determined as inflated probabilities of type I errors under a true null hypothesis of no effect of SR. A modelling approach was also used to assess the robustness of alternative designs to separate the effects of ID and DE; the results of these analyses served as a basis to introduce a general framework for the design of experiments on biodiversity.
Materials and methods
separating the effects of sr and id
The effects of ID were examined with Monte Carlo experiments in which the number of species was manipulated according to a hierarchical design with random assemblages nested within levels of SR and with biomass as the response variable. The procedure generated values of biomass comparable with those obtained in manipulations of vascular plants, according to the following linear model:
where yijr is the rth observation within the ijth combination of the other factors, µ is the general mean of the population (µ = 50), Si is the effect of the ith level of SR (i = 1, … , 3), A(S)j(i) is the effect of the jth assemblage (j = 1, … , 8) nested in the ith level of SR, and ɛr(ij) is the error term associated with observation yijr (r = 1, … , 5 replicates). To simulate the realistic condition in which only few species have the appropriate life-history traits to make a relevant contribution to the response variable, assemblages were replicated by sampling a log-normal distribution with mean = 2 and standard deviation = 2·5. This was contrasted with the condition in which traits are evenly distributed across species, which was achieved by sampling a uniform distribution in the range 0–50. The A(S)j(i) terms originated from the summation of these random draws over the appropriate number of species within each level of diversity. Errors were generated from a normal distribution with mean = 0 and variance = 300. Simulations were done under a true null hypothesis of no effect of SR (Si = 0 for every i).
To examine how far effects of ID might depend on the number of species used, two sets of simulations were repeated with two, four and eight species in one set and six, 12 and 24 species in the other set. These figures span the range of values used in real experiments. There were 1000 simulated experiments for each combination of levels of SR, number of replicate assemblages and skewed vs. uniform distribution of traits. Data from each experiment were transformed to the natural logarithm (as it is commonly done in analyses of real experiments) and then analysed with a two-factor nested anova with the effect of diversity tested over the mean square of assemblage. The probability of type I error (at α = 0·05) was determined as the number of times the null hypothesis of no effect of diversity was rejected over 1000 simulated experiments.
separating effects of sr and de
The effects of DE were investigated using a Monte Carlo procedure similar to that employed to investigate those of ID. The effect size of density was added by imposing a proportional increase to the mean value of the response variable for every 50% reduction in relative density of species. Proportional changes were in the range 0·2–1, to simulate increases between 20 and 100% of the response variable. Adjustments to relative densities are required in substitutive designs in order to maintain a constant overall density across levels of SR. Thus a reduction of 50% is necessary when the number of species is doubled (e.g. from two to four species per experimental unit). Simulated experiments consisted of three levels of SR (six, 12 and 24 species) and four replicate assemblages nested within each level of SR (n = 5 replicate units). All other details are the same as those described in the analysis aimed at separating the effects of SR and ID.
To be meaningful, experiments on biodiversity must deal simultaneously with the effects of ID and DE. In order to achieve this, two modifications were introduced to the basic nested design discussed so far. The first modification was a constrained random selection of species. That is, once a group of species was selected to create a low level of SR, these species should also appear in the higher levels together with new, randomly drawn species. This procedure has already been used in some biodiversity experiments (Hector et al. 1999; Downing & Leibold 2002), but the extent to which this can separate the effects of ID from those of SR has never been explored.
The second modification originated from existing experiments that were developed as alternatives to the substitutive design in studies of competition (Underwood 1978, 1984). It consisted in introducing density (or biomass) explicitly in the design as a factor. The new design therefore included SR and DE as fixed, crossed treatments and assemblage as a random factor nested within SR and crossed with DE (Fig. 1). The relevant test to tease apart the effect of SR from possible effects of DE was provided by the SR × DE interaction. The null hypothesis of no effect of SR is rejected whenever the effect of simultaneously increasing density and number of species is different from that occurring when only density is increased (Fig. 2).
Monte Carlo simulations were run to investigate the performance of the proposed design when challenged by the simultaneous effects of DE and ID, according to the following linear model:
where yijkr is the rth observation within any given combination of the other factors, µ is the general mean of the population (µ = 50), Si is the effect of SRi (i = 1, … , 3), A(S)j(i) is the effect of assemblage j (j = 1, … , 4) nested in level of SRi, Dk is the effect of density k (k = 1, 2), ɛr(ijk) is the error term associated with any given observation (r = 1, … , 5 replicates), and other terms indicate the interactions among specified effects. The details of these simulations were similar to those described in the previous subsection. Analysis of data, however, was based on the linear model illustrated in equation 2. The frequency of significant F tests was determined for the S × D interaction and for the main effect D. Factors S and D were fixed, whereas A(S) was treated as a random effect. Both the S × D interaction and the main effect D were tested over the D × A(S) term.
The rate of type I errors with only a single assemblage for each level of SR ranged between 0·795 (simple system) and 0·972 (complex system), for species drawn from a log-normal distribution (Fig. 3a,b). This problem was greatly reduced by including two replicate assemblages in the design, but the probability of rejecting a true null hypothesis was still larger than the nominal value of 0·05. Surprisingly, increasing the number of assemblages further increased the rate of type I errors. In contrast, the rate of rejection of the true null hypothesis was always very close to the nominal value of 0·05 when species were drawn from a uniform distribution.
Simulations showed inflated probabilities of type I errors in the presence of effects of DE (Fig. 3c). This effect was more pronounced when species were drawn from the log-normal distribution, when effects of ID were also present.
The proposed design was able to control for type I errors under a true null hypothesis of no effect of SR, in the presence of additional effects of ID and DE (Fig. 4). The relevant test, the S × D interaction, was insensitive to changes in the effect size of density, which was properly absorbed by the corresponding term in the analysis.
Simulations indicated that in systems where traits are unevenly distributed across species, nesting replicated assemblages within levels of SR may not necessarily control for the effect of ID. Rather, this problem became more severe with increasing number of replicated assemblages, due to the larger number of selections allowed by the treatment with the largest level of SR. Similar results were obtained in simulated experiments by Huston & McBride (2002), though patterns were driven by a specific mechanism of facilitative interactions among species in that study. Overall, these results contrast with the need to use large numbers of replicated assemblages in real experiments to increase statistical power for the main effect of diversity. This problem can be addressed by selecting species according to a constrained random selection method, rather than completely at random (Hector et al. 1999). When this restriction was introduced into the proposed design, which included four replicated assemblages in each level of SR, the rate of type I errors was maintained at the nominal level of α = 0·05 (Figs 3b and 4).
Simulations indicated that substitutive designs are not appropriate to tease apart an effect of SR in the presence of density-dependent effects. Separating these two sources of variability requires factorial designs where density and SR are crossed factors. An effect of SR is identified if the effect of increasing density (or biomass) due to the addition of new species is significantly larger than the effect of increasing density (or biomass) alone (with the addition of individuals of pre-existing species). This design effectively controlled for possible effects of DE in the simulated experiments, as shown by the significance of the corresponding term in the analysis (Fig. 4).
Despite the advantages discussed above, the proposed design may still have limitations. In particular, species that appear in the most diverse assemblages do not necessarily occur under low levels of SR (e.g. species Z, P, M, R and L in Fig. 1). Thus effects of ID could still dictate the outcome of this kind of experiment, although there was no evidence for this in the simulations. A second issue is the efficiency of the proposed design when used to examine the influence of more than two levels of SR. These experiments may become very expensive as more levels of SR are included, because each level must be crossed with two or more levels of density.
Addressing these additional issues requires further elaboration of the basic design and the definition of a general framework for the experimental analysis of biodiversity. This can be summarized in the following steps: (1) define the levels for factor SR, density and assemblage; (2) construct replicated assemblages for the lowest level of SR by sampling species randomly from a common pool of species; (3) construct replicated assemblages for a second level of SR by adding new, randomly drawn species to the original assemblages; (4) create as many levels of SR as required, each time adding new randomly drawn species to pre-existing assemblages in the immediate lower level of SR; (5) use the same combinations of species that served to increase SR at a given level to create new assemblages for the lowest level of SR, ensuring that all species occur in conditions of both high and low SR; (6) create different densities for each assemblage.
As an example of the procedure, consider an experiment with three levels of SR (three, six and nine species) crossed with two levels of density (nine and 15 units) and two replicated assemblages nested within each combination of SR and density (step 1 above; Fig. 5). The first two assemblages at the lowest level of SR consist of the random assembly of species C, S, T and B, F, R, respectively (step 2). The two assemblages at the intermediate level of SR are obtained by adding two new sets of randomly drawn species (L, Z, P and I, M, K) to those of the first and second assemblages of low SR, respectively (step 3). The two assemblages with nine species are created by adding two other sets of species (D, W, Q and Y, G, O) to the first and second assemblages of intermediate SR, respectively (step 4). Two new treatments of low SR, consisting of two replicated assemblages each, are then created with those sets of species that served to increase SR from three to six species and from six to nine species (step 5). At this stage the experiment includes three sets of assemblages at the lowest level of SR (three species each), one set at the intermediate level (six species), and one set at the highest level (nine species), with two replicated assemblages nested within each set. All combinations of species can then be established at the chosen densities to create the factorial structure (step 6; Fig. 5).
This design satisfies the logical requirements needed to tease apart the effect of SR from those of ID and DE. Because all species contributing to the intermediate and high levels of SR are also represented in the lower level, this design should have increased capacity to discriminate between the different components of biodiversity, compared with previous formulations. The statistical analysis of this design, however, requires considerable care in the way the total sums of squares are partitioned among the various sources of variation. A possible option is to specify sets of a priori contrasts that reflect specific hypotheses of interest (Table 1). Thus the mean square of the density × treatment interaction would be divided into three components: two fixed effects with one degree of freedom reflecting interactions between density and the contrasts of three species vs. others, and six vs. nine species, respectively, plus a random component measuring the interaction between density and the treatments with three species (Fig. 5a–c), with two degrees of freedom (Table 1). The latter comparison tests whether there are particular species (or combinations of species) within the less diversified assemblages that might be responsible for apparent effects of ID in tests contrasting increasing levels of SR. If this is not the case, then F ratios that test for the contrast between three species vs. other levels of SR can be constructed using the mean square of the term density × assemblage. If assemblages of low diversity are very different from each other, indicating that important species have been selected, a test of this contrast should be conducted with the mean square of the interaction between density and the treatments with three species as the denominator for F (Table 1). This would test for an effect of SR in more diversified assemblages, over and above possible effects of ID or SR already present in the less diversified assemblages.
Table 1. anova from the hypothetical experiment illustrated in Fig. 5
Source of variation
Expected mean square
Denominator for F
n, Number of replicate units.
a, levels of factor assemblage(treatment) (c = 2).
d, levels of factor density (d = 2).
t, levels of factor treatment (t = 5).
This test should be done only if the interaction density × three species vs. others is not significant, and assumes that is equivalent to . In this case, and if the term among treatments with three species can be eliminated from the model (not significant at α= 0·25; Winer et al. 1991), the test can be done using assemblage(treatment) as the denominator for F.
This test assumes that is equivalent to . In this case, and if the interaction density × treatments with three species can be eliminated from the model, the test can be done using the term density × assemblage(treatment) as the denominator for F.
This term can be compared with other terms resulting from the decomposition of density × assemblage(treatment) to test for the variance-reduction effect if species are sampled with replacement. Similar tests can be done by decomposing assemblage (treatment) if interactions with density are not significant. See text for further details.
It should be noted that examining interactions between SR and density in factorial experiments does not require an assumption of independence of these variables in nature. Indeed, SR and density (or biomass) are likely to be correlated in many natural settings. Factorial experiments artificially break the correlation among two or more predictor variables as a technical requirement for the analysis of possible relationships, which is accomplished through tests of interaction terms. Thus independence is necessary to make correct inferences from experiments, but does not mean that the predictor variables have to be independent in the real world.
A word of caution is also necessary on the nature of the contrast between six and nine species. This comparison may still combine effects of SR and ID, because three of the species that contribute to the treatment with the largest SR (D, W, Q and Y, G, O, respectively, for the two replicate assemblages in Fig. 5) are not represented in the treatment with six species. Although possible effects of ID should be controlled for by the treatments with three species, there is still the logical possibility that these effects become apparent only with particular combinations of species that are not represented at the lowest level of SR (e.g. species D in Fig. 5 becomes important only if it occurs together with species L due to, say, facilitative effects). A possible way to control for this effect would be to include replicate treatments with six species, as it is done for the lowest level of SR. In addition to increasing the costs of the experiment, this choice does not necessarily solve the problem because treatments with six and nine species would differ not only in terms of SR, but also in terms of the frequency with which each species appears in the different treatments. Because species would appear more than once in treatments with six species and just once in the treatment with nine species, the contrast between these treatments may include effects of both SR and ID.
Other features of the procedure require additional discussion. Steps 3 and 4 can be accomplished by selecting species either with or without replacement. Sampling with replacement results in the problem known as the ‘variance-reduction effect’, discussed at length by Huston (1997) and Huston & McBride (2002). Specifically, similarity among assemblages will increase with increasing levels of SR if species are sampled with replacement. This suggests that assemblages made up of few species will be more variable than those including several species. This effect is likely to be more severe if species are drawn from a limited pool.
Sampling without replacement, in contrast, generates distinct assemblages within all levels of SR (Fig. 5). The decision on how to select species, however, should not be dictated uniquely by considerations of the variance-reduction effect. Sampling without replacement may not be appropriate, or may not be feasible in systems with few species. This procedure may also prevent inclusion in the experiment of assemblages that are common in natural conditions. Thus sampling with replacement may add realism to the experiment. Furthermore, the influence of the variance-reduction effect can be investigated explicitly for the proposed design by comparing the mean squares of the interaction terms of density × assemblage across levels of SR (Table 1). Formal comparisons of these terms can be done with appropriate F ratios (Underwood 1997).
In terms of costs and logistical constraints, the proposed design would not require much more effort than is commonly invested in classical experiments on biodiversity. The specific example considered here (Fig. 5) would consist of 100 experimental units with n = 5, which is a tractable number even with the limited resources commonly available for this kind of study. Admittedly, as it stands the design does not enable comparisons across several levels of SR. This can be accomplished by replicating the first treatment of low SR (treatment a in Fig. 5) twice. The cost would be of 20 extra units (with n = 5), but this implementation would enable independent tests on variation among assemblages with three species and the analysis of differences among assemblages with three, six and nine species. Alternatively, an additional level of SR can be added to the experiment at the cost of 40 extra units (20 for the new level of SR and 20 to compensate for the addition of new species).
This design also guarantees true replication at all levels of diversity. That is, each assemblage would be established several times, in contrast to other experiments where identical sets of species may occur only within high levels of SR as a consequence of selecting species randomly for every unit (discussed by Huston & McBride 2002; Schmid et al. 2002). By introducing an additional level of replication, estimates of error variances should not be affected by the variance-reduction effect discussed above, making the data more tractable with conventional analytical techniques.
As is often the case in ecological experiments, statistical power may also be a problem in the proposed framework. Some of the key tests to detect effects of biodiversity may have low power, particularly if these tests use the interaction between density and the treatments with three species as the denominator for F, rather than the density × assemblage term that would involve a larger number of degrees of freedom. The judicious allocation of resources and the use of optimization procedures would assist in improving the efficiency of these experiments, as is the case for other complex experimental designs (Benedetti-Cecchi 2001).
Finally, though the discussion so far has focused on anova, it is worth noting that experimental designs based on the proposed framework can also be analysed using a regression-type approach. This would enable the analysis of trends and the exploration of response surfaces across several levels of SR and density of species. These analyses would provide estimates of parameters that could be used for modelling purposes, strengthening the dialogue between empirical and theoretical approaches to the study of biodiversity (Kinzig et al. 2001).
The proposed designs may offer several advantages over previous experimental analyses of biodiversity. First, they provide the opportunity to identify effects of SR unambiguously, even in the presence of effects of DE. This is accomplished by the test of the interaction between SR and density. Second, they control successfully for effects of ID while still enabling a test of the hypothesis that identity of species is important. This is provided by the test of the nested term assemblage and by variation among replicated assemblages at the lowest level of SR (either in interaction with density or in isolation). Third, they preserve the flexibility of more traditional designs to examine effects of different categorizations of biodiversity simultaneously (Hector et al. 1999; Emmerson et al. 2001). For example, number of species could be manipulated at each of several trophic levels to test the hypothesis that loss of biodiversity may have different effects depending on whether producers, consumers or decomposers disappear from the system.
It is important to note that the results of the present study are not suited to determine whether past experiments on biodiversity have been influenced by the biases discussed, or the extent to which these problems may affect real studies. For example, effects of ID may be more pronounced in simulations than under natural settings, where the pool of available species (and therefore the number of traits that can be sampled) is finite. The results presented here show that previously recognized problems and less appreciated biases in the experimental analysis of biodiversity are not solved with current designs. Discussions on the relationship between biodiversity and functioning of natural systems largely reflect a lack of consensus on the way these relationships have been investigated. Along with the identification of novel analytical procedures (Allison 1999; Loreau & Hector 2001; Petchey 2003), the development of more robust experimental designs is an important step to improve analyses of causal effects of biodiversity. The general framework proposed in this paper will contribute to these issues by providing ecologists with additional tools to investigate the relative importance of richness, density and identity of species in maintaining fundamental ecological processes.
I thank M. J. Anderson, F. Micheli and the anonymous reviewers for helpful comments and criticism on the manuscript. This paper is based on material presented at the High-Level Scientific Conference ‘Biodiversity of Coastal Marine Ecosystems: A Functional Approach to Coastal Marine Biodiversity’, supported by the European Commission. The completion of this paper benefited from discussions with colleagues at kick-off meetings of the MARBEF (Marine Biodiversity and Ecosystem Functioning) network of excellence.