*Daniel P. Faith, Australian Museum, 6 College St., Sydney, NSW, 2010, Australia. E-mail: email@example.com
Biodiversity assessment requires that we use surrogate information in practice to indicate more general biodiversity patterns. ‘ED’ refers to a surrogates framework that can link species data and environmental information based on a robust relationship of compositional dissimilarities to ordinations that indicate underlying environmental variation. In an example analysis of species and environmental data from Panama, the environmental and spatial variables that correlate with an hybrid multi-dimensional scaling ordination were able to explain 83% of the variation in the corresponding Bray Curtis dissimilarities. The assumptions of ED also provide the rationale for its use of p-median optimization criteria to measure biodiversity patterns among sites in a region. M.B. Araújo, P.J. Densham & P.H. Williams (2004, Journal of Biogeography31, 1) have re-named ED as ‘AD’ in their evaluation of the surrogacy value of ED based on European species data. Because lessons from previous work on ED options consequently may have been neglected, we use a corroboration framework to investigate the evidence and ‘background knowledge’ presented in their evaluations of ED. Investigations focus on the possibility that their weak corroboration of ED surrogacy (non-significance of target species recovery relative to a null model) may be a consequence of Araújo et al.'s use of particular evidence and randomizations. We illustrate how their use of discrete ED, and not the recommended continuous ED, may have produced unnecessarily poor species recovery values. Further, possible poor optimization of their MDS ordinations, due to small numbers of simulations and/or low resolution of stress values appears to have provided a possible poor basis for ED application and, consequently, may have unnecessarily favoured non-corroboration results. Consideration of Araújo et al.'s randomizations suggests that acknowledged sampling biases in the European data have not only artefactually promoted the non-significance of ED recovery values, but also artefactually elevated the significance of competing species surrogates recovery values. We conclude that little credence should be given to the comparisons of ED and species-based complementarity sets presented in M.B. Araújo, P.J. Densham & P.H. Williams (2004, Journal of Biogeography31, 1), unless the factors outlined here can be analysed for their effects on results. We discuss the lessons concerning surrogates evaluation emerging from our investigations, calling for better provision in such studies of the background information that can allow (i) critical examination of evidence (both at the initial corroboration and re-evaluation stages), and (ii) greater synthesis of lessons about the pitfalls of different forms of evidence in different contexts.
As workers in biodiversity conservation planning, we think about biodiversity as presenting us with the challenge of trying to ‘know the unknowable’. It is not possible in practice to know all the species present in any one place, let alone know all those in a given region in which biodiversity conservation planning is to take place. This recalls the early characterization of biodiversity as a symbol for our lack of knowledge (Wilson, 1988). Biodiversity conservation consequently presents us with the problem of dealing with ‘option values’. Option values relate to unknown, future, values, and imply that any measure of variation/diversity itself corresponds to a measure of value (e.g. Faith, 2003a). This perspective demands attention to what might be called the ‘hard’ problem of biodiversity assessment – quantifying patterns over all of biodiversity – and perhaps less attention to the ‘soft’ problem of quantifying those components of biodiversity that we happen to value right now.
The ‘hard’ problem requires that we use surrogate information in practice to indicate more general biodiversity patterns. Planning decisions, such as allocation of economic incentives (e.g. Faith et al., 2003), will often use estimated ‘complementarity’ values (marginal gains in the amount of biodiversity represented/protected when adding a site to an existing set of sites); effective surrogates must provide complementarity values that indicate or ‘predict’ the complementarity values we would obtain if we could measure all of biodiversity (Faith and Walker, 1996a; Faith, 1996).
It is not possible (at least here) to summarize the extensive literature on the evaluation of various biodiversity surrogates (see Margules & Pressey, 2000; Sarkar & Margules, 2002, for entry to some of the literature). The extent of this overwhelming, somewhat disconnected, literature perhaps provides a lesson in itself. Arguably, there has not been much synthesis of the lessons (if any) learned along the way. One reason may be that re-discovery and re-naming of surrogates approaches and tests has promoted a sense of novelty and progress, but in fact has done little for synthesis. But another fundamental reason for lack of synthesis may be that, in trying to develop ways to use surrogates to get to know the ‘unknowable’, there has been little attention to fundamental issues of epistemology and ‘growth of knowledge’. This may be because biodiversity assessment is a young discipline, or perhaps it arises because biodiversity conservation is so multi-disciplinary – extending from theory to policy, and from genes to global scales, and so on.
We can contrast this experience with that in another, closely related, biological subdiscipline – systematics – in which philosophy of science has played a big role (sometimes to its detriment; see Hull, 1999). Here again, the problem is to try to know the ‘unknowable’ (the true phylogeny of any group of taxa arguably will never be known), and so systematists have addressed questions about the nature of their evidence statements and their tests for phylogenetic hypotheses. While adoption of a standard Popperian falsificationist perspective on the nature of evidence arguably has not been successful in systematics (Faith & Trueman, 2001), our position is that Popperian philosophy remains useful, in systematics and elsewhere, through a shift in focus to the more constructive process of corroboration. Corroboration assessment simply asks, ‘is this supposed good evidence that we have observed for our hypothesis something that quite probably could have been observed even without the hypothesis?’. Thus, corroboration assessment looks for plausible alternative explanations for the evidence based on what Popper calls our ‘background knowledge’. We have some degree of corroboration for an hypothesis only when our best efforts to ‘explain away’ evidence nevertheless suggest that the evidence looks quite improbable without considering the hypothesis (for examples in systematics and other areas, see Faith, 2004).
The corroboration assessment process naturally is all about scepticism: not only the hypothesis may be questioned and re-evaluated, but also the evidence and the background knowledge. It is this constructive scepticism that we will turn to below in examining the tests and evidence presented in Araújo et al. (2004) for the evaluation of a specific surrogates strategy, ‘ED’ (Faith & Walker, 1993, 1996a,b,c).
First, we will briefly review the ED surrogates approach (and try to circumvent potential confusions caused by the Araújo et al.'s re-naming of ED as ‘AD’). We then examine the specific evidence relating to ED surrogacy that was introduced by Araújo et al. (2004), showing how careful consideration of their non-corroboration findings suggests weaknesses in the evidence for ED surrogacy that was put forward. We will also examine their implicit background knowledge – based on randomization – and discuss weaknesses in this aspect of their evaluations as well.
What is ‘ED'?
‘ED’ stands for ‘environmental diversity’ and refers to a specific surrogates framework (Faith & Walker, 1993, 1994, 1996a,b,c; Faith, 1994, 2003b) based on (i) robustness of compositional dissimilarities and ordinations to an assumed model of unimodal species’ responses to environmental gradients, and (ii) the consequent use of p-median (and related) optimization criteria to measure biodiversity. The ED method typically uses species surrogate data as a starting point (Faith & Walker, 1996b; Faith & Ferrier, 2002a; Faith, 2003b and references within), but, as the name suggests, sees the critical link to a pattern that reflects underlying ‘environmental diversity’ as the basis for using those species data to make assertions about biodiversity more generally.
Environmental diversity is one example of a general pattern framework for exploring biodiversity (‘PD’ or phylogenetic diversity, is another example; see, for example, Faith et al., 2004). In this pattern approach, ‘objects’ of interest form a pattern, allowing inferences about diversity at some lower level of ‘features’. For ED, the objects (typically) are sites (geographical places), and pattern-relationships among these objects, as represented by an ordination pattern, are assumed to indicate relative numbers of underlying features, corresponding to species or similar units. The pattern (the ordination or sometimes the matrix of pairwise dissimilarities/distances among sites), interpreted as reflecting underlying environmental variation, provides predictions of the degree of biodiversity complementarity of a site to any given set of sites. A site that is environmentally similar to sites already in a set is expected to have lower complementarity. Because this similarity is not calculated by directly observing the features of interest, but only by examining the pattern, this is a form of ‘pattern-based complementarity’.
Critical to ED is the way in which ‘environmentally similar’ is interpreted. The general unimodal response model (for brief review, see Faith, 2003b) leads directly to the use of the p-median criterion (Faith & Walker, 1993, 1996b). Identifying the set of p sites (out of all N) that has the best ED value is an example of a ‘p-median’ problem. A p-median in general calculates the distances from all ‘demand points’ to their nearest ‘locality sites’. For the ED method, the demand points can be derived by defining each of the original N sites as a point that must be represented (the ‘discrete p-median’ version of ED, used, for example, on the original dissimilarities matrix or distances estimated from an ordination or other analysis; Faith & Walker, 1993, 1994). The ‘continuous p-median’ version of ED is based on the assumption that the demand points can be found throughout the continuous environmental space. Faith & Walker (1996b) demonstrate that only for this continuous case is there a clear link from the unimodal model to an interpretation of the p-median as counting-up numbers of species. This link depends on a robustness argument and the consequent rationale for (i) a specific choice of biotic/compositional dissimilarities among sites (Bray Curtis dissimilarities as most robust; see Faith et al., 1987) and (ii) the choice of hybrid multi-dimensional scaling (‘HMDS’; Faith et al., 1987) as the ordination method that links dissimilarities to environmental space.
When the goal is representation of species, the model and associated robust methods imply that representation is maximized if and only if the (continuous) p-median is used (for proof see Faith & Walker, 1996b). However, when other factors are introduced, the model implies modified criteria. For example, when we assign probabilities (of expected species persistence or ‘presence’) to sites (see Faith, 1995; Faith & Walker, 1996c), the p-median, which strictly depends on nearest neighbours, is relaxed, and the total estimated diversity now depends on summation over ordered nearest neighbours. Suppose for any demand point, x, we are given ordered nearest neighbour sites i,j,k… with probabilities of absence or extinction (of member species) equal to pi, pj, pk, etc. Let dxi = distance or dissimilarity of x to site i, etc. Then the expected number of species lost, among all those associated with demand point x, is given by:
and the total probabilistic-ED value is equal to the sum over all demand points, x, of Lx values. When probabilities are all 1 or 0 (a site is ‘selected’ or not) the formula is consistent with p-median versions of ED. For example, if the probability of extinction of member species at site i is 1 (it was not ‘selected’) and at j is 0 (it was ‘selected’), then the formula behaves as if j is the nearest (selected) neighbour of demand point x in the calculation of the p-median for ED. Probabilistic-ED is useful, for example, when the ‘environmental’ space is geographic space (‘geo-sampling’ of Faith & Walker, 1994), the probabilities are derived from species–area or accretion curves, and the goal is to identify new survey sites to increase overall regional survey representation of species.
Applications of ED for identifying survey sites were pioneered by Ferrier and colleagues in NSW, Australia, and elsewhere (Ferrier, 2002). This applied context has lead to important extensions of ED. One is the improved provision of predicted dissimilarities (and distances) between all sites in a region, as a way to address the long-standing problem of biotic data that is not available at all sites of interest in a given region. This method builds on the idea of regression-type modelling of known compositional dissimilarities using environmental distances. As a modification of strictly linear dissimilarity/distance regression, ‘generalized dissimilarity modelling’ (GDM), is a nonlinear method allowing for non-constant rates of turnover of species abundance along gradients (Ferrier, 2002).
Ferrier et al. (2002) argued that such ‘predictions from modelling of compositional dissimilarity could be used more directly to prioritise and select conservation areas, by employing techniques such as the environmental-diversity (ED) approach…’ S. Ferrier and colleagues (unpublished data) recently have applied GDM and a variant of ED at the global scale, illustrating how biotic data from a wide variety of sources, when modelled through GDM, can provide dissimilarities/distances among all sites, which then are interpreted using ED methods.
While ED may use dissimilarities (or distances) directly, through the discrete version of ED (Faith & Walker, 1993, 1994; Faith & Walker, 1996a,b,c), the predicted dissimilarities or distances information from the GDM approach may be used most effectively in the continuous ED method. We present a simple example application in the following section, and will refer to it again in later discussion of Araújo et al.'s analyses.
The Panama example
A dramatic example of the strong links, provided by GDM, from environmental data to dissimilarities is the analysis (Faith & Ferrier, 2002a,b) of biotic (plant species) and environmental data from Panama. Here, we extend these results to illustrate how ED that can enable use of this information in conservation planning.
As background, Duivenvoorden et al. (2002) examined environmental and spatial explanations of beta diversity for these Panama data (see also Condit et al., 2002) and concluded that the environmental data was a poor explanator of beta diversity: ‘59% of the variation in species dissimilarity remained unexplained by either distance or environment’. Faith & Ferrier (2002a,b) argued that the poor explanation was a consequence of the restrictive assumption of a uniform linear relationship between dissimilarity and distance. They used GDM to explore the prediction of dissimilarities based on environmental (and spatial) distances, and found in contrast that species turnover (expressed as Bray Curtis dissimilarities among sites) in Panama can be explained well by a combination of spatial and environmental distances. An impressive 83% of variation in Bray Curtis dissimilarities could be explained by a GDM model applied to the same data that had been used as a prime example of poor explanatory power of environmental variables.
In demonstrating that realistic nonlinear dissimilarity/distance models can account for beta diversity patterns, these results highlight the prospects for using available environmental data in combination with available species’ survey data (e.g. from museum collections), in a way that overcomes the weaknesses of using either information on its own (see also Faith & Walker, 1996b; Ferrier, 2002). In Panama, a next step might be to calculate an ordination space based on estimated dissimilarities or distances among all sites. As an initial exploration of this step, Faith & Ferrier (2002b) used the calculated dissimilarities to create an HMDS ordination for ED analysis, applying patn software (Belbin, 1994). Based on this ordination, we present here (Fig. 1) a simplified ED analysis for the Panama plant survey data, using only the first two dimensions of the six dimensional HMDS space.
As preparation for the continuous version of ED, the boundaries of the demand points must be defined in the environmental (ordination) space, typically using convex hull options or user defined boundaries (Faith & Walker, 1994, 1996b, discuss procedures for considering assumptions about the boundaries of the space and weighting schemes applied to demand points addressing possible variation in species richness over the space). In Fig. 1a, diamonds represent a hypothetical example of the designation of demand points in the space. In Fig. 1b, 100 Panama sites are shown as dots in this HMDS space, where larger dots indicate the first five sites selected under the simple stepwise selection option in diversity-ed v. 2.1 (for worked examples of site selection, including non-‘greedy’ algorithms, see Faith & Walker, 1994).
Figure 1c shows the curve tracing successive gains in expected species representation, corresponding to reduction in ED's p-median value, as the five sites are selected. This is a simple form of a trade-offs curve for planning in which cost of additional sites may be balanced against biodiversity gains (for examples, see Faith, 1995). Here, ED may be used without any targets (e.g. percentage targets) set for biodiversity; instead, the total budget, sensitivity analysis, or other factors may indicate the final set. Using such curves, future planning analyses in Panama might identify cost-effective conservation options based on filling gaps in ordination space. Because the ED approach provides complementarity estimates, in practice, actual applications may focus less on sets of sites and more on identifying sites of high ‘ED-endemism’– sites having high complementarity even when all other sites are already selected (the ‘complementarity hotspots’ defined by Faith & Walker, 1996a).
The Panama analyses using GDM and ED illustrate the potential for bringing together environmental and available biotic data, and so serving the original stated goal for ED of making ‘best-possible’ use of both kinds of information (Faith & Walker, 1996b).
Araújo et al.'s re-naming of ED
While ED applications, particularly to survey design, have been pursued, there has been little formal evaluation of ED after Ferrier & Watson (1997) and Faith & Walker (1996a). Araújo et al. (2001, 2004) therefore provide welcome evaluation efforts in their ED analyses of European biotic and environmental data. However, the reader may have difficulty linking their described methodology to the existing ED literature. The first problem is that Araújo et al. (2004) appear to have used ‘ED’ as a general term to refer to analyses based on any form of environmental variation. The second problem is that they refer to the method that they explore, which is equivalent to ED, as ‘AD’, stating that ‘The assemblage diversity (AD) approach used in this paper is in effect the biotic version of the environmental-diversity framework (ED) proposed by Faith & Walker (1996b)’. This might be interpreted wrongly to mean that AD is being introduced as a novel, biotic, extension of ED. The brief review of ED above demonstrates that ED, in typically using biotic data for its p-median optimization, subsumes their specific approach.
It is ironic that, in a paper whose primary focus is on applying and evaluating the ED method, the method was given another name, and the ‘ED’ term was used instead to refer to other things. Names aside, we might expect that the consequent weak connections to previous ED literature might obscure previous lessons. For example, Araújo et al. propose alternatives to HMDS and Bray Curtis dissimilarities with no discussion of the robustness derived from these specific choices as critical to ED. Further, proposals are made, without reference to past work, for ED strategies that in fact are already being applied. For example, Araújo et al.'s ‘future prospects’ include ‘calculating p-median solutions in the original dissimilarity matrices’, considering ‘hybrid approaches where available biological data are combined with environmental descriptions of areas’, and a ‘candidate strategy deserving appropriate evaluation is the use of GDM’. All of these ED strategies have been explored in the past and continue to be applied (see above discussion).
More important are possible misunderstandings in Araújo et al.'s approach to evaluation of ED. We argue below that there are indeed serious weaknesses in their evaluation of ED, and that these can be understood through corroboration issues arising from their choice of evidence and background knowledge.
Corroboration and evidence
We can interpret Araújo et al.'s measures of ED's degree of representation (recovery) of target species as evidence for an hypothesis that ED has surrogacy value. Further, their comparison with recovery achieved from randomized selection of sites can be seen as one way to potentially ‘explain away’ supposed positive evidence from some observed level of recovery for ED. For example, a given recovery level might be shown to be not so improbable (say > 0.05) even when selecting sites at random.
Araújo et al. found that ED recovery was sometimes significant, sometimes not. But this finding of only weak corroboration (or non-corroboration) properly may lead to doubts about the evidence put forward, or the nature of the alternative explanations (background knowledge), as much as about the hypothesis itself. We begin with some concerns about the presented evidence for ED, arising from Araújo et al.'s use of discrete ED.
Discrete vs. continuous ED
One limitation of the Araújo et al. study appears to lie in the use of a less effective version of ED, based on the discrete p-median (see also Faith & Walker, 1996b and discussion in Faith, 2003b). Use of a less effective implementation of ED clearly would affect the capacity to draw conclusions about the true performance of the method.
The contrast between the two versions of ED can be seen using a hypothetical HMDS ordination space (Fig. 2). In this space, different geographical places (sites) are represented as dots; the 10 large dots indicate already protected sites and small dots are candidate sites for selection. Suppose that one of these candidates is to be selected so as to maximize expected gain in species representation. The contrast between discrete and continuous ED can be seen in the differences in ratings of site ‘A’ vs. sites around ‘B’ in the space.
We used the software package, diversity-ed (version 2.1; Faith & Walker, 1994) to perform both versions of ED analysis on these data. Continuous ED (Fig. 2a) recognizes A as filling a large representation gap. It has an ED complementarity of 222 units as compared with complementarity values ranging from 10 to 23 units for the best sites within the B region of the space. In contrast (Fig. 2b), under the discrete ED method, site A is not recognized as having high complementarity. Instead, several sites within B are identified by this method as the ones providing the greatest increase in representation.
A conclusion is that, while ED sees the environmental space as reflecting turnover among species, geographical space can disrupt this information when the discrete version of ED is used as in Araújo et al.'s (2004) study. That is, any extensive geographical duplication of essentially the same portion of environmental space causes a bias through the consequent redundancy of p-median demand points. This problem is related to the weakness of discrete ED documented in Faith & Walker (1996b), who recommended continuous over discrete ED.
This problem may be a serious one for some real-world data. We recently examined the Australian Museum's skink (the lizard family Scincidae) distribution data as part of a larger ED study. In NSW, Australia, this collections data base consists of 83 species over c. 3500 grid cells in NSW. We found that 1875 of these cells were identical to one or more others in their species composition, with large numbers of duplications occurring in the more arid western portion of the state. This is in accord with the finding (Faith et al., 2003) of large numbers of near-identical cells based on environmental data.
This cautionary tale does not mean that we would never see a case where a cluster of sites in some environmental space is highly heterogeneous in species composition, such that repeated sampling from the cluster would increase representation. Of course, such a case implies that space has not modelled turnover in species composition well. The in-principle possibility of poor models can never be ruled out. This situation will arise, for example, when a small number of environmental variables are used to derive a low-dimensional space and key environmental variables are missing [see discussion of Araújo et al.'s (2001) analysis in Faith, 2003b].
We have highlighted a weakness of discrete ED that may mean that ED-based recovery of species in the Araújo et al. study was less that it would have been under the recommended continuous version of ED, so unnecessarily biasing results towards non-corroboration. The next section considers ordination results from Araújo et al.'s study and casts further doubt on the evidence underlying their weak corroboration findings.
If input Bray Curtis dissimilarities are not well modelled by ordinations, then ED applied to that poor ordination (or its implied distances when discrete ED is used) will provide only poor evidence for the surrogacy hypothesis.
Our brief review of ED indicated that the ED model depends upon the robustness gained from using HMDS, Bray Curtis dissimilarities and the continuous p-median. Evidence for ED performance therefore logically is linked to these factors. Araújo et al. did not use HMDS. However, our conjecture is that the use of NMDS (‘non-metric’ MDS) may have had a relatively small effect of the success of the ordination models. NMDS would present problems if a poor starting configuration were used, from which NMDS iterations cannot ‘escape’. But it is not possible to judge from the Araújo et al. analyses whether this should be a concern, as analysis details on starting configurations, number of iterations, stopping rules and so on were not provided.
We will focus then on Araújo et al.'s results from the NMDS ordinations as approximations to HMDS and ignore their results based on the less robust ordination methods (for background discussion on other methods see Minchin, 1987). In this context, we suggest re-consideration of one of Araújo et al.'s major conclusions. They argued, ‘we showed that using a particular implementation of the assemblage-diversity idea for one assemblage could provide a suboptimal representation even for species belonging to the assemblage being sampled’, but in fact NMDS-based ED produced significant departure from random in every one of these cases (see their Table 3).
The NMDS-based ED therefore produced some encouraging results, but is the evidence presented for ED as good as it could have been? We have argued that the use of discrete ED raises concerns. Our other main concern is in the search for optimal MDS solutions. There are two optimality issues: the optimality of the solution for any nominated dimensionality and the optimality of the choice of number of dimensions. We explore these issues by examining three aspects of the presented results that raise concerns.
(1) No difference in ‘stress’ (MDS ‘badness-of-fit’ of dissimilarities to output distances) was found on two occasions between two- and three-dimensional solutions: ‘the addition of a third axis did not always improve the method's ability to represent the original ranked distances in ordination space. This was the case for breeding birds and mammals that showed constant minimum stress values after 10 simulations for both the 2- and 3-D ordinations’ (emphasis added). Such a result is exceedingly improbable in correct MDS analyses. As an indication of the more typical MDS results that can be expected, for the Panama analyses we see that three dimensions further increases explanatory power, with an associated lower stress value (Fig. 3). Data sets with more species and more sites, as for the European data, would be expected to produce even greater stress reductions.
(2) In the table in their Appendix 1, they refer to 10 iterations used for each MDS. This number would be much too small for anything but a trivial data set. Our guess is that this is a miss-representation of the analyses. What appears to have been number of random starts or ‘simulations’ (see above) may have wrongly been called ‘iterations’. Unfortunately, this possible clarification does not escape difficulties. Only 10 random starts may have been used in the search for the optimal MDS solution; this might be adequate for typical MDS analyses with about 100 sites (although even for the Panama analysis of 100 sites we sometimes required 100 random starts), but is unlikely to find good solutions with 2500 sites. For example, in analyses of insect species distribution data for about 400 sites in the wet tropics of Queensland, Australia, (K. Richardson et al., unpublished data) the number of random starts was important. For 10 HMDS runs, the lowest observed stress was 0.3240, but in 100 runs, the lowest stress was 0.3163. Using only 10 runs missed the more optimal solution.
(3) A low degree of resolution of MDS stress differences, to only two decimal places, is reported by Araújo et al., suggesting that alternative MDS solutions are not well discriminated.
There may be an interaction between the small number of runs and the low resolution of stress values. Returning to the Queensland example, we note that if only two decimal places had been used to resolve stress, we would have been happy with the less optimal solution. Further, this stress value would have been seen in nine of the 10 runs, incorrectly suggesting that no additional runs were needed.
Not only does this suggest that the optimum for two-dimensional NMDS runs may not have been found, but also it helps explain why on occasion three dimensions showed the same ‘constant’ stress value. With this low resolution, stress would provide misleading indication of constancy, even while additional dimensions in fact would have improved explanation of dissimilarities. Put another way, if we cannot detect stress value improvement with additional dimensions when we have, say, 100 species, then we have not measured stress with enough resolution. This will be more critical for larger data sets. The number of sites in Araújo et al.'s study was about 2500, producing about 3,100,000 dissimilarities used to compute stress. With many sites, a dramatic change in one site might correspond to a relatively small numerical change in stress.
We conclude that some combination of a low number of random starts, combined with low resolution of stress values, may well have created less optimal solutions in one or more of the NMDS runs in the Araújo et al. study. Naturally, ED will not be able to recover diversity patterns if key ordination dimensions are missing or obscured. As a contrast, consider an ordination summary for the Panama analyses (Table 1). It shows how increased HMDS dimensions for these data pick up correlates with measured environmental variation (Faith & Ferrier, 2002b, discuss evaluation of the number of significant dimensions). Such a link is of course critical. These environmental variables (combined with geographical distance) accounted for a very large portion of variation in Bray Curtis dissimilarities among sites (Faith & Ferrier, 2002a,b). Our explanation of biotic turnover would be much restricted if we were to use only two dimensions (and would be even worse if the two-dimensional solution was not well optimized).
Table 1. Projections of maximum correlation for five environmental variables in six-dimensional hybrid multi-dimensional scaling ordination space
Projections are given by coefficients along each dimension, with those values greater in absolute value than 0.5 highlighted in bold. Last column gives overall product–moment correlation (n = 100) of each variable along its projection.
E-W, measure of position of site along east–west direction; N-S, measure of position of site along north–south direction; Rain, mm total annual rainfall; Alt, altitude above sea level; Age, forest age. For further information see Faith & Ferrier (2002a,b) and supplementary material within Condit et al. (2002).
Both the optimality of the two-dimensional solutions and the exploration of additional dimensions are doubtful in Araújo et al.'s study. We conclude that the evidence presented by Araújo et al. for ED surrogacy may have unnecessarily favoured non-corroboration results both because of the use of discrete ED and the use of ordinations providing possible poor explanation of input dissimilarities.
Corroboration and randomizations
We now turn to ‘background knowledge’ and the role of randomizations in evaluating ‘improbability’ of evidence. Evidence for ED surrogacy, based on its recovery of some target set of species has in the past been contrasted with a ‘chance alone’ scenario, for example in the early ED study by Ferrier & Watson (1997). A degree of Popperian corroboration can be claimed when we have evidence that could not have been simply a product of chance (see Faith, 2004).
An interesting aspect of Araújo et al.'s study is that their weak evidence based on discrete ED appears also to have made it more likely that chance alone could match the result, delivering low corroboration. In cases as in Fig. 2, where there are many near-duplicates (geographical redundancy) of sites, discrete ED samples of sites will have similarities to random selection of sites – a cluster of, say, 1000 near-identical sites (out of say 2500) frequently will be sampled by chance and also will be sampled well by discrete ED. In contrast, application of continuous ED would produce sets of sites different from both discrete ED and random selection. Thus, expected recovery of species based on continuous ED should not only be better than that for discrete ED (for the reasons discussed above), but the results should be more distinct from random. The weakness of discrete ED may make it more like random selection, while the strength of continuous may make it less like random.
Geographical redundancy is one way that clusters of sites in ordination space can occur, but another way in which clusters of sites may occur is artefactually through sampling biases – sites with few species may cluster together for MDS and other methods. Faith (2003b) raised concerns about unequal sampling effort for the European data, pointing out that sites selected by ED in the Araújo et al. (2001) study sometimes would not have their true complement of species recorded, so artefactually deflating recovery values. Araújo et al. (2003) acknowledged that there is inconsistent sampling for the European data, saying ‘…a more realistic bias is that of clustering in recording effort…some southern European countries…are known to have a more limited tradition of natural history, hence less intensive and carefully designed surveys than some central and northern European countries…Poorly sampled areas and taxa are found in particular areas, rather than across the entire European environmental gradient’.
Thus, we can expect in some places poor sampling across all taxonomic groups, and in other places good sampling across all taxonomic groups. What outcome do we expect under these circumstances? Araújo et al. (2003) go on to say, ‘There is no evidence that a clustered bias in recording effort should affect recovery of species with ED more than it does with random data sets’. We agree that ED and random selection will respond to this problem similarly, for the case of discrete ED. We therefore see this issue as relevant to the application of discrete ED to ordinations in Araújo et al. (2004). Discrete ED and random selection will sample these clusters of species-poor cells in accord with their frequency, selecting sites on occasion that artefactually are credited with little species recovery. Discrete ED selection of sites will to some extent mimic aspects of random selection.
In contrast, for species-based complementarity sets, the bias will be the other way; sites with members of one group will be more likely to have members for other groups as well, as an artefact of the acknowledged sampling biases (Faith, 2003b, discussed a similar problem arising in the study of Lund & Rahbek, 2002). Complementarity sets in the study of Araújo et al. (2004) therefore artefactually will have looked good relative to discrete ED and random selection. We conclude that sampling artefacts will simultaneously make recovery rates for discrete ED look like random sets (and so non-significant) and recovery rates for complementarity sets look significant.
Thus, the apparent corroboration for species surrogates under complementarity sets selection appears to depend on evidence that, upon closer inspection, to some extent can be ‘explained away’. At the same time, the presented evidence for ED evaluation may have been unnecessarily weak.
The ED surrogacy, when properly applied, might be corroborated through randomizations. But what is the proper form of evidence for such randomizations? We note that when the evidence is recovery of species based on selection of sets of a fixed size, as in Araújo et al.'s study, significance means little, as it depends on the nominated size of the set selected. Any surrogate, even a perfect one, will produce a non-significant result at some point as the set size increases. Thus, conclusions drawn from randomizations, as recommended by Araújo et al. (2004), may be misleading, even without sampling artefacts.
A better approach may be that developed by Ferrier & Watson (1997) in which sets of all sizes are examined. But we suggest consideration also of a form of evidence that may better reflect the challenges in practice faced by any biodiversity surrogate – the prediction of overall biodiversity complementarity values. After all, few complete computer-generated sets of sites are ever selected in practice; instead, decisions often will face assessments of the marginal gains provided by individual sites (see Faith et al., 2003). Evaluation of the degree of prediction of overall complementarity values therefore has been promoted as a basis for evidence for surrogacy (Faith & Walker, 1996a; Faith, 1996): ‘For a ‘given’ set of protected areas, the complementarity value of an additional area(s) is calculated for the nominated surrogate information and also for the test data…The correspondence between the predicted complementarity values from the surrogate and the ‘real’ complementarity values is assessed over a range of given sets and additional areas’ (Faith, 1996). Randomisations applied to this form of evidence hold promise (possibly combined with other interpretive approaches described in Faith, 2003b, for evaluating ‘effect size’). Ongoing corroboration assessments of continuous ED, based on a variety of forms of evidence, hopefully will reveal strengths and weaknesses of this surrogacy strategy.
Of course, it remains the case that a very good surrogates strategy on occasions will not receive corroboration from a given data set. A danger then is that this evidence, simply taken at ‘face value’ as a discouraging result, can be selectively used (with other, positive, evidence perhaps ignored) by those trying to make a case against that surrogates strategy (e.g. Brooks et al., 2004, refer to the supposed negative results on ED surrogacy in Araújo et al., 2001, but do not refer to the positive evidence for ED in Faith, 2003b).
Non-corroboration based on any one data set sometimes may be important as an opportunity to learn about the pitfalls to be found in different kinds of evidence and background knowledge. For example, results from that earlier ED study (Araújo et al., 2001) raised doubts about the evidence, based on the type of ED that was used, the derivation of the environmental space, and the treatment (or not) of sampling biases (Faith, 2003b).
This scepticism about evidence cuts both ways – a corroboration finding of course does not mean that the surrogacy hypothesis is ‘true’. As was the case for the apparent corroboration for species surrogates in Araújo et al. (2004); closer inspection can always reveal other factors that may account for the apparent good evidence.
We have recommended a sceptical view of the evidence for surrogates presented in Araújo et al. (2004). However, while there may be little in the way of firm conclusions to draw about surrogacy from their study, it is rich in lessons about the process of corroboration assessment of surrogacy hypotheses. One lesson is that the corroboration and re-evaluation processes can be frustrated by lack of information. While Araújo et al. (2003), in responding to Faith (2003b), called for a ‘quest for evidence’, what is needed also is a quest for corroboration assessment. Evidence is only part of the story, and on its own means little in support of a given hypothesis of surrogacy. But corroboration assessment is not easy, as a corroboration framework demands lots of background information in order to probe the evidence (both at the initial corroboration and subsequent re-evaluation stages). In the present context, we needed to know much more about how the ordinations were derived, more about the sampling biases, more about the possible effects of nominated set sizes, and so on.
A related lesson is that we can, over the course of different studies, accumulate experience about the pitfalls of certain kinds of evidence for surrogates like ED (as long as links with the past are not lost through re-naming). In this way, we can, in new studies, better put forward evidence for a surrogate that is not easily explained away. For ED, experience now suggests extreme caution in the use of evidence based on poor ordination models, discrete p-median, and species data with sampling biases. This learning process may provide the needed synthesis that is lacking in much of biodiversity assessment.
G. Carter assisted in provision of Australian Museum skink collections data as part of a larger project on ED analyses of these collections in NSW, Australia. We thank the Editor of the Journal of Biogeography for the invitation to submit a response to Araújo et al.'s paper. We thank the K. Richardson, Kristen Williams, and the Cooperative Research Centre for Tropical Rainforest Ecology and Management for funding. Colleagues at NCEAS have collaborated on extending the ‘predicting complementarity’ approach.
Daniel P. Faith is a Principal Research Scientist at the Australian Museum, Sydney. His research interests are in the theory and applications of quantitative biodiversity assessment, extending from the scale of genes to whole countries. Research interests also focus on the best-possible use of Museum collections in regional biodiversity assessment, and on the links to sustainability and economics. A phylogenetic component of his biodiversity research is based on investigations of ‘phylogenetic diversity’ and conservation. Other work in phylogenetics concerns development and application of phylogenetic methods, philosophy of science and editorial work for Systematic Biology (see web pages at http://amonline.net.au/systematics).
Simon Ferrier is a Principal Research Officer in the NSW Department of Environment and Conservation. His research interests include biodiversity conservation planning and assessment, including development of landscape-based methods and development of biodiversity surrogates strategies that combine environmental and biotic data.
Paul Walker is the Group Leader of ‘Urban and Regional Development Futures’ within CSIRO, Sustainable Ecosystems, Canberra. His research interests include application of systems approaches to assessing options for future growth and development in Australian communities. He is co-developer with D.P. Faith of diversity software for conservation planning, including ed, pd and target.