A successful community-level strategy for conservation prioritization


  • Anni Arponen,

    Corresponding author
    1. Metapopulation Research Group, Department of Biological and Environmental Sciences, University of Helsinki, PO Box 65, FIN-00014 Helsinki, Finland; and
      *Correspondence author. E-mail: anni.arponen@helsinki.fi
    Search for more papers by this author
  • Atte Moilanen,

    1. Metapopulation Research Group, Department of Biological and Environmental Sciences, University of Helsinki, PO Box 65, FIN-00014 Helsinki, Finland; and
    Search for more papers by this author
  • Simon Ferrier

    1. New South Wales Department of Environment and Climate Change, PO Box 402, Armidale, NSW 2350, Australia
    Search for more papers by this author

*Correspondence author. E-mail: anni.arponen@helsinki.fi


  • 1Regions of the world with highest biodiversity and greatest conservation needs are often simultaneously data poor. An effective surrogate strategy would be invaluable for conservation prioritization in such regions. Large-scale environmental data are readily available, but the effectiveness of environmental surrogate strategies for conservation planning has not been confirmed. In this study, we compare a range of such strategies.
  • 2Environmental surrogacy is based on the idea that by covering a wide range of different environmental conditions, one also achieves high species representation. The effectiveness of this strategy may be enhanced by using community-level modelling techniques, in conjunction with best-available biological (species distribution) data, to calibrate the relationship between environmental gradients and community richness and composition. We develop a novel approach, called maximization of complementary richness, which accounts for gradients in species richness and non-constant turnover rates of community composition in environmental space.
  • 3We show that our novel technique can achieve notably higher species representation than what is achieved using past approaches. Simple strategies, such as direct use of environmental data only or environmental clustering, achieved species representation only slightly better than random selection of sites. P-median selection from ordinations of community composition had intermediate performance. Performance of our new maximization of complementary richness technique was closer to the representation levels of optimal reserve networks than to random selection of sites.
  • 4Synthesis and applications. We found that there are three critical components that are relevant for success: a good model for community turnover (compositional dissimilarity), a good model for species richness, and a selection procedure that appropriately utilizes both turnover and richness information. By taking these into account, one can achieve reasonable levels of species representation. We conclude that using surrogates based on community-level modelling is a highly promising strategy for cost-effective conservation prioritization.


Lack of data on species distributions is a major problem in conservation planning. Often, extensive surveys are impossible due to limited resources and time, and some kind of shortcuts must be taken. Environmental variables are probably the most easily and cheaply available surrogates. Remote sensing data are also becoming more readily available in increasing resolution, and numerous biotic and abiotic variables have been mapped extensively, even globally (Global Soil Data Task Group 2000; NOAA Satellite and Information Service 2007). An effective surrogate strategy to conservation planning using such data would be invaluable especially in poorly surveyed biologically diverse regions (Myers et al. 2000).

Surrogate strategies employing environmental data alone are based on the idea that by selecting conservation areas covering a wide range of environmental conditions, one achieves comprehensive species coverage, including for those species with scarce or missing observational data. The effectiveness of such strategies may be enhanced by using community-level modelling techniques (Ferrier & Guisan 2006), in conjunction with best-available biological data (species distribution data), to calibrate the relationship between environmental gradients and community richness and composition. Applications of environmental and community-level surrogate strategies to conservation planning have produced both encouraging (Fox & Beckley 2005; Sarkar et al. 2005; Trakhtenbrot & Kadmon 2005) and discouraging results (Araujo et al. 2001; Araujo, Densham & Williams 2004; Bonn & Gaston 2005). This is at least partially because there are many available surrogate strategies (Supplementary Material Table S1) that are bound to give different results.

In the following, one should notice that the discussed surrogate strategies have fundamental differences. For example, they may (i) use only environmental or also species distribution data, (ii) utilize species richness information, (iii) model community turnover in environmental space, (iv) use linear or nonlinearly scaled environmental space, and (v) classify sites before selection or operate directly in environmental space. Despite these differences, the performance of these strategies can be objectively compared using simulated data. We compare a range of simpler methods to the complementary richness maximization technique that we develop. Figure 1 summarizes different approaches to the use of environmental and species data in conservation prioritization.

Figure 1.

Schematic of different types of data (diamond boxes on the left), different modelling approaches (square boxes), and different conservation area selection approaches (ovals) with examples of specific techniques in italic. Thick arrows connect different types of data to modelling approaches, and thin arrows connect raw data or modelled data to area selection approaches. Arrows are black for the combinations we used in our analyses, and white for methods we did not explore.

using environmental data alone in conservation prioritization

Environmental data are usually quite cheap, easy and rapid to acquire compared to biological distribution data. The simplest implementation of the principle of environmental surrogacy is to select sites evenly across environmental gradients without relating them to species distributions. Data requirements are minimal, but variation in species composition may not be captured to any reasonable extent. If variables are not selected, scaled and weighted appropriately, variation in environmental space may not translate to variation in community composition. This general principle of environmental representation has been employed in many studies aiming at even coverage of sites from, e.g. different land-cover classes, habitat types, or ecoregions (Belbin 1993; Faith et al. 2001). The aim can be environmental representation in itself, but in environmental surrogate studies, the surrogates are expected to reflect species distributions, aiming at species coverage in a conservation area network, according to the principle of complementarity (Pressey et al. 1993).

An even selection of areas across environmental gradients can be achieved, for example, by using the p-median (see Methods; Faith & Walker 1996). Another alternative is to employ statistical classification techniques, such as cluster analysis, to derive groups of similar sites, from among which sites are selected evenly, or randomly (Belbin 1993; Kirkpatrick & Brown 1994; Trakhtenbrot & Kadmon 2005). In this process, information is inevitably lost since the differences between sites in the same cluster are ignored (Faith & Walker 1996; Ferrier 2002), and at worst one may by chance select the sites from each cluster that would be the closest ones to each other if actual environmental dissimilarities were considered. A similar strategy would be to select sites from a complete tree hierarchy based on an analogue of a phylogenetic diversity measure, which maximizes the length of the sub-tree spanned by the selected sites, thereby maximizing the environmental distinctiveness of the selected sites (Faith & Walker 1996; Woinarski, Price & Faith 1996).

using species modelling or community modelling in conservation prioritization

Environmental variables can be used in combination with biological distribution data to model how species distributions actually relate to the environmental variables. There are two alternative strategies: modelling distributions of individual species or modelling community composition. Modelling of individual species distributions is a relatively common strategy (reviewed by Elith et al. 2006), which unfortunately has high data demands when applied to many species. Also, rare species that should influence conservation planning frequently need to be excluded due to insufficient observational data for fitting a good model.

Another alternative is community-level modelling, which can be used in several ways for conservation prioritization. These techniques can be classified into three broad categories (Ferrier & Guisan 2006). First is ‘Assemble first, predict later’, in which biological data are first subjected to some form of ordination or classification to produce community-level attributes or entities that are then modelled in relation to environmental predictors. Second is ‘Predict first, assemble later’, in which individual species are first modelled as a function of environmental variables and predicted distributions are then used for classification, ordination or aggregation. However, when species distributions have first been modelled separately, then those distributions could as well be used directly in complementarity-based conservation area selection. And third is ‘Assemble and predict together’, in which all species are modelled simultaneously with environmental variables. Besides classifying the data into distinct categories (site- or species-groups), all these strategies can also operate directly in continuous environmental space, avoiding the loss of resolution related to classification procedures. Heuristically, sites are selected far apart from each other in the environmental space to maximize species representation, assuming that distant sites differ more in their community composition than those that are close together.

As a significant advantage, both ‘Assemble first, predict later’ and ‘Assemble and predict together’ approaches succeed in modelling rare species with scarce records – assuming they have similar responses to environmental variables as the common species – since they deal with community composition as a whole, incorporating interactions between species, rather than relying on individual species’ records. A technique called Generalized Dissimilarity Modelling (‘GDM’, Ferrier et al. 2007) performs particularly well in extrapolating community turnover information to communities that differ in their species composition from those used for the modelling, and is therefore an especially promising method for conservation planning in data-poor regions (Ferrier & Guisan 2006).

In this study, we develop a novel approach, called maximization of complementary richness (MCR), for the use of community modelling in conservation prioritization. The proposed approach differs from direct selection from environmental gradients as it calibrates the relative influences of environmental variables by species data. It is different from richness-based scoring-type selection of conservation areas as it accounts for the similarity of communities between selected areas. It is different from the previous use of ordinations as it utilizes information about both ecological similarity and species-richness gradients (Gauch 1973) of communities.

To evaluate the performance of our complementary richness maximization technique, we examine an increasingly complex series of surrogate approaches, starting from direct selection from environmental gradients and environmental cluster analysis, continuing to variants of community modelling approaches. For each class of methods, we explain the conditions under which they could work, and when they are likely to fail. We use simulated data to objectively compare the performance of these methods in a system with fully known properties. We compare use of environmental surrogates and community modelling-based approaches to random selection and optimal selection from full species information, which give us bounds for poor and good performance, respectively.

Material and methods

We compared a series of modelling and site-selection techniques using simulated data. A summary of the combinations of analyses that we performed is provided in Table 1.

Table 1.  Combinations of modelling and area selection techniques used in our analyses
General approachSpecific modelling techniqueSelection strategyAbbreviation
Only environmental data
 Multidimensional space formed by environmental gradientsDiscrete p-medianDIR
 Environmental classificationupgma clusteringRandom selection from within each clusterCLU
Environmental and biological data
 ‘Assemble and predict together’stepacross + CAPSCALEDiscrete p-medianCAP + pm
Discrete p-median corrected for richnessCAP + pm + R
GDMDiscrete p-medianGDM + pm
Direct maximization of complementary richness using local search and stochastic optimization, implicitly assuming equal richness across study area.GDM-MCR + no richness
Discrete p-median corrected for richnessGDM + pm + R
Discrete p-median corrected for unique richnessGDM + pm + UR
Direct maximization of complementary richness using local search and stochastic optimizationGDM-MCR + R

simulated data

We generated simulated presence–absence data for our comparison: a grid of 160 × 160 elements defined by two environmental variables, R (‘rainfall’; y-axis) and T (‘temperature’; x-axis), both scaling linearly from 0 to 1 across the matrix. We randomized 3000 hypothetical species to occur in the space. Niche width of individual species and average species richness in a cell were made to depend on the position in the matrix to mimic patterns of varying species richness and community turnover. We used only data from a subsection of 80 × 80 elements in our analyses to mimic the effect of the study area being a part of a wider landscape with higher variation in environmental variables. This subsection was positioned towards the lower right corner of the space so that there were 60-cell-wide margins where species ranges were large, and 20-cell-wide margins where ranges were small, with the consequence that some marginal species occurred at the edges of the subsection with the majority of their range outside (Fig. 2a). Consequently, some of the 3000 species only occurred outside or at the edge of the 80 × 80 area. We randomly selected 200 grid cells from within the 80 × 80 cell rectangle, which became the study sites used for conservation area selection. These 200 sites never coincided with ranges of all 3000 species; on average, there were 2299 species per data set with a variable number of occurrences. We generated 10 replicate data sets with the same statistical properties but different randomized allocation of species to grid cells and different random sets of cells as study sites.

Figure 2.

Data characteristics. (a) A random sample of species as 1-SD range circles. Ranges at the upper left corner have a radius three times as large as those in the lower right corner, indicating a gradient in community turnover. Density of species (number of circles) increases to the right. (b) Number of species in the cell (species richness) resulting from the combined effect of species density and range sizes, increasing towards the upper right corner of the space. (c) Unique species richness (range-size weighted richness) depends on both richness and turnover of community composition, and it describes the contribution of an individual cell to total diversity.

The end-product of our randomization procedure is shown in Fig. 2: average niche width varies across the landscape (Fig. 2a), resulting in varying community turnover (beta-diversity). Absolute species richness varies across the region, increasing towards one corner of the area (Fig. 2b). Variable species richness and turnover combine to generate a different gradient of unique species richness (Fig. 2c), or ‘range-size weighted richness’, the summed fractions of species distributions occurring in the cell (Usher 1986).

For the approaches that also used biological data, we assumed one-quarter of the presence–absence matrix [one-half of the species (~1100) in one-half of the sites (100)] was available for the fitting of community turnover and richness models. Knowledge of environmental variables was assumed to be known for all sites, which allowed prediction of species richness to all 200 sites and community similarity between all pairs of sites.

We randomized species presence using the following procedure, repeating it until 3000 species had been placed into the space. (i) Randomize an (R, T) centre point for a species distribution. (ii) Use a rejection sampling procedure (Press et al. 1992) utilizing a uniformly distributed random number r ∈ U(0, 1): accept the proposed centre point only if r < {0·01 + 0·99[T6/(0·2 + T6)]}. This causes more species to be located around high-temperature areas. (iii) Place species distribution around (R, T). The probability of presence of the species in the surrounding cells follows an unnormalized two-dimensional normal distribution having standard deviation σ = 3 + (4·0/140·0) * (140–160 * T + 160 * R) * a cells, where a ∈ U(4, 5). Thus, species niche width (and number of occurrences in the area) is influenced by position in the matrix (Fig. 2a).

using environmental data alone in site selection

As the simplest strategy, we selected sites evenly from the original environmental space formed by the rainfall and temperature gradients with the discrete p-median approach explained below (‘direct selection’, Table 1). We did not use any scaling or weighting for the variables.

The second method was upgma cluster analysis (unweighted pair group method using arithmetic averages, Sokal & Michener 1958) using environmental variables. We cut the clustering tree at the level that had the required number of clusters, and one site was randomly selected from within each cluster. Random selection was repeated 10 times for each network size and for each of the 10 replicate data sets. Reported performances are averages over all 100 runs.

using environmental and biological data together

We compared two community-level modelling strategies of type ‘Assemble and predict together’ where biological and environmental data are used simultaneously. In our first strategy, we used the ‘stepacross’ procedure in r (Oksanen 2007) to correct for the nonlinear relationship between compositional and ecological dissimilarity: it transforms the presence–absence version of Bray–Curtis dissimilarities into more realistic estimates of ecological dissimilarity. These values were then used to perform a Constrained Analysis of Principal Coordinates (CAPSCALE routine in r, Legendre & Anderson 1999; Anderson & Willis 2003; Oksanen 2007). Compared with traditional canonical analyses, such as redundancy analysis (RDA) or constrained correspondence analysis (CCA; descriptions in Legendre & Legendre 1998), this approach has the advantage of accommodating any dissimilarity measure through the use of principal coordinates analysis (PCoA) as an intermediate step, while also taking into account the correlation structure among variables in the response data (species occurrences). Technically, a PCoA is performed on the dissimilarity matrix, and RDA is applied to the resulting principal coordinates. The resulting ordination can be used for conservation area selection as described in the section ‘Selecting the sites’.

The second community-level modelling strategy, GDM (Ferrier et al. 2007), is very different. It is a matrix regression method that uses pairwise compositional dissimilarities between sites as the response variable and site-pair-specific differences in environmental measures as explanatory variables. It models spatial patterns of turnover in community composition, i.e. beta-diversity. It takes into account two types of nonlinearities: the non-linear relationship between dissimilarity measures and ecological dissimilarity, and variation in turnover rate along environmental gradients. The relevant feature of GDM is that it produces a scaling of environmental space and a nonlinear link function, which predicts community dissimilarity between any pair of sites. This information can be used in an ordination, or for direct site selection using some optimization strategy. We used Bray–Curtis dissimilarities as inputs.

Both CAPSCALE and GDM have advantages compared to traditional canonical analyses, as they correct for the well-known problem that biological dissimilarity indices have a non-linear relationship with ecological dissimilarity: when sites share no species, their dissimilarity is always one regardless of their ecological dissimilarity (Gauch 1973; Faith, Minchin & Belbin 1987). For example, consider three sites – two tropical forests and a grassland site – that share no species with each other. The grassland is ecologically more different from either of the forests than the forest sites are from each other, yet all their pairwise dissimilarities are one according to a compositional dissimilarity measure. This is especially problematic when beta-diversity, or spatial turnover in species composition, is high, as in our data.

We also modelled species richness and unique richness using generalized additive models (Hastie & Tibshirani 1986) implemented in r with d.f. = 8, where the response variable was species richness (or unique species richness) in each of the 100 sites used in model fitting, and rainfall and temperature were explanatory variables. We used the models to predict species richness across the environmental space. Richness or unique richness information was either utilized in site selection or not, depending on the method variant (Table 1).

techniques for selecting the sites

We selected sets of 2, 4, 9, 16, 25, 36, 49 and 64 sites from each 10 replicate data sets and with each different combination of modelling and selection techniques.

P-median selection is a commonly used technique adopted from facility location science for selecting locations evenly across a space. It is essentially a heuristic approximation of maximizing dissimilarities between sites. The objective is to select p‘facilities’ such that the distance from all ‘demand points’ is minimized (e.g. distances from households to fire stations). These p facilities (conservation areas) are thus local medians of demand points. Two versions, discrete and continuous p-medians, have been used in environmental surrogate studies. The discrete p-median considers only true, existing localities (unprotected areas) as demand points, whereas the continuous version considers the entire space as filled with demand points and spreads out the selected facilities (conservation areas) more regardless of how many true demand points are close to each facility. As our sites were randomly distributed across a rectangular area of environmental space, the choice between discrete and continuous p-median should make little difference to results. We chose to use the discrete version, because our explicit aim is to cover species in our 200 demand points using environmental surrogates. With the continuous p-median, the shape of the ordination in multidimensional space would rather arbitrarily influence the spatial distribution of demand points, and therefore produce arbitrary differences in the results. Thus, we chose to use the discrete version because it provides a more objective comparison between modelling techniques. We used popstar software for calculating the p-medians (Resende & Werneck 2006).

We developed approaches for selecting sites from environmental space, specifically aiming at maximizing complementarity. The p-median is likely to be suboptimal because it ignores species richness gradients in the space; one would rather select two dissimilar sites with high richness than two equally dissimilar but species-poor sites. To correct for species richness in p-median selection, we utilized the modelled richness distribution in the selection procedure so that the density of selected sites in the space was directly proportional to richness. This was achieved by weighting the pairwise distances by species richness (or unique richness) of the demand site. Weighting species-rich parts of the space higher has been suggested (Faith & Walker 1996) but not commonly applied.

We also used GDM-modelled community dissimilarity and species richness directly with a technique that we call maximization of complementary richness, GDM-MCR. Ferrier et al. (2004) described how the fraction of species covered can be calculated from richness and site pair-specific community dissimilarity information. We extend this idea to practical conservation area selection by stochastic optimization. We used an objective function

image(eqn 1)

where N is the number of sites in the landscape, ri is the (modelled) species richness at site i, and ci is the fraction of species at site i covered, either by the site itself if it is selected, or in other sites that have similar species composition. In calculating ci, the community similarity between the selected sites and the rest of the sites in the landscape has to be accounted for:

image(eqn 2)

where k is the index of the jth out of p selected sites and sik is the predicted number of the same species in sites i and k. sik can be solved by rearrangement of the Bray–Curtis dissimilarity measure; sik = (1 – dik)(ri + rk)/2, where dik is the predicted community dissimilarity between sites i and k.

Equation 2 assumes that selected sites are contributing to the coverage of demand points independently from each other, which is not true. Assume demand point D, which is partially covered by facility F1. When facility F2 is added to the solution, it will cover a substantial fraction of uncovered species at D only if F1 and F2 are highly dissimilar themselves. If F1 and F2 are identical, F2 will not cover new species at D. Consequently, optimal usage of richness and turnover information would require utilization of the full partial correlation structure in community similarity. We did not attempt to solve this hard optimization problem here, but applied a simple stepwise correction procedure to alleviate the problem, accounting for similarity between F1 and F2: when adding a new facility k to the solution under evaluation, we modify sik in equation 2 to account for similarity between it and sites already added to the solution, j = 1, ... , k– 1,

image(eqn 3)

The MCR selection is a nonlinear problem and cannot be optimized using linear programming. We used the following procedure: starting from a random set of sites, local improvement was applied iteratively. A site was selected at random and 50 new sites were then proposed, and if an improved solution was found, it was taken. After 100 unsuccessful attempts to relocate a selected site, the local search was considered to have converged. To find further improvements, the local search procedure was followed by a stochastic search procedure, simulated annealing, which has been extensively utilized in conservation prioritization (Possingham, Ball & Andelman 2000). Thirty thousand rounds of simulated annealing were run, and at each round, the algorithm attempted to relocate a randomly selected point. This simple optimization technique appeared to converge reliably. However, global optimality of our solutions cannot be guaranteed, and if anything, our estimate of the performance of the GDM-MCR strategy is thus an underestimate. This simple search procedure could later be refined and made more efficient and better suited for large data sets.

solution evaluation

Our aim in the use of these surrogate strategies was to have as many species represented as possible, which is generally called the maximum coverage problem in systematic conservation planning literature. Therefore, we evaluated the selected conservation area networks by looking at the true numbers of species represented in them by inspecting the original full data matrices. Our standard for poor solution performance was the average performance of randomly selected sets of sites. Our standard for good solution performance was the performance of a complementarity-based site selection algorithm, applied to the full data matrices assuming complete knowledge of species’ distributions. Site selection from species matrices was done using the technique in Arponen et al. (2005), which employs forward and reverse stepwise heuristics and simulated annealing.


We examined the performance of environmental and community-level surrogate strategies in representing species in conservation area networks. Surrogate strategies performed on average better than a random selection of sites, and as selected networks became very large, nearly all species became protected with any strategy (Fig. 3a). There were consistent and large differences in the relative performances of methods ranging from scarcely better than random to 75% towards optimal from random (Fig. 3b). The rank order of performance of the strategies was reasonably consistent across a range of network sizes (Fig. 3).

Figure 3.

Species richness in networks selected with the different techniques. Network size is on x-axis. (a) Performance shown as absolute numbers of species. (b) Performance scaled between that of random selection and optimal selection from full sites × species data matrix. Values given are means across 10 replicate data sets.

The methods using environmental data alone – clustering and direct selection from the environmental space using a p-median – are much closer to random than to optimal solutions (Fig. 3), with a mean performance of 10–15% of the optimal-random difference. Inclusion of biological data does not automatically improve performance over selection from environmental data alone: selecting sites from ordinations using the p-median (CAP + pm or GDM + pm) performed worse than methods with no biological data. This happens because these methods emphasize areas of the environmental space with high community turnover rates, and if species richness gradients do not coincide with turnover gradients, selection from ordinations can even do worse than random.

The first major jump in performance comes when a correction for richness is applied by weighting the p-median selection by richness or unique richness of demand points. These strategies achieve a performance of 30–60% (Fig. 3).

A second major jump in performance is achieved by switching from the p-median to a selection strategy that does not assume that species richness decreases towards the edges, and acknowledges that species in one demand point may be covered by multiple facilities, which means that it is acknowledged that a degree of community similarity may extend rather far across the environmental space. This observation is demonstrated by pairwise comparison between strategies GDM + pm vs. GDM-MCR + norichness, and GDM + pm + R vs. GDM-MCR + R. These selections are otherwise using identical data and richness and turnover models, and they differ only in the selection strategy, discrete p-median vs. direct maximization of species coverage (equation 1). In both cases, direct maximization of species coverage is by far superior. Overall highest performance is achieved with GDM-MCR + R, which has a performance of 65–83% of the optimal-random difference.

Figure 3 shows only the mean performances of methods, and it is conceivable that high variation between replicate data sets could make choice between the methods unreliable. Supplementary Table S2 (Supplementary Material) demonstrates that this is not so. In summary, all GDM variants with a richness correction outperform nearly 100% of the time all p-median methods that do not consider richness. GDM-MCR (with richness) outperforms all other methods in all cases, except for GDM-MCR + norichness at 64 conservation areas and a single replicate of GDM + pm + UR.

The patterns of selected sites vary widely (Fig. 4). Selecting the sites directly from original environmental space using either the p-median or environmental cluster analysis resulted in a grid-like even selection across the space (Fig. 4a). Use of GDM + pm moves selected areas slightly towards parts of the space with high turnover (Fig. 4b). Inclusion of species richness information moved selected sites towards the upper right corner with higher richness (Fig. 4d–f, compare to Fig. 2). GDM-MCR + R (Fig. 4g) resembles the optimal solutions (Fig. 4h), with selected areas close to the edges and concentrated to the corners of high richness and high unique richness.

Figure 4.

Locations of sets of 16 selected sites in environmental space, shown summed across the 10 replicate data sets. The selection method is indicated in italic for each panel.


The choice of strategy used in application of environmental or community-level surrogates to conservation prioritization has a considerable effect on the quality of outcome – the performance ranges from worse than random to near-optimal (Fig. 3). We have developed a new strategy for community modelling-based conservation area prioritization. We found that there are three critical components relevant for success: a good model for community turnover (compositional dissimilarity), a good model for species richness, and a selection procedure that appropriately utilizes both turnover and richness information. Use of environmental classification may fail because it does not relate environmental information to species richness and turnover. Selection from ordinations may fail if they do not account for richness gradients. Selection from an ordination may lead to a worse-than-random result if high species turnover occurs at a species-poor region of the scaled environmental space. Use of richness alone will not suffice because of the danger of choosing species-rich but identical sites, ignoring complementarity. In summary, modelling both richness and community turnover are indispensable components of a successful surrogate strategy.

shortcomings of the p-median selection

Regarding site selection methods, we have shown that the commonly used p-median selection has methodological shortcomings. There are four reasons why p-median applied to an ordination may fail: (i) to maximize species representation, it is not enough to find sites as different as possible from each other, but sites that have most species altogether, in combination, which requires taking into account differences in species richness; (ii) an even representation of the environmental space is not effective if species turnover rate varies in the space; (iii) p-median's functionality in representing species is based on the assumption that the centres of species’ distributions can be located anywhere within the environmental space but not outside of it; (iv) the p-median assumes each demand point is covered by one facility. All these issues apply to both the discrete and continuous p-medians, but not to the complementary richness maximization technique.

The edge-effect problem [point (iii)] can potentially be addressed by defining the planning region, from which protected locations are selected, as a subset of a more extensive region (both environmentally and geographically) across which demand points are drawn for the p-median analysis. But this requires that any biological information employed in the analysis, such as species richness, must also be available across this larger region. However, if conservation planning is happening also in the adjoining regions, selecting sites from the edges in both neighbouring areas will result in overlap in species coverage. In this case, the basic p-median would be more effective.

P-median ignores the possibility that one demand point may be partially covered by multiple facilities [in reality, most sites share at least some species, point (iv)]. Consequently, p-median tends to position selected sites rather regularly inside the environmental space. In contrast, maximization of complementary richness places selected sites closer to edges and corners of the environmental space, which is more optimal, because biodiversity features present in sites in the centre of the environmental space may become covered by the sites at the edges of the space. The extent to which this conclusion is true depends on the rate of community turnover over the environmental space. If significant community similarity extends through the entire region of environmental space, then sites at edges will cover features in the centre of the space. If the community turns over multiple times across the environmental space, sites need to be placed also around the centre of the environmental space.

common problems with community modelling and its application to conservation planning

A number of additional issues about the use of community models may have influenced the mixed conclusions drawn from earlier studies and these should be considered in future applications:

  • 1Aim at including all key environmental variables in the appropriate resolution for the problem at hand, and at having a sufficiently large number of dimensions in the ordination (Ferrier et al. 2002).
  • 2Species distributions may not be in equilibrium with environmental variables especially with the ongoing climate change, habitat degradation and loss, and therefore may not be modelled well by such variables (Kirkpatrick & Brown 1994; Ferrier 2002). Incorporating geographical distance, paleoclimatic modelling or dispersal cost surfaces may help in catering for the influence of historical events and population processes (Ferrier 2002; Steinitz et al. 2005).
  • 3Some ordination techniques should be avoided if their assumptions about forms of species’ responses to environmental gradients do not correspond with your data (such as RDA and CCA).
  • 4Do not assume that a selection based on modelling done with one species group captures variation in other groups, as relationships between environmental variables and species vary between taxonomic groups (Hortal & Lobo 2005). This is equivalent to the unresolved issue of taxonomic surrogacy.
  • 5Keep in mind that sampling bias in biological data could have unexpected consequences for modelling of community turnover and species richness.

limitations of our approach

In this comparison of surrogate strategies, we have ignored several complications which may be relevant for real planning cases. These include the connectivity of selected sites (Cabeza 2003), accounting for stochastic habitat loss and availability rates (Pressey, Watts & Barrett 2004), climate change (Williams et al. 2005), genetic effects (Lesica & Allendorf 1995), and needs to specially cater for focal species. Accounting for such complications will be methodologically challenging and only worth the trouble if the underlying surrogate strategy is coherent.

In our example, we illustrated the principle of complementarity maximization using only two environmental variables. In real-world planning, the number of variables could be much larger; however, this complication is easily implemented as an arbitrary number of variables can be used as explanatory variables in the models for richness and dissimilarity.

Even the simplest environmental surrogate strategies usually perform better than random selection of sites (Table S2), but when biological data is available, it should be used, as the strategies that account for turnover and species richness gradients are clearly superior. However, it is unclear to what extent the quality and quantity of biological data can influence the results: can use of environmental variables alone be better than use of scarce and biased biological data? These effects should preferably be explored with multiple large, high-quality empirical data sets. Alternatively, one could introduce systematic data error rates into simulated data to get an idea of the performances of the methods in a worst-case scenario.

management implications and availability of the methods

The novel method described in this study provides a clear improvement to previously used surrogate strategies, and can therefore be recommended to replace them in real-world conservation planning applications. Applicability of these techniques is not restricted to selection of new protected areas, but is also applicable for any kind of site prioritization, such as identifying sets of sites for restoration or for further biodiversity surveys.

Regarding implementing these analyses, r-functions for fitting GDMs (Ferrier et al. 2007) are freely available from http://www.biomaps.net.au/gdm/. The richness models can be fitted using a variety of statistical modelling techniques (Elith et al. 2006) such as GLM or GAM, freely available as standard functions within the r statistical package. A forthcoming study (J.R. Leathwick, personal communication) will explain how a particular arrangement of input files and analysis options allows large-scale high-resolution reserve selection using complementary-richness to be implemented using the publicly available zonation framework and software (Moilanen et al. 2005; Moilanen & Kujala 2006).

There are numerous approaches for utilizing environmental or community-level surrogates that probably would have had mediocre performance, but it is beyond one study to examine all combinations of them. Nevertheless, we have demonstrated that surrogate approaches based on community modelling, when used properly, perform substantially better than earlier techniques. We thereby conclude that community-level surrogates hold great promise for conservation planning, especially in species-rich and data-poor regions of the world, where data may suffice for community-level modelling but not for the modelling of distributions of many species.


This work was financed by LUOVA – Finnish School in Wildlife Biology, Conservation and Management (AA) and the Academy of Finland project 1206883 (AM). We thank Professor Juha Alho for statistical advice, Glenn Manion for help with GDM runs, and Dr Brendan Wintle and two anonymous reviewers for comments on the manuscript.