On the limitations of graph-theoretic connectivity in spatial ecology and conservation

Authors


Correspondence author. E-mail: atte.moilanen@helsinki.fi

Summary

1. Applications of graph-theoretic connectivity are increasing at an exponential rate in ecology and conservation. Here, limitations of these measures are summarized.

2. Graph-theoretic connectivity measures are fundamentally limited as they require specification of a habitat quality threshold to allow definition of habitat patches (nodes). Frequently, a second threshold (critical dispersal distance) is applied in the identification of graph edges.

3. Graph-theoretic measures are poorly applicable to large-scale, high-resolution, grid-based data that describe distributions of species in habitats of varying quality.

4. Graph-theoretic connectivity primarily concerns the emigration-immigration component of spatial population-dynamics. Therefore, it cannot alone answer questions about the regional population size, resilience or persistence of a focal species.

5.Synthesis and applications: Conservation managers in particular should appreciate these limitations before applying graph-theoretic analysis to spatial conservation planning.

Introduction

Connectivity is a fundamental variable in spatial ecology and biodiversity conservation (e.g. Hanski 1998; Calabrese & Fagan 2004; Crooks & Sanjayan 2006; Kindlmann & Burel 2008), and improving connectivity has been the most prevalent proposed solution for conservation under climate change (Heller & Zavaleta 2009). Therefore, it is no surprise that every year, thousands of publications develop new connectivity-based analyses or apply connectivity as part of an ecologically based analysis. Here, I comment on a branch of connectivity research that has been gaining popularity at an exponential rate during the past decade, that is, graph-theoretic connectivity (Urban & Keitt 2001; Pascual-Hortal & Saura 2006; Minor & Urban 2008; Urban et al. 2009). While in year 2000, there were just two citations to graph-theoretic connectivity in ecology and conservation, in 2010, there were >450, with the citation rate doubling approximately every 1.5 years (Fig. 1). With enthusiasm on the rise, it is hard to locate a unified critical perspective about the applicability of graph-theoretic connectivity. Such will be provided here. But, first, what are graphs, really?

Figure 1.

 ISI Web-of-Science search on April 22, 2011, for topic = (‘graph-theory’ OR ‘graph-theoretic’) AND (connectivity OR connectedness OR isolation OR dispersal OR migration) AND (ecology OR population OR conservation OR biodiversity). From four papers and 31 citations in 2005, research volume has grown to 27 papers and 458 citations in 2010.

Graphs originate from mathematics and computer science. A graph defines relationships between entities, which are frequently called edges and nodes. Graphs are used most naturally when there are relatively few edges between nodes, in which case the graph is easy to visualize and computationally efficient algorithms can be applied to it (Urban et al. 2009; Rayfield, Fortin & Fall 2010). Clear visualization, compact standard representation, efficient computation, and centuries-long mathematical background lead to graphs being appealing structures for applied sciences, including ecology. One example of profitable import of graph-theoretic methodology is connectivity motivated by electric circuit theory, where the ecologically relevant feature is that alternative connectivity pathways are accounted for (McRae et al. 2008), improving from analysis based on least cost path distances. For more background about graphs, see the comprehensive and well-illustrated review of Newman (2003). An excellent summary from the fields of ecology and conservation is given by Urban et al. (2009) and reviews from other biological disciplines by May (2005) and Proulx, Promislow & Phillips (2005).

Integral to the concept of a graph is that it must be possible to define its nodes and edges – something highly relevant below. This is easily carried out in engineering systems, such as energy supply, transportation, or communications networks, where nodes represent supply and demand points and edges represent actual physical connections between nodes. In spatial ecology, the nodes are habitat patches and edges are interpreted as connectivity between patches (Urban & Keitt 2001).

Limitations of graph-theoretic connectivity

Limitations of graph-theoretic connectivity are below structured around four themes:

  • 1 A multitude of measures with uncertain ecological relevance and novelty value.
  • 2 Thresholding leads to significant loss of information.
  • 3 Computational limitations in application to high-resolution GIS grids.
  • 4 Overemphasis on relevance of landscape connectivity.

A multitude of measures with uncertain ecological relevance and novelty value

How novel is graph-theoretic connectivity exactly? A graph can be represented by a matrix, where entries on rows and columns indicate the strength of the connection between the two patches. If nodes have a low average number of connections, the matrix can be efficiently stored as a sparse matrix, and easily visualized as a network of circles connected with lines.

Full pair-wise connectivity matrixes have been used to represent connectivity in metapopulation studies earlier than graphs have been used in ecology and conservation (e.g. Hanski 1994), a connection recognized by some of the literature of graph-theoretic connectivity (Urban & Keitt 2001; Urban et al. 2009). Consider also buffer (a.k.a. neighborhood) measures of connectivity, in which a distance is specified and connectivity is assumed to occur within this distance. Such measures have been extensively used in metapopulation studies (see Moilanen & Nieminen 2002 for early review) and in statistical species distribution modeling (Elith & Leathwick 2009), where they measure quantities such as ‘the amount of forest within a 500-m radius around the focal grid cell’. There is no meaningful difference between buffer measures and critical distances used to construct graph-based representations of a landscape: both lead to an effectively identical characterization of local connections in the landscape – of course different things can subsequently be performed with this description of the landscape.

The language of graph-theoretic connectivity is not fully ecologically pertinent – a recent review about graph theory in spatial analysis (Urban et al. 2009) define among others the following technical terms with variably clear ecological relevance: arc, characteristic path length, centrality, closeness, clustering, community structure, component, degree centrality, diameter, digraph, directed graph, dual graph, least cost path, least cost link, link, link weight, minimum planar graph, minimum spanning tree, multigraph, node, node betweenness, node failure, order, path, planar graph, regular graph, scale free graph, small world graph, spanning tree, subgraph, walk, and value. One might opine that the already abundant terminology of structural, functional, regional, landscape, and metapopulation connectivity would benefit from a consolidation of operational terms rather than from proliferation of them. Fundamentally, connectivity is about (i) how do you define the patches, (ii) how do you calculate potential connections between patches, and (iii) what do you do with this description of the landscape? Sensible connectivity measures – whatever they are called – should turn out very similar results when based on the same structural description (i and ii) of the landscape. Instead of labeling an analysis as metapopulation, landscape ecological, landscape connectivity, graph theoretic, statistical distribution modeling, spatial PVA, or something else, it would be more profitable to concentrate on the operational structure of the analysis: how are habitat area, habitat quality, spatial considerations, multiple species (or whatever biodiversity features), temporal dynamics, and analysis resolution handled?

Thresholding leads to significant loss of information

Use of graphs relies on the ability to define nodes and edges. In ecology, this implies that it must be possible to delineate habitat patches and to define connections between them. Natural delineation of patches may be feasible, for example when forest fragments occur in an agri-urban habitat matrix. In contrast, assume a semi-continuous forest landscape with varying tree species composition and human impacts; delineating patches becomes less easy and requires application of thresholding to divide the landscape into habitat patches and other habitat types. This requirement of classification of the landscape to habitat and non-habitat is a fundamental limitation in particular for multi-species analysis, frequently applied across spatial ecology, in island biogeography, metapopulation studies, and landscape ecology (Chetkiewicz, Clair & Boyce 2006).

A second thresholding may occur when connections (graph edges) are delineated. Frequently, graph edges are defined via critical distance inside which patches are connected and outside which no connections exist (see e.g. Pascual-Hortal & Saura 2006 for comparisons). Effectively, such a threshold equates to a dispersal kernel that is a step function of distance, which does not correspond ideally to ecological reality: dispersing individuals do not suddenly drop dead when hitting an invisible critical distance. Not all applications of graph-theoretic connectivity apply critical distances; several studies have used pair-wise ‘probability of connectivity’ matrixes that can be constructed either via the use of a declining-by-distance dispersal kernel or via more complicated path-type calculations (Urban & Keitt 2001; Saura & Pascual-Hortal 2007). In fact, the PC-index of Saura & Pascual-Hortal (2007), based on a pair-wise connectivity matrix, was the only graph-theoretic measure out of nine that satisfied 13 given conditions required for the sensible behavior of a connectivity metric.

Problems with thresholds compound when going from single to multi-species analysis. Different species have different habitat requirements and dispersal behavior, implying much work and significant simplification when thresholds for habitat quality and dispersal-distances are defined for up to tens of thousands of species. It is a common technique in graph-theoretic analysis that measures of network structure is recalculated for a range of critical distances, facilitating identification of distances where the structure of the network changes rapidly (e.g. Urban et al. 2009; Bodin & Saura 2010). Such an approach is feasible for a few species, but becomes arduous and hard to summarize with many species that inhabit different networks.

Furthermore, twice thresholded graphs could be applied within the context of systematic conservation planning, where conservation targets are effectively defined as minimum thresholds to species occurrence levels (representation), connectivity and other quantities (Margules & Sarkar 2007). One then does applied conservation on a thrice thresholded system where the objectives (targets) and the ecological model (habitat quality, connectivity distances) have all been thresholded, resulting in a somewhat black-and-white view of the world.

Computational limitations in application to high-resolution GIS grids

Statistical species distribution modeling (Elith & Leathwick 2009) is typically carried out on high-resolution GIS grids, where explanatory variables represent factors such as topology, temperature, rainfall, and vegetation cover. Such data are becoming available at very high resolutions, for example, the global Landsat data and the European Corine data both have spatial resolutions in the order of tens of metres. Landscapes may be small-scale and fragmented in industrialized countries, and the average sizes of individual properties may be in the order of hectares – implying that both ecological analysis and conservation management ideally should happen at spatial resolutions of hectares (or less) for them to correspond to realities of ecology and land ownership. Such resolutions imply national-scale analysis on grids consisting of tens of millions of grid cells of information per species or ecosystem type. At the time of the review of Urban et al. (2009), it appeared that some more complex graph-theoretic analyses were only possible for landscapes of some thousands of nodes, implying a need to simplify landscape descriptions via thresholding.

This deficiency would be meaningless if there were no alternatives that would be applicable to data such as described earlier. However, it turns out that arbitrary species-specific kernel-based declining-by-distance connectivity measures that do not require habitat quality thresholding or specification of a critical distance can indeed be computed on large grids, within a species (Moilanen & Wintle 2007), between distributions (Rayfield, Moilanen & Fortin 2009), or between many partially similar habitat types (Lehtomäki et al. 2009; Leathwick et al. 2010) and used in spatial conservation. These computations rely on the application of the fast Fourier transform technique, described by Moilanen (2005), to enable connectivity transforms for grids of tens of millions of elements. Kernel-based connectivity techniques can also be used for calculating connectivity values for use in subsequent graph-based processing.

Some graph-algorithms, originating from computer science, are exact and guaranteed to find optimal results, implying intensive computation in analysis when there are many nodes. Take as an example, the least cost path (LCP), which is the shortest possible path between two nodes. Computationally, intensive graph-theoretic methods are frequently associated with LCP calculation and application (Urban et al. 2009). Nevertheless, recent results suggest that LCP distances may approximate functional connectivity poorly (Palmer, Coulon & Travis 2011). This may be because LCP analysis is sensitive to the definition of resistance values used (Rayfield, Fortin & Fall 2010) and because it does not acknowledge the existence of multiple alternative paths and their effect on functional connectivity, see Fig. 2 for illustration. Another category of connectivity methods, analysis based on mechanistic movement rules, can both approximate connectivity in an ecologically realistic manner and account for multiple alternative movement paths in a natural manner (Ovaskainen et al. 2008; Watts & Handley 2010; Palmer, Coulon & Travis 2011). In summary, it is presently not clear that graph-based analysis is the best available compromise between ecological realism and computational tractability.

Figure 2.

 Spurious results can arise from the use of critical distances. (a) A patch network with three clusters that are separated by more than the critical distance. (b) Extension of the network by adding habitat to each cluster – the clusters remains separated by the critical distance. (c) Addition of two stepping stones connects the entire network according to the critical distance. A spatial population model utilizing a declining-by-distance dispersal kernel would see network (b) as the most stable alternative, while a graph-theoretic connectivity measure might conclude that the addition of stepping stones is critically important for connectivity.

Overemphasis on relevance of landscape connectivity

The close occurrences of landscape connectivity with graph-theoretic connectivity in ecological literature imply that graph-theoretic connectivity is seen as a vehicle for finding an improved measure of landscape connectivity (Bunn, Urban & Keitt 2000; Urban & Keitt 2001; Pascual-Hortal & Saura 2006; Estrada & Bodin 2008; Minor & Urban 2008). This concept is one of the holy grails of landscape-scale conservation in that it should directly inform about the resilience of the species across the landscape, thereby providing a fundamental measure for conservation planning. Recent reviews have commented about the continuing lack of clear operational definition for (landscape) connectivity (Kindlmann & Burel 2008; Heller & Zavaleta 2009); here I will remark on the fundamental role of connectivity in population ecology.

The basic equation of population ecology states that change in population size equals births – deaths + immigration − emigration. Connectivity only informs about the immigration and emigration components of this equation. [That is, unless the semantic meaning of connectivity is significantly expanded from original to encompass a PVA-like evaluation of persistence.] Thus, no measure of connectivity, whatever it is called, can give a fully reliable estimate of the persistence, extinction risk, or resilience of a species at a regional scale. For comparison, measures such as metapopulation capacity (Hanski & Ovaskainen 2000), ‘rapid evaluation of metapopulation persistence’ (Drielsma & Ferrier 2009), or spatial population viability analyses integrate analysis of spatial pattern with additional information about population sizes, birth and death rates, thereby providing more direct information about regional persistence.

Figure 3 illustrates a case where a spatial population model and a connectivity-focused measure might plausibly disagree about what would be the best conservation strategy. A connectivity measure, such as the maximum connected subgraph size, utilizing the critical distance shown in the figure would conclude that the network, unsatisfactorily, consists of three separate clusters (maximum connected subgraph size = 1/3). Addition of two stepping stones (Fig. 3c) would, apparently advantageously, connect the entire network with distances shorter than the critical distance. Contrastingly, a spatial population model that uses a declining-by-distance kernel and accounts for births, deaths, and patch sizes would see relatively little influence from the stepping stones: if the small stepping stone is within colonization distance from the larger clusters, then direct dispersal between the clusters will be possible as well (although at a lower per-capita rate), and the small patches will not provide significant increase to the total amount of habitat available for the species. This was noticed already by Urban & Keitt (2001): ‘A small stepping-stone patch might be important to traversability without contributing substantially to overall productivity or dispersal flux’; see also Bodin & Saura (2010). A population model would prefer addition of three large patches to the system (Fig. 3b) – these would add significant habitat, improve the persistence of the clusters, and therefore also increase the number of migrants moving between clusters. But, put into numbers, how does a 10% addition in area compare with a tripling of connectivity? A situation conceptually similar to Fig. 3c was found in an empirical study by Urban (2005). A likewise spurious result could be produced by varying the habitat quality threshold used in the definition of habitat patches (nodes), with a lower threshold apparently improving landscape connectivity and resilience because of both an apparent increase in habitat amount and reduced distances between patches.

Figure 3.

 Illustrating the relevance of accounting for multiple alternative paths. The shortest path between two locations is the same in panels (a) and (b). However, a dispersing individual will not in advance know which path (A or B) to take, leading to functional connectivity being lower in (b) than in (a). This effect only becomes apparent in connectivity analysis when multiple paths between patches are accounted for. Likewise, the per-capita dispersal rate between the two patches would be higher in (c) than in (d). In the figure, breeding habitat is in black, white is suitable for dispersal and gray areas are habitat avoided by the species.

Discussion

The purpose of the present discussion has been to pointedly state limitations of graph-theoretic connectivity – its benefits and promises, including a mathematically coherent framework with efficient computational implementations and transfer of knowledge from engineering sciences, have been summarized before (Saura & Pascual-Hortal 2007; Minor & Urban 2008; Urban et al. 2009; Zetterberg, Mörtberg & Balfors 2010). Fundamentally, graph-theoretic connectivity is best suited for analysis of naturally patchy landscapes, with connections defined via critical distances. Contrastingly, at least in conservation, there is a need for high-resolution, multi-species analyses that allow for species-specific consideration of variable habitat quality and connectivity.

With respect to the role of connectivity in general, the area of habitat suitable for the species and quality of those habitats are the two primary variables defining the landscape-scale maximal carrying capacity for a species (Hodgson et al. 2009). Spatial arrangement, coming next, influences how much of this carrying capacity is utilized. The influence of habitat quality may be generally larger than that of spatial pattern (Travis & Dytham 1999; Hodgson et al. 2011). Consequently, simplification of the landscape structure, thresholding of habitat quality, reduction in analysis resolutions, or other similar operations required to fit analysis into any connectivity framework should be treated with caution when working on real-world conservation management.

To conclude and dropping semantics, it is useful to bear in mind the following operationally relevant considerations when working with connectivity: Is connectivity considered as a part of a full population model, or is connectivity measured and applied as a standalone entity? How, if at all, are patches defined? Is patch quality information retained? How are distances between patches calculated? Is distance defined and scaled in a species-specific manner? Are alternative pathways between sites considered? How is connectivity aggregated across a patch network?

Acknowledgements

I thank the Academy of Finland centre of excellence programme 2006–2011, grant 213457, and the ERC-StG grant 260393 (GEDA), for support.

Ancillary