Mihai Valcu, Department Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Eberhard Gwinner Strasse, 82319 Seewiesen, Germany. E-mail: email@example.com
Aim To introduce rangeMapper, an R package for the study of the macroecological patterns of life-history traits, and demonstrate its capabilities using three case studies. The first case study addresses an important topic in conservation biology: biodiversity hotspots. Specifically, we investigate the congruence between global hotspots of three parameters that describe avian diversity: species richness, endemic species richness and relative body mass diversity. The second case study investigates a topic of relevance for macroecology: the inter-specific relationship between range size and body size for avian assemblages, and how it varies geographically. The third case study tackles a methodological problem in macroecology: the influence of map resolution on statistical inference, i.e. the question of whether and how the relationship between species richness and body mass varies with map resolution.
Innovation rangeMapper offers a tight integration of spatial and statistical tools for macroecological projects and it relies on a high-performance database engine which makes it suitable for managing projects using a large number of species. rangeMapper's architecture follows closely the concepts described by Gaston et al. (2008 Journal of Biogeography, 35, 483–500) and its flexibility allows for both complex data manipulation procedures and easy implementation of new functions. By choosing case studies to cover various technical and conceptual issues we demonstrate rangeMapper's capabilities to address a wide array of questions.
Understanding patterns of biological diversity is a major goal of macroecological research. In particular, over the last decade substantial effort has been invested in explaining geographical variation in species richness (e.g. Gaston, 2000; Davies et al., 2007). In addition to species diversity, functional trait diversity has now been recognized as an equally important component of biological diversity (e.g. Petchey & Gaston, 2006). Indeed, with the increased availability of life-history data compiled for whole taxonomic groups, the first studies on life-history traits at a global level have recently been undertaken (e.g. Jetz et al., 2008; Olson et al., 2009).
Global patterns of species richness and life-history trait distributions are likely driven by multiple factors, at multiple levels of organization (from population to assemblage level) and at multiple spatial scales (from regional to global) (Gaston et al., 2008). This makes macroecology inter-disciplinary in both its approach and the tools it uses (Blackburn, 2004; Smith et al., 2008). For example, a typical macroecology project (e.g. Hurlbert & Jetz, 2007; Olson et al., 2009; Fritz & Purvis, 2010; Powney et al., 2010) requires a tight integration of geospatial species range vector data with (1) life-history information and (2) satellite remote-sensing ecological and climatological raster data. The conceptual and technical complexity of such a project is increased further depending on whether the level of analysis is at the inter-specific or the assemblage level. For an inter-specific analysis, the parameters required are the life-history traits of interest for each species and the phylogenetic relationships between species, linked with spatial characteristics of the species ranges (e.g. range size or range shape) and the key environmental variables within the ranges. In contrast, for an assemblage-level analysis, the geographical ranges of all species are partitioned into a regular grid, and each ‘assemblage’ is defined as the community of species that occurs within each cell. Each assemblage can then be described by its richness (i.e. the count of all species in a grid cell) and the life-history characteristics of its species community. Environmental variables at each grid cell are then typically used as predictor variables for both species richness and/or life-history traits.
The analyses described above require adequate statistical models that allow for spatial autocorrelation and/or phylogenetic control. Although various statistical tools already exist along with spatial and database management support, these are often difficult to integrate under the same computing platform. Thus researchers typically have to switch between various computer programs that are loosely interconnected at best and often function as a black box (i.e. when not open source).
Here we introduce rangeMapper (http://cran.r-project.org/package=rangeMapper), a versatile framework for the study of macroecological patterns of life-history traits that can be used to answer a large array of questions in both fundamental ecological research and conservation biology. rangeMapper is an open source extension for R (R Development Core Team, 2010), built using R's comprehensive database (James, 2010) and spatial classes support (Pebesma & Bivand, 2005; Bivand et al., 2008; Hijmans & Etten, 2010). Macroecological projects can be performed at both inter-specific and assemblage levels, and tools for connecting the two approaches are provided. rangeMapper further allows a straightforward integration of the many statistical tools existing in R.
In this paper we first describe the concept behind rangeMapper and introduce its general capabilities. Second, we apply rangeMapper to three case studies chosen to cover various technical and conceptual issues. For each case study, we provide a brief introduction to the topic, a description of the method, the rangeMapper results and a brief discussion. These examples are based on a comprehensive dataset of the geographical breeding distributions of more than 8000 avian species. The case studies are accompanied by reproducible examples using the life-history traits and the breeding range distribution of the New World wrens (Troglodytidae), a dataset which is bundled with the package (see Appendices S1–S5 in the Supporting Information).
MATERIALS AND METHODS
The rangeMapper general framework
rangeMapper adopts a modular framework (Fig. 1) where each project is partitioned into several steps. This versatility ensures that it is relatively straightforward at each step to plug in various statistical models of any degree of complexity. This mechanism allows us to implement both (1) range structure indexes (e.g. range shape; Pigot et al., 2010) and (2) measures of environmental parameters (e.g. mean primary production or elevation range). Environmental parameters can be computed either within the range of each taxon or at each grid cell (with the support from the raster and rgdal packages; Hijmans & Etten, 2010; Keitt et al., 2010). Finally, the modular framework allows the use of a wide array of statistical models computed at each canvas cell, from simple analyses (count, average) to rather complex models applied at each pixel (e.g. generalized linear models).
A ‘subsetting’ mechanism used for map building is an additional strength of rangeMapper. That is, subsets of life-history traits, range traits, assemblage traits (i.e. pixel traits) or any combination of those can be easily defined, ensuring a tight connection between inter-specific and assemblage levels.
rangeMapper projects are hosted on disk in sqlite databases so most computations are not memory limited. Moreover the robust sqlite engine allows efficient management of large projects. For example creating a global map of median body mass of 8434 species using a canvas with a grain of 50 km2 takes 2.1 min on a four-core 2.8 GHz Intel Xeon running 64-bit Ubuntu Linux 10 with 11.8 GB of physical memory.
For users without knowledge of R scripting language, a cross-platform graphical user interface (GUI) is provided for most tools (Valcu & Dale, 2011). Finally, rangeMapper is built using S4 classes (Chambers, 2008) and can therefore be easily extended.
We collected body mass data of 8434 bird species from the CRC handbook of avian body masses (Dunning, 2008). When multiple entries per species were available we used median body mass as the species value. We digitized breeding ranges (i.e. the geographical extent of occurrence of each species in the reproductive season) from various sources (Cramp & Simmons, 1977–1994; Brown et al., 1982–2004; Marchant & Higgins, 1990–2006; del Hoyo et al., 1992–2010; Ridgely & Tudor, 2009) onto a 720 × 360 unprojected pixel template of the earth and converted these raster images into vector files. The hotspot congruence analysis and the ‘range size–body mass’ analysis were performed using a rangeMapper project with an equal area projection canvas approximating to a 1° scale.
Case study 1: congruence of different biodiversity hotspots
Biodiversity ‘hotspots’ are a central concept in conservation biology (Orme et al., 2005; Ceballos & Ehrlich, 2006) because they form the foundation for establishing global conservation priorities. However, because there is an ongoing debate about which specific biodiversity measure is most relevant, it is important to understand the extent to which different types of hotspots overlap.
We identified the global hotspots of three parameters describing avian diversity: total species richness, endemic species richness and relative body mass diversity. This was done by generating maps for species richness (total number of species in each canvas cell), endemic species richness (number of species with the 25% smallest breeding ranges present in each canvas cell, e.g. Orme et al., 2005) and relative body mass diversity (coefficient of variation of log10 body mass in each cell) (Fritz & Purvis, 2010). The maps of endemic species richness and body mass relative diversity can be found in Appendix S1. We defined hotspots for each of these measures as the richest 5% of grid cells (Fig. 2). We then measured the congruence between hotspots by the extent of spatial overlap, i.e. the percentage of canvas cells which met the definition of two or three of the hotspots (Orme et al., 2005; Ceballos & Ehrlich, 2006).
We found a very low spatial overlap (0.9%) between hotspots of species richness and relative body mass diversity (Fig. 2). The overlap between hotspots of endemic species and relative body mass diversity was also low (2.3%, Fig. 2). The overlap between hotspots of endemic species and species richness was 3% and the cumulative overlap between the three sets of hotspots was only 0.15%.
Our results suggest that the three avian diversity hotspots considered here are virtually independent. This replicates previous results showing little congruence between hotspots of species richness and endemism (Orme et al., 2005; Ceballos & Ehrlich, 2006). Adding relative body mass diversity as a complementary biodiversity measure did not change the overall picture, suggesting that spatial patterns exhibited by various aspects of biodiversity are determined by different mechanisms.
This case study illustrates the use of rangeMapper for identifying biodiversity hotspots (see Appendix S2 for a reproducible R code example using the wrens dataset). Using rangeMapper's flexible subsetting mechanism, this example can be further refined. For example, by changing the subset definitions to incorporate only certain groups of species, we could investigate hotspot congruency for particular taxonomic clades, different functional groups or different habitat classes.
Case study 2: geographical variation in the relationship between range size and body size
The relationship between species range size and average body size is a classic topic in macroecology (e.g. Gaston & Blackburn, 1996, 2000). The relationship is positive across many taxa, but a few studies report a negative relationship or no relationship. Gaston & Blackburn (1996) suggested that negative relationships are artefacts because the likelihood of finding a negative (or no) correlation between range size and body size is higher when the scale (i.e. the extent) of the study is too small to encompass all the geographical ranges of the studied species. Alternatively however, global-scale spatial variation in the range size–body size relationship itself may exist. This is not far-fetched, because spatial variation in similar relationships has been documented. For example, the generally positive correlation between range size and local abundance does not apply for an entire biogeographical area, even though it is one of the most robust findings in macroecology (Symonds & Johnson, 2006).
Here, we investigated global spatial variation of the slope of the correlation between range size and body mass for avian assemblages. For each grid cell, we estimated the slope of the range size–body mass regression using a robust regression (Venables & Ripley, 2002) whereby both range size and body mass were log-transformed and standardized with z-scores (i.e. scaled and centred). This alternative to ordinary least squares regression ensures an unbiased estimation even when the model assumptions are unfulfilled.
Our analysis revealed strong geographical variation in the slope of the range size–body mass regression despite the fact that 98.9% of the slopes were positive (Fig. 3a). Relatively strong range size–body mass regression slopes were confined to certain geographical areas. Moreover, after controlling for multiple testing only 52.2% of the grid cells contained a statistically significant range size–body mass regression (Fig. 3b). Interestingly, among the four avian studies reviewed by Gaston & Blackburn (1996) the two studies that reported negative or no relationships (3 and 4 in Fig. 3b) examined areas where the positive relationship was non-significant or negative, while the two studies reporting positive relationships (1 and 2 in Fig. 3b) were from areas where the positive relationship was strong and statistically significant.
This case study illustrates the use of rangeMapper for statistical modelling at grid cell level (see Appendix S3 for a reproducible example using the wrens dataset). Although here we used a robust regression slope, other statistical models can be incorporated equally easily in rangeMapper. For example, one could use the slope of a mixed model (Pinheiro & Bates, 2000) (see Appendix S3) that included higher taxonomic groups as random effects and would thus allow for a certain degree of phylogenetic correction.
Case study 3: the influence of grid size on the relationship between species richness and body size
Spatial patterns of species richness can be strongly dependent on the chosen resolution (i.e. grid cell size). When the spatial resolution is not adequate, species richness obtained from geographical ranges is a poor estimator of true species richness (Hurlbert & Jetz, 2007). Moreover, the effect size of predictors of species richness can change with varying spatial resolution (Rahbek & Graves, 2001; Davies et al., 2007). Therefore it is reasonable to predict that the relationships between species richness and life-history variables will also depend on the spatial resolution. For example, Olson et al. (2009) showed that species richness was a strong predictor of variation in median body size in an assemblage-level spatial multiple regression using a 1° resolution (see Fig. 4a,b).
Here we examined the relation between median body mass and species richness at the assemblage level using 100 different spatial resolutions. We used rangeMapper projects set on equal area projection canvases ranging from 50 to 550 km2 (i.e. from c. 0.45 to c. 4.5°) with median body mass (log10-transformed) as the dependent variable and species richness (square-root transformed) as the only predictor. To account for spatial autocorrelation we used simultaneous autoregressive models (SAR) (Bivand et al., 2008).
The strength of the body size–species richness relationship, quantified as the slope of a SAR model, varied considerably at the 100 different spatial resolutions. We confirmed that the SAR slope is negative for the entire resolution range, but found it to decrease dramatically with increasing grid cell size (i.e. decreasing map resolution; Fig. 4c). Typically such analysis is done for three to four (Davies et al., 2007) or sometimes 10 resolutions (Rahbek & Graves, 2001). However, because rangeMapper easily allowed us to use a large number of spatial resolutions we could additionally resolve a nonlinear dependence of the SAR slope on the spatial resolution (Fig. 4d).
This case study exemplifies a more advanced use of rangeMapper because it requires the command line interface or batch processing, while the first two case studies can be performed entirely from the GUI. However, the compact scripting language makes it possible to implement complex analyses with relatively few commands. In Appendix S4 we show how to iteratively build up projects of increasing resolution and extract the parameters of interest (i.e. the slope of the body size–species richness regression) associated with each level of resolution.
The techniques presented in this case study can be combined with the ones described for case study 1 to investigate another important methodological problem in macroecology – the influence of range size on statistical inference (Jetz & Rahbek, 2002; Tello & Stevens, 2010). Appendix S5 shows the influence of range size on the body mass–species richness regression slope using the case study on the wrens.
rangeMapper is an open source front end R package for macroecological studies designed to serve as an interface between the spatial and the statistical tools offered through the R environment. By choosing three case studies covering various technical and conceptual issues and a dataset of the global geographical distribution of more than 8000 bird species we demonstrated rangeMapper's capabilities to address a wide array of questions.
When you use rangeMapper, we ask you to cite R together with this publication followed by the package version you used.
We thank Allen Hurlbert and three anonymous referees for helpful comments on an earlier version of this manuscript.
Mihai Valcu is a behavioural ecologist with a strong interest in spatial ecology. His research focuses on spatial aspects of avian reproductive behaviour and life-history strategies.
James Dale is a senior lecturer in ecology at Massey University. His research mostly focuses on social behaviour in animals, with an emphasis on communication, sexual selection, individual recognition and reproductive strategies.
Bart Kempenaers is a behavioural ecologist interested in the diversity of life-history traits. He is director of the Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology.