### Abstract

- Top of page
- Abstract
- The multimodel SAR framework
- Models
- Model fitting
- Regression validation
- Model selection
- Model averaging and confidence interval building
- Acknowledgements
- References

The species–area relationship (SAR) is one of the most fundamental tools in ecology. After almost a century of quantitative ecology, however, the quest for a “best SAR model” still remains elusive, with a substantial uncertainty about the best fitting SAR model frequently being observed. Recent research has required that this uncertainty be addressed, and a multimodel SAR framework has been devised. Here we introduce the mmSAR R-package, which is a flexible and scalable implementation of the multimodel SAR framework for species-area datasets, and provide some examples of its use. This R-package provides functions for fitting SAR models, performing model selection, and the build up of multimodel SARs.

One of the most ancient and ubiquitous patterns that has been recognized in ecology is the increase in species richness (S) with increasing sampling area (A): the species–area relationship (SAR). The SAR has been mystifying ecologists for more than 150 years (De Candolle 1855, MacArthur and Wilson 1967, Connor and McCoy 1979, Drakare et al. 2006, Southwood et al. 2006) and its modelling remains a central issue for theoretical ecologists and conservationists (Rosenzweig 1995, Smith 2010). Inference about the SAR is mandatory in the wide range of conservation applications that require the comparison of diversity patterns when regions differ in area, such as global scale conservation priority-setting schemes (Brooks et al. 2006, Lamoreux et al. 2006, Wilson et al. 2007). In theoretical studies, SARs are considered to be fundamental properties of biological systems and are, for example, explained in terms of species abundances and spatial distribution of individuals (He and Legendre 2002, Martin and Goldenfeld 2006) and constitute a cornerstone for macroecological investigations (Šizling and Storch 2004, Drakare et al. 2006). After Arrhenius (1921), the SAR has mainly been modelled using a power law (*S=cA*^{z}, where *c* and *z* are constants to be estimated). Despite this historical hegemony, however, several studies have highlighted other functional forms for SARs (Gleason 1922, Coleman et al. 1982, Lomolino 2000, Tjørve 2003, 2009). Moreover, quantitative studies focusing on comparisons among models have indicated that the power law SAR is not ubiquitous (Connor and McCoy 1979, Flather 1996, Stiles and Scheiner 2007), stressing the importance of testing the relative fit of various different models in SAR analyses (Smith 2010). Furthermore, recent analyses have often demonstrated substantial uncertainty in selecting the best SAR model for a given dataset (Stiles and Scheiner 2007, Guilhaumon et al. 2008). The multi-model selection framework (Burnham and Anderson 2002) is an approach that can account for such uncertainties in inferring the SAR, allowing the investigator to perform inferences while incorporating variability in both model selection and parameter estimation (multimodel SARs; Guilhaumon et al. 2008).

Here, we introduce the mmSAR R-package for the freeware and open-source R software (R Development Core Team 2009). mmSAR is a flexible and scalable implementation of the multimodel SAR framework for species–area datasets and provides several functionalities: fitting several relevant SAR models, performing a selection among this set of models, averaging the prediction of the SAR obtained from different models to establish a consensual inference and to provide robust confidence intervals. The present software note describes the different components of the multimodel SAR framework, as well as their implementation in the mmSAR R-package (Fig. 1). We illustrate the framework with the results of an analysis of a species–area dataset for the plants of the Galapagos Islands (Preston 1962). The users interested in the methodological details of the multimodel SAR framework are referred to Guilhaumon et al. (2008).

### Models

- Top of page
- Abstract
- The multimodel SAR framework
- Models
- Model fitting
- Regression validation
- Model selection
- Model averaging and confidence interval building
- Acknowledgements
- References

For a given dataset, a multimodel SAR inference is made simultaneously using the predictions of several non-linear regression models. Obtaining a consistent set of models is one of the most important challenges in information-theoretic analyses (Burnham and Anderson 2002). mmSAR proposes a comprehensive set of SAR models (Table 1), including five convex models (power, exponential, negative exponential, Monod and rational function) and three sigmoid models (logistic, Lomolino, and cumulative Weibull). This includes convex, sigmoid, asymptotic, and non asymptotic functions, thus encompassing the various shapes attributed to SARs in the literature. Note that the linearized forms (via logarithmic transformations) of the power and exponential models, which require using log(*S*) in place of *S*, were not implemented in mmSAR, otherwise precluding comparisons across the entire set of models. In mmSAR, models are implemented as R objects and new non linear SAR models should easily be specified by the user and added to the available collection.

Table 1. Functional forms for the SAR implemented in mmSAR. In these equations, *S* and *A* represent, respectively, species richness and area, while *c*, *z*, *f* and *d* are fitted parameters. The parameter d is an upper asymptote, except for the rational function for which the upper asymptote is *z*/*d*. Name | Code | Formula | Number of parameters | Shape | Asympotic nature |
---|

Power | Power | S=cA^{z} | 2 | Convex | No |

Exponential | Expo | *S*=*c*+zlog(*A*) | 2 | Convex | No |

Negative exponential | Negexpo | *S*=*d*(1−exp(−*zA*)) | 2 | Convex | Yes |

Monod | Monod | *S*=*d*/(1+*cA*^{−1}) | 2 | Convex | Yes |

Rational function | Ratio | *S*=(*c*+*zA*)/(1+*dA*) | 3 | Convex | Yes |

Logistic | Logist | *S*=*d/*(1+exp(−*zA*+*f*)) | 3 | Sigmoid | Yes |

Lomolino | Lomolino | *S*=*d/*1+(*z*^{log(f/A)}) | 3 | Sigmoid | Yes |

Cumulative Weibull | Weibull | *S*=*d*(1−exp(−*zA*^{f})) | 3 | Sigmoid | Yes |

### Model fitting

- Top of page
- Abstract
- The multimodel SAR framework
- Models
- Model fitting
- Regression validation
- Model selection
- Model averaging and confidence interval building
- Acknowledgements
- References

mmSAR performs nonlinear regressions to obtain model parameter estimates by minimizing the residual sum of squares with an unconstrained Nelder–Mead optimization algorithm. Assuming normality of the observations, this approach produces optimal maximum likelihood estimates of model parameters (Burnham and Anderson 2002). To avoid numerical problems, such as local minima, and speed up the convergence process, starting values used to run the optimization algorithm are carefully chosen. For directly interpretable parameters (e.g. an asymptote), corresponding values in the datasets are used (e.g. the observed maximum of species richness in the case of an asymptote), otherwise the standard procedures described by Ratkowsky (1983, 1990) are implemented. Finally, mmSAR gives the option to provide custom starting values, allowing users to implement exhaustive searches for best fits. We provide example fits of the eight SAR models implemented in mmSAR to the Galapagos Islands dataset in Fig. 2A1–A8.

### Model selection

- Top of page
- Abstract
- The multimodel SAR framework
- Models
- Model fitting
- Regression validation
- Model selection
- Model averaging and confidence interval building
- Acknowledgements
- References

The information-theoretic framework for model-selection is based on the evaluation of multiple working hypotheses (Burnham and Anderson 2002). This evaluation of competing hypotheses, which are each represented by a different model, is achieved through the estimation, for each, of the probability to be the best in explaining the data. In mmSAR, these probabilities are materialized by Akaike weights (Burnham and Anderson 2002) derived from information criteria (IC) such as the Akaike information criterion (AIC) or its correction for small sample bias (AICc) and the Bayesian information criterion (BIC). AIC and other model selection criteria that estimate Kullback–Leibler information are used widely in the ecological literature, but other criteria such as the BIC are also commonly used to carry out model selection (see Burnham and Anderson 2002 for a review of model selection and multimodel inference). AIC and BIC do not share the same conceptual bases and penalize differently for the dimension of the models (BIC tends to select models with fewer parameters than AIC), and although the results of (mm)SAR analyses are generally robust as regards the criterion used for model selection (Guilhaumon et al. 2008), mmSAR implements both Kullback–Leibler and Bayesian strategies for model selection. For a fitted model *i*, its weight *w*_{i} is given by:

- (1)

where *M* is the number of models in the set and *Δi* is defined as *Δ*_{i}*=IC*_{i}−*IC*_{min} with *ICmin* the IC value for the best model.

Akaike weights are a straightforward means of interpreting the IC values of each model, as model likelihood, and provide the basis of multimodel inference. For the Galapagos Islands data set, the best fitting model was exponential but three others models (power, negative exponential, and Monod) had almost equivalent probabilities in explaining the data (AICc Akaike weights in Fig. 2B). The four remaining models (rational function, logistic, Lomolino, and cumulative Weibull) have negligible likelihood and should contribute only marginally to the multimodel SAR (AICc Akaike weights in Fig. 2B).

### Model averaging and confidence interval building

- Top of page
- Abstract
- The multimodel SAR framework
- Models
- Model fitting
- Regression validation
- Model selection
- Model averaging and confidence interval building
- Acknowledgements
- References

In the model selection framework, model selection uncertainty arises when the dataset support several models with a similar strength (i.e. for a given dataset, no *w*_{i} is higher than 0.9; Burnham and Anderson 2002), as this is the case with the data from the Galapagos Islands (Fig. 2B). In such cases, it is not adequate to rely exclusively on the best model only; multimodel inference can construct a more robust final inference (Burnham and Anderson 2002). As advocated for differently parameterized models, mmSAR implements model averaging and considers the weighted average of all valid model predictions (see Regression validation), with respect to model weights, to construct multimodel SARs:

- (2)

where is the multi-model averaged species richness and *Ŝ*_{i} is the species richness inferred from model *i*, *M* is the number of valid models. The multimodel SAR for the Galapagos Islands data set is presented in Fig. 2C.

Finally, in mmSAR, confidence intervals incorporating uncertainty regarding both model selection and parameter estimation can be constructed using the percentile method and a non-parametric bootstrap scheme (Efron 1979, Buckland et al. 1997). For a given species–area dataset, a large number of bootstrap samples are obtained in the following manner: 1) one of the SAR models included in the analysis is selected with a probability equal to its weight as calculated from eq. 1. 2) The selected model is fitted to the observed dataset under study. 3) The vectors of inferred species richness (regression line) and residuals are obtained from the regression and the residuals are standardized. 4) The residuals are sampled with replacement until sample size reaches that of the dataset, to form a vector of modified residuals. 5) The vector of modified residuals is added to the vector of inferred species richness, to form the resample (bootstrap set of pseudo responses).

A collection of multi-model SARs inferred from each of the resamples is gathered by applying the whole procedure of model selection and averaging, while the bootstrap estimates of species richness are sorted in ascending order to provide the percentile confidence intervals (Buckland et al. 1997): the limits of an approximate (1−*α*)100% confidence interval are given by picking the rth and sth values in the ordered vector of bootstrap estimates, such that *r=*(*b*+1)*α* and *s=*(*b*+1)(1−*α*).

For the Galapagos Islands dataset, the number of resamples was fixed to 9999, thus the limits of the 95% confidence interval for a point estimate of species richness (Fig. 2C) are given by the 250th and the 9750th values.

The mmSAR R-package may have potential uses in both theoretical and conservation analyses. For example, in theoretical applications such as investigations about how SARs may differ among different systems, model selection patterns (i.e relative likelihoods of different SAR shapes) can be compared for the different systems. Allowing one, for example, to state about the saturation or non saturation of species richness with increasing area. These kind of analyses may help to extend discussions beyond the comparison of slopes of log-linear power SARs (Guilhaumon et al. 2008). In conservation applications, multimodel non-parametric confidence intervals can inform about the reliability of the multimodel SAR for a given dataset but also have more practical applications. For example, these confidence intervals were used by Guilhaumon et al. 2008 to rank regions of a dataset with respect to their biological richness. By positioning the observed richness of each region in the associated vectors of ordered bootstrap species richness estimates (the higher the position of the observed species richness in the vector of bootstrap estimates the higher the ecoregion in the ranking), these authors were able to devise a hotspot ranking methodology that was robust to the underlying form of SARs.

To cite mmSAR or acknowledge its use, cite this Software note as follows, substituting the version of the application that you used for “Version 0”:

Guilhaumon, F., Mouillot, D. and Gimenez, O. 2010. mmSAR: an R-package for multimodel species–area relationship inference. – Ecography 33: 420–424 (Version 0).