### Abstract

- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Acknowledgments
- Conflict of interest
- References

Species range shifts associated with environmental change or biological invasions are increasingly important study areas. However, quantifying range expansion rates may be heavily influenced by methodology and/or sampling bias. We compared expansion rate estimates of Roesel's bush-cricket (*Metrioptera roeselii*, Hagenbach 1822), a nonnative species currently expanding its range in south-central Sweden, from range statistic models based on distance measures (mean, median, 95^{th} gamma quantile, marginal mean, maximum, and conditional maximum) and an area-based method (grid occupancy). We used sampling simulations to determine the sensitivity of the different methods to incomplete sampling across the species' range. For periods when we had comprehensive survey data, range expansion estimates clustered into two groups: (1) those calculated from range margin statistics (gamma, marginal mean, maximum, and conditional maximum: ˜3 km/year), and (2) those calculated from the central tendency (mean and median) and the area-based method of grid occupancy (˜1.5 km/year). Range statistic measures differed greatly in their sensitivity to sampling effort; the proportion of sampling required to achieve an estimate within 10% of the true value ranged from 0.17 to 0.9. Grid occupancy and median were most sensitive to sampling effort, and the maximum and gamma quantile the least. If periods with incomplete sampling were included in the range expansion calculations, this generally lowered the estimates (range 16–72%), with exception of the gamma quantile that was slightly higher (6%). Care should be taken when interpreting rate expansion estimates from data sampled from only a fraction of the full distribution. Methods based on the central tendency will give rates approximately half that of methods based on the range margin. The gamma quantile method appears to be the most robust to incomplete sampling bias and should be considered as the method of choice when sampling the entire distribution is not possible.

### Introduction

- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Acknowledgments
- Conflict of interest
- References

Although understanding the factors determining distributions of species in equilibrium with environmental conditions is central to ecology (Andrewartha and Birch 1954; Brown et al. 1996), focus has more recently turned to organisms undergoing range shifts associated with climate change (Parmesan and Yohe 2003; Brooker et al. 2007) and the filling of empty ecological niches during biological invasions (Elith et al. 2010; Václavík and Meentemeyer 2012). Accurate descriptons of range shifts are an important component for predicting future trends; thus, accurate assessment of current and potential distributions of species expanding their current range is a critical step in evaluating environmental impacts and management control options (Drury and Rothlisberger 2008; Keller et al. 2008; Hassall and Thompson 2010). There are many ways to calculate species' range expansions or shifts; some of these methods are complex and require detailed ecological life-history information (e.g., Van den Bosch et al. 1990; Lensink 1997; Hill et al. 2001). However, because detailed ecological knowledge for many species is missing, less complex methods based on species presence data are often used to assess distributional changes.

Species occupancy data collected over large areas and for multiple years can be obtained from a number of sources (e.g., national record data bases, species atlases, surveys, and monitoring programs). Data on species distributions collected by the public and stored in national data bases are generally underused in research and management (Goffredo et al. 2010), although being valuable for estimating changes in species distributions (Snäll et al. 2011). Methods using occupancy data in range expansion assessment can be crudely categorized as those that are area based and those that are distance based. In area-based methods, changes in range size are quantified by measuring the occupied area (counting the number of occupied grid cells (Ward 2005), with the rate of change calculated from the increase of occupied grids over time (Hill et al. 2001). In distance-based methods, range shifts are assessed by measuring the geographical distances between observations from different time periods with the first observation of the species in a specific location; including the mean, median, maximum, or marginal mean (mean of the ten most distant observations) of the annual distances to calculate the expansion rate of the species (Hassall and Thompson 2010).

Despite various methods being used independently in different studies to calculate range expansion, an evaluation of their comparative performance and sensitivity to sampling effort, that is, number of species records needed for an accurate assessment of range expansion rate, is generally missing (but see Hassall and Thompson 2010). Thus, the main aim of our study was to compare the performance of seven widely used range-expansion models to quantify the rate of range expansion and the sensitivity to sampling effort in a Swedish population of the Roesel's bush-cricket (*Metrioptera roeselii*; Fig. 1). This orthopteran is nonnative to Sweden and currently expanding in its range, not only in Sweden but also in other European countries (Pettersson 1996; Simmons and Thomas 2004; Gardiner 2009; Hochkirch and Damerau 2009; Species Gateway 2010). The Swedish population of Roesel's bush-cricket is ideal for evaluating range expansion models, because the species is easy to record in the field, there are long-term records in presence-based data bases and the population has been the subject of two large-scale censuses in 1989–1990 and 2008–2010. These existing data make it possible to estimate expansion rate of the species using different commonly used methods and compare model predictions and performance. For this, we used the initial record and the two large-scale survey data on the distribution of *M. roeselii* in central Sweden to: (1) calculate the species' expansion rate using different range-expansion models, to compare the estimates obtained from each method, and (2) evaluate how robust the different distance-based methods are to sampling effort (range 1–100%) through simulation. We then used these sampling simulation results to help interpret changes in range expansion estimates when we recalculated expansion rates for each model using summary data for all years where records exist, which included incidental observations recorded in the Species Gateway (i.e., data with potential sampling bias). Thus, our aim was not primarily to document the ‘true’ rate of expansion of this species, but rather to highlight the characteristics and limitations of commonly used range-expansion models under conditions of incomplete sampling effort.

### Discussion

- Top of page
- Abstract
- Introduction
- Methods
- Results
- Discussion
- Acknowledgments
- Conflict of interest
- References

Estimates of range expansion rates using the change in a range statistic measure over time are a function of two key modeling components: calculation of the yearly range statistic from the distribution data, and the fitting of a model to quantify the temporal trend across years. Range statistics can be calculated from the observed area occupied (grid occupancy), from the central tendency of the distribution of observations (mean and median) or from the range margin of the observed distribution (95^{th} gamma quantile, marginal mean, maximum, and conditional maximum). Because range statistics have their own mathematical properties, not only may they influence the calculation of range expansion rates in specific ways but incomplete sampling may also affect them differently (Hassall and Thompson 2010). Thus, when interpreting range expansion estimates, these factors need to be considered in addition to the type of model fit used to explain temporal trends (e.g., linear versus nonlinear; Shigesada and Kawasaki 1997). We discuss these issues and the implications for citizen-collected data below.

#### Range statistics and range expansion models

The analysis of distribution data collected between 1981 and 2010 estimated that *M. roeselii* had been expanding its range in central Sweden at a rate between 1.11 to 3.12 km/year depending on the model type used (Table 1). Despite there being relatively large variation in these range expansion estimates, much of it was predictable based on the mathematical properties of the range statistics used. For the distance-based methods, rates calculated from the range margin were roughly double those calculated from the central tendency (~3 vs. ~1.5 km/year, respectively); in general the central tendency of a group of values will generally increase at half the rate of the maximum (all else being equal; see Fig. 3A). However, this need not always be the case because long-distance dispersers at the range margin are likely to comprise a disproportionately small proportion of the population (and thus have a relatively small influence on the mean and median), despite having potentially large effects on range margin statistics. The establishment of pioneer populations is often the main factor driving rapid increases in the occupied area (Kovacs et al. 2011), and may be one reason why models using range changes at the distribution margin in other Orthopterans (e.g., *Conocephalus discolor*), can be up to six times larger compared to those at the core of the range (Simmons 2003). Range expansion estimates using the median might be expected to be lower than the mean because dispersal distance data are often positively skewed, with the majority of individuals dispersing short distances and few individuals dispersing far (Preuss 2012). In such cases, central tendency models may be less well suited to describing a dispersal pattern created by two different dispersal behaviors: one slow and continuous dispersal and another infrequent long-distance dispersal.

The grid occupancy model uses average radial distance (i.e., square root of the occupied area divided by the square root of pi) and thus should give results comparable to other distance-based methods at the range margin (in this study ~3 km/year). However, the linear form predicted the lowest rate of range expansion (1.11 km/year) and the nonlinear form was comparable to the central tendency models (~1.5 km/year). It is important to note that the grid occupancy model assumes dispersal according to a simple diffusion model (Van den Bosch et al. 1990; Lensink 1997) with the range expanding in approximately concentric circles that are largely occupied, even if the expansion front is irregular (Shigesada and Kawasaki 1997). However, if we consider the occupied area of *M. roeselii* in Fig. 2, we see that many squares within the dispersal region are unoccupied; if we assume a 92 km radius based on the gamma quantile, then the proportion of occupied squares is only 0.21. This low rate of occupancy may be because of incomplete detection, habitat avoidance (particularly the large regions of forest in this area; Preuss et al. 2011) or expansion at the periphery occurring through the formation of satellite colonies from long-distance dispersers (Shigesada and Kawasaki 1997). It is likely that the violation of assumptions of this model is, at least partly, responsible for the nonlinear function having a better fit to the data, when it should have been similar to other range margin models that were fit using a simple linear regression. Although range expansion rates across time are likely to be more complex than a simple linear fit would suggest (Shigesada and Kawasaki 1997), because the population is currently undergoing a rapid expansion phase and we had only a limited number of survey points (effectively only three; 1981, 1989–1990 & 2008–2010), a linear fit to the data is not unsurprising (Fig. 3).

#### Sensitivity to incomplete sampling

The grid occupancy model (as discussed above) assumes extensive colonization within the ‘circular’ range area; as the number of occupied squares decreases from saturation, the range expansion estimate declines as a function of the square root of the occupied area (e.g., if only one quarter of the area is occupied, the range expansion rate estimate will decline by half). Thus, to obtain reliable range expansion estimates using grid occupancy, extensive sampling across the entire distribution range at regular time intervals is required. For example, in a study on the range dynamics of the hooded warbler (*Setophaga citrine*) estimates of range expansion were highly sensitive to sampling effort and location; increasing sampling time by 100 h and surveying additional squares in the vicinity of occupied squares led to an increase of the estimated expansion rates by 15 and 38%, respectively (Melles et al. 2011). For *M. roeselii* in Sweden, sampling effort was highly variable across all years because grid occupancy data originated from multiple sources (surveys versus incidental observations). When we included data from years in which the species occupied area was largely undersampled, it led to an underestimation of the rate of range expansion in *M. roeselii* by an order of magnitude. Therefore, this method would be most suitable for species where atlas data are available or monitoring programs with the appropriate funding and staff are in place.

Distance-based models showed large variation in their sensitivity to subsampling (Table 1); data from years with low sampling effort can produce extremely uncertain estimates depending on the method used. Previously, studies have used low thresholds without quantifying the sensitivity of this on their estimates: Hickling et al. (2006) had a threshold of 20 records and Hassall and Thompson (2010) analyzing historical distribution data of Odonata calculated that at least 45 records/year are necessary to estimate range expansion with 90% accuracy. Although this suggests previous studies might have underestimated the uncertainties, it does not necessarily mean their estimates are systematically biased. When considering the effect of incomplete sampling there is the uncertainty associated with calculating the range statistic for each time period in the analysis, with this uncertainty declining as a function of sampling effort (Fig. 4). However in addition, there is the degree of bias generated by subsampling; this effect becomes evident when we compare the uncertainty estimates generated for the mean, median and gamma models with the marginal mean, maximum, and grid occupancy. The marginal mean, maximum, and grid occupancy will always be downwardly biased as sampling effort is reduced (with the degree of this bias a function of sampling effort), while this will not generally be the case for models fitting the mean, median and gamma range statistics because they are just as likely to over- as underestimate the true value. This means that if enough years of data are collected, the model fit will bisect these uncertainties and converge on the true range expansion rate (see Hassall and Thompson 2010 for examples of this). Thus, when choosing a method to best estimate range expansion when sampling is incomplete (or the degree of sampling unknown), consideration should be given to methods that are relatively insensitive to sampling in the calculation of the range statistic, and do not give systematic downward biases.

#### Implications and recommendations

Based on our results, it appears the 95^{th} gamma quantile is the method of choice; unlike other range margin models it does not give any systematic bias when sampling is reduced, and unlike the central tendency models it is relatively insensitive to incomplete sampling. However, there are specific instances when other range-margin models should be considered, especially when restricted sampling can be focused on the range margin (in our study, we assumed incomplete sampling was randomly assigned across the entire distribution). One practical advantage of measuring range expansion at the range front is that fewer observations are needed from a restricted geographic area to estimate expansion rate. Sensitivity analysis showed that for the maximum, a sampling effort of only 16% of the available records was sufficient to obtain reliable expansion estimates. This estimate of sampling effort was based on the entire distributional area and could conceivably be greatly reduced if surveys were specifically targeted to range margins. However, since estimates would then be derived from only a small number of observations, spatial and temporal aspects will become increasingly important to consider in the sampling strategy. Stratified surveys and repeated sampling of specific locations over time has been found a useful approach in monitoring the range expansion of widespread nonnative plants in the United Kingdom (Hulme 2003). Previous use of diffusion models has shown severe underestimations of expansion rate (e.g., 20 times slower than observed rate in the nonnative cereal leaf beetle *Oulema melanopus*) (Andow et al. 1990). We believe that for nonnative species it is appropriate to adopt a precautionary approach (Hulme 2003), and focus the expansion models on data from the species distribution boundary.

Because organized large-scale surveys at regular time intervals are financially and time-costly, citizen-collected data have been promoted as a solution for estimating species distributions (Gardiner 2009; Snäll et al. 2011); however, a certain level of citizen participation is required to adequately sample the distribution area. One possibility for improving the usefulness of citizen-reported data could be to encourage its collection in areas where satellite populations are establishing at the distribution margin. Because the amount of information required from the species range margin for obtaining accurate estimates of range expansion is relatively small, even restricted information on species presence from these areas can provide useful data for accurate range expansion estimations. In addition, single observations can provide valuable information for directing future survey efforts and management actions as small systematic changes and trends may become important in the longer term (Parmesan and Yohe 2003). With increased citizen effort focused to these margin areas, sufficient amounts of data could be effectively gathered in short periods and over a large spatial extent. This early detection of pioneer populations at the outer range margin is also important for the effective management of invasive organisms (Moody and Mack 1988; Hulme 2003). While it is being increasingly recognized that national data bases with citizen-reported records are an important source of information to assess the ongoing spread of nonnative species (Aslan and Rejmánek 2010), it should be stressed that these sources of information cannot always replace structured and targeted surveys. As our study shows, low sampling effort in years that only included opportunistic observations had potentially large negative effects on range expansion estimates and thus these records cannot always be reliably utilized for high accuracy in range-shift estimations.