SEARCH

SEARCH BY CITATION

Keywords:

  • common waxbill;
  • dispersal limitation;
  • Estrildidae;
  • invasion;
  • range expansion;
  • spatial autocorrelation;
  • species distribution modelling

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

1. Non-native species can be major drivers of biodiversity loss and cause economic damage. Predicting the potential distribution of a non-native species, and understanding the environmental factors that limit this distribution, is useful for informing their potential management. This is often carried out using species distribution models (SDMs) that attempt to classify grid cells as suitable or unsuitable for a species based on a set of environmental covariates.

2. A key assumption of SDMs is that a species is in equilibrium with its environment. Spreading non-native species often violate this assumption due to dispersal limitation.

3. We present a simple method for dealing with this problem: dispersal weighting (DW). This uses the probability that a species can disperse to a grid cell to weight a SDM. We use simulations to compare the ability of DW and unweighted models at parameterising the true species–environment relationship (SER) of a simulated species, and to test their ability at predicting the future distribution of this species. We investigate how varying the degree of spatial autocorrelation in explanatory variables affects the performance of the methods.

4. Dispersal weighting models outperformed unweighted models at parameterising the SER, and at predicting the future distribution of the species when dispersal probabilities were incorporated into the model predictions. Unweighted models had a stronger tendency than DW models to overestimate the magnitude of relationships with spatially autocorrelated explanatory variables, but underestimate the magnitude of relationships with randomly distributed variables.

5. We then applied our method to a real case study, using it to model the distribution of the non-native common waxbill Estrilda astrild in the Iberian Peninsula as a function of climate and land-use variables. The relative performance of DW and unweighted models reflected the results of the simulation.

6. We conclude that DW models perform better than unweighted models at modelling the true SER of non-native species, and recommend using DW whenever enough data exist to create a dispersal model.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

Human-assisted dispersal has allowed species to cross biogeographical barriers, introducing them to new environments where they interact with novel species assemblages. These non-native species can have negative impacts on native biodiversity (Williamson 1996), and can cause economic damage by becoming pests (Pimentel, Zuniga & Morrison 2005) or disrupting ecosystem services (Cook et al. 2007). To evaluate the potential impacts of these species, and devise management strategies to control them, it is useful to be able to predict their potential distribution and understand the environmental factors that limit this distribution. Species distribution models (SDMs) have often been employed to do this (Real et al. 2008; Strubbe & Matthysen 2009). Where presence–absence data are available, records of non-native species can be mapped onto a grid, and models use environmental covariates to discriminate between grid cells that are occupied and unoccupied. SDMs assume that the species being modelled is at equilibrium with the environment (Guisan & Thuiller 2005), so unoccupied grid cells are unsuitable for the species. This assumption is likely to be violated by spreading non-native species, which have yet to reach all environmentally suitable areas (Václavík & Meentemeyer 2012), and also by range-shifting species responding to environmental change (Elith, Kearney & Phillips 2010), as dispersal limitation may prevent them from keeping pace with the movement of suitable environmental conditions (Menendez et al. 2006; Brooker et al. 2007). Spreading species can therefore be absent from a grid cell due to low environmental suitability or dispersal limitation. The spatial structure of explanatory variables may interact with dispersal limitation to affect model inference (Václavík, Kupfer & Meentemeyer 2012); for example, environmental variables that do not causally influence the distribution of a species may be erroneously identified as limiting the distribution if they occur on a gradient aligned to species’ axis of dispersal.

The need to account for dispersal limitation when modelling the distribution of non-native and range-shifting species has been recognised (Peterson 2003; Guisan & Thuiller 2005; Gallien et al. 2010). Invasion dynamics have been simulated using dispersal models that incorporate environmental suitability (Smolik et al. 2010; Travis et al. 2011), and dispersal models have been used to produce realistic predictions of species’ distributions under climate change scenarios (Engler & Guisan 2009). Despite this, there are few examples of dispersal models being used to influence the fitting of SDMs. Several studies (e.g. Muñoz & Real 2006; Dullinger et al. 2009) have used covariates such as roads that might be related to the transport and introduction of non-native species as proxies for dispersal, while Václavík & Meentemeyer (2009) used propagule pressure calculated from a dispersal model as a covariate. The most direct approach to dealing with the problem of absences due to dispersal limitation was by Elith, Kearney & Phillips (2010) who estimated the maximum area a non-native species could have spread to, and restricted pseudoabsence background points to that area. Despite these techniques, it is still not the state of practice to incorporate dispersal limitation into models of the distribution of spreading species (e.g. Heidy Kikillus, Hare & Hartley 2010; Gormley et al. 2011).

We present a simple new method that accounts for dispersal limitation in the fitting of a SDM. We first construct a dispersal model, and then use this to weight a SDM of the species–environment relationship (SER). In this way, the importance of absences due to dispersal limitation is reduced, so the model fitting procedure is closer to the desired situation where the model discriminates between presences and absences due to suitable and unsuitable environmental conditions, respectively.

We compare the ability of this method with models that do not account for dispersal limitation at parameterising the SER and predicting the future distribution of a simulated non-native species. We explore how both modelling techniques perform when the spatial structure of explanatory variables is varied. Both techniques are then applied to model the distribution of a non-native bird, the common waxbill Estrilda astrild, in the Iberian Peninsula.

Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

General model framework

We used a dispersal model, in this case a cellular automaton (Carey 1996), to calculate the probability that a species could disperse to a given grid cell. These probabilities were used to weight a linear model, so it was fitted more closely to data points where the species was likely to have been able to disperse to and the assumption of equilibrium likely to hold. We refer to this as dispersal weighting (DW).

Dispersal weighting is most easily understood by considering model fitting by least squares. In ordinary least squares, ∑d2 is minimised in model fitting, where d is the difference between the response variable and fitted values predicted by the model, while in DW least squares, ∑pdisp × d2 is minimised, where pdisp is the probability of the grid cell being dispersed to. DW can also be applied to generalised linear models (GLM), where the vector of dispersal probabilities is supplied as prior weights to the iteratively reweighted least squares algorithm used in model fitting. DW can be easily implemented in R (R Development Core Team 2010) by supplying a vector of dispersal probabilities to the weights argument of model fitting functions such as lm.

While dispersal information is used in the fitting of DW models, predictions from the fitted model object will only relate to how environmentally suitable grid cells are and will not be influenced by the probability that the species could disperse to each grid cell. We call this as unweighted prediction (UP). However, for a grid cell to be occupied, it has to be both suitable and dispersed to. If these events are assumed to be independent (this assumption is only likely to hold if a globally derived dispersal kernel is used, see Discussion), then, using the multiplication rule for independent events, the probability that a grid cell is occupied is the product of the probability that it is dispersed to (calculated from the dispersal model) and the probability that it is environmentally suitable. We refer to this as weighted prediction (WP). The use of dispersal information in model fitting and prediction is summarised in Table 1.

Table 1.   Use of dispersal information in the different models used in this paper
ModelDispersal information used in model fitting?Dispersal information used in prediction?
  1. UP, unweighted prediction; WP, weighted prediction.

GLM UPNN
DWGLM UPYN
GLM WPNY
DWGLM WPYY

Simulation

We used a simulation to compare the performance of DW and unweighted GLMs (hereafter DWGLM and GLM, respectively) at parameterising the SER of a simulated non-native species. Each cell in a 50 × 50 grid was assigned a probability of being suitable based on known relationships with three environmental variables. The simulated species was ‘introduced’ to a grid cell (coordinates 48, 28) and was allowed to spread to suitable grid cells based on known dispersal rules (Data S1, see Data S2 for examination of the influence of introduction location). The dispersal rules were also used to provide weights for the DWGLM. To investigate whether the relative performance of GLMs and DWGLMs change as a species spreads and occupies a larger portion of suitable grid cells, we ran simulations for three, five and 10 generations. To investigate whether spatial autocorrelation in an explanatory variable influenced model performance, we ran simulations where one variable, a, was randomly distributed (non-spatial scenario, mean correlation between a and the x-axis <0·001 ± 0·020 standard deviation), and where a was strongly correlated with the x-axis (spatial scenario, mean correlation between a and the x-axis 0·993 ± <0·001 standard deviation). The other two environmental variables were randomly distributed in all simulation scenarios. Each simulation scenario was run 1000 times, and the occurrence of the simulated species was modelled as a function of the three environmental variables using logistic DWGLMs and GLMs. The environmental variables were reset for each iteration and chosen according to the previously described rules.

We also assessed the ability of the models to predict the simulated species’ future distribution. Models were fitted using the distribution after five generations, and their performance was assessed against the distribution after 10 and 20 generations by calculating the area under the receiver operating characteristic curve (AUC). AUC has been criticised as values are dependent on the ratio between the extent of occurrence of a species and the extent of the study area (Lobo, Jiménez-Valverde & Real 2008). That criticism is not applicable to our use of AUC as we used it to compare GLMs and DWGLMs using the same distribution data. Simulations were run 50 times in this assessment, and we assessed model performance using both unweighted and WP (see General model framework); in the latter case, the relative dispersal pressure at the end of the simulation was used to weight predictions. By using the same dispersal rules to run the simulation and provide the dispersal weights, we were in effect using a perfect dispersal model. As dispersal models constructed with real data are almost certainly imperfect descriptions of the true situation, we tested the sensitivity of DWGLMs to errors in the dispersal model. Stochastic errors were introduced to the dispersal model predictions by adding or subtracting a random number drawn from a uniform distribution up to a maximum error value for each grid cell. We did this for errors of up to ±0·05, ±0·1, then at increasing 0·1 increments up to ±0·9. These errors were added to the dispersal probability predictions used to fit DWGLMs after five simulation generations, and also to the probability of dispersal used to calculate WPs when the models were tested after 10 generations.

Modelling the distribution of the common waxbill

To compare the performance of DWGLMs and GLMs when real data were used, we applied the modelling techniques to model the occurrence of the common waxbill in the Iberian Peninsula as a function of climate and land-use variables. The common waxbill is a largely granivorous estrildid finch species (Passeriformes: Estrildidae) native to sub-Saharan Africa, where it is often associated with mesic habitats (Payne 2010). It has been introduced to South America, the Iberian Peninsula and several oceanic islands (Lever 2005). In the Iberian Peninsula, it was first recorded in western Portugal in 1964 (Reino & Silva 1998) and is now the most widespread non-native bird species in the Iberian Peninsula (Silva, Reino & Borralho 2002).

Calculating dispersal probabilities

We obtained data on the expansion of the common waxbill between 1964 and 1999 from Reino, Moya-Larano & Heitor (2009), supplemented with additional records from Spain (Fig. 1). Occurrences were mapped in 1964, 1974, 1984, 1994 and 1999 on a UTM grid of 10 × 10 km2 cells covering continental Portugal and Spain. We used a coarser timescale than previous studies of the expansion of the common waxbill to try and mitigate the effects of temporary high spatial heterogeneity in recorder effort caused by local bird atlas projects (e.g. Elias & Reino 1994).

image

Figure 1.  Probability of grid cells in the Iberian Peninsula being dispersed to by the common waxbill. Darker shades indicate higher probabilities. Coloured circles show the colonisation date of grid cells: red = by 1964, orange = by 1974, yellow = by 1984, light blue = by 1994 and dark blue = by 1999.

Download figure to PowerPoint

We used this data set to inspect the shape and parameterise the dispersal kernel that best described the expansion of the common waxbill over 10-year periods (Data S3). The dispersal kernel was run in cellular automata dispersal models (see Carey 1996 for an example) starting from 1964, 1974, 1984 and 1994, and using real occurrence data for each starting year, to calculate the probability of each grid cell being dispersed to by the following time period (Data S3>). The addition rule for non-mutually exclusive events was used to calculate the overall probability of each grid cell being dispersed to by 2004 from these, giving a single probability between zero and one that the cell had been dispersed to (Fig. 1).

Explanatory variables

We modelled the occurrence of the common waxbill as a function of both climate and land-use variables. Mean precipitation, temperature and daily temperature range were obtained for 10′ grid cells for each month between 1991 and 2000 from the CRU TS1.2 (Mitchell et al. 2004) and interpolated to a 1-km2 resolution where appropriate (Data S4). Land-cover variables were obtained from the Corine land-cover classes (Table S2), and the area of each class in each 10-km UTM grid cell was extracted from Corine 2000 vector layers for Portugal (Caetano, Nunes & Nunes 2009) and Spain (Instituto Geográfico Nacional 2011) in Arc GIS 9.3 (ESRI 2008). To allow us to compare how the performance of DWGLMs and GLMs with different covariate sets related to their performance in spatial and non-spatial simulation scenarios, we assessed the degree of spatial autocorrelation in explanatory variables by calculating Moran’s I in the first distance class in SAM (Rangel, Diniz-Filho & Bini 2010).

Distribution data

As our primary interest was comparing the performance of GLM and DWGLM, we only used distribution data from the Iberian Peninsula, a part of the invaded range where sufficient data were available to construct a dispersal model. We obtained data on the occurrence of common waxbills in 10 × 10 UTM grid cells in the Iberian Peninsula from the most recent Portuguese (Equipa Atlas 2008) and Spanish (Marti & de Moral 2003) atlases of breeding birds. The survey periods for both atlases overlapped considerably (the Portuguese atlas ran from 1999 to 2005, while the Spanish atlas ran from 1998 to 2002). Where grid cells straddled the national border, they were considered occupied if common waxbills were recorded there in either national atlas.

Data analysis

We constructed logistic DWGLMs and GLMs of the occurrence of the common waxbill as a function of climate and land-use variables, using the dispersal probability for each grid cell to weight the DWGLMs. To aid comparison of the different modelling techniques, the same explanatory variables were used in the global models for each method. Following preliminary analysis (Data S4, Table S1), three climate and five land-use variables were selected, as well as appropriate quadratic terms and interactions (Table S3) and a proxy for recorder effort (Data S4). We used multi-model inference (MMI, Burnham & Anderson 2002) to fit all valid simplifications of the global climate and land-use models and identify the 95% confidence set of models with the most support (Data S4, results presented in Table S3). Model performance was assessed by cross-validation, with data split into mutually exclusive training (75%) and testing (25%) sets. The MMI procedure described above was performed on the training set, and the accuracy of the model averaged predictions of the resulting 95% confidence set of models was tested on the testing set. Model performance was assessed by calculating the AUC for WPs and UPs. This procedure was repeated for 500 iterations for each model set. Unless otherwise stated, all analyses were performed in R version 2.12 (R Development Core Team 2010).

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

Simulation

The simulated species colonised more of the available grid cells when explanatory variables were spatially structured than when they were randomly distributed. In the spatial scenario, the median area colonised after five generations was 2·4% of the grid, with 45·8% colonised after 20 generations. In the non-spatial scenario, 1·2% of the grid was colonised after five generations, with 28·3% colonised after 20 generations.

DWGLMs performed better than GLMs at parameterising the SER of the simulated species (Fig. 2). Compared to DWGLMs, GLMs tended to underestimate the magnitude of relationships with randomly distributed variables, but overestimate the magnitude of relationships with strongly spatially correlated variables, indicating that while the spatial structure of explanatory variables had a strong effect on how GLMs parameterised the SER for dispersal limited species, the effect was less pronounced for DWGLMs. GLM parameter estimates for the strongly spatially correlated variable a improved when the simulation was run for more generations (Fig. 2). Despite this, the proportion of simulation runs in which DWGLMs produced closer estimates of the true parameter value increased with the number of generations (e.g. in the non-spatial scenario, this happened in 65·7% of runs after three generations, and in 84·9% of runs after 10 generations) and was also higher in the spatial than non-spatial scenario (after 10 generations, DWGLMs produced closer estimates of the true parameter value in 99·2% of runs in the spatial scenario compared to 84·9% in the non-spatial scenario). GLMs produced better parameter estimates for one randomly distributed variable, b, in all but one simulation scenario; however, there was considerable overlap between parameter estimates derived by both methods, and the median value from DWGLMs was closest to the true value of 0·2 (range of median parameter estimates from all simulation scenarios: GLMs = 0·04–0·076, DWGLMs = 0·124–0·154). These results indicate that the superior performance of DWGLMs compared to GLMs was most pronounced when variables had larger true parameter values and were spatially autocorrelated, and when models were fitted after a number of generations.

image

Figure 2.  Performance of DWGLMs and generalised linear models (GLMs) at parameterising the species–environment relationship (SER) of a simulated invasive species. The number of generations the simulation was run for is shown by the number after N (e.g. N3 = 3 generations). Spatial simulation scenarios are denoted by SP (a–c). Estimates of the slope of the relationship between the species’ occurrence and environmental parameters a (a), b (b) and c (c) produced by DWGLMs (blue) and GLMs (red). Points show the median and error bars, the interquartile range of estimates from 1000 runs. The dashed line shows the true parameter value. The difference between DWGLM and GLM parameter estimates was tested with Mann–Whitney U tests; Bonferoni adjusted P values are displayed, ***< 0·001 (d–f). Proportion of runs in which DWGLMs (blue bars) and GLMs (red bars) produced the closest parameter estimate to the true value for parameters a (d), b (e) and c (f). Binomial tests were used to test whether the proportions were significantly different from the null expectation of 0·5; ***< 0·001, **< 0·01.

Download figure to PowerPoint

When WP (multiplying the predicted suitability of a grid cell with the predicted probability the grid cell was dispersed to, see general framework in methods) was used, DWGLMs performed better at predicting the future distribution of the simulated species in all simulation scenarios (Fig. 3, range of median AUC values from all simulation scenarios: GLMs = 0·645–0·897, DWGLMs = 0·979–0·993). This indicates that DWGLMs were better at classifying the suitability of grid cells in areas where the species was able to disperse to. When UP was used, GLMs and DWGLMs showed similar performance when environmental variables were randomly distributed (median AUC values after 10 and 20 generations: GLMs = 0·61 and 0·586, DWGLMs = 0·631 and 0·612). When one environmental variable was strongly spatially correlated, GLM performed better than DWGLMs after 10 generations (median AUC values: GLMs = 0·886, DWGLMs = 0·78) when the simulated species occupied 11·1% of the grid cells, but after 20 generations when 45·8% of grid cells were occupied both methods showed similar performance (median AUC values: GLMs = 0·782, DWGLMs = 0·823), indicating better classification of grid-cell occupancy by GLMs only for early stages of invasion.

image

Figure 3.  Performance of generalised linear models (GLMs) and DWGLMs at classifying the suitability of grid cells for a simulated species. Models were constructed after five generations and tested on the distribution after 10 (a and b) and 20 (c and d) generations. See General model framework for description of weighted and unweighted validation methods. Models were constructed for (a and c) randomly distributed explanatory variables and (b and d) where one variable was strongly spatially autocorrelated. Median and interquartile range AUC values from 50 simulation runs are shown.

Download figure to PowerPoint

The performance of WPs from both models declined when errors were introduced into the dispersal model (Fig. 4). This was especially pronounced when explanatory variables were randomly distributed. The decline in performance was steeper for DWGLMs, but they still outperformed GLMs when the maximum introduced error in dispersal probability was <0·6, indicating that DWGLMs are fairly robust to errors in the dispersal model.

image

Figure 4.  Effect of errors in the dispersal model on the performance of DWGLMs and generalised linear models (GLMs) in (a) spatial and (b) non-spatial simulation scenarios. Errors were drawn from a uniform distribution up to a maximum value and introduced to the dispersal probabilities for each grid cell. AUC values were calculated by testing the ability of models constructed after five generations of a simulated non-native species to classify grid cells as suitable for that species after 10 generations. Median AUC values for DWGLM weighted prediction (WP)s are shown by the bold line, with the other line showing AUC values for GLM WPs. Dashed lines delimit the interquartile range from 50 simulation runs. For comparison, median and interquartile range AUC values for unweighted prediction (UP)s of GLMs have been shown by points and error bars at both ends of the x-axis.

Download figure to PowerPoint

Common waxbill model

Dispersal was important in structuring the common waxbill distribution, with the dispersal model explaining 19·2% of variation in the occurrence data (Fig. 1). The majority of absences were due to dispersal limitation: 70·0% of absences had a probability of being dispersed to of <0·1, compared to 0·006% of presences. Despite this absence, data were still available for model fitting, with 770 absences and 594 presences having a probability of being dispersed to of >0·5.

The main differences between GLMs and DWGLMs of common waxbill occurrence were the magnitude of relationships with explanatory variables and the importance given to interactions (Table S3). These differences resulted in DWGLMs having fewer omission errors and predicting a larger potential distribution (Fig. 5).

image

Figure 5.  Potential distribution of the common waxbill using (a) land-use and (b) climate variables. A threshold that minimised the difference between omissions and commissions (Jiménez-Valverde & Lobo 2007) was used to convert continuous suitability values to a binary classification. This threshold was lower for generalised linear models (GLMs). Abs denotes grid cells where the species is absent; Pres denotes grid cells where it is present. Areas within the thick black lines have a dispersal probability of >0·5.

Download figure to PowerPoint

The relative performance of GLMs and DWGLMs of common waxbill occurrence was similar to the results of the simulations. Climate variables showed stronger spatial autocorrelation than land-use variables (Table 2). This was reflected in the performance of climate and land-use-based models of common waxbill occurrence when assessed by cross-validation (Fig. 6). The relative performance of GLMs and DWGLMs using climate covariates was similar to the situation in the spatial simulation scenarios; GLMs performed better than DWGLMs when UPs were assessed (median AUC values 0·937 and 0·919, respectively), while DWGLMs performed better than GLMs when WPs were assessed (median AUC values 0·962 and 0·951, respectively). The performance of land-use-based models was more similar to the non-spatial simulation scenario. GLMs and DWGLMs performed similarly when using UPs (median AUC values 0·867 and 0·861, respectively), but DWGLMs performed better than GLMs models when using WPs (median AUC values 0·966 and 0·948, respectively). This indicates that for both sets of covariates, DWGLMs models were better at classifying the suitability of grid cells for the common waxbill when dispersal limitation was corrected for, confirming similar results for the simulations compared to those observed with real data.

Table 2.   Spatial autocorrelation of explanatory variables used to model the occurrence of the common waxbill
Explanatory variableMoran’s I
MTCM0·650
MDTR0·697
MAP0·693
Rice0·180
Irrigated agriculture0·167
Parks and gardens0·137
Built0·336
Woody agriculture0·409
Recorder effort0·099
image

Figure 6.  Performance of generalised linear models (GLMs) and DWGLMs at classifying the suitability of grid cells for the common waxbill. Models were constructed using climate (a) and land-use (b) explanatory variables. See General model framework for description of prediction methods. Median and interquartile range AUC values from 500 cross-validation runs are shown.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

Comparison of GLMs and DWGLMs

Logistic GLMs have been frequently used to model the distribution of spreading non-native species (Reino 2005; Real et al. 2008). These models have proved useful at distinguishing between areas that are occupied and unoccupied by a species based on sets of environmental covariates, but the performance of such correlative models may be affected by not considering dispersal limitation (Beale, Lennon & Gimona 2008; Gallien et al. 2010). We proposed a new method, DW, that downweights the importance of grid cells where a species is likely to be absent due to dispersal limitation, and tested its performance against GLMs. These analyses demonstrated that DWGLMs performed better than GLMs at parameterising the true SER and at classifying the suitability of grid cells in areas where the modelled species was likely to have dispersed to. However, when explanatory variables were distributed along a spatial gradient, GLMs performed better than DWGLMs at classifying areas as occupied or unoccupied across the whole study area. The differences in model performance can be understood with reference to the pool of presences and absences that models are fitted to classify between. In DWGLMs, absences due to dispersal limitation are downweighted, so the absence pool largely contains absences due to unsuitable environmental conditions. In contrast, in GLMs the absence pool contains absences due to both unsuitable environmental conditions and dispersal limitation.

The degree of spatial autocorrelation in explanatory variables affected the performance of the different methods. When environmental variables were randomly distributed, dispersal limitation of non-native species led to explanatory variables occurring with favourable values in the absence pool. Downweighting the importance of those dispersal limited grid cells reduces their frequency in the absence pool, so the distribution of environmental variables in the presence and absence pool will better reflect the environmental preferences of the non-native species. UPs from DWGLMs and GLMs performed poorly at classifying the potential distribution of the simulated species across the whole study area. This is likely to be because both models were penalised for correctly predicting suitable sites that were not yet occupied due to dispersal limitation. When WPs were used, the performance of both methods improved, but DWGLMs performed considerably better than GLMs as they were better at parameterising the SER.

Generalised linear models were more prone than DWGLMs to overparmeterising the relationship with spatially autocorrelated explanatory variables. If explanatory variables are distributed so that they have increasingly favourable values near the site of introduction of a non-native species, then they will occur with favourable values in the presence pool and unfavourable values in the absence pool due to dispersal limitation alone, accentuating the pattern that might be observed due to the SER. Because of this, in an invasion’s early stage, these spatially autocorrelated variables make good predictors of the non-native species’ distribution. By overparameterising relationships with spatially autocorrelated variables, GLMs exploit the spatial information contained by them, so perform better than DWGLMs. As the non-native species spreads further, the spatial information contained in spatially autocorrelated variables becomes less useful; this was demonstrated by the reduction of the relative performance of GLMs as the simulation was run for more generations. As with the case where explanatory variables were randomly distributed, DWGLMs performed better than GLMs when using WPs. Our results support previous studies that show that associations between the spatial structure of explanatory variables and the dispersal potential of a species can lead to models that do not account for dispersal limitation identifying statistical relationships where little or no causal relationship exists (Bahn & McGill 2007; Beale, Lennon & Gimona 2008).

AUC values of WPs in the simulations were almost certainly higher than would be achieved with real data, as a perfect dispersal model was used (i.e. the dispersal model used to construct the simulations was used to provide dispersal weights), and the simulated species distribution depended on only three environmental variables, all of which were modelled. We are confident that this does not affect the conclusions drawn above for three reasons. Firstly, both DWGLM and GLM WPs benefited from a perfect dispersal model, so this would not affect comparisons between them. Secondly, when stochastic errors were introduced into the dispersal model, DWGLMs still outperformed GLMs when WPs were assessed. The faster decline in performance of DWGLM compared to GLM WPs was probably because dispersal information was used twice (fitting and prediction) in DWGLM, compared to once (prediction) in GLM. Thirdly, and most importantly, the similarity between the results of the simulations and the application with the common waxbill support generalisation of the simulation results to real scenarios.

Application to real data: a case study of the common waxbill

The relative performance of GLMs and DWGLMs at modelling the distribution of the common waxbill in the Iberian Peninsula was similar to the results of the simulations. This extended to observations of the effect of spatial autocorrelation in explanatory variables; climate variables showed stronger spatial autocorrelation than land-use variables, and the relative performance of the modelling methods mirrored the spatial and non-spatial simulation scenarios, respectively. This is not perfect as the difference in spatial autocorrelation was not as extreme as in the simulation scenarios.

Many grid cells had suitable land use but low dispersal probabilities; therefore, the increase in performance when WP was used compared to when UP was used was greater for land-use models. Additionally, many dispersal limited grid cells were also climatically unsuitable. DWGLMs performed worse in UP as they predicted a larger potential distribution and were penalised for classifying dispersal limited grid cells as suitable.

Dispersal was an important constraint on the common waxbill distribution, and the majority of absences available to the GLM were strongly downweighted by DWGLM. A small number of presences were also strongly downweighted. As these only represented a small fraction of the total set of presences, it is unlikely that this would have had a major effect on model performance. One of the reasons for errors in the dispersal model was that the dispersal kernel used did not vary spatially, while in reality the common waxbill expanded faster along the northward expansion axis than in other directions (Silva, Reino & Borralho 2002). Using a global dispersal model has a useful property in that the probabilities that a grid cell has been dispersed to and is environmentally suitable are independent, so can be multiplied to provide the probability that a grid cell will be occupied (as in WP). This is unrealistic as dispersal will interact with environmental suitability (Marushia & Holt 2006); unsuitable areas will slow dispersal (McRae 2006), while corridors or stepping stones of suitable habitat can assist population spread (Sondgerath & Schroder 2002). In this paper, our primary aim was to understand the SER, so we first calculated dispersal probabilities and then parameterised environmental suitability given these dispersal probabilities. This allowed us to account for dispersal limitation when modelling the SER. Smolik et al. (2010) did the opposite, estimating dispersal probabilities given environmental suitability. Their approach does not mitigate the impact of dispersal limitation on fitting environmental suitability models, but could be useful for simulating the future distribution of spreading species. Ideally, the parameters of the dispersal and environmental suitability models would be parameterised simultaneously. Bayesian hierarchical models show promise as a means of achieving this; for example, Stanaway, Reeves & Mengersen (2011) used them to simultaneously estimate parameters of two dispersal models acting at different spatial scales.

Choice of dispersal model

Constructing a dispersal model is the most challenging part of using this method (see Data S3 for discussion). The model used will depend on the data available. When, as in our case with the common waxbill, distribution data are available at different time periods, the parameter(s) of a dispersal kernel can be estimated using numerical optimisation by running cellular automata with different parameter values (Carey 1996; Smolik et al. 2010) and assessing the fit of the resulting models to observed data. Dispersal kernels could also be estimated from recaptures of marked organisms (Paradis et al. 1998) or movements of radio-tagged individuals (Driezen et al. 2007). Cellular automata can be constructed with limited data, but they do not take variation in dispersal probabilities due to demographic factors into account (Carey 1996), so, where more data are available, it may be preferable to use other spread models (see Hengeveld 1989 for examples). The only requirement for a dispersal model to be used in DW is that it can provide the probability that each grid cell has been dispersed to.

Use with other modelling techniques

The method of weighting grid cells by dispersal probabilities can be applied to any other presence–absence modelling technique that accepts case weights, such as generalised additive models. The technique could potentially be applied to presence–pseudoabsence techniques. Pseudoabsence points can be restricted to buffer zones around presence points, with the aim of accounting for dispersal limitation (Elith, Kearney & Phillips 2010). This method could be extended so that, rather than simply restricting pseudoabsences to an area of potential dispersal, the probability of drawing a grid cell as a pseudoabsence point is proportional to the dispersal probability of that grid cell.

Conclusion

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

Dispersal is an important constraint on the distribution of spreading species such as the common waxbill, with species absent from many areas due to dispersal limitation alone. We demonstrated that models which downweighted absences due to dispersal limitation performed better than unweighted models at parameterising the SER and classifying the suitability of grid cells for the modelled species. A number of other issues not addressed here contribute to the challenge of modelling range-shifting species distributions (Elith, Kearney & Phillips 2010); however, using DW to help tackle the problem of dispersal limitation, we can start to increase confidence in our ability to model the distributions of these species.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

We thank an anonymous reviewer for their comments on the manuscript. The ICNB provided digital atlas data for Portugal. This research was carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia. MJPS was funded by a NERC PhD studentship, and LR was funded by the Portuguese Science Foundation through grant SFRH/BPD/62865/2009.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgements
  9. References
  10. Supporting Information

Data S1. Simulation construction.

Data S2. Effect of introduction site on simulation results.

Data S3. Constructing the dispersal model.

Data S4. Modelling the occurrence of the common waxbill: explanatory variable extraction and data analysis.

Table S1. Univariate logistic regressions between the common waxbill occurrence in 10 km2 UTM grid cells in the Iberian Peninsula and climatic variables.

Table S2. Aggregation of Corine land-cover classes into groups used in this analysis.

Table S3. Land-use and climate models for the occurrence of the common waxbill in 10 km2 UTM grid cells in the Iberian Peninsula.

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

FilenameFormatSizeDescription
MEE3_219_sm_Data-TableS1-S3.doc124KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.