Identifying appropriate spatial scales of predictors in species distribution models with the random forest algorithm


Correspondence author. E-mail:


  1. Including predictors in species distribution models at inappropriate spatial scales can decrease the variance explained, add residual spatial autocorrelation (RSA) and lead to the wrong conclusions. Some studies have measured predictors within different buffer sizes (scales) around sample locations, regressed each predictor against the response at each scale and selected the scale with the best model fit as the appropriate scale for this predictor. However, a predictor can influence a species at several scales or show several scales with good model fit due to a bias caused by RSA. This makes the evaluation of all scales with good model fit necessary. With potentially several scales per predictor and multiple predictors to evaluate, the number of predictors can be large relative to the number of data points, potentially impeding variable selection with traditional statistical techniques, such as logistic regression.
  2. We trialled a variable selection process using the random forest algorithm, which allows the simultaneous evaluation of several scales of multiple predictors. Using simulated responses, we compared the performance of models resulting from this approach with models using the known predictors at arbitrary and at the known spatial scales. We also apply the proposed approach to a real data set of curlew (Numenius arquata).
  3. AIC, AUC and Naglekerke's pseudo R2 of the models resulting from the proposed variable selection were often very similar to the models with the known predictors at known spatial scales. Only two of nine models required the addition of spatial eigenvectors to account for RSA. Arbitrary scale models always required the addition of spatial eigenvectors. 75% (50–100%) of the known predictors were selected at scales similar to the known scale (within 3 km). In the curlew model, predictors at large, medium and small spatial scales were selected, suggesting that for appropriate landscape-scale models multiple scales need to be evaluated.
  4. The proposed approach selected several of the correct predictors at appropriate spatial scales out of 544 possible predictors. Thus, it facilitates the evaluation of multiple spatial scales of multiple predictors against each other in landscape-scale models.