We focused on the species at the centre of the refugia debate, namely boreal and nemoral tree species. Trees were defined as woody plants reaching ≥ 20 m in height (Svenning & Skov 2005). A key assumption of bioclimatic species distribution modelling is that the ranges of the modelled species are, to a large extent, in equilibrium with climate. For European tree species, there is evidence that this is only true for widespread, northern species (Svenning & Skov 2004). Thus, only species with wide northern distributions were included in the present study. These are also the species which are most likely to have had northern refugia and have been most discussed in the refugia debate (Willis et al. 2000; Willis & van Andel 2004; Cheddadi et al. 2006; Leroy & Arpe 2007). The boreal tree species were Alnus incana (L.) Moench, Betula pendula, B. pubescens, Picea abies (L.) Karsten, Pinus sylvestris L., Populus tremula and Salix caprea. The nemoral tree species were A. glutinosa (L.) Gaertner, Carpinus betulus L., Fagus sylvatica, P. alba L., P. nigra L., Quercus petraea (Mattuschka) Liebl., Q. robur L., S. alba L., S. fragilis L., Taxus baccata L., Tilia cordata Mill., T. platyphyllos Scop., Ulmus glabra Hudson, U. laevis Pallas and U. minor Miller.
Data on current climate (monthly temperature and precipitation) were obtained from the CRU CL 2.0 data set (New et al. 2002, http://www.cru.uea.ac.uk/cru/data/hrg.htm) at a 10′ resolution. To assess modelling uncertainty due to uncertainty in the LGM climate data, two LGM climate simulations were used: (i) the Stage 3 Project (S3P) simulation (c. 60-km resolution), which reconstructed the LGM climate in Europe using a nested high-resolution mesoscale model during the phase 4 of the Oxygen Isotope Stage 3 Project (S3P home page: http://www.esc.cam.ac.uk/index.php/component/content/article/274 (Pollard & Barron 2003); and (ii) the Laboratoire de Météorologie Dynamique's LMDZHR simulation, which reconstructed the LGM climate using a general circulation model with a stretched grid over Europe (c. 60-km resolution) (Jost et al. 2005). Estimates of the present climate as simulated by the two climate models were also available and were used to compute the anomalies between modelled present and LGM conditions for monthly temperature and precipitation. To improve the representation of small-scale climatic variation caused by topography, high-resolution LGM climate estimates were obtained by interpolating the anomalies to 10′ resolution and subtracting them from the CRU present climate data set (cf. Hijmans & Graham (2006) for a similar approach). During the LGM, sea level was lowered by 110 m (Ruddiman 2001). The LGM coastline was estimated by lowering the sea level of the Earth Topography-5′ (http://www.ngdc.noaa.gov/mgg/fliers/93mgg01.html) elevation-bathymetry raster by 110 m and interpolating the CRU data to the now inundated areas.
Table 1. The climatic predictor variables used in this study, their acronyms, units and present-day ranges at 10′ resolution in Europe (31.3°W to 68.7°E latitude and 27.6–82.9°N longitude)
|Mean summer temperaturea||mst||°C||−3.3 to 30.6|
|Absolute minimum temperaturec||tmin||°C||−55.8 to 1.1|
|Growing degree daysd||gdd||°C||0–5357|
|Mean summer precipitationa||psum||mm||0–219|
|Water balanceg||wbal||mm||−1060 to 2983|
|Summer water balanceh||wb_sum||mm||−149 to 161|
|Minimum monthly precipitationi||pmin||mm||0–162|
|Water balance seasonalityb||wb_sea||mm||4–111|
species distribution modelling
The southern vs. northern refuge hypotheses were investigated for the 22 tree species using species distribution modelling to estimate their climatic niches and hindcast their potential distributions during the LGM, that is, the geographic distribution of suitable climate conditions for these species during that period. The general modelling approach was to calibrate the distribution models using the data for current species distributions and climate, evaluate their predictive ability in terms of the modern distribution, and then project the models onto the LGM climate data. The refuge hypotheses were evaluated by considering the LGM predictions for the individual study species as well as, more synthetically, the predicted LGM tree species richness, computed as the sum of the predicted presences of the 22 study species per AFE cell as well as for the 7 boreal and 15 nemoral tree species separately.
Many algorithms exist for species distribution modelling (e.g. Guisan & Zimmermann 2000). Given that the AFE data are clearly not complete for eastern Europe, a crucial region in the present study given its cold winters, we based our modelling on two algorithms that only use the species’ presence (presence-only data) (Elith et al. 2006; Pearce & Boyce 2006): (i) maximum entropy species distribution (Maxent) modelling (Phillips et al. 2006), and (ii) a standard rectilinear climatic envelope (Bioclim) model (Guisan & Zimmermann 2000). Maxent is well-suited for species distribution modelling theoretically (Phillips et al. 2006) and has been shown to perform well compared to other methods (Elith et al. 2006; Hijmans & Graham 2006; Phillips et al. 2006). However, it can sometimes overfit species–climate relationships, limiting transferability (Peterson et al. 2007). Bioclim is much less prone to overfitting due to its simplicity, and in previous modelling algorithm comparisons it gave results that were among the most divergent from Maxent (Elith et al. 2006). In addition, Hijmans & Graham (2006) concluded that Bioclim was little prone to overprediction and could be used as a conservative approach. Therefore, Bioclim was used as a supplementary alternative.
The Maxent modelling was performed with all background points available in the study area (n = 4878) and the recommended default values for convergence threshold (10−5), maximum number of iterations (500) and regularization multiplier (1) (Phillips et al. 2006). The logistic output format ranges from low (minimum: 0) to high (maximum: 1) probability of presence. Two alternative approaches were used to select the threshold for converting the continuous logistic probability scale to a binary prediction of potential presence (suitable climate) or absence (unsuitable climate). First, for each species, the threshold was selected that produced the best match to its range limit in the north-eastern part of the study area (northern Russia, according to AFE) and further east (based on outline maps available from EUFORGEN and Hultén & Fries (1986)), that is, under climatic conditions most similar to those thought to limit tree species distributions northward in Europe during the LGM. This approach is referred to as the north-eastern limit (nelim) criterion. As the nelim criterion involves a certain degree of subjectivity, the threshold was also selected using Maxent's maximum training sensitivity plus specificity threshold (mts + s) criterion, which has recently been shown to produce highly accurate predictions (Jiménez-Valverde & Lobo 2007).
The Maxent modelling was implemented with two sets of predictor variables: (i) a simple set consisting of three variables (gdd, tmin and wbal) generally thought to be of key importance for plant distributions (Sykes et al. 1996; Skov & Svenning 2004), and (ii) a set of all 12 climatic variables. It can be argued that the former provides a more parsimonious and, therefore, more robust description of the climatic niche space. Hence, species distribution modelling (see below) based on the 3-variable set will be less prone to overfitting of the species–climate relationships and, probably have greater transferability to other spatiotemporal domains (Peterson et al. 2007). The 12-variable set, conversely, allows for greater complexity in species–climate relationships. In total, three different implementations were used: the 3-variable set with nelim and mts + s thresholds and the 12-variable set with the mts + s threshold.
Bioclim was parameterized in two ways: using (i) the 10th and 90th percentiles, and (ii) the outlier-corrected (Skov & Svenning 2004) minimum and maximum observed values for each climatic predictor variable across a species’ present-day range. When all climatic variables in the predictor set fell within the inner range defined by these values, a species’ potential presence was predicted. The 10th and 90th percentile range was used as a highly conservative estimate of a species’ tolerance with respect to each climatic factor. Only the 3-variable climatic predictor set was used for the Bioclim-modelling, as the 12-variable set would produce overly conservative climatic niche estimates.
The models were calibrated using species occurrences across the AFE grid (n = 4878 grid cells), using grid cell means for the climatic variables, but projected onto the modern and LGM climate data at 10′ resolution to allow a more fine-scale indication of the potential LGM distributions. Modelling was done for a study area ranging from 31.3 to 68.7°E latitude and 27.6 to 82.9°N longitude for the present day. A slightly smaller study area (10.9–50.0°E latitude and 33.9–74.7°N longitude) was used for the LGM projections, reflecting the geographic coverage of the LGM simulations.
Evaluation of the models’ predictive abilities was done using three approaches: The first method was visual evaluation of the models’ abilities to predict the species’ present distribution across Europe, emphasizing the north-eastern part of the study area. In the second approach, the Receiver Operating Characteristics Area-Under-the-Curve (AUC) was computed for the Maxent models using Maxent's internal validation procedure, which randomly partitions the data into a 70% calibration data set and a 30% test data set (Phillips et al. 2006). The AUC is the current standard statistical measure of the accuracy of predictive distribution models (Fielding & Bell 1997), but a recent paper highlights several problems with AUC, notably its dependence on study area extent and prevalence (Lobo et al. 2008). Therefore, AUC was primarily used to assess the relative accuracy of different models for the same species. In the third approach, as a synthetic measure of the models’ predictive abilities, the observed tree species richness (sum of the observed presence of the 22 study species per AFE cell) was regressed against the predicted tree species richness (sum of the predicted presence of the 22 study species per AFE cell) for each modelling implementation. For this analysis, the easternmost part of the study area was excluded (leaving 2255 AFE cells) because this region is incompletely inventoried in AFE.
All species distribution and climate variable maps were computed and used in the Lambert Azimuthal Equal Area projection. All GIS modelling was performed using ArcGIS 9.2 (ESRI, Redlands, CA). Species distribution modelling was done using Maxent version 3.0.6 (http://www.cs.princeton.edu/~schapire/maxent/) and Bioclim modelling, implemented in ArcGIS 9.2 using Python 2.4 (http://www.python.org/).