1. Climate-induced range shifts have been detected in a large number of plant and animal taxa and a significant portion of these shifts have been found using records collected over a long period of time. However, the absence of standardized collecting procedures in some historical data sets introduces bias and skew into the data which can result in misleading conclusions. A range of different methods has been employed to account for this heterogeneity, but these methods have yet to be compared using a single data set.
2. We tested the accuracy of published methods for accounting for this heterogeneity. An extensive, heterogeneous data base of sightings of Odonata from the United Kingdom was analysed using four published methods to control for uneven recorder effort. For each method, five different range statistics were calculated. The results were compared and tested against changes in temperature over time to select the most accurate method.
3. Significant variation existed between results derived using different methods to account for uneven recorder effort. Range statistics were also shown to exhibit different biases to varying recorder effort, particularly those most commonly used in published studies.
4. A combination of existing methods is recommended to control for temporal variation in recorder effort. This focuses on random resampling of the more heavily recorded time period. A novel range statistic based on a gamma frequency distribution, which avoids the inherent bias of existing statistics, is suggested as a descriptor for range margins.
5. When the most robust methods to control for uneven recorder effort were combined with the most robust range statistics describing the range shift, British Odonata as a group were shown to be tracking isotherms between 1960 and 2005.
6. Accurate description of past range shifts is essential for correct predictions of future trends and for making decisions concerning conservation priorities. We strongly recommend the use of the best performing methods outlined here to ensure consistency and accuracy in future studies.
Although broad geographical trends such as the decline in biodiversity towards the poles are clearly visible, it is vital that the apparent stasis of these trends does not detract from the dynamic, species-level processes that underlie them. Indeed, the dependence of the global biota upon certain ranges of climatic variables means that, as the climate changes, distributions may also change in a predictable manner. The ability to predict the spatial responses of organisms to climate change plays an important role in planning conservation measures for threatened species (Williams et al. 2005).
Large-scale ecological investigation and prediction is an area which, due to the vast complexity inherent within the field, is particularly susceptible to over-simplifications and misinterpretations. Two approaches that have been used in this field have the potential to provide inaccurate, if alluring, results. The first is the use of ‘bioclimate envelope models’. These models take as their basis the assumption that a species’ range is determined by climatic variables. However, this ignores the contribution of both ecological factors (Davis et al. 1998) and landscape structure (Hill et al. 2001). The debate over this approach has been covered elsewhere (Pearson & Dawson 2003). The other approach employed to investigate large-scale distributional changes is that of the ecological census. This can be carried out by monitoring the same study sites as used in previous studies to assess differences between the two time periods (e.g. Sagarin et al. 1999). However, this offers only a regional view on distributional changes. To gain an insight into more general trends, researchers have previously turned to historical records (Telfer, Preston & Rothery 2002).
Distributional changes have been documented in a great number of animal and plant species (Parmesan et al. 1999; Thomas & Lennon 1999; Warren et al. 2001; Mieszkowska et al. 2006; Hitch & Leberg 2007). When presented with a plethora of biological records with a broad temporal and spatial scope, there is great temptation to draw grand conclusions without fully considering the limitations that such data sets carry (Shoo, Stephen & Hero 2006) and to date no comparison has been made of the variety of methods that has been used. For example, records in a British biological data base are usually biased towards later time periods and lower latitudes (Fig. 1). The spatial bias is likely due to the higher population density at lower latitudes (combined with more amenable environmental conditions for invertebrates). We compare prominent methods for controlling for uneven recorder effort.
Warren et al. (2001) attempted to equalize the number of butterfly records in each of two time periods (1970–1982 and 1995–1999) to compare shifts in the northern range margin. This range margin was calculated by averaging the northern extents of the 10 most northerly records. Difference in recorder effort was controlled for by randomly subsampling the later, more heavily recorded period so that the same number of records was used in each period. This was carried out separately for each 100 km grid square so as to maintain the spatial variation in records.
An alternative to resampling was used by Hickling et al. (2005) who simulated a census in two 11-year time periods (1960–1970 and 1985–1995) to track changes in the range margins of British Odonata. They included in their analysis only those records that were taken from 10 km grid squares (the spatial resolution most commonly used in regional analyses) that had been monitored in both time periods. The range margin was then calculated for each species according to the same marginal-averaging method of Warren et al.
In a later analysis, Hickling et al. (2006) augment the CSM by further excluding grid squares that did not contain a certain proportion (10% or 25%) of species in the taxon of interest. The intention was to control for difference in recorder effort between sampled sites as well as controlling for difference in recorder effort in the wider landscape context.
Finally, Thomas & Lennon (1999) used a method similar to that of Hickling et al. (2005) to derive range margins changes between 1968–1972 and 1988–1991 for birds. Only records from 100 km2 squares that were monitored in both time periods were included in the analysis and these were used to find range shifts for each species. Range shifts for individual species were plotted (on the y-axis) against the log difference in the number of 100 km2 squares in which that species was recorded between time periods (on the x-axis). By taking the y-intercept of the regression line of this plot (i.e. x = 0, or equal numbers of records in each period), they are able to infer the range shift of a species that had no difference in recorder effort between time periods.
Despite a range of methods being used to detect range shifts from historical data, there has been little study of the variation between each of the methods (Shoo, Stephen & Hero 2006). We present a methodological analysis of these methods used to account for differences in recorder effort in a data base of sightings of British Odonata. In addition, we calculate the range shift using not only the predominant range statistic from previous studies – the mean latitude of the 10 most northern, occupied grid squares – but also the mean, median and most northerly grid squares and a novel range statistic based on a gamma frequency distribution. We also provide calculations for the threshold at which a species’ rarity affects the accurate calculation of range margins.
Materials and methods
The British dragonfly society data base
To illustrate the methodological considerations highlighted above, we use the extensive data base of sightings of Odonata maintained by the British Dragonfly Society (BDS). This is a data base with extreme spatial and temporal skew in the distribution of records (Fig. 1). Records were selected from between 1960 and 2004 (n = 349 624, as of November 2005), as this represents a period of substantial warming (Jones & Mann 2004) and, therefore, the period most heavily analysed in search of changes in distributions. The data base contains records of 50 species, of which 38 are residents and 12 are vagrants or migrants.
Exclusion of species
Species with highly restricted distributions were excluded. To establish a threshold for exclusion, the relationship between sampling intensity and range margin accuracy was investigated. A hypothetical one-dimensional range of 100 units was sampled at varying intervals: every cell (n = 100), every second cell (n = 50), every third cell (n = 33), etc. The range margin was defined in each case as the mean of the 10 highest cell values with the ‘correct’ range margin being the absolute highest cell value (100). Range margins of samples were expressed as percentages of this correct range margin. The process was repeated for ranges of 200, 300, 400 and 500 units.
The results showed that, regardless of the size of the range, there was a very clear hyperbolic relationship between the number of squares that had been sampled and the accuracy of the range margin result (Fig. 2; R2 = 98·0%). This relationship was determined using SigmaPlot (v.10, SYSTAT, Chicago, IL, USA). The number of squares yielded a range margin with 90% accuracy while retaining the majority of the data. Using the equation of the curve, it was found that 45 sampling points were necessary to generate this level of precision, so species lacking records from 45 different 10 km grid squares were excluded from the analysis. Fifteen other species were excluded if they exhibited irregular ranges (ranges comprising a number of discrete zones separated by uninhabited areas) or geographically constrained distributions (i.e. those species that could not expand further).
Comparison of methods
We compare the validity and variation in results between four approaches used to control for differences in recorder effort (‘methods’) and five range statistics for describing the location of a species’ range (‘range statistics’). Eighteen pairs of 10-year time periods were selected to provide coverage of the entire period as well as a variety of degrees of difference in recorder effort (Table 1). All periods within pairs were separated by at least a 5-year interval. Efforts were made to reduce overlap between periods to preserve independence of data, but some periods had to be selected to fill gaps in the distribution of variation in recorder effort where recorder effort was more even. For each of these comparisons, four separate analyses were carried out with different methods: (i) random subsampling of more heavily recorded period (Warren et al. 2001) (RM for resampling method), (ii) only records from grid squares sampled in both time periods (Hickling et al. 2005) (CSM for common squares method), (iii) only grid squares sampled in both time periods that contained 10% of the species of interest (Hickling et al. 2006) (TD10 for 10% threshold diversity), and (iv) only grid squares sampled in both time periods that contained 25% of the species of interest (Hickling et al. 2006) (TD25 for 25% threshold diversity). A control was also carried out which did not account for difference in recorder effort in any way (RD for raw data).
Table 1. Division of the British Dragonfly Society data base into comparisons of time periods with statistics for those comparisons
Time period 1
Time period 2
Log difference in records
Annual rate of temperature change (°C per year)
For each of these time periods, the range margin was calculated as the mean of the 10 most northerly grid squares in which each species was recorded. For comparison, the mean, median and maximum latitude of the records were also calculated. In addition to these range statistics, a gamma distribution was fitted to each set of latitudinal data using the fitdistr function in the MASS library (Venables & Ripley 2002) in R (R Development Core Team 2006). From this, the 95th quantile was found as an alternative method of estimating the range margin. The advantages of this range statistic over existing statistics are twofold: first, the shape of the gamma distribution includes an element of skew, thereby modelling the poleward tail of the distribution of records (see, e.g. Fig. 1a) more accurately. Secondly, an approach based on probability distributions reduces the errors associated with the stochastic nature of discovery of populations at range margins that are inherent within the maximum and margin range statistics (see below). The range shift was calculated as the difference between the values (mean, median, maximum, margin or 95th gamma quantile) of each time period. These were averaged across species to produce a mean range shift within each analysis.
This analysis was carried out twice: once with the raw range shifts as described above and a second time implementing Thomas & Lennon’s (1999)‘intercept method’ (RIM for regression intercept method). Only time periods where the regression line crossed the y-axis (i.e. the range of values for log difference in record number included zero) were included in the analysis to insure against extrapolation beyond the data. Differences between methods and range statistics were tested using a glm with method, range statistic and whether the RIM was used as factors in the model.
Comparison of range statistics
In addition to considering which methods would best account for differences in recorder effort between sampling periods, it is important to ensure that the statistics being used to describe the range are robust to the effects of varying recorder effort. Intuitively, using the most northerly record as the range margin and increasing the recorder effort will result in an apparent shift in range due to the increased chance of finding rare edge-of-range populations with more searching. A similar error will be introduced when using the margin statistic if the 10 averaged squares constitute a substantial portion of the total range. However, a range of such statistics is available to researchers. To determine which of the potential statistics of range location is most robust to the impacts of differences in recorder effort, 7269 records for Coenagrion puella (Linnaeus, 1758; the most recorded odonate species in the BDS data base) from between 1960 and 2004 were extracted from the data base to act as an exemplar. Five statistics of range location (mean, median, margin, maximum and 95th gamma quantile) were calculated for random subsamples of the records representing differences in recorder effort ranging from 1% to 100% of the whole data set for C. puella. A ‘range shift’ (which in this case would be wholly an artefact of sampling) was estimated as the difference in each range statistic between each level of recorder effort and the value of that range statistic at 100% recorder effort.
Method-range statistic validation
To test the validity of each method-range statistic combination, three groups of regression models were constructed. The first two were simple linear regressions with annual range shift (calculated as the mean range shift between two periods divided by the interval between those two periods) across all species as the response and either (i) the log difference in records between periods, or (ii) average annual change in temperature (the slope of the linear regression fit of mean Central England Temperature on year) as the predictor. The third group comprised multiple regressions with average annual range shifts across all species as the response variable and both average annual change in temperature between those periods and log difference in records between periods as predictors. Mean annual temperature change increased with time while differences in recorder effort showed the opposite pattern (Fig. 3a and b). A method-range statistic combination that accurately reflects the changing environmental conditions would, therefore, have resulted in a positive, significant relationship with annual temperature increase and a non-significant relationship with differences in recorder effort. There was a significant correlation between the log difference in records and the annual rate of temperature change (r = −0·712, P < 0·001, Fig. 3c). However, the collinearity in the regression model resulted in a variance inflation factor of 2·029, well below the rules of thumb of 4 or 10 that are used as indicators of substantial collinearity.
Comparison between methods
RIM, method and range statistic were significant factors in determining mean annual range shift (RIM, F1,595 = 28·85, P < 0·001; method, F4,595 = 43·13, P < 0·001; range statistic, F4,595 = 39·85, P < 0·001; R2 = 0·374). However, the similarity of the raw and RIM data in Fig. 4 and the strong correlation between raw and RIM values (r = 0·714, P < 0·001) suggests that the RIM acts mainly to reduce the magnitude of detected shifts. Variation between the mean range shifts calculated using the margin range statistic for CSM, TD10 and TD25 (Fig. 4) shows similar patterns of variation to similar plots in the paper on which those methods are based (Hickling et al. 2006), with decreasing magnitudes of range shifts calculated under progressively conservative methods.
Comparison between range statistics
When the balance of recorder effort between the two periods was varied, estimation of the range margin using the margin range statistic exhibited a significant, directional skew (Fig. 5a). This results in an apparent range shift which increases exponentially as the percentage of included records decreases, leading to the linear relationship seen in Fig. 5a. Estimation of the range margin using the maximum range statistic shows a similar pattern at highly uneven recorder efforts (1–30% of records included; Fig. 5b). However, these two range statistics are extremely accurate at more even recorder effort (i.e. once the correct extreme value has been identified within the subsample). This accuracy is likely to be extremely random in nature.
Mean, median and 95th gamma quantile each responded similarly, showing no artefacts due to differences in recorder effort. The ncv.test function in the car library (Fox 2002) in R was used as a post hoc test of equality of variance in these three range statistics. Variance was constant across differences in recorder effort for mean (χ2 = 0·615, P = 0·433; Fig. 5c) and median (χ2 = 0·068, P = 0·794; Fig. 5d) but varied with differences in recorder effort in the 95th gamma quantile (χ2 = 17·561, P < 0·001; Fig. 5e). This non-constant variance results from a more accurate approximation of the ‘true’ range shift at more even recorder effort (although it is worth noting that the change in the error is relatively small, even accounting for the log scale in Fig. 5e).
Method-range statistic validation
The repeated use of some time periods and overlapping years (see Table 1) will inflate the significance of the model results shown in Table 2. As a result, model results have been ranked according to the significance of the results of Model 3 with temperature. Of the 18 models describing range shift in terms of temperature (Model 2 in Table 2), six resulted in strong, positive correlations with temperature. These included three method-range statistic combinations that have been used previously in the literature. However, the multiple regression describing range shifts in terms of both temperature and variation in recorder effort (Model 3 in Table 2) shows that the relationship with temperature is substantially weakened for most models when recorder effort is included. The strongest of the temperature relationships when recorder effort is included occurs when the TD25 method is used with the maximum range statistic. However, this retains a strong positive correlation with recorder effort, which is indicative of the kinds of statistical artefacts that we are trying to account for. Also, this method-range statistic combination does not result in a positive correlation with temperature in Model 2. The RM method used with the gamma range statistic, on the other hand, produces the strongest positive relationship with temperature in Model 2 and the second strongest relationship with temperature in Model 3, but without the positive relationship with recorder effort. The RM method provided the three strongest relationships with temperature in Model 2 and two of the three strongest relationships with temperature in Model 3.
Table 2. Results of multiple regressions of the log difference in the number of records in each analysis (Records) and the mean annual change in temperature between the periods (Temp.) on the mean annual range shift
Method-statistic combinations are sorted according to the strength of the relationship with temperature in Model 3 (i.e. taking recorder effort into account). RM, resampling method; CSM, common squares method; TD10, 10% threshold diversity; TD25, 25% threshold diversity; RD, raw data.
Selected model results
RM gamma gives a mean annual range shift across the entire period of 3·41 km year−1 ± 0·815 (SE). When the RIM is applied, this is reduced to 3·10 km year−1 ± 0·840. That RIM only reduces the mean annual range shift by 9% and less than half the standard error further supports the position the method based on this measure is already robust. The range shifts that are calculated using this method-range statistic combination closely resemble the shifting of isotherms in response to increasing temperatures of 150 km °C−1 (Intergovernmental Panel on Climate Change 1996; Fig. 6).
Historical data bases of sightings of animals and plants are a valuable resource in detecting spatial and temporal change. However, the inherent biases that are found in such data sets require special methods to produce meaningful results. We present the first comparison of methods used to detect range margin changes in historical data and demonstrate that their reliability varies markedly. In a comparison of methods, we found that RM (random resampling of squares in the more heavily recorded period to equalize recorder effort; Warren et al. 2001) was least vulnerable to bias from uneven recorder effort. While relationships persist between range shift and recorder effort, these are in the direction expected as a result of the slight collinearity between these two variables (Fig. 3c) – i.e. a positive relationship with temperature is likely to produce a negative relationship with recorder effort. If uneven recorder effort was creating apparent range shifts that were artefacts of the uneven records, then a positive relationship would be expected (Thomas & Lennon 1999). The RIM (Thomas & Lennon 1999) could not be evaluated in the same way as other methods. However, given that the method is easy to apply, it can be used as an additional check to ensure that mean range shifts averaged across the taxon of interest are not confounded by a positive correlation between range shift and recorder effort. Consistent results both before and after the application of the RIM support the robustness of the results.
We also gave consideration to which range statistic should be used when comparing distributions in two different time periods. The obvious range statistic would simply be the most extreme record. However, it is clear that for the majority of cases where data are not systematically collected this will be subject to stochasticity and extremely sensitive to differences in recorder effort (Fig. 5b). Calculating the margin by averaging the 10 most extreme locations is widely used in the literature (Thomas & Lennon 1999; Warren et al. 2001; Hickling et al. 2005, 2006). However, we showed that this also results in artefacts resembling range shifts in data sets where none is present (Fig. 5a). We found support for an alternative method for the detection of trends in range margin shifts using the 95th quantile of a fitted gamma distribution. Due to the capacity of the gamma distribution to incorporate elements of skew into its shape, fitting a gamma distribution to each data set individually accounts more accurately for the frequency distribution of the records. This includes a portion of the latitudinal variation in recorder effort shown in Fig. 1a. Although the error associated with this range statistic increases at higher levels of uneven recorder effort, the error avoids the directional bias associated with the maximum and margin range statistics. For this reason, we recommend the use of frequency distribution approaches in the measurement of range shifts in preference to those currently in use. It is worth noting, however, that the margin and maximum range statistics are also appropriate where recorder effort is even.
The factors that cause issues with detecting range shifts in historical data also pose problems in the analysis that we present in this study. Changes in the rate of temperature change over time, changes in recorder effort over time, collinearity between temperature and recorder effort through time and insufficient records to conduct analyses at temporal resolutions that avoid non-independence in data reduce our ability to draw statistical conclusions concerning particular method-range statistic combinations from the models that we present. However, consistent patterns appear in the ranking of methods, with the RM method performing well with margin, maximum and gamma range statistics.
An additional factor in planning analyses of range shifts is the inclusion of appropriate species. For example, the margin range statistic is defined by Hickling et al. (2005) as ‘…the mean location of the 10 km grid squares… at the range boundary’. However, some species analysed have too few records to provide accurate locations of range margins. Coenagrion hastulatum and Aeshna isosceles have only ever been recorded in 10 and 18 different 10 km grid squares, respectively, in the United Kingdom (in the BDS data base as of November 2005), so any attempt to use the margin range statistic would give nothing more than the mid-point of the range. Despite this, each of these species can be found in Table 1 of Hickling et al. (2005). Range margin shifts also cannot be calculated accurately for species in which the range margin is geographically constrained (Shoo, Stephen & Hero 2006). This means that the inclusion of ‘ubiquitous’ species (those exhibiting no range margins in the study area) in Hickling et al. (2005) is not going to yield meaningful results. In a later article, these errors are corrected by only including species which exhibit a northern range margin in Britain and which occupied 20 or more grid squares in both time periods (Hickling et al. 2006), a threshold which still may introduce errors in the measurement of the range margin judging from Fig. 2.
Since climate has varied in the past (particularly during the early Quaternary period; Adams, Maslin & Thomas 1999), it might be expected that extant organisms would possess some form of adaptation which permitted their survival in the face of such change (Balmford 1996). Indeed, dispersal rates during the Quaternary climate fluctuations were surprisingly high and compensated for changing climate (Ashworth 1997; Clark et al. 1998). The consistency of range shift results across multiple taxa (Parmesan & Yohe 2003) suggests that the observed trend in poleward movements is not an artefact of detection methods. Long-distance dispersal plays a substantial role in maintaining this process and is vital for the persistence of metapopulations (Trakhtenbrot et al. 2005). However, it is likely that we have underestimated the maximum dispersal ability of species as this parameter is difficult to measure in standard mark–release–recapture studies (Slatkin 1985; Thompson & Purse 1999; Schneider 2003). Further work is needed to quantify dispersal and attempt to account for biases involved with study area size.
However, the ability of organisms to disperse or adapt in the face of contemporary climate change may be compromised by the sheer rate of warming coupled to reduced permeability of the landscape (Davis & Shaw 2001; Warren et al. 2001; Travis 2003; Opdam & Wascher 2004). This possibility is also suggested by the results of this study showing past rates of poleward movement of individual species of British Odonata varying relative to the velocity of poleward movement of isotherms. As a result, conservation measures are being specifically targeted towards facilitating dispersal (Araújo et al. 2004; Williams et al. 2005). This also places an emphasis on landscape-scale conservation strategies designed to maintain the connectivity between habitat patches. Despite their relatively high dispersal ability, British Odonata vary in the specificity of their habitat requirements with some requiring particular plant species, e.g. Erythromma najas and its association with Nuphar lutea (Hofmann & Mason 2005), or particular types of water bodies, e.g. Ceriagrion tenellum and its requirement for permanent, slow-flowing water with small particle size (Strange et al. 2007). Indeed, both of these species were among those which consistently failed to track isotherms in this analysis.
Although each of the methods previously used to control for the effects of differences in recorder effort in historical data sets appears reasonable, these methods vary in their ability to accomplish that task. Based on an analysis of an extensive and heterogeneous historical data set, we recommend a combination of subsampling of more heavily recorded periods and a frequency distribution approach to range margin description. With a large amount of historical data potentially still unanalysed, this study should act as a cautionary tale to those who might wish to embark on such analyses.
We would like to thank the British Dragonfly Society for access to its extensive data set of records and the many recorders who contributed to the data. Andrew Fenton, Alan Hildrew, Dan Faith and three anonymous reviewers provided valuable comments which greatly improved the manuscript. C.H. was supported by a Natural Environment Research Council Studentship and a Government of Canada Postdoctoral Fellowship.