### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Interpolation Methods
- 3. Data and Methodology
- 4. Results
- 5. Discussion and Conclusions
- Acknowledgments
- References
- Supporting Information

[1] We compare versions of six interpolation methods for the interpolation of daily precipitation, mean, minimum and maximum temperature, and sea level pressure from station data over Europe from 1961 to 1990. The interpolation methods evaluated are global and local kriging, two versions of angular distance weighting, natural neighbor interpolation, regression, 2D and 3D thin plate splines, and conditional interpolation. We first evaluated, using station cross-validation and several skill scores, relative skill of each method at estimating point values, looking at spatial and temporal patterns and the frequency distribution of the variables. We then compared, for precipitation, gridded area averages from the candidate interpolation methods against existing high-resolution gridded data sets for the UK and the Alps, which are derived from a much denser network of stations. In both point and area-average cases, differences in skill between interpolation methods at any one point are smaller than the range in skill for a single method either across the domain, or in different seasons. The main control on spatial patterns of interpolation skill is density of the station network, with topographic complexity a compounding factor. The relative skill of different methods remains relatively constant through time, despite a varying station network. Skill in interpolating extreme events is lower than for average days, but relative skill of different methods remains the same. We select global kriging as the best performing method overall, for use in the development of a daily, high-resolution, long-term, European data set of climate variables as part of the EU funded ENSEMBLES project.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Interpolation Methods
- 3. Data and Methodology
- 4. Results
- 5. Discussion and Conclusions
- Acknowledgments
- References
- Supporting Information

[2] Gridded surface climate data are important for many applications, including climate change detection [e.g., *Barnett et al.*, 2005], the evaluation of climate models [e.g., *Caesar et al.*, 2006], the parameterization of stochastic weather generators [e.g., *Hutchinson*, 1995; *Price et al.*, 2000], and understanding how climate interacts with terrestrial, hydrological, and biogeochemical processes [e.g., *Goovaerts*, 2000; *Scholze et al.*, 2006; *Shen et al.*, 2001]. Current work in climate change modeling requires higher-resolution observational data for the evaluation of regional climate models. In addition to higher spatial resolution, daily temporal resolution is needed to evaluate the ability of models to simulate the variance and extremes in climate that are key for a range of climate impact assessments. While there are a wide range of regional and global data sets of monthly climate derived from either station and/or satellite observations, gridded daily data are either specific to a particular country [e.g., *Daly et al.*, 2002; *Hewitson and Crane*, 2005; *Perry and Hollis*, 2005], cover a short time period [e.g., *Piper and Stewart*, 1996; *Rubel et al.*, 2004], or have coarse spatial resolution [e.g., *Caesar et al.*, 2006]. Table 1 provides examples of available daily data sets; while not exhaustive, it is clear that there are no long-term, high-resolution, daily gridded data sets for Europe or other large regions.

Table 1. Examples of Daily Gridded Data Sets of Daily Climate VariablesData Set Reference | Time Period | Area | Spatial Resolution | Variable | Interpolation Method Used |
---|

*Alexander et al.* [2006] | 1951–2003 | Global | 2.5 * 3.75 degree | Extreme precipitation and temperature indices | Angular distance weighting (ADW) |

*Ansell et al.* [2006] | 1850–2003 | European North-Atlantic | 5 * 5 degree | MSLP | Reduced space optimal interpolation (RSOI) |

*Caesar et al.* [2006] | 1946–2000 | Global | 2.5 * 3.75 degree | Temperature | ADW |

*Feng et al.* [2004] | 1951–2000 | China | 1 * 1 degree | Temperature, precipitation and many other climate variables | Modified Cressman scheme |

*Frei and Schär* [1998] | 1971–1990 | Alps | 25 * 25 km | Precipitation | ADW |

*Groot and Orlandi* [2003] | 1975-present | Europe | 50 * 50 km | Temperature and precipitation | Weighting method with 1–4 stations most similar to grid center |

*Hewitson and Crane* [2005] | 1950–2000 | South Africa | 0.1 * 0.1 degree | Precipitation | Conditional Interpolation (CI) |

*Kiktev et al.* [2003] | 1950–1995 | Global | 2.5 * 3.75 degree | Extreme precipitation and temperature indices | ADW |

*Kyriakidis et al.* [2001] | November 1981-January 1982 | California | 1 * 1 km | Precipitation | Alternative forms of kriging |

*Perry and Hollis* [2005] | 1961–2000 | UK | 5 * 5 km | 36 elements, including precipitation, temperature and MSLP | Regression followed by Inverse Distance Weighting (IDW) of the residuals |

*Piper and Stewart* [1996] | 1987 | Global | 1 * 1 degree | Temperature and precipitation | Nearest neighbor interpolation |

*Rubel and Hantel* [2001] | 1996–1998 | Baltic sea catchment | 1/6 * 1/6 degree | Precipitation | Ordinary block kriging |

*Rubel et al.* [2004] | September 1999-December 2000 | Europe | 0.2 * 0.2 degree | Precipitation | Ordinary block kriging |

*Shen et al.* [2001] | 1961–1997 | Alberta | Polygons | Temperature and precipitation | Nearest station assignment |

[3] As part of the EU ENSEMBLES project a daily, high-resolution gridded climate data set for the European domain is required for evaluation of regional climate models developed within the project and for climate change impacts assessments. Three key sources for gridded data sets are meteorological station records, satellite observations and, for precipitation, estimates from weather radar. For surface air temperature and pressure, satellite observations are either not available (pressure) or have significant, spatiotemporal biases (temperature) that currently preclude their use in climatology and climate model evaluation. For precipitation, satellite and radar data have advantages (complete spatial coverage) and disadvantages (short temporal coverage and large biases) [e.g., *Gerstner and Heinemann*, 2008; *New et al.*, 2001; *Reynolds*, 1988]. Merged station-satellite data sets overcome some of these disadvantages, but remain limited in time duration, particularly at high spatial and temporal resolutions [*Huffman et al.*, 1995; *Kottek and Rubel*, 2007; *New et al.*, 2001]. A fourth source for gridded data sets are reanalysis data, such as ERA40 and NCEP/NCAR data sets. However, *Simmons et al.* [2004] show for temperature that these data are only comparable to observations from stations after 1979, and precipitation from reanalyses exhibit large errors and systematic biases.

[4] Given these issues with remotely sensed data, and the particular need within ENSEMBLES for gridded data extending back to the 1950s, the ENSEMBLES gridded data set has been based on interpolation of station data. As the gridded data are primarily aimed at regional climate model evaluation, the approach used aims to produce area average rather than point estimates. The methodology is described in detail by *Haylock et al.* [2008], but is a two stage process: station data are first interpolated as point estimates to a fine grid, after which the point estimates are averaged to obtain area averages for the 25 km and 50 km grids used by the regional modeling centers.

[5] The development of the gridded climate data set is dependent on both adequate underlying station observations and the use of an appropriate interpolation method for high-resolution gridded point estimates prior to creation of area-averages grid values. Members of the ENSEMBLES consortium have contributed an unprecedented number of daily station data for Europe as the basis for the resulting gridded data set [*Klok and Klein Tank*, 2008]. The objective of this paper is the comparison of several candidate interpolation methods for estimation of point data in order to identify the most appropriate method for use in the construction of the ENSEMBLES gridded data for precipitation, temperature and sea level pressure (SLP).

[6] We adopt two approaches for our comparison. First, we use station cross-validation [e.g., *New*, 1999; *Willmott and Matsuura*, 1995] over the European domain, where each station is excluded in turn and the station values are then estimated through interpolation from the surrounding stations. The measured and estimated values at the excluded station are then compared, enabling us to quantify the relative skill of different interpolation methods at estimating point values.

[7] Second, we wish to confirm whether the best performing method(s), identified through station cross-validation, are also the best when grid-box area averages are calculated from the point estimates. In the absence of true areal averages, we compare our gridded area-average estimates to area averages calculated from existing high-resolution gridded data for the UK and the Alps. These regional and national gridded data are based on an order of magnitude more stations than are available to the ENSEMBLES project. Our assumption is that these grids are a fair approximation to the true area averages, and those candidate interpolation methods that produce area averages from our sparser station networks that are closest to those calculated from the higher-quality grids will likewise produce better estimates of the true area average. Alternative approaches to evaluating area-average estimates include spatially sub-sampling from continuous fields such as weather radar or high-resolution regional climate model output, then interpolating from the sub-sample to estimate the original field. As the primary purpose of the ENSEMBLES data set is climate model evaluation, we wish to avoid comparing the interpolation methods on climate model fields. Radar-based precipitation estimates were considered, but we were unable to obtain these within the time-constraints of the research project.

[8] The paper begins with a review of previous work in the interpolation of daily station data (section 2), with emphasis on work that has evaluated different methods, and justification for the methods we choose to evaluate. This is followed by a description of the methods and data used in the evaluation (section 3). We then present our results (section 4), looking at relative interpolation skill from several different perspectives, and summarize our findings in section 5.

### 2. Interpolation Methods

- Top of page
- Abstract
- 1. Introduction
- 2. Interpolation Methods
- 3. Data and Methodology
- 4. Results
- 5. Discussion and Conclusions
- Acknowledgments
- References
- Supporting Information

[9] Many different interpolation methods have been used for the gridding of climate station data [see *Tveito et al.*, 2006 for a recent review]. According to *Vicente-Serrano et al.* [2003], the best performing interpolation method “varies as a function of the area and the spatial scale desired for mapping”. Also important are the temporal duration and the nature of the climate variable to be interpolated; temperature and sea level pressure are, for example, more or less continuous in both time and space, whereas precipitation fields are spatially discontinuous on shorter timescales and more continuous on longer timescales [*New et al.*, 2001]. Moreover, the importance of geographical factors such as elevation [*Price et al.*, 2000; *Willmott and Robeson*, 1995], aspect, distance to coast [*Daly et al.*, 2002], seasonality and/or synoptic state [*Hewitson and Crane*, 2005], station density [*Willmott et al.*, 1994], and representation of the station network [*Willmott et al.*, 1991] may influence the choice of interpolation method and the accuracy of results. Some interpolation methods have the capability to include co-predictors, which may produce superior interpolation results. Another possibility is to reduce the influence of factors known to be important, through the interpolation of anomalies or through optimal interpolation. In the former, the deviation from the mean is interpolated before adding the anomaly field back onto a long-term mean field which is based on a much richer network of station means, often interpolated using co-predictors [*New et al.*, 2001; *Widmann and Bretherton*, 2000; *Willmott and Robeson*, 1995]. Optimal interpolation uses the long-term mean field as a “first guess” onto which shorter duration (monthly or daily) station values are interpolated [*Chen et al.*, 2002].

[10] *Tveito et al.* [2006] note the importance of testing different interpolation methods for specific purposes. While numerous comparisons of interpolation schemes have been undertaken, these have mostly been for long-term mean data, or monthly [e.g., *New et al.*, 2000; *Price et al.*, 2000] or annual [e.g., *Vicente-Serrano et al.*, 2003] fields. In contrast, there have been few evaluations of alternative methods for interpolation of daily station data. *Kurtzman and Kadmon* [1999] compare splines, and inverse distance weighting (IDW) and regression models, finding that a regression model predicts mean daily temperature values in Israel more accurately than splines or IDW. *Shen et al.* [2001] qualitatively compare several methods for the interpolation of daily station data, concluding that most interpolation methods do not retain the variability of the data and over-smooth the raw station data, and are thus best for interpolating mean climate data, which are themselves much smoother. The number of days with precipitation is also often not adequately represented. *Shen et al.* [2001] adopted the nearest station assignment, a hybrid of Thiessen polygons and IDW, as their best method [*Shen et al.*, 2001]. *Jarvis and Stuart* [2001] conclude that there is no significant difference between partial thin plate splines (TPS), ordinary kriging and IDW for daily minimum and maximum temperature values in England and Wales. *Daly* [2006] qualitatively compares IDW, TPS and versions of local and regional regression for the interpolation of precipitation, noting advantages and disadvantages of each method, but do not identify a “best” method. Finally, *Stahl et al.* [2006] compare, among others, Gradient plus Inverse-Distance-Squared (GIDS, a method based on multiple linear regression), ordinary kriging and IDW, finding that ordinary kriging performs best, except at high elevations, where GIDS performs best if the station density is high.

[11] From the above it is clear that different interpolation methods can work better for different variables, station densities and climate regimes. We therefore chose six different interpolation methods for evaluation of skill in interpolating the climate variables of interest: precipitation, SLP, and mean, minimum and maximum temperature.

[12] Natural neighbor interpolation (NNI), originally developed by *Sibson* [1981], is a fast and simple baseline method that has been used for many years as a standard part of the library of graphics functions provided by the National Center for Atmospheric Research (NCAR). NNI takes the best of Thiessen polygons and triangulation and objectively chooses the number of neighbors from which to interpolate based on the geometry. The weights for each station are selected based on the proportional area rather than distance. NNI produces an interpolated surface that has a continuous slope at all points, except at the original input points. It is an exact interpolator in that it reproduces the observations at the station locations.

[13] Angular distance weighting (ADW) has been used quite widely for interpolation of monthly climate data [e.g., *New et al.*, 2000; *Shepard*, 1984] and for daily data and extreme indices [*Alexander et al.*, 2006; *Caesar et al.*, 2006; *Frei and Schär*, 1998]. We test two versions of the formulation of ADW of *New et al.* [2000]. The first selects stations contributing to a grid-point estimate using a constant search radius of 250 km for precipitation and 500 km for temperature and SLP, with the distance components of the weights decaying to zero at the search radius. For the second version, which has only been applied to precipitation, the search radius and weighting function are permitted to vary across the grid domain, as explained by N. Hofstra and M. New (Spatial variability in correlation decay distance and influence on angular-distance weighting interpolation of daily precipitation over Europe, submitted to *International Journal of Climatology*, 2008). In both versions, if there are less than 3 stations within the search radius, the value for the grid point is not calculated; if more than ten stations are potentially available, the ten with the highest angular-distance weights are used.

[14] Conditional interpolation (CI) has recently been developed by *Hewitson and Crane* [2005]. Self-organizing maps (SOMs) are used to define characteristic synoptic rainfall states in a region surrounding the target grid point [*Hewitson and Crane*, 2002]. The search radius used for the SOM and the interpolation is set to 2.5 degree after a short sensitivity study. The synoptic state determines the likelihood of a wet/dry day. Wet day amounts are then interpolated using a weighted average of surrounding stations, where the station weights vary as a function of angular distance and are “conditional” on synoptic state. As CI was developed specifically for gridding precipitation, we do not evaluate CI for temperature and SLP.

[15] Regression has also been used fairly widely for interpolation, and can have various forms [e.g., *Stahl et al.*, 2006]. Here we test multiple linear regression using latitude, longitude, elevation and distance to coast as predictors. The regression relationship is established separately for each target point using only the neighboring stations within a radius of 500 km. Target locations with fewer than four neighbors are set to missing. We use singular value decomposition to calculate the regression coefficients because of its greater numerical stability than Gaussian elimination or LU decomposition [*Press et al.*, 1986]. Residuals are not interpolated separately.

[16] Kriging is used extensively in the geosciences [*Journel and Huijbregts*, 1978; *Kolmogorov*, 1939; *Krige*, 1951, 1966; *Matheron*, 1963]. It is a stochastic method that, like regression, applies best linear unbiased estimation (BLUE) methodology: the “estimated” (interpolated) value is a linear combination of the predictors (nearby stations), such that the sum of the predictor weights is 1 (unbiased) and the mean squared error of the residuals from the interpolating surface is minimized (best estimate). The interpolated surface is therefore a local function of the neighboring data, but conditional on the data obeying a particular model of the spatial variability (the variogram). Variogram modeling is done by fitting each of three non-linear functions (Gaussian, Spherical and Exponential) to the experimental variogram using the method of *Marquardt* [1963] and selecting the one with the lowest Chi-squared statistic. We tested for anisotropy but found the use of an isotropic variogram more appropriate. We also tried reducing the skewness of the precipitation distribution using a cube-root transform, but this did not improve the skill of the interpolation in cross-validation. The importance of elevation was tested by implementing this as an external drift, which improved the skill in temperature interpolation but not precipitation or SLP.

[17] Kriging is not an exact interpolator in that it will produce an interpolating surface that does not honor the observations at the station locations. We use two versions of universal kriging which differ in the manner in which the variogram is modeled: Global kriging (GK) where a single variogram is used across the entire region, for all days and stations and local kriging (LK) where a different variogram is defined for each interpolation point. We use the same variogram for each day of the year and found that defining separate variograms for each month reduced skill due to less data being available for variogram modeling. Search radii for stations were set, after comparing cross-validation skill scores for varying search radii, at 500 km for SLP, mean and minimum temperature, 450 km for precipitation and 300 km for maximum temperature. These differences in search radii do not influence the interpolated values significantly. For precipitation, we add an additional step, indicator kriging [*Deutsch and Journel*, 1998], where the occurrence of rainfall is interpolated (as binary values of 0 for <0.5 mm and 1 for >0.5 mm). The resulting interpolated values fall between 0 and 1 and can be interpreted as the probability of observing a wet day. Comparing the cross-validation skill of several probability thresholds revealed a threshold of 0.4 to be optimal for assigning a wet day to an interpolated point. Wet day amounts were then interpolated by universal kriging [*Webster and Oliver*, 2001], as for temperature and pressure.

[18] Thin plate splines (TPS) share features that are similar to kriging and there have been several comparisons of the two methods [*Hutchinson*, 1993; *Hutchinson and Gessler*, 1994; *Laslett*, 1994]. Splines use a different covariance function and one that is rarely used in kriging [*Hutchinson and Gessler*, 1994], which is defined by minimizing the generalized cross-validation error; thus, the amount of data smoothing can easily be optimized and TPSs are appropriate for use across large heterogeneous areas [*Hutchinson*, 1995]. Although there have been some attempts to unify the two approaches, such as the method of Matern Splines [*Handcock et al.*, 1994], the two methods are usually treated as independent. We use the ANUSPLIN package of *Hutchinson* [1993], comparing both 2D (TPS2D) and 3D (TPS3D) models. In the 3D implementation, elevation is converted to km and latitude and longitude are in degrees.

### 5. Discussion and Conclusions

- Top of page
- Abstract
- 1. Introduction
- 2. Interpolation Methods
- 3. Data and Methodology
- 4. Results
- 5. Discussion and Conclusions
- Acknowledgments
- References
- Supporting Information

[46] We have described the comparison of nine different versions of six interpolation methods, regression, NNI, ADW (1 and 2), kriging (global and local), TPS (2D and 3D), and CI, for five climate variables (mean, minimum and maximum temperature, precipitation, and MSLP) at daily time steps over Europe. We first evaluated relative skill of each method through station cross-validation, looking at (1) average skill over the entire domain, (2) spatial patterns of skill at individual stations, (3) variations in skill over time as the station network varies, and (4) skill for different deciles in the frequency distribution at each station. We also compared, for precipitation, the candidate interpolation methods against existing high-resolution gridded data sets for the UK and the Alps, which are derived from a much denser network of stations. For all comparisons, we use a range of skill scores that evaluate different aspects of estimation skill, particularly for precipitation, where we need to evaluate skill in estimating state (wet/dry days) as well as amount.

[47] Apart from regression and CI, the differences between the interpolation methods for all skill scores and all variables are fairly small. In fact, the differences in skill between summer and winter for a single method are larger than the differences between the methods. In addition to seasonality, skill is also influenced by station density, with all interpolation methods performing better in areas with higher-density station networks. However, not all interpolation methods respond in the same way to changing station networks, for example, while GK and ADW2 have very similar skill when the station network is dense, ADW2 performs better when nearest stations are more distant. Skill in areas that are topographically complex tends to be poorer, as shown by the station cross-validation in the Alps, and the comparison to high-resolution gridded data over the Alps and UK; for the latter skill is lowest in mountainous areas of Scotland and Wales. In these areas, different interpolation methods perform slightly better, depending on the variable of interest and the station density. However, the potential improvements in interpolation accuracy that might arise from using different interpolation methods in different areas or for different variables do not seem large enough to warrant such a complex approach to generating a pan-European gridded data set.

[48] Overall, GK is the best ranked method for all climate variables except maximum temperature, where TPS3D has a marginally better overall skill (Table 3). NNI performs well at many stations for the temperature variables and SLP, but average skill for NNI is negatively affected by larger errors at stations where NNI performs relatively poorly. Many studies have found that universal kriging is one of the best interpolation methods for both mean precipitation and temperature [e.g., *Attorre et al.*, 2007], as well as daily climate variables [*Jarvis and Stuart*, 2001; *Stahl et al.*, 2006]. *Stahl et al.* [2006] compare 12 variations of regression-based, kriging and weighted average approaches for the interpolation of daily minimum and maximum temperature over British Columbia, Canada. They find that the GIDS method (a method based on multiple linear regression) performs best when a high station density network is available. However, they conclude that methods that compute local lapse rates from the control points, like GIDS, should not be applied in the absence of sufficient higher-elevation station data because these methods performed more poorly for the years for which there were a smaller number of higher-elevation stations. For the years with lower station density, their ordinary kriging method performed best.

[49] The skill (absolute or relative to other methods) of the interpolation is relatively constant in time For MSLP, skill decreases in the 1980s, but the reasons for this are the data quality issues at stations in Belgium and the Alps, which are being investigated further prior to construction of a final data set. There are also only a few areas where a specific interpolation method appears to have higher skill: in the Netherlands LK appears to be the best method for precipitation and in the Alps TPS3D appears better for temperature, but in these areas differences between these methods and GK remain small.

[50] Thus, while there is no interpolation method that stands out as superior to others by a large margin and several methods perform best when considering a specific criterion, climate variable or sub-domain, GK is by a small margin the best overall method. GK also has the advantage, along with LK and TPS of yielding interpolation error estimates (analogous to the confidence intervals for regression). Therefore global kriging was selected as the method to be used in the construction of the ENSEMBLES 0.2° longitude/latitude gridded daily climate data set, described in detail by *Haylock et al.* [2008].