SEARCH

SEARCH BY CITATION

Keywords:

  • empirical probability distribution function;
  • statistical modelling;
  • generalized extreme value distribution;
  • climate change

ABSTRACT

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

This paper proposes a method for describing the distribution of observed temperatures on any day of the year such that the distribution and summary statistics of interest derived from the distribution vary smoothly through the year. The method removes the noise inherent in calculating summary statistics directly from the data thus easing comparisons of distributions and summary statistics between different periods. The method is demonstrated using daily effective temperatures (DET) derived from observations of temperature and wind speed at De Bilt, Holland. Distributions and summary statistics are obtained from 1985 to 2009 and compared to the period 1904–1984. A two-stage process first obtains parameters of a theoretical probability distribution, in this case the generalized extreme value (GEV) distribution, which describes the distribution of DET on any day of the year. Second, linear models describe seasonal variation in the parameters. Model predictions provide parameters of the GEV distribution, and therefore summary statistics, that vary smoothly through the year. There is evidence of an increasing mean temperature, a decrease in the variability in temperatures mainly in the winter and more positive skew, more warm days, in the summer. In the winter, the 2% point, the value below which 2% of observations are expected to fall, has risen by 1.2 °C, in the summer the 98% point has risen by 0.8 °C. Medians have risen by 1.1 and 0.9 °C in winter and summer, respectively. The method can be used to describe distributions of future climate projections and other climate variables. Further extensions to the methodology are suggested.


1. Introduction

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

In the last IPCC report (Solomon et al., 2007) the evidence for changes in the variability and extremes of temperature, and changes in mean temperatures, was examined. Typically this evidence was based on the comparison of summary statistics derived from the empirical probability distribution function (EPDF) of observed or projected temperatures.

The EPDF of, for example, historical daily mean temperatures on a specific day over a particular period of years is obtained from the set of daily mean temperatures that are recorded on that day of the year for each of the years in that period. With daily long-term records of temperature, or with climate projections on a daily time step, it is possible to obtain an EPDF for each day of the year. It is also possible to obtain an EPDF for each month of the year (Barrow and Hulme, 1996), for a set of months (Beniston and Stephenson, 2004), or for the whole year (Ballester et al., 2010) by selecting the set of days to be included in its construction.

Summary statistics can be derived for each day of the year from a daily EPDF and seasonal variation is observed in these statistics. Two types of summary statistics are typically derived from the EPDF. The first type characterizes the shape of the whole distribution, for example Ballester et al. (2010) summarize the mean, variance and skewness of the annual EPDF. The second type describes characteristics such as the percentiles of the distribution: for example, the temperatures below which 5%, say, of observations fell. For example, Yan et al. (2002b) calculated a number of percentiles for each day of the year for the period 1748–1998 in Uppsala and compared these with the same percentage points in Beijing for the period 1915–1997. These types of summary statistics are useful not only for researchers but can also be useful for users of climate data, such as farmers, hydrologists, or energy companies. For climate users in particular, these summary statistics should provide useful information about the climate and be relatively simple to calculate.

There are two difficulties for a climate user, especially when the interest is in daily statistics derived from the daily EPDF. First, summary statistics derived for two consecutive days may differ substantially, so that a sequence of daily summary statistics over the year, such as the 5% point, may demonstrate a large amount of day-to-day variability as well as seasonal variation. Although the observed day-to-day variability is real, it may mask the seasonal pattern. If comparing two locations or periods it may mask the underlying similarity or difference in the climate that these two distributions represent: in effect the signal may be masked by noise. This is especially true when the period of the EPDF is short; that is, not many observations contribute to each EPDF. A second difficulty is that for each new summary statistic that is required, the raw data must be re-examined.

A solution to the first problem is to fit a smooth function to the summary statistics through the year. In essence, the smoothed function ‘borrows’ information from adjacent days to return a set of summary statistics that varies smoothly over the year. For example, Yan et al. (2002b) fitted an 11 point binomial filter to the 5% values for each day derived from the EPDF. However, each separate summary statistic of interest must be smoothed separately, and even if the summary statistic is only required for 1 day, the statistic must be calculated on other days to obtain the smoothed result.

A solution to the second problem is to describe the EPDF by fitting a suitable theoretical probability distribution to the data. Given a choice of theoretical distribution that is a good fit to the EPDF, all relevant summary statistics, such as percentiles, can then be calculated from the theoretical distribution using the estimated parameters without needing to revisit the data. Climate users then only need to store the estimated parameters (typically two or three) of the distribution, rather than all the observed values, for each day of the year.

Ideally, the theoretical distribution and, therefore, summary statistics derived from the distribution, should capture the seasonal variability in the data and mask the day-to-day variation. This could be achieved if the estimated parameters of the theoretical distribution varied smoothly through the year. This is not achieved directly because the distribution is fitted separately to the EPDF for each day of the year, and the EPDF is noisy. An alternative to smoothing the summary statistics directly is to smooth the estimated parameters of the theoretical distribution. Summary statistics, derived from the theoretical distribution using smoothed parameters, will then also vary smoothly through the year capturing the seasonal variability in the data.

A common approach to describing the distribution of daily weather data so that the distribution varies smoothly through the year is to use Generalized Linear Models (GLMs) (McCullagh and Nelder, 1989). For example, Yan et al. (2002a) describe the distribution of maximum wind speed on any day of the year using a gamma distribution, and the amount of rainfall on a wet day is typically modelled using the gamma or exponential distribution (Stern and Coe, 1984). The distribution of daily (maximum, minimum and mean in general) temperatures, were modelled by Richardson (1981) and more recently by Furrer and Katz (2007), assuming that the temperature distribution on any day is a normal distribution.

In northern Europe, daily temperatures in winter can be below 0 °C and the EPDF is left skewed because of extreme cold events. In the summer, temperatures are typically above 0 °C and extreme warm events result in an EPDF that is right skewed. Although the GLM framework is quite general, it is constrained to modelling distributions from the exponential family. There are no distributions in the exponential family that are skew and take both positive and negative values and so this approach is not available for modelling these temperature data. Furthermore, in a GLM it is a function of the mean of the distribution that can vary through the year. Other characteristics of the distribution, such as the variance, have less flexibility.

Because of these limitations there appears to be little work on modelling temperature distributions in northern Europe. Jones et al. (1999) focused on relative rather than absolute distributions and changes in the distribution of the Central England temperature data by fitting gamma distributions to temperature anomalies relative to a smoothed mean on each day of the year for two different periods. Barrow and Hulme (1996) tested the suitability of different theoretical probability distributions to describe monthly EPDFs of daily maximum or minimum surface air temperatures, recorded in Fahrenheit, at nine locations in the UK. Different distributions were chosen depending on the month and location, and no overall preferences were found.

The present paper demonstrates a pragmatic method for describing how the daily distribution of temperatures: (1) varies through the year so that parameters of the distribution and therefore any summary statistics vary smoothly throughout the year, and, (2) has changed between two periods. The method fits the generalized extreme value (GEV) distribution, a distribution with three parameters, to the EPDF for each day of the year in two different periods. A separate linear model is then used to describe how each of the three estimated parameters varies smoothly through the year to obtain smoothed parameter estimates for each day of the year in each period. Using the smoothed parameter estimates to obtain new GEV distributions for each day of the year, relevant summary statistics can be derived and compared between the two periods.

The method is illustrated using daily effective temperature data, temperature adjusted for wind speed (see Section '2. Data' for definition) from De Bilt, Holland, for the two periods 1904–1984 and 1985–2009. One set of users of these data wanted to know the probability that the temperature exceeds certain thresholds to assist in planning the maintenance of utility services, especially during the winter. The users wanted to be able to calculate the probability for any threshold in the future, rather than being given results for pre-specified thresholds, and without returning to the raw data each time. A long time series of data was available, but because of concerns over changes in temperature, only the last 25 years were thought relevant. However, there is information in the previous 80 years of data, and an additional feature of this paper is that the smoothing of parameters in the more recent period uses, where possible, information from the earlier period, thus making maximal use of the available data. Finally, the paper describes how this methodology provides a baseline for the development of more sophisticated methods that directly model the correlation in the data and the parameters and provide measures of uncertainty in relevant summary statistics.

2. Data

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

The data are 38 351 records of daily effective temperature (DET) from De Bilt, Holland from 1 July 1904 to 30 June 2009, with missing values for all of April 1945 and the last 8 days in August 1915. Daily effective temperature on day t, Tt is calculated using the definition of Wever (2008) as:

  • equation image(1)

where Tmath image is the observed daily mean temperature on day t measured in °C, and Wt is the average wind speed on day t measured in m s−1. The daily mean temperatures, Tmath image, and average wind speeds, Wt, were obtained from the Royal Netherlands Meteorological Institute website (http://www.knmi.nl). To account for variability in the altitude at which the wind speed was measured over the 105 years, the wind speeds were corrected using Wever's (2008) adjustment.

The data are split into two periods. The first from 1 July 1904 to 30 June 1984 is the reference period and the second, from 1 July 1984 to 30 June 2009, the current period. It is the distribution of daily effective temperatures in this current period of 25 years which is of most interest for users of summary statistics as well as the change between any two periods. As shown in Figure 1, the EPDF of DET in the reference period is therefore relatively well defined because it is based on 80 observations for most days of the year, whereas in the current period there are only 25 observations per day.

image

Figure 1. Empirical probability distribution function (EPDF) of the daily effective temperature (DET) for the 15 of each month for the reference period (1904–1985), unshaded histograms, and the current period (1986–2009) shaded histograms. Bottom histogram is the EPDF of all observations over the whole time period. This figure is available in colour online at wileyonlinelibrary.com/journal/met

Download figure to PowerPoint

There is evidence of seasonal variation in the EPDF (Figure 1). It is negatively skewed in the winter, positively skewed in the summer and almost symmetric in spring and autumn. Overall, there is more evidence of negative skew than positive skew in the distribution of DET. This is because the distribution of observed temperatures in the winter is negatively skewed and high winds further decrease temperatures. In the summer, high winds decrease the effect of extreme high temperatures reducing the positive skew of the observed temperatures. Combining the DET values over all years and all days results in a bimodal distribution reflecting the positive and negative skew in the data.

Key summary statistics of the daily EPDF, mean, standard deviation and skew, are shown in Figure 2 for the reference period. These summaries are based on the negative temperatures, DET multiplied by − 1, because the focus is on winter temperatures. Further justification is given in Section '3.1. Stage 1: describing a separate distribution for each day in each period' There is strong evidence of seasonal variation and little evidence of day-to-day variability for the negative mean temperature: it is high in the winter and low in the summer (Figure 2(a)). The standard deviation and skew show more day-to-day variability and some seasonal variability. The temperatures vary least in the autumn, smallest standard deviation, and most in the winter (Figure 2(b)). The negative temperatures have a positive measure of skew, indicating a right-skew distribution in the winter (equivalent to cold extremes) and negative measures of skew, indicating left skew in the summer (hot extremes). The winter distributions are more skew than the summer distributions, as 12 days had a right skew of more than 90 in the winter, compared to 4 days with a left skew of less than − 90 in the summer. The two distributions at the bottom represent 2 days with very different characteristics. The EPDF of the negative temperatures on 15 January has a high mean, high variance and positive skew and on the 15 July has low mean, lower variance and negative skew.

image

Figure 2. (a) Mean, (b) standard deviation and (c) skew of the EPDF calculated for every day of the year during the reference period. Calculations are based on negated daily effective temperatures (DET). (d) Shows the EPDF for 15 January (a darkly shaded histogram) with high mean, standard deviation and skew, (values marked as vertical dash-dot line in plots (a–c)) and for 15 July (a lightly shaded histogram) with low mean, standard deviation and skew (values marked as vertical dashed line in plots (a–c)). This figure is available in colour online at wileyonlinelibrary.com/journal/met

Download figure to PowerPoint

3. Method

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

A two-stage modelling process, as illustrated in Figure 3, was used to obtain parameter estimates of the theoretical probability distribution that varied smoothly through the year. In the first stage a separate theoretical probability distribution was fitted to the daily EPDF of DET on each day of the year in each time period. Each separate fit gave estimates of the three parameters of the distribution. In the second stage, a linear model was fitted to each of these parameters to describe the seasonal variation in the parameters. The model was fitted to parameters from both time periods with flexibility for parameters to take different values in the two periods where necessary. From the linear models, smoothed estimates of the parameters of the distributions could be obtained from which summary statistics for any day of the year in either period derived.

image

Figure 3. Schematic showing the stages required to obtain the smoothed summary statistics. In stage 1, a GEV distribution is fitted to the EPDF giving fitted parameter estimates for each day of the year. In stage 2 a linear model is fitted to describe how each of the fitted parameter estimates varies through the year and between the two time periods. From this model smoothed parameter estimates are obtained. From these smoothed GEV distributions the summary statistics are derived

Download figure to PowerPoint

3.1. Stage 1: describing a separate distribution for each day in each period

The distribution of DET on each day of the year was described by the Generalized Extreme Value (GEV) distribution. Although the GEV distribution was originally constructed for describing the distribution of extreme values it has all the properties required for modelling the distribution of DET as it is extremely flexible, can be positively or negatively skewed and take positive or negative values. The distribution is described by three parameters; the location, µ, scale, σ, and shape, η. It is η, the shape parameter, which determines the tail of the distribution. The GEV distribution is most flexible in the shape of the upper tail so the highest values are well described. Because in this case low temperatures were of more interest, and because winter EPDFs were more skew than summer EPDFs, the temperatures were transformed by multiplying by − 1 so that the lowest temperatures were best described.

Thus, if Tty is a random variable for the daily effective temperature observed on day t for t = 1, …, 366 for the reference period (y = 0) and the current period (y = 1) the distribution of the EPDF of the transformed temperatures, Xty = − Tty was modelled as:

  • equation image(2)

Fitted parameter estimates, equation image and equation image, and their standard errors equation image and equation image respectively, of this distribution were obtained for each day of the year for both the reference period and current period using maximum likelihood in the statistical software package R (R Development Core Team, 2011) with the GEV function in the EVIR package (S original (EVIS) by Alexander McNeil and R port by Alec Stephenson, 2008). Code for this, the rest of the analyses and the graphs, are available in the Wiley Online library. Parameter estimation, especially of the standard errors, was difficult for some of the short time series in the current period due to convergence issues so the Broyden, Fletcher, Goldfarb and Shanno (BFGS) methods (Broyden, 1970; Goldfarb, 1970; Shanno, 1970; Fletcher, 1970) were used for the optimization.

3.2. Stage 2: smooth estimates of the parameters through the year

A weighted linear regression model was fitted to each of the three fitted parameters equation image and equation image to describe seasonal variation in the parameters and the differences between the two periods. The log of equation image was modelled to ensure that equation image remained positive.

Fourier series functions were the main covariates considered in the model. These were used to model the seasonality and to ensure continuity between the last day of a year and the first day of the following year. To determine whether the variability through the year differed depending on the period, an indicator variable Yty was another candidate variable. This variable took the value 1 if y = 1, that is if the response is from the current period, and the value zero otherwise. In addition, interactions between Yty and the sine and cosine terms were also considered. The maximal model considered for equation image, and similarly equation image and equation image, was of the form:

  • equation image(3)

for t = 1, …, 366 and y = 0, 1 where equation image and equation image

The variance of the error term εty was weighted in proportion to the precision of the fitted parameter estimates.

Model fitting followed a forward selection process starting with Yty, and then considering each pair of Fourier series terms (sin(it′), cos(it′)), i = 1, … and if significant their interaction with Yty, before considering the next pair of Fourier series. Fourier series harmonics were included as pairs in the model and if the addition of either term led to a significant improvement both were included in the model. Only significant interactions between Yty and Fourier series harmonics were included: if Yty.cos(it′) was not significant it was not included in the model, even if Yty was significant. Successive harmonics were added to the model until two consecutive pairs of harmonics were not significant. Candidate variables were retained in the model if significant at 1%, almost equivalent to a change in AIC (Akaike Information Criterion) (Akaike, 1974) of 10 or more.

Using the fitted model for each parameter, smoothed estimates of equation image, equation image and equation image and their standard errors were obtained. For example if the maximal model for equation image, was retained the smoothed estimate equation image would be of the form:

  • equation image(4)

where equation image etc are the maximum likelihood estimates of the equation image etc obtained for the fitted model. Standard errors, and 95% confidence intervals, for each of the smoothed estimates can be obtained from the variances and co-variances of the equation image as in a linear regression model.

Using the smoothed parameter estimates equation image, equation image and equation image, a distribution of the daily effective temperatures for day t in the current period (y = 1) was obtained. From this distribution, summary statistics such as the 5% point, or the probability that the temperature is below, for example, − 5 °C were calculated.

One concern was that significant differences in the parameters between the two periods were due only to the different sample sizes used (80 and 25 observations respectively). To check this, a simple simulation was carried out. Using fitted parameter estimates from the reference period, data were simulated for 105 seasons for the 6th and 15th of each month. The methods described above were used to obtain fitted parameter estimates and smooth parameter estimates for the two nominal dates. There was no evidence in these cases that there were significant differences between the two parameters. This suggests that the estimated differences in parameters between the two periods were not due to chance.

4. Results

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

4.1. Parameter estimates

From stage 1 of the analysis the parameter estimates equation image, equation image, equation image were obtained for t = 1, …, 366 for y = 0, 1 for all but 7 days in the current period. Specifically, for these 7 days the standard errors were either not estimated or extremely poorly estimated, giving a value of less than 2E−6, all other standard errors were greater than 0.04. The 29 February was difficult to fit because there were only six observations in this period: in the reference period there were 20 observations so it was possible to fit a distribution to these values although the standard error of the parameter estimates for this day was, on average, more than twice the standard error of the estimates for other days of the year, for which there were 80 observations. It was unclear why estimates for the other days were poor. Model checking suggested that for all other days the GEV distribution provided a good description of the data.

The stage 1 parameter estimates, equation image, equation image and equation image for t = 1, …, 366 for the two periods are shown as points in Figure 4. There is a large amount of variability in these estimated parameters from day to day, particularly for the current period. This is especially the case for the estimates of equation image and equation image. The three parameters, equation image, equation image and equation image roughly represent the three characteristics of the observed distributions shown in Figure 2. The location parameter, µty, describes the mean value and is close to zero over the winter months when the actual temperatures are lowest, and furthest from zero over the summer when the actual temperatures are highest. The scale parameter σty, is a measure of the variability in the data. The variability is lowest in the autumn and highest in the winter and summer. The shape parameter ηt represents the skewness in the distribution. The closer ηt is to zero, the greater the right skew in the distribution. As ηt becomes more negative the greater the left skew in the distribution. As negative temperatures were used, ηt is negative but close to zero in the winter when there are extreme low temperatures and ηt becomes more negative in the summer when there are extreme high temperatures, leading to a more positive skew in DET.

image

Figure 4. Estimated parameters (a) mean, (b) standard deviation, (c) skewness of the GEV distribution plotted against time of year for the reference (light grey points and lines) and current period (dark grey/red points and lines). Points indicate the stage 1 fitted parameter estimates and lines the stage 2 smoothed parameter estimates. This figure is available in colour online at wileyonlinelibrary.com/journal/met

Download figure to PowerPoint

Table 1 gives the estimated effects and their standard errors for the three linear models fitted in stage 2 of the analysis. Using these estimated effects and their standard errors the smoothed values of each of the parameters of the GEV distribution equation image, equation image and equation image, were obtained for each day of the year (t = 1, …, 366) for the two periods y = 0, 1. These are shown in Figure 4 with their 95% confidence intervals. Although it was not possible to fit GEV distributions to 7 days of the year in the current period, the models can be used to obtain smoothed parameter estimates of the GEV distribution for these days. There is evidence that all three parameters vary through the year: in Table 1 all three models required first and second order sine and cosine terms. There was also evidence of a significant effect of period, Yty, indicating that the parameters and, therefore, the distributions are not the same for the two periods. Furthermore, the variability in both the location parameter, equation image, and the shape parameter, equation image through the year differs between the two periods. Both required an interaction with sine and cosine terms. The 95% confidence intervals for the smoothed parameters are larger for the current period than the reference period because of the smaller amounts of data to estimate the parameters.

Table 1. Estimates (standard errors) of significant terms for the three parameters
ParameterTermequation imageequation imageequation image
equation imageIntercept− 8.023 (0.018)1.109 (0.005)− 0.289 (0.004)
equation imagesin(t′)− 2.885 (0.021)− 0.100 (0.006)− 0.023 (0.006)
equation imagecos(t′)− 7.009 (0.026)0.042 (0.006)− 0.212 (0.006)
equation imagesin(2t′)− 0.375 (0.022)− 0.038 (0.006)0.025 (0.006)
equation imagecos(2t′)− 0.093 (0.022)0.064 (0.006)0.031 (0.006)
equation imagesin(3t′)0.285(0.025)− 0.024 (0.006)
equation imagecos(3t′)0.151(0.022)0.010 (0.006)
equation imagesin(5t′)− 0.014 (0.022)0.003 (0.005)
equation imagecos(5t′)0.194 (0.022)− 0.023 (0.006)
equation imagesin(6t′)− 0.036 (0.022)− 0.015 (0.005)
equation imagecos(6t′)0.069 (0.022)0.009 (0.006)
equation imagesin(7t′)− 0.151 (0.025)
equation imagecos(7t′)− 0.066 (0.022)
equation imageYty− 0.946 (0.035)− 0.063 (0.010)− 0.050 (0.010)
equation imagesin(t′).Yty− 0.039 (0.014)
equation imagecos(t′).Yty0.242 (0.052)
equation imagesin(2t′).Yty0.039 (0.014)
equation imagesin(3t′).Yty− 0.253 (0.050)− 0.037 (0.041)
equation imagecos(3t′).Yty− 0.048 (0.014)
equation imagesin(7t′).Yty0.268 (0.050)

Throughout the year the smoothed location parameter equation image is lower in the current period than in the reference period with the biggest difference mostly in the winter. Because these are negative temperatures this indicates that the current period is warmer than the reference period. For equation image, measuring the variability in the data, the current period is less variable than the reference period. The smoothed measure of skew, equation image, is more negative in the current period than the reference period from June through August and mid-September to end of November. So DET is more positively skewed, that is has more extreme warm days, in these parts of the year in the current period.

4.2. Distributions and summary statistics derived from the parameter estimates

Overall, there is a general increase in DET from the reference to the current period. Figure 5 uses the smoothed estimates of the GEV distribution to obtain distributions of DET on the 15 of each month. The warming and therefore shift in distributions between the two periods is clear. For the summer the left tail of the distribution cannot be calculated at low temperatures. Figure 6(a) and (b) show, for both periods, the 2, 10, 50, 90 and 98% points, and the probability that the DET is less than 10, 5, 0, − 5, − 10 °C through the year, respectively. The percentage points for the current period are always higher than for the reference period with most marked differences in the winter. The mean difference in median DET (the 50% point) in the winter is 1.1 °C and in the 2% point is 1.2 °C. In the summer the mean difference in median DET is 0.9 °C and in the 2% point is 1.6 °C. There are also shifts in the upper percentage points, particular in the winter where the difference in the 98% point is 0.7 °C.

image

Figure 5. GEV distributions of the daily effective temperature (DET) for the 15 of each month using the smoothed parameter estimates for the reference (light grey line) and current (dark grey/red line) periods. This figure is available in colour online at wileyonlinelibrary.com/journal/met

Download figure to PowerPoint

image

Figure 6. Summary statistics calculated for each day of the year in the reference (light grey line) and current (dark grey/red line) periods using the smoothed GEV distributions. (a) The daily effective temperature (DET) at which there is a 2% (long dash), 10% (dash-dot), 50% (solid), 90% (short dash) and 98% (long dash-short dash) chance of the observed daily effective temperature being less than. (b) The probability that the daily effective temperature is less than − 10 °C (long dash), − 5 °C (dash-dot), 0 °C (solid), 5 °C (short dash), 10 °C (long dash-short dash). This figure is available in colour online at wileyonlinelibrary.com/journal/met

Download figure to PowerPoint

Because of the warming between the two periods, the probability that the DET is less than a particular temperature is always lower in the current period than the reference period. From December to the end of February the probability that the DET is less than 10 °C is 1 in both the reference and current periods: it is not expected that the DET will be more than 10 °C in these months. The probability is less than 1 for all lower temperatures. From June to the end of August the probability that the temperature is less than 5 °C is zero for both the current and reference period. The clearest differences between the two periods are in the winter. On the 15 January, the probability that the temperature is less than 0 °C is 0.52 in the reference period and 0.37 in the current period.

For comparison, the same statistics as shown in Figure 6 were calculated using the observed DET data. These are shown in Figure 7 for the current period. Here the difficulty of using the observed data to calculate these statistics is the variability from 1 day to another can be seen. The smoothed values calculated using the modelling approach in this paper, shown as the continuous smooth line, are a good representation of the observed data although the probability that the temperature is less than − 5 °C is rather poorly estimated in January.

image

Figure 7. Summary statistics calculated from observations and smoothed GEV distribution for the current period for (a) daily effective temperature (DET) below which there is a 2% (long dash), 10% (dash-dot), 50% (solid), 90% (short dash) and 98% (dot) chance of being observed and (b) probability that the DET is less than − 10 °C (long dash), − 5 °C (dash-dot), 0 °C (solid), 5 °C (short dash), 10 °C (dot) on each day of the year

Download figure to PowerPoint

5. Discussion of alternative methodologies

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

The two-stage methodology used in this paper fits a separate GEV distribution to the DET for each day of the year in each of the two periods to give estimates of the three parameters of the GEV distribution for each day in each period. Each of the three parameters is then smoothed to describe how it varies through the year and between the two periods. Smoothing is carried out by fitting a weighted general linear model with Fourier series functions as covariates. Using this method of smoothing: (1) enables there to be a smooth transition between the end of one year (30 June) and the beginning of the next year (1 July); (2) gives the ability to share information between the two time periods so that common features can be fitted together, this is especially important for the current period where estimates are based on many fewer observations; (3) enables those parameters that are more precisely estimated to be given more weight in defining the pattern through the year, and, (4) provides a straightforward way to predict the distribution for any day of the year in either period because all that needs to be stored are the three linear models and their estimated parameters. It is difficult to see how other smoothing methods, such as binomial filters, which are commonly used to model time series date (for example Yan et al., 2002b) would meet all four of these requirements.

Further development of the methodology used in this paper could enable confidence intervals for the summary statistics to be obtained, which would be useful for comparing between periods or through the year. For this, the covariance of pairs of parameter estimates, for example equation image, as well as the variance of each of the smoothed parameter estimates, are required (Coles, 2001). It may be possible to obtain these by fitting a multivariate normal distribution to all three fitted parameter together so that co-variances as well as variances are estimated. However this is a non-trivial estimation exercise, which moves away from the pragmatic approach described here, and it is unclear how weighted multivariate regressions, particular the weighted covariance structure, can be implemented. If it was implemented, it would only provide a first estimate of the variability of the summary statistics because it only captures the uncertainty about the smoothed parameters; the variability of the fitted parameters, for example equation image, would not be captured by this method and so confidence intervals for summary statistics would appear more precise than they should. Further work is required to extend these methods and potential avenues with their major challenges are described below.

More elegant and statistically appealing approaches are used in the analysis of extremes. These strategies combine the two-stages so that the daily observations of DET are modelled directly to obtain estimates of the parameters of the GEV distribution that vary smoothly through the year. For example, Menéndez et al. (2007) fitted a one-stage model to high-sea levels in which the parameters of the GEV distribution vary seasonally through the year, and Maraun et al. (2009) apply a similar model to describe the annual cycle of precipitation using the maximum precipitation in each month. Coles and Tawn (2005) carried out a Bayesian analysis which accounted for seasonality and long-term trends in the parameters of the GEV distribution to describe extreme sea surges on the UK east coast. A Bayesian analysis is appealing because all sources of uncertainty are automatically included in the analysis.

However, the data structures to which these models are applied are simpler than the one described in this paper. First, they only model the distribution of either maximum value, or values over a threshold so there is no, or little, serial correlation to account for. To apply these methods to the data in this paper would mean that the serial correlation in the DET between consecutive days in the same year would need to be accounted for and it is challenging to extend the one-stage methods described above to do this. Using the two-stage method described in this paper the serial correlation does not need to be accounted for because separate distributions are fitted to each day.

Furthermore, the distributions of the extremes or peaks-over thresholds are always right-skew: this is much simpler to fit than trying to fit data with left and right skew. A possible alternative approach is to switch from modelling the distribution of negated DET to DET at some point in the year. Determining two suitable points to do this may be problematic.

A further limitation of the analysis in this paper is that it focuses on the difference between two periods, because separate GEV distributions need to be fitted for each time period. A more sophisticated analysis, such as Coles and Tawn (2005), could provide scope for modelling a long-term trend in DET, for example by using generalized additive models (Underwood, 2009) through the years once the issues of autocorrelation and changing the direction of skew were solved.

The analysis here has focused on DET and the issues surrounding the modelling of negative and positively skewed data, other climate variables could be modelled using a similar strategy. Depending on the shape of the EPDF, distributions other than the GEV might be appropriate. In addition to describing changes in the distribution of two historical periods, comparisons between two locations or between current day and projections could also be explored.

6. Conclusions

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

This paper describes a methodology for easily obtaining summary statistics that vary smoothly through the year, of the distribution of DET. Furthermore, it demonstrates how long time series of data over many years can be used to help inform summaries for a number of years, e.g. the most recent years. The method describes the EPDF of DET on any day as a GEV distribution, and uses simple equations to describe the seasonal variation in the parameters of this theoretical distribution for each time period. Given these equations, the practitioner can easily calculate the parameters of the GEV distribution, and from this calculate relevant summary statistics such as the 5% point for any day of the year in either time period without recourse to the original data.

The methods have been applied to DET in De Bilt Holland. They show a very definite warming, as measured by DET, from the reference to the current period, particularly in the winter temperatures. The median DET has increased by 1.1 °C in the winter and by 0.9 °C in the summer and a 1.2 °C warming in the lower 2% in the winter and a 0.8 °C increase in the upper 2% in the summer. These are consistent with the analyses of Brown et al. (2008) which looked at extreme values in the maximum and minimum temperatures from January 1950 to 2004 and showed a 1.1 °C warming in maximum daily maximum temperatures in Europe and a 1.6 °C warming in the minimum daily minimum temperature in Europe. However, Brown et al.'s (2008) analysis uses observations above a pre-specified threshold in each year. The analysis in this paper focuses on describing the distribution of all observations which provides additional insights into how temperatures are changing through time.

Methods used in the analysis and description of extreme events may be useful when calculating confidence intervals if issues of correlated data and distributions with both left and right skew can be solved. The methodology described in this paper is a simple pragmatic approach to obtaining summaries that can be applied to other daily observations where distributions are both positive and negatively skewed. Abbreviations: DET: daily effective temperature; EPDF: empirical probability distribution function; GEV: generalized extreme value

Acknowledgements

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References

A. Fournier and M. Hoogwerf provided support in calculating daily effective temperature. Dr R. D. Stern was involved in earlier work that motivated part of this paper. Mr R. W. Burn provided thoughtful comments on the text.

References

  1. Top of page
  2. ABSTRACT
  3. 1. Introduction
  4. 2. Data
  5. 3. Method
  6. 4. Results
  7. 5. Discussion of alternative methodologies
  8. 6. Conclusions
  9. Acknowledgements
  10. References
  • Akaike H. 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6): 716723.
  • Ballester J, Giorgi F, Rodó X. 2010. Changes in European temperature extremes can be predicted from changes in PDF central statistics. Clim. Change 98: 277284.
  • Barrow EM, Hulme M. 1996. Changing probabilities of daily temperature extremes in the UK related to future global warming and changes in climate variability. Clim. Res. 6: 2131.
  • Beniston M, Stephenson DB. 2004. Extreme climatic events and their evolution under changing climatic conditions. Global Planet. Change 44: 19.
  • Brown SJ, Caesar J, Ferro CAT. 2008. Global changes in extreme daily temperature since 1950. J. Geophys. Res. 113: D05115.
  • Broyden CG. 1970. The convergence of a class of double-rank minimization algorithms 2, the new algorithm. J. Inst. Math. Appl. 6: 222231.
  • Coles S. 2001. An Introduction to Statistical Modelling of Extreme Values. Springer: London.
  • Coles S, Tawn T. 2005. Bayesian modelling of extreme surges on the UK east coast. Philos. Trans. R. Soc., A 363: 13871406.
  • Fletcher R. 1970. A new approach to variable-metric algorithms. Comput. J. 13: 317322.
  • Furrer EM, Katz RW. 2007. Generalized linear modelling approach to stochastic weather generators. Clim. Res. 34: 129144.
  • Goldfarb D. 1970. A family of variable-metric algorithms derived by variational means. Math. Comput. 24: 2326.
  • Jones PD, Horton EB, Folland CK, Hulme M, Parker DE, Basnett TA. 1999. The use of indices to identify changes in climatic extremes. Clim. Change 42: 131149.
  • McCullagh P, Nelder J. 1989. Generalized Linear Models. Chapman and Hall: London.
  • Maraun D, Rust HW, Osborn TJ. 2009. The annual cycle of heavy precipitation across the United Kingdom: a model based on extreme value statistics. Int. J. Climatol. 29(12): 17311744.
  • Menéndez M, Méndez FJ, Izaguirre C, Luceño A, Losada IJ. 2007. The influence of seasonality on estimating return values of significant wave height. Coastal Eng. 56: 211219.
  • R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna; ISBN 3-900051-07-0. http://www.R-project.org (accessed November 2011).
  • Richardson CW. 1981. Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resour. Res. 17: 182190.
  • S original (EVIS) by Alexander McNeil and R port by Alec Stephenson. 2008. evir: Extreme Values in R. R package version 1.6.
  • Shanno DF. 1970. Conditioning of Quasi-Newton methods for function minimization. Math. Comput. 24: 647656.
  • Solomon S, Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M, Miller HL (eds) 2007. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press: Cambridge, UK and New York, NY.
  • Stern RD, Coe R. 1984. A model fitting analysis of daily rainfall data. J. R. Stat. Soc. Ser. A 147: 134.
  • Underwood FM. 2009. Describing long-term trends in precipitation using generalized additive models. J. Hydrol. 364: 285297.
  • Wever N. 2008. Effectieve Temperatuur en Graaddagen: Klimatologie en Klimaatscenario's. KNMI Publicatie 219. http://www.knmi.nl/bibliotheek/publicatiemetnr101.html (accessed November 2011).
  • Yan Z, Bate S, Chandler RE, Isham V, Wheater H. 2002a. An analysis of daily maximum wind speed in northwestern Europe using generalized linear models. J. Clim. 15: 20732088.
  • Yan Z, Jones PD, Davies TD, Moberg A, Bergström H, Camuffo D, Cocheo C, Maugeri M, Demarée GR, Verhoeve T, Thoen E, Barriendos M, Rodríguez R, Martín-Vide J, Yang C. 2002b. Trends of extreme temperatures in Europe and China based on daily observations. Clim. Change 53: 355392.