Development of statistical models for at-site probabilistic seasonal rainfall forecast


  • Gabriele Villarini,

    Corresponding author
    1. Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ, USA
    2. Willis Research Network, London, UK
    • Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ, USA.
    Search for more papers by this author
  • Francesco Serinaldi

    1. Willis Research Network, London, UK
    2. School of Civil Engineering and Geosciences, Newcastle University, Newcastle Upon Tyne, UK
    Search for more papers by this author


A probabilistic seasonal rainfall forecasting system for the Bucharest-Filaret (Romania) station based on Generalized Additive Models in Location, Scale and Shape (GAMLSS) is proposed. First we develop statistical models to describe seasonal rainfall over the period 1926-2000, both considering the seasonal record as a continuous time series and accounting for seasonal changes, and by developing ad hoc models for each individual season. The Southern Oscillation Index (SOI), the North Atlantic Oscillation (NAO) and the seasonal rainfall for the previous year are included as possible covariates. Model selection is performed with respect to two penalty criteria [Akaike Information Criterion (AIC) and Schwarz Bayesian Criterion (SBC)], each of them leading to different final model configurations in terms of predictors and their functional relation to the parameters of the probability distribution. Retrospective forecast, in which the parameters of the models are re-estimated every time new information becomes available, is performed on a yearly basis for the period 1986-2000. The quality of the forecasts is assessed in terms of several accuracy measures and by visual examination of the forecasts' probability distributions. The best forecasts are obtained for the winter season. While it is not possible to identify a single ‘best’ model according to all the forecast measures, we recommend using the model that considers the seasonal rainfall as a continuous time series and penalized with respect to AIC. Copyright © 2011 Royal Meteorological Society

1. Introduction

Seasonal forecast of hydrometeorological variables, such as rainfall, temperature and discharge has been the object of numerous studies, and different approaches have been proposed and developed over the years. Seasonal forecasts can be deterministic (point forecast representing our best guess) or probabilistic (it provides the forecast in terms of probability of exceedance which reflects a range of possible values for the variable of interest), can be statistical in nature or based on numerical models. For an overview, the interested reader is referred to Palmer and Anderson (1994), Carson (1998), Goddard et al. (2001), and Troccoli (2010), among others.

Seasonal forecasts are founded on the principle that large-scale surface processes evolve slowly, providing predictability to the atmospheric component at seasonal time scales (Palmer and Anderson, 1994; Anderson, 2000). These forecasts can have large societal and economic repercussions, ranging from the energy sector to the insurance/reinsurance industry, from crop production to water resources management (Hansen et al., 2000; Alaton et al., 2002; Hamlet et al., 2002; Jewson and Caballero, 2003; Cantelaube and Terres, 2005; Abawi et al., 2007; Benth et al., 2007; Harrison et al., 2007; Nelson et al., 2007; Troccoli, 2007; Kumar, 2010; de Oliveira Cardoso et al., 2010). Our improved seasonal forecasts would contribute to more efficient management of natural resources, improved productivity and more accurate pricing of weather-related quantities.

Seasonal forecasts do not exhibit the same skill for all the meteorological variables and every area of the globe. Recently, Lavers et al. (2009) examined the skill of eight seasonal climate forecast models in forecasting rainfall and temperature, and found that these models exhibit significant skill in long-lead times only for temperature and for the equatorial Pacific Ocean. On the other hand, there is a very limited skill in rainfall forecast beyond one month, in particular at the mid-latitudes (see also Anderson, 2000; Doblas-Reyes et al., 2000; Graham et al., 2000; Vizard et al., 2005; and Alessandri et al., 2011, among others).

In this study, we follow an empirically based approach and develop statistical models to provide at-site probabilistic seasonal rainfall forecasts for the Bucharest-Filaret (Romania) station. While the largest potential rainfall predictability over Europe is for spring (Nranković et al., 1994; Doblas-Reyes et al., 2000; Graham et al., 2000; Lloyd-Hughes and Saunders, 2002), we develop models for all the seasons and compare forecast performance across them. Moreover, while in most of the cases linear regression is used to relate predictand and predictors (see Mason and Baddour, 2007; for a recent overview), we move away from statistical models using distributions from the exponential family (e.g. Gaussian, exponential) and consider a more general set of models. We also include nonlinear dependencies in the relation between covariates and predictand (see also Lo et al. (2007) and Maia and Meinke (2010) for nonlinear probabilistic approaches).

The basic idea is to use covariates available prior to the onset of the season to forecast, and use them to predict the values of the parameters of the selected probability distribution. By relating the parameters and the predictors, we obtain the forecast distribution for a given season (see Serinaldi (2011) for a recent example). The probabilistic forecasts represent the last of a set of steps we need to go through and related to answering the following questions:

  • 1.What are the appropriate probability distributions, predictors and their relation to the parameters of the probability distribution?
  • 2.Can these models be used to forecast seasonal rainfall? Do they perform better than naïve forecasts (e.g. the rainfall value for the previous year)?

These are all questions we address in this study. Rather than focusing on a single model, we show that there is not a single ‘best’ model from a statistical standpoint. The final model depends on whether we focus on individual seasons, or consider the rainfall time series as continuous and account for seasonal changes. Model selection (both in terms of predictors and their functional dependence on distribution's parameters) depends on the penalty rule we apply, the likelihood-based evaluation criterion, and whether we select a more or less parsimonious model. Apart from the differences in statistical structure of the forecast system, we also highlight how the forecast accuracy of these models depends on the performance measure used. Nonetheless, even though there is not a single model that is unequivocally better than the others, we provide general recommendations based on performance and parsimony.

This paper is organized as follows. In Section 2, we describe the data, the statistical procedures for covariate and model selections, and provide an overview of the different measures used to quantify the forecast performance. Section 3 describes the results of our analyses, followed by a summary and conclusions, which are presented in Section 4.

2. Data and methodology

Statistical modelling of seasonal rainfall is performed using measurements for the Bucharest-Filaret (Romania) station available through the European Climate Assessment & Dataset Project (Klein Tank et al., 2002; Klok and Klein Tank, 2009). We consider the continuous period from 1926 to 2000, and for every year and season (winter: December, January and February; spring: March, April, and May; summer: June, July, and August; fall: September, October, and November), the daily data is summed up to provide seasonally accumulated rainfall values. We use as covariates the Southern Oscillation Index (SOI; Trenberth, 1984; Ropelewski and Jones, 1987) and the North Atlantic Oscillation (e.g. NAO; Hurrell, 1995; Hurrell and Van Loon, 1997; Jones et al., 1997). NAO is defined as the normalized pressure difference between Gibraltar and SW Iceland, while SOI as the normalized pressure difference between Tahiti and Darwin. We have downloaded the time series of these predictors from the Global Climate Observing System (GCOS) Working Group on Surface Pressure (WG-SP) website (

2.1. Generalized additive models in location, scale and shape

We use the Generalized Additive Models in Location, Scale and Shape (GAMLSS; Rigby and Stasinopoulos, 2005) as modelling framework. The advantage of GAMLSS compared to the more classical Generalized Additive Models (Hastie and Tibshirani, 1990), Generalized Linear Models (McCullagh and Nelder, 1989; Dobson, 2001), Generalized Linear Mixed Models (McCulloch, 1997, 2003) or Generalized Additive Mixed Models (Fahrmeir and Lang, 2001) is that the variable of interest can be described by a distribution outside of the exponential family (e.g. Gaussian, gamma). This higher degree of flexibility allows considering distributions that are highly skewed and/or kurtortic, continuous or discrete. GAMLSS have been successfully used to model different hydrometeorological variables, including rainfall and flood peaks (Villarini et al., 2009a, 2009b, 2010, 2011; Villarini and Vecchi, 2011; Karambiri et al., 2011; Serinaldi and Cuomo, 2011).

We provide a brief overview of the GAMLSS. For a more comprehensive discussion, the interested reader is pointed to Rigby and Stasinopoulos (2005) and Stasinopoulos and Rigby (2007). Let Y be the predictand, with cumulative distribution function FY(yi; θi), where yi, for i = 1, …, n (in our case, yi represents the accumulated rainfall for the season of interest for the ith year) are assumed to be independent observations and equation image is a vector of p distribution parameters (the number of parameters p is usually smaller than or equal to four). There are several different possible models relating the parameters of the selected distribution to the predictors. In this study, we focus on the semi-parametric additive model formulation. Given an n length vector of the predictand yT = (y1, …, yn), we represent with gk(·) for k = 1, …, p the monotonic link functions which relate the distribution parameters to predictors through:

equation image(1)

where θk and ηk are vectors of length n, equation image is a parameter vector of length Jk, Xk is a known design matrix of order n × Jk, hjk are smoothing terms (e.g. splines, local linear smoothers) allowing a more flexible description of the dependence of the parameters of the selected distribution on the covariates. For a more detailed description about the theory behind GAMLSS, consult Rigby and Stasinopoulos (2005).

The selection of the significant predictors as well as their functional dependence on the parameters of the distribution is performed by penalizing more complex models with respect to both the Akaike Information Criterion (AIC; Akaike, 1974) and the Schwarz Bayesian Criterion (SBC; Schwarz, 1978). In this case, penalizing with respect to SBC would generally result in a model more parsimonious than one obtained by penalizing with respect to AIC. A stepwise procedure is employed to select the final models, using AIC or SBC as selection criteria. The stepwise procedure relies on the chi-square likelihood ratio test for the assessment of the significance of the improvement provided by nested models (Dobson, 2001). Because selection with respect to AIC or SBC does not provide information about the quality of the fit (Hipel, 1981), we use several different diagnostic tools to assess whether the model residuals are independent and Gaussian distributed. This property would indicate that the model is able to describe the systematic information, while the remaining information is random noise. We check the properties of the residuals by computing their first four moments, their Filliben correlation coefficient (Filliben, 1975), and by visual examination of residual plots, in particular qq plots and worm plots (van Buuren and Fredriks, 2001). Worm plots are detrended forms of qq plots, in which a flat worm supports the choice of the selected distribution. For more information about model selection and fitting, the interested reader is pointed to Stasinopoulos and Rigby (2007). All the calculations are performed in R (R Development Core Team, 2008); using the freely available gamlss package (Stasinopoulos and Rigby, 2007).

2.2. Statistical distributions and covariates

We consider two formulations: one in which each seasonal time series is modelled independently of the others (we refer to these as ‘seasonal models’), and one in which we model the seasonal records as a single continuous time series, in which a factor is introduced to discriminate among seasons (we refer to these as ‘global models’). Factor variables are categorical variables that, in the global model, denote the four seasons. The factorization allows focusing on the seasonality without introducing additional hypotheses on the patterns of the seasonal means and variances. In this way, each observation is labelled according to the season, the time series is implicitly partitioned in four seasons, but the fitting is performed on the whole sample. The empirical results will indicate whether it is better to describe the seasonal variability using a single statistical model, or whether the use of four different models would improve the description of the seasonal variability of rainfall for this station.

For the seasonal models, we focus on five two-parameter distributions: gamma, Gumbel, inverse Gaussian, lognormal, and Weibull (Krishnamoorthy, 2006; Table I). These distributions allow modelling changes in magnitude and variability exhibited by the time series. Following Serinaldi (2011), we use the four-parameter Johnson distribution (Johnson, 1949) for the global model. The Johnson distribution has the following probability density function:

equation image(2)

with y∈(−∞, ∞), the location parameter µ∈ (−∞, ∞), the scale parameter σ> 0, and the shape parameters ν∈(−∞, ∞) (it controls the skewness of the distribution, which is positive if ν> 0 and negative if ν< 0) and τ> 0 (it determines the kurtosis of the distribution, approaching a Gaussian distribution as τ→∞). Moreover:

equation image(3)
equation image(4)
equation image(5)

with w = exp(1/τ2) and Ω = − ν/τ, and Z is a standard normal random variable. The first two moments of this distribution are E[Y] = µ and Var[Y] = σ2, respectively. For sake of parsimony and simplicity, only the µ and σ parameters are assumed to depend on the covariates, while the shape parameters ν and τ are set constant (Serinaldi, 2011).

Table I. Summary of the five two-parameter distributions considered in this study for the seasonal model
 Probability density functionDistribution moments
Gumbelequation imageE[Y] = µ + γσµ + 0.57722σ
 − ∞ < y < ∞, − ∞ < µ < ∞, σ > 0Var[Y] = π2σ2/61.64493σ2
Weibullequation imageequation image
 y > 0, µ > 0, σ > 0equation image
Gammaequation imageE[Y] = µ
 y > 0, µ > 0, σ > 0Var[Y] = σ2µ2
Inverse Gaussianequation imageE[Y] = µ
 y > 0, µ > 0, σ > 0Var[Y] = σ2µ3
Lognormalequation imageE[Y] = ω1/2eµ
 y > 0, µ > 0, σ > 0Var[Y] = ω(ω− 1)e, where ω = exp(σ2)

We use as covariates SOI (Trenberth, 1984; Ropelewski and Jones, 1987) and NAO (Hurrell, 1995; Hurrell and Van Loon, 1997; Jones et al., 1997) because numerous studies found that various circulation patterns (some included the two indices) are important descriptors of rainfall variability in this area (Fraedrich and Müller, 1992; Fraedrich, 1994; Busuioc and von Storch, 1996; Van Oldenborgh et al., 2000; Mares et al., 2002; Lloyd-Hughes and Saunders, 2002; Haylock and Goodess, 2004; Stefan et al., 2004; Tomozeiu et al., 2005; Brönnimann et al., 2007; Scaife et al., 2008; Niedźwiedź et al., 2009). For a given season, we also include the seasonal rainfall measured the previous year as additional covariate. As mentioned before, for the global model we use factors to distinguish among seasons. It is worth noting that the Bucharest-Filaret station is located in an urban area. Besides the large-scale influence, rainfall amounts are also influenced by urbanization (Busuioc and von Storch, 2003). The latter influence is not directly included in the statistical model at the present time.

Because we are interested in developing a seasonal forecast model, we use only information that would be available 1–3 months prior to the beginning of the season we want to forecast. We indicate these predictors with a subscript ‘− 1’, ‘− 2’ or ‘− 3’ (e.g. for winter forecast, NAO−1, NAO−2 and NAO−3 represent the NAO values in November, October and September, respectively). We also indicate with R−1 the seasonal rainfall value measured the previous year (e.g. if we forecast the winter season for 1995, R−1 represents the winter seasonal rainfall value for 1994). We perform model selection for both the seasonal and global models, and use both AIC and SBC as penalty criteria.

2.3. Seasonal forecasts

The model selection procedure described in Section 2.2 provides the structure of the final models, both in terms of covariates as well as their functional dependence on the parameters of the distribution. The values of the coefficients computed from the entire record, however, do not allow a direct assessment of the forecast performance of our models. For this reason, we start by using the data from 1926 to 1985 to compute the models' coefficients, and use these models to forecast the seasonal values for 1986. We then add the information for 1986, and estimate the models' coefficients from the data for the period 1926–1986. We then use these models to forecast the values for 1987. We repeat these steps for every year up until 1999, in which we fit the models over the period 1926–1999, and use these models to forecast the value for 2000 (the last year in the record). This is similar to the ‘retroactive validation’ described in Mason and Baddour (2007). This approach is different from the classic ‘calibration-validation,’ and the forecasted values over the validation period are not homogeneous because they are not obtained from the model fitted over one fixed interval. This approach, however, uses all the new additional data available from year to year, and was also used in other disciplines, such as econometrics (Weron, 2006).

We adopt different approaches to assess the forecast quality, based on several measures of accuracy as well as on visual examination of the probabilistic forecasts. The accuracy of the forecast is quantified using nine indices and using the median of the forecast as ‘best estimate’ (Table II). Each of these measures have strengths and weaknesses (Jachner et al., 2007; Dawson et al., 2007; Reusser et al., 2009), and the use of more than a single index is recommended to highlight the model's performance for different applications (Dawson et al., 2007; Villarini et al., 2008; Napolitano et al., 2011). Following Hyndman and Koehler (2006), we can group the nine indices into two main categories:

  • 1.Absolute indices: mean error (ME), mean absolute error (MAE), median absolute error (mdAE) and root mean squared error (RMSE) quantify the differences between observed and forecasted values in the original units. Even though these four indices belong to the same broad group, they highlight different types of discrepancies between the observed and forecasted values. ME can help identifying possible biases, and small and high difference values receive the same weight. Different from ME, MAE, MdAE and RMSE are non-negative indices, with RMSE penalizing more large discrepancies compared to the other two. Compared to MAE, MdAE is less sensitive to skewed error distributions.
  • 2.Relative errors: mean percentage error (MPE), mean absolute percentage error (MAPE), median absolute percentage error (MdAPE) and root mean squared percentage error (RMSPE) provide a measure of the relative error compared to a reference value, rather than an absolute value in the original units. They are adimensional and highlight how a difference of 1, for instance, has a much larger impact if the observed value is 1 or 100 (Dawson et al., 2007). These are the ‘relative’ counterparts of the ‘absolute’ indices described above.

We also include the geometric reliability index (GRI; Leggett and Williams, 1981). This index provides information in a geometric sense, by measuring the accuracy of the forecasted value within a multiplicative factor. For a given GRI value, a possible interpretation of this index is that the observed value falls within 1/GRI and GRI times of the corresponding forecasted value (Jachner et al., 2007). These nine indices complement the visual examination of the time series of probabilistic forecasts over the period 1985–2000 (Laio and Tamea, 2007).

Table II. Summary of the measures of performance used in the study
IndexFormulaLBUBNo error
  1. Their lower and upper bounds (LB and UB), and the values corresponding to perfect agreement (No error) are also included. y and ŷ denote the observed series and the model forecast, respectively.

MEequation image− InfInf0
MAEequation image0Inf0
RMSEequation image0Inf0
MPEequation image− InfInf0
MAPEequation image0Inf0
MdAPEmedian equation image0Inf0
RMSPEequation image0Inf0
GRIequation image where equation image1Inf1

3. Results

3.1. Model selection

In this section, we investigate what the most appropriate covariates and what their dependence on the model parameters (linear or by means of cubic splines) are. We use two different penalty criteria (AIC and SBC) and a stepwise method for variable selection.

Let us start with the global model. Depending on the penalty criterion, we obtain two different models (Table III). When penalizing with respect to AIC, the location parameter µ depends on season, used as a factor to distinguish among them, NAO−1 and NAO−3 via a cubic spline, and linearly on NAO−2 and SOI−2. The scale parameter σ linearly depends (via a logarithmic link function) on NAO−1, NAO−3, SOI−1 and R−1 (the shape parameters are kept constant). As shown in Figure 1 (top panel), this model is able to describe well the intra- and inter-annual variability exhibited by the data. Moreover, fit diagnostics (Figure 2, top panel) are quite satisfactory, supporting our model selection.

Figure 1.

Global modelling of seasonal rainfall, using AIC (top panel) and SBC (bottom panel) as penalty criterion for model selection. The white line represents the median (50th percentile), the dark grey region the area between the 25th and 75th percentiles, while the light grey region the area between the 5th and 95th percentiles. The black circles represent the observed seasonal values

Figure 2.

Worm plots for the two models in Figure 1. For a good fit, the points should be on the black line and between the two grey lines

Table III. Summary of the distributions and dependencies of their location and scale parameters on the predictors
ModelDistributionParameter µParameter σ
  • A linear dependence is assumed, while ‘cs(·)’ indicates a cubic spline. The symbol ‘

  • **

    ’ (‘* ’) indicates that the value of the coefficient is statistically different from zero at the 5% (10%) significance level.

Winter (AIC)Weibullcs(NAO−1)** , cs(SOI−3)** , R−1*NAO−1
Winter (SBC)WeibullNAO−1** , SOI−3** , R−1*
Spring (AIC)Weibullcs(NAO−1)**SOI−1**
Spring (SBC)WeibullSOI−1**
Summer (AIC)GammaSOI−1**NAO−2, R−1
Summer (SBC)GammaSOI−1**
Fall (AIC)GammaNAO−1**SOI−2**
Fall (SBC)GammaNAO−1**SOI−2**
Global (AIC)JohnsonSeason** , cs(NAO−1)** , NAO−2** , cs(NAO−3)** , SOI−2**NAO−1** , NAO−3** , SOI−1** , R−1**
Global (SBC)JohnsonSeason** , NAO−2**R−1**

When using SBC as penalty criterion, µ depends only on the factor for season and NAO−2, while log(σ) depends linearly on R−1 (Table III). In this second case, the model is much more parsimonious both in terms of number of predictors, as well as their dependence on the model's parameters, because SBC applies a larger penalty for additional degrees of freedom used for the fit. This model is still able to capture the seasonal changes in rainfall, even though there is a less marked inter-annual variability (Figure 1, bottom panel). The diagnostics highlight the very good fit of the model (Figure 2, bottom panel).

In addition to the global models, we have also developed statistical models for each individual season. For a given season and penalty criterion, we have used a stepwise method for covariate selection, resulting in five different final models (one per distribution in Table I). Out of these five models, we have then selected the one with the lowest value of the penalty criterion. The Weibull distribution is selected for winter and spring rainfall, while the gamma distribution for summer and fall, regardless of the penalty criterion (Table III). While the distributions are the same for both AIC and SBC, the covariates are not (with the exception of fall). For winter and using AIC, log(µ) depends linearly on R−1 and by means of a cubic spline on NAO−1 and SOI−3, while log(σ) depends linearly on NAO−1. This model describes well the variability in the data, and the alternating of periods of with higher and lower rainfall (Figure 3, top-left panel). Compared to the results from the global model (Figure 3, top-right panel), it describes better the observed pattern, with a narrower conditional distribution. When penalizing with respect to SBC, log(µ) depends linearly on R−1, NAO−1 and SOI−3, with constant σ. In this case, we still capture the inter-annual variability, but the conditional distribution is wider (Figure 4, top-left panel), as for the global winter model (Figure 4, top-right panel).

Figure 3.

Seasonal modelling of rainfall using AIC as penalty criterion for model selection (left panels). The white line represents the median (50th percentile), the dark grey region the area between the 25th and 75th percentiles, while the light grey region the area between the 5th and 95th percentiles. The modelling results for each individual season from the global model are shown for comparison in the right panels. The black circles represent the observed seasonal values

Figure 4.

Same as Figure 3, but using SBC as penalty criterion

The seasonal spring model (penalized with respect to AIC) has µ depending on NAO−1 by means of a cubic spline, and log(σ) linearly on SOI−1. This model reproduces the variability in an average sense, but not the large excursions from the median (Figure 3). On the other hand, the spring global model describes better both the changes in magnitude and variability, possibly because of the information provided by the seasonal factors. When penalizing with respect to SBC, there is only a linear dependence of log(σ) on SOI−1, making this model even less able to describe the temporal changes (Figure 4). The corresponding global model exhibits more inter-annual variability, even though 90% limits of the conditional distributions range between 50 and 250 mm independently of the year. The µ parameter of the summer seasonal model (penalized with respect to AIC) linearly depends on SOI−1, and σ on NAO−2 and R−1 (in both cases, we use a logarithmic link function). The conditional distribution is wide, and this is a feature shared by both the seasonal and global models (Figure 3, third row). The seasonal model penalized with respect to SBC has only log(µ) linearly dependent on SOI−1, and the results are similar to what observed for the AIC-penalized model (Figure 4, third row). The fall seasonal model is the same for both penalty criteria. The parameter µ (σ) depends linearly on NAO−1 (SOI−2) via a logarithmic link function. This model describes reasonably well the variability in the data (Figures 3 and 4, bottom panel), with a performance similar to the one obtained from the global model. The analyses of the residuals for all these models support our selections (Figure 5).

Figure 5.

Worm plots for the seasonal models based on AIC (left panels; Figure 3) and SBC (right panels; Figure 3)

Among the different seasons, the best visual agreement between models and observations is for winter. This behaviour is likely related to the different seasonal rainfall generating mechanisms and their relation to NAO and SOI. The winter season is characterized by large-scale weather patterns that are better captured by the climate indices, in particular NAO. On the other hand, summertime precipitation is associated with convective activity (Paraschivescu et al., 2011), that is more spatially localized and less directly related to NAO or SOI. Moreover, different covariates are retained depending on the season, with NAO that is mostly related to fall–winter rainfall, while SOI to spring–summer rainfall (see also Busuioc and von Storch, 1996; Van Oldenborgh et al., 2000; Mares et al., 2002; Lloyd-Hughes and Saunders, 2002; Tomozeiu et al., 2005).

The comparison between global and seasonal models does not suggest that the latter performs better than the former, despite the fact that we have developed an ad hoc model for each season. On the contrary, the global model describes better the observations for spring, summer and fall. These results suggest that it would be possible to use a single model for the whole year, rather than running four different seasonal models. Finally, a model penalized with respect to AIC performs better than one based on SBC, and better captures the inter-annual variability exhibited by the data.

3.2. Seasonal forecast

We use the model structure derived from the entire period (Table III), but then refit the model's parameters in a retrospective forecast mode for every year when new information becomes available. For each season, we have four different statistical models (global and seasonal models, and for each of them the AIC and SBC are used as penalty criteria). The forecast accuracy (using the median as reference) is assessed using the nine indices in Table II. Moreover, we use as naïve forecast the rainfall value for the previous year (one-year persistence). Because of the nature of this system, we not only have the point forecast (the median) but the entire distribution, which provides information about our confidence in the forecast.

Let us start by examining the time series of the forecast distributions for the models based on AIC (Figure 6). The winter forecasts are the ones that more closely follow the observations. This statement is valid for both the central tendency and the spread of the distribution. The seasonal and global models perform comparably well in forecast mode, with the former that presents slightly tighter forecast distribution. The forecast performance for the other seasons is not as good as for winter. More specifically, while the median tracks the variability in the data, the confidence intervals of the forecast distributions do not. This is particularly true for spring and summer. Compared to the seasonal models, the global model provides an overall better visual description of the observations. This is consistent with the modelling results in Figure 3.

Figure 6.

Seasonal forecast of rainfall based on the seasonal models (left panels) and the global model (right panels), and using AIC as penalty criterion. The white line represents the median (50th percentile), the dark grey region the area between the 25th and 75th percentiles, while the light grey region the area between the 5th and 95th percentiles. The hatched areas indicate the forecasted values. The black circles represent the observed seasonal values

Figure 7 summarizes the results from the models for which SBC was the penalty criterion. Compared to Figure 6, there is a visual worsening of the forecast performance. This statement is valid independently of the season and model. For winter, the largest differences with respect to the AIC-penalized models are not so much in terms of the median as in terms of the spread of the distribution, in particular for the global model. This is related to the lack of predictors for σ for the seasonal model and to the much smaller number of covariates included in the global model (Table III). The results for the other seasons are even more representative of the differences between AIC and SBC models, as exemplified by the spring forecast by the seasonal model. The global model better describes the data variability for these three seasons.

Figure 7.

Same as Figure 6, but using SBC as penalty criterion

Apart from visual examination of the forecast distributions, we use the nine indices in Table II to quantitatively assess the accuracy performance. The results for the winter season (Figure 8) indicate how the AIC-global model performs better than any other model for almost all the indices. This is generally followed by the AIC-seasonal model. As expected, the SBC models generally perform worse than the corresponding AIC ones because of their more parsimonious nature. These models improve over the naïve forecasts, with the only exclusion of ME and MPE. This is likely because naïve forecasts are simply the lag-1 backward shifted observations, so that the long-term averages of these two series are expected to be the same and the mean error to be zero. However, since the validation sample is small, in some cases the error can be large especially for the spring and summer seasons. Moreover, since ME points out the possible bias of the point estimates, it is also expected that the models' median forecasts, which are unavoidably biased when the conditional distribution is not symmetric, can perform worse than naïve forecasts according to this index.

Figure 8.

Summary of the nine indices in Table II to evaluate the forecast accuracy of seasonal and global models (based on both AIC and SBC) for the winter

We obtain similar overall results for the other three seasons (Figures 9–11). In these cases, the statistical models perform much better than the corresponding naïve forecasts. The global models perform better than the corresponding seasonal ones, even though this statement is not valid for all the indices.

Figure 9.

Same as Figure 8, but for spring

Figure 10.

Same as Figure 8, but for summer

Figure 11.

Same as Figure 8, but for fall

On the basis of these results, it is not possible to identify a single ‘best’ model according to all the indices. We can provide, however, recommendations on a suite of models that performs better overall. We think that models based on AIC (rather than SBC) should be preferred. Even though the number of degrees of freedom used for the fit are larger, they perform better both in terms of accuracy performance but also in terms of width of the forecast distribution. The AIC-global model is able to better describe the magnitude and variability exhibited by the data, requiring a smaller number of parameters. For winter, the AIC-seasonal model is the one that combines narrower forecast distributions and good accuracy, but is still able to well describe the data. To summarize, while the AIC-seasonal model is recommended for winter, the AIC-global model is the suggested choice across all the seasons.

4. Conclusions

We have developed a set of statistical models for probabilistic seasonal forecast of rainfall for the Bucharest-Filaret (Romania) station. The main findings of this study can be summarized as follows:

  • 1.We have modelled seasonal rainfall using GAMLSS. We have proposed and developed different configurations, depending on whether we modelled the seasonal rainfall as a continuous time series or each season separately. Moreover, two penalty criteria (AIC and SBC) were used for the selection of the covariates and their functional dependence on the parameters of the distribution. Two climate indices (NAO and SOI) were used as possible predictors, together with the seasonal rainfall value for the preceding year.We used the Johnson distribution for the global model as far as the seasonal model is concerned, the Weibull distribution was selected to model winter and fall rainfall, while the gamma distribution to model spring and summer rainfall. Strengths and weaknesses of each model are described. Different final model configurations are achieved depending on the penalty criterion. NAO is generally selected during the fall and winter seasons, while SOI during spring and summer. The season that is best modelled (both in terms of magnitude and variability) is winter, likely due to the strong link between NAO and the storm systems during this season. In general, we recommend using the global model (penalized with respect to AIC) for all the seasons, while the seasonal model (penalized with respect to AIC) for winter.
  • 2.We have quantified the forecast accuracy of these models with nine performance indices. We have used visual examination of the time series of the forecast distributions as a graphical way to examine the probabilistic forecasts. The winter forecasts are overall better than for the other seasons. While it is not possible to identify a single ‘best’ model in terms of all these measures, we recommend using the global model (penalized with respect to AIC) for seasonal forecast of rainfall. On the other hand, the seasonal model for winter (penalized with respect to AIC) provides the best overall description of the data.

In this study, we have focused on a single station to describe the utility of GAMLSS as a forecasting tool and also to present a large set of different possible model configurations. In future studies, we will expand the methodology presented in this work in different directions. One venue is to include seasonal forecasts of other hydrometeorological variables related, for instance, to temperature and discharge. It would also be possible to consider not only seasonally aggregated or averaged quantities, but also extremes. Another topic that we will explore in the future would include moving away from the at-site modelling and generate seasonal forecasts at multiple locations by accounting for their multivariate dependence. The larger sample size involved in regional analyses will allow the more refined assessment of the models' performance by quantitative techniques and formal statistical tests (Laio and Tamea, 2007; Napolitano et al., 2011; Serinaldi et al., 2011).


This research was funded by the Willis Research Network. The authors acknowledge Dr Klein Tank and coauthors for making the data available as part of the European Climate Assessment (ECA) project. The authors would also like to thank Drs Stasinopoulos, Rigby and Akantziliotou for making the gamlss package (Stasinopoulos et al., 2007) freely available in R (R Development Core Team, 2008).