Understanding the effect of climatology on monthly rainfall amounts in Australia using Tweedie GLMs

Authors

  • Md. Masud Hasan,

    Corresponding author
    1. Faculty of Science, Health and Education, University of the Sunshine Coast, Maroochydore DC, Qld 4558, Australia
    • Faculty of Science, Health and Education, University of the Sunshine Coast, Maroochydore DC, Qld 4558, Australia.
    Search for more papers by this author
  • Peter K Dunn

    1. Faculty of Science, Health and Education, University of the Sunshine Coast, Maroochydore DC, Qld 4558, Australia
    Search for more papers by this author

Abstract

Rainfall models are used to understand the effect of various climatological variables on rainfall amounts. The models also have potential uses in predicting and simulating rainfall. We use Tweedie generalized linear models to model monthly rainfall amounts and occurrence simultaneously with a set of predictors (sine term, cosine term, NINO 3.4, SOI and SOI phase). Models are fitted to the monthly rainfall data of 220 Australian stations with 4 stations as case studies. First, models with only sine and cosine terms (the base model) are fitted to model the cyclic pattern of rainfall data, and then one of the climatological variables is added each time in addition to the base model. On the basis of the BIC, the model with NINO 3.4 is preferred for most of the studied stations. Stations for which the model using the SOI is preferred appear in small clusters. Adding the climatological variables to the base model improves the fit of the model and makes substantial changes in the predicted mean monthly rainfall amount and probability of getting a dry month. The climatological variables have significant impacts on the amount of rainfall in most stations located on the eastern and northeastern regions of Australia. The models used lags one of the climatological covariates (i.e. value of the covariates of previous month with rainfall amount of a month) and are useful for one month lead rainfall prediction. Copyright © 2011 Royal Meteorological Society

1. Introduction

Rainfall models are useful for understanding the impact of different climatological variables on the occurrence and amount of rainfall, and have potential uses in predicting and simulating rainfall. Consequently, rainfall models are useful for modelling the growth and development of crops, developing crop simulation models, and for agricultural planning and management (Lennox et al., 2004; Shui and Haque, 2004; Hansen et al., 2009). Prediction of monthly and seasonal rainfall amounts is useful in determining supplemental irrigation, water requirements, storage of water, and for reservoir management.

The use of the simplest statistical model, the linear regression model, is not appropriate for modelling monthly rainfall amounts for Australian stations, which are generally highly right-skewed. For modelling skewed data, transformations of the rainfall totals are commonly used (Mooley, 1973; Meng et al., 2007). Sometimes, rainfall totals are converted to rainfall anomalies (Lorenzo et al., 2010) or a standardized precipitation index (Loukas and Vasiliades, 2004; Almeira and Scian, 2006; Gonzalez and Cariaga, 2009) to fit linear regression models to the converted indices. However, because of the transformation, some information regarding the rainfall data is lost.

For modelling right-skewed rainfall amounts, the assumption of a gamma distribution has been used extensively in the literature on rainfall modelling (Allan and Hann, 1975; Feuerverger, 1979; Chapman, 1998; Wilks, 1999; Chandler and Wheater, 2002), often with variations or extensions (Das, 1955; Stern and Coe, 1984; Wilks, 1990).

Another difficulty when modelling monthly rainfall is that rainfall may have both discrete (exact zero for dry months) and continuous (amount of rainfall for wet months) components. For the rainfall models used in crop simulation, it is common practice to use Markov chains to model the occurrence and gamma distributions to model the amount of rainfall (Richardson and Wright, 1984; Hamlin and Rees, 1987; Rosenberg et al., 2004; Hansen and Ines, 2005; Abtew et al., 2009). Buishand et al. (2004) used logistic regression to model occurrence and gamma distribution to model the amount of monthly rainfall. To model continuous data with exact zeros, some authors proposed mixture models between Bernoulli and gamma or lognormal distributions (Piantadosi et al., 2008; Fernandes et al., 2009; Little et al., 2009). An alternative approach was adopted by Glasbey and co-workers (Glasbey and Nevison, 1997; Durban and Glasbey, 2001), who applied a monotonic transformation of rainfall data to define a latent Gaussian variable with zero rainfall corresponding to censored values below some threshold (1.05 mm).

The rainfall models based on the Tweedie distributions (Smyth, 1996; Jørgensen, 1997; Dunn 2004; Dunn and Smyth, 2005) model the exact zeros and the amount of rainfall simultaneously. The models can be used in the generalized linear modelling framework. Previously, the Tweedie distributions have been shown to model monthly rainfall data well, by fitting separate models for each month but using no predictors (Hasan and Dunn, 2010a). Cyclical patterns are likely to be evident in rainfall; for example, most locations have drier and wetter months consistently from year to year. Using Tweedie generalized linear models (GLMs), Hasan and Dunn (2010b) fitted models with sine and cosine terms as predictors to model the monthly rainfall amounts in Australia.

Apart from the cyclical patterns, climatological factors, such as the Southern Oscillation Index (SOI) and the sea surface temperature anomaly, influence monthly rainfall. The relationship between the SOI and rainfall in Australia has been well known for many years and studied by numerous authors (e.g. Troup, 1965; Quinn and Burt, 1972; Power et al., 1997; Simmonds and Hope, 1997; Chowdhury and Beecham, 2010), among others. Correlation coefficients and linear regression (Power et al., 1997; Simmonds and Hope, 1997; Almeira and Scian, 2006) have been used to understand the relationship between SOI and amount of rainfall. Using the SOI, Stone and Auliciems (1992) constructed five SOI phases which can then be used to study the effect of the SOI on rainfall amounts. The SOI phases have proven useful for studying their effects on rainfall and on different types of cropping in Australia (e.g. Stone and McKeon, 1993; Hammer et al., 1996; Meinke et al., 1996; Stone et al., 1996b; Meinke and Ryley, 1997; Willcocks and Stone, 2000). The NINO 3.4 index is one of the El Niño Southern Oscillation indicators based on sea surface temperature and is related to the monthly rainfall amounts of Australia. Several authors used linear regression analysis to study the effect of NINO 3.4 on the rainfall amounts (Wu and Kirtman, 2007; Everingham and Reason, 2009; Lee et al., 2009). The use of linear regression models is not appropriate for understanding the relationship between the climatological variables and monthly rainfall totals of Australian stations because the rainfall distribution is not normal and the relationship is not approximately linear.

In this paper, Tweedie GLMs will be fitted to understand the effect of the abovementioned climatological variables on monthly rainfall amounts of Australian rainfall stations. First, the simplest model with sine and cosine terms as predictors will be fitted (the base model) as in Hasan and Dunn (2010b). Then one of the three climatological variables (NINO 3.4, SOI and SOI phase) will be added each time to the base model. Thus, in the study, we have four different models (one base model and three models with climatological covariates). The impacts of the climatological variables on monthly rainfall totals after adjusting for the sine and cosine terms will be studied. We then examine the preferred model for each of the 220 Australian stations and identify geographical regions where each climatological variable is superior for modelling monthly rainfall. Statistical methods will be used to assess the extent to which each model with a climatological variable improves the base model for modelling monthly rainfall on different regions across Australia. Values of the climatological variables from the previous month (i.e. a lag of one) will be used in the models for one month lead prediction of rainfall. We will also use the features of Tweedie models to simultaneously examine the improvement in predicted mean rainfall and predicted probability of no rainfall using each climatological variable.

We first discuss the data (Section 2), and then introduce the Tweedie distributions and their properties (Section 3). In Section 4, different models, different model comparison criteria, and test statistics are defined, and a detailed interpretation of the results is presented. The concluding comments are in Section 5.

2. Data

Monthly rainfall data from 220 Australian stations are studied (data obtained from the Australian Bureau of Meteorology), with 4 stations as case studies: Bidyadanga and Trayning, from Western Australia, Theebine from Queensland, and Clarence from Victoria (Figure 1). These stations represent a variety of types of rainfall distribution in Australia. Most of the studied stations (92%) have data available from 1910 or earlier to the end of 2007. The remaining stations are generally located in remote areas, and have rainfall records from 1950, or earlier, to the end of 2007.

Figure 1.

Location of the stations studied. The four case studies mentioned in the paper are named and indicated using squares; grey dots represent the other studied stations

NINO 3.4 is the average sea surface temperature anomaly in the region bounded by 5°N–5°S, 120°W–170°W. The region has large variability on El Niño time scales, and so is used by many authors to understand their impact on rainfall amounts for different parts of Australia (Wu and Kirtman, 2007; Everingham and Reason, 2009; Lee et al., 2009). The monthly NINO 3.4 data used in the study is obtained from http://www.cgd.ucar.edu/cas/catalog/climind/TNI_N34/index.html#Sec5. In the early part of the record there is a considerable difference in estimation of the magnitude of El Ninño events (reconstructed and reanalysed), even though strong correlation exists between the two time series (Giese et al., 2010).

The SOI is standardized fluctuations in the air pressure difference between Tahiti and Darwin (Troup, 1965). Sustained negative values of the SOI often indicate El Niño episodes. These negative values are usually accompanied by sustained warming of the central and eastern tropical Pacific Ocean, a decrease in the strength of the Pacific trade winds, and a reduction in rainfall over eastern and northern Australia. Positive values of the SOI are associated with stronger Pacific trade winds and warmer sea temperatures to the north of Australia, known as a La Niña episode. Waters in the central and eastern tropical Pacific Ocean become cooler during this time. The phases of the SOI (Stone and Auliciems, 1992; Stone et al., 1996a) are defined using a cluster analysis to group all sequential two-month pairs of the SOI into five clusters. The phases are consistently negative, consistently positive, falling, rising, and consistently near-zero. The SOI and SOI phase data used in the study were obtained from http://www.longpaddock.qld.gov.au/seasonalclimateoutlook/.

Lagged values of the climatological variables will be used; that is, the models use the value of the climatological variables of the previous month to model the rainfall amount of a given month.

3. Models

3.1. Exponential dispersion models

Generalized linear models, as used in this paper, are discussed in Section 3.3. Here, we introduce the probability models upon which GLMs are based, in general, and the Tweedie exponential dispersion models in particular, in the following section.

A probability function of the form

equation image(1)

for yS, for some suitable function a(y, ϕ) and known functions θ and κ(.), is called an exponential dispersion model (EDM). The mean is µ = κ′(θ) and ϕ> 0 is the dispersion parameter. The function a(y, ϕ) cannot always be written in closed form and is the function necessary to ensure the total integral or summation of Y over the domain S is one. Examples of EDMs include the normal, binomial, gamma and Poisson distributions.

The notation YED(µ, ϕ) indicates the random variable Ycomes from an EDM with location parameter E[Y] = µ = κ′(θ) and variance var[Y] = ϕκ″(θ), as in Equation (1). The functional relationship between µ and θ defined by µ = κ′(θ) is invertible, so the variance can be written as var[Y] = ϕV(µ), when V(µ) is called the variance function.

3.2. The Tweedie family

A special case of EDMs is the Tweedie family of distributions studied by Jørgensen (1987, 1997) and named in honour of Tweedie (1984). The Tweedie family of distributions are those EDMs with variance function V(µ) = µp for equation image (Jørgensen, 1987). Further information is available in Smyth (1996), Jørgensen (1997), and Dunn and Smyth (2005). The Tweedie distribution with mean µ, dispersion parameter ϕ and index parameter pis denoted Twp(µ, ϕ).

To determine the appropriate distribution for modelling the monthly rainfall data, we examine the mean–variance relationship of the monthly rainfall totals. By plotting the log of mean against the log of variance of monthly rainfall totals, Hasan and Dunn (2010b) show that the relationship can be suitably expressed as

equation image

Rearranging,

equation image

That is, the variance is approximately proportional to some power of the mean. Hence, Tweedie distributions are appropriate for modelling monthly rainfall totals of Australian rainfall stations.

There are four notable special cases of Tweedie distributions: the normal distribution (p = 0), the Poisson distribution (p = 1, ϕ = 1), the gamma distribution (p = 2) and the inverse Gaussian distribution (p = 3). Apart from these special cases, the probability functions for the Tweedie distributions have no closed form. For p ≥ 2, the distributions are suitable for modelling positive, right-skewed data. Of special interest are the distributions for which 1 < p < 2, also called the Poisson-gamma distributions (Dunn and Smyth, 2005). In this context, the probability distributions for which 1 < p < 2 can be developed as follows.

Assume any rainfall event i produces an amount of rainfall Ri, and that each Ri comes from a gamma distribution Gam(α, γ). In this parameterisation, the mean is αγ and the variance is αγ2. Assume the number of rainfall events in any one month is N, where N has a Poisson distribution with mean λ; that is N∼ Pois(λ). This implies months with no rainfall when N = 0. The total monthly rainfall, Y, is the sum Y = R1 + R2 + ·+ RN, where N has a Poisson distribution with mean λ. When N = 0, then Y = 0.

One of the important properties of Tweedie GLMs is that they provide a mechanism for understanding the fine-scale structure in coarse-scale data, and consequently are useful for the disaggregation of monthly rainfall to a daily timescale for incorporation into cropping system and other models (Dunn, 2004). This model is used in this paper to model monthly rainfall, and complements the work of Hamlin and Rees (1987), Chandler and Wheater (2002), Buishand et al. (2004) and others who used two separate models to model the occurrence and quantity of rainfall separately. Smyth (1996), Dunn (2004), Lennox et al. (2004), Dunn and White (2005) and Hasan and Dunn (2010a, 2010b) used these distributions in related contexts.

3.3. Generalized linear models

GLMs consist of two components (McCullagh and Nelder, 1989; Dobson and Barnett, 2008): (1) The response variable Yi follows an EDM family distribution, with mean µi and dispersion parameter ϕ such that YiEDi, ϕ/wi) for i = 1, 2, …, n, where wi > 0 are known prior weights. This is the random component; (2) The mean equation image is related to the predictors through a monotonic, differentiable link function g(.) so that equation image where equation image is a matrix; equation image are (n × 1) vectors of predictors and equation image is a (p × 1) vector of unknown regression coefficients. This is the systematic component.

Often, the linear predictor, equation image is denoted by ηi, when equation image where Xi is row i of matrix X.

3.4. Model fitting

Fitting the Tweedie family requires estimates of equation image, ϕ and p. Estimating equation image for given p is performed using a usually robust iterative procedure called iteratively reweighted least-squares (McCullagh and Nelder, 1989). Many software packages fit GLMs; here, we use R (R Development core team, 2009), the techniques expounded in Dunn and Smyth (2005, 2008), and the corresponding R packages statmod (Smyth, 2009) and Tweedie (Dunn, 2010).

To get the maximum likelihood estimate (MLE) of p, a profile (log-) likelihood plot is used; this requires the computation of the density. To estimate p for any postulated model, proceed as follows: For a given value of p assumed fixed, the MLE of equation image and ϕ are found as above, and the log-likelihood computed. This is repeated for a range of p values and, because of the associated computational burden, a cubic spline interpolation through these computed points is fitted. The value of p for which the log-likelihood is maximized is chosen as the MLE, . R functions (Smyth, 2009; Dunn, 2010) are used to automate the process. The probability of a particular month having no rainfall, π0 = P(Y = 0), is a function of ϕ. Finding the MLE of π0 requires using the MLE of ϕ. The MLE of ϕ is difficult to compute but the algorithms used to estimate p also compute the MLE of ϕ (Dunn and Smyth, 2005).

4. Model results

4.1. The models

Models of the form equation image are fitted, based on Tweedie EDMs. To fit models, first the appropriate estimates of the index parameter p are found to determine the particular Tweedie distribution for the model. Except for 4 rainfall stations (which have close to but larger than 2), all the 220 studied stations have between 1 and 2, and for most of the studied stations is close to 1.6.

For simplicity, we propose models with a single value of = 1.6 for all the studied stations rather than using different values of p for different stations. Estimation of p for each station is computationally intensive but once = 1.6 is decided upon the models are very quick and easy to fit. The value of makes no difference to the estimates of regression coefficients, but has a small impact on the monthly rainfall variation and on π0. Provided is not too far from p = 1.6, the impacts are usually not too great. p can easily be estimated for each station, if necessary. Using = 1.6 for each station actually performs very well in practice (details are given in Hasan and Dunn 2010b).

The four models considered here have the following systematic components:

  • 1.logµt = β0 + β1sin(2πm/12)+ β2cos(2πm/12);
  • 2.logµt = β0 + β1sin(2πm/12)+ β2cos(2πm/12)+ β3 (NINO 3.4)t−1;
  • 3.logµt = β0 + β1sin(2πm/12)+ β2cos(2πm/12) + β4 (SOI)t−1;
  • 4.logµt = β0 + β1sin(2πm/12)+ β2cos(2πm/12) + β5 (SOI phase)t−1;

where YtTwpt, ϕ), m = month of the year (1 = January, 2 = February, … and so on), t = 12(j − 1)+ m, where j = 1 (first year of studied data), 2 (second year of studied data), … and so on. Model 1 (the base model) is the simplest with only sine and cosine terms as predictors and is nested within the three other models. Models 2, 3, and 4 consider the lagged values from one of the climatological variables, NINO 3.4, SOI, or SOI phase, respectively, in addition to cyclic sine and cosine terms.

The Likelihood Ratio Test (LRT) is used to compare two models, one of which is nested within the other (models in which the simpler model, Model A, can be obtained from the more complex model, Model B, by imposing a set of linear constraints on the parameters). The LRT statistic follows an F-distribution, where the null hypothesis is that there is no statistically significant difference between the two models. For the P-values below the significance level, the null hypothesis is rejected and consequently Model B is assumed to fit significantly better than Model A. The LRT will be used to compare the fit of the Models 2, 3, and 4 with the Model 1 (the base model).

Alternative methods for model comparison (nested or non-nested) are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). The AIC and the BIC balance having a simple model and with a model that fits well. Smaller values of the AIC and the BIC indicate superior models. The model with the minimum AIC or BIC is the preferred model. The BIC penalizes the complex model more heavily than the AIC, and hence gives preference to simpler models in selection (Anderson, 2008; Lewis et al., 2010). The BIC will be used to identify the best fit model among the fitted models.

4.2. The fitted models and interpretation

Results of the models fitted to data up to 2007 for the 4 case study stations are presented in Tables I, II, III, and IV. To study the impact of SOI phases, phase 1 is considered as reference category, and the effect of other phases are compared with this reference category. Apart from Clarence (where cosine term has no significant impact), the cyclical sine and cosine terms have significant impact on monthly rainfall. For all the four case studies, NINO 3.4 and SOI have significant impacts on monthly rainfall amounts after adjusting for the sine and cosine terms. For Bidyadanga and Trayning, the SOI phases do not have a significant impact on the monthly rainfall amount. Among the four studied models, Model 1 is nested in the other three models. The LRT is used to compare the fit of the Models 2, 3, and 4 with Model 1. When considering all 220 studied stations, the model with NINO 3.4 fits significantly better (based on the LRT statistic) than the base model for most of the stations (83.2%). The model with SOI phase fits significantly better than the base model for just above half the studied stations (57.7%) (Table V).

Table I. The estimated values of the coefficients for the predictors in the fitted rainfall models for Bidyadanga
 Model 1 (Base model)Model 2 (Base model + NINO 3.4)Model 3 (Base model + SOI)Model 4 (Base model + SOI phase)
LRT statistic (to compare with Model 1)4.22c8.41b2.18
BIC8081808180758095
 equation imaget-valueequation imaget-valueequation imaget-valueequation imaget-value
  • a

    p < 0.001.

  • b

    0.001≤p < 0.01.

  • c

    0.01≤p < 0.05.

Constant2.9043.76a2.9143.92a2.8944.68a2.8219.70a
Sine1.9021.28a1.9121.47a1.9322.08a1.9421.96a
Cosine0.728.57a0.728.55a0.718.70a0.718.57a
NINO 3.4  − 0.11− 1.98c    
SOI    0.022.83b  
SOI phase        
Phase 2      0.291.64
Phase 3      − 0.23− 1.12
Phase 4      0.120.66
Phase 5      − 0.02− 0.10
Table II. The estimated values of the coefficients for the predictors in the fitted rainfall models for Trayning
 Model 1 (Base model)Model 2 (Base model + NINO 3.4)Model 3 (Base model + SOI)Model 4 (Base model + SOI phase)
LRT statistic (to compare with Model 1)11.86b5.19c2.30
BIC9702969497029718
 equation imaget-valueequation imaget-valueequation imaget-valueequation imaget-value
  • a

    p < 0.001.

  • b

    0.001≤p < 0.01.

  • c

    0.01≤p < 0.05.

Constant3.18103.32a3.19105.46a3.18104.24a3.1341.74a
Sine− 0.12− 2.84b− 0.13− 3.02b− 0.12− 2.87b− 0.12− 2.95b
Cosine− 0.69− 16.00a− 0.70− 16.59a− 0.69− 16.20a− 0.70− 16.33a
NINO 3.4  − 0.10− 3.38b    
SOI    0.012.23c  
SOI phase        
Phase 2      0.2102.17
Phase 3      0.0050.04
Phase 4      − 0.003− 0.03
Phase 5      − 0.007− 0.07
Table III. The estimated values of the coefficients for the predictors in the fitted rainfall models for Theebine
 Model 1 (Base model)Model 2 (Base model + NINO 3.4)Model 3 (Base model + SOI)Model 4 (Base model + SOI phase)
LRT statistic (to compare with Model 1)19.61a13.51a3.64b
BIC12117121021210912129
 equation imaget-valueequation imaget-valueequation imaget-valueequation imaget-value
  • a

    p < 0.001.

  • b

    0.001≤p < 0.01.

  • c

    0.01≤p < 0.05.

Constant4.27160.40a4.28161.30a4.27161.11a4.0861.11a
Sine0.4712.52a0.4812.91a0.4712.77a0.4712.49a
Cosine0.4712.65a0.4712.60a0.4712.60a0.4812.72a
NINO 3.4  − 0.11− 4.31a    
SOI    0.013.66a  
SOI phase        
Phase 2      0.313.61a
Phase 3      0.212.14c
Phase 4      0.252.78b
Phase 5      0.151.84
Table IV. The estimated values of the coefficients for the predictors in the fitted rainfall Models for Clarence
 Model 1 (Base model)Model 2 (Base model + NINO 3.4)Model 3 (Base model + SOI)Model 4 (Base model + SOI phase)
LRT statistic (to compare with Model 1)14.36a5.69b3.34
BIC12616126051261612627
 equation imaget-valueequation imaget-valueequation imaget-valueequation imaget-value
  • a

    p < 0.001.

  • b

    0.01≤p < 0.05.

Constant4.47177.55a4.48177.49a4.47177.63a4.4170.81a
Sine0.339.14a0.339.30a0.339.22a0.339.40a
Cosine0.061.600.051.430.051.480.041.31
NINO 3.4  − 0.09− 3.75a    
SOI    0.012.34b  
SOI phase        
Phase 2      0.202.42b
Phase 3      − 0.09− 0.97
Phase 4      0.010.16
Phase 5      0.070.95
Table V. Number (percentage) of stations where Models 1, 2, 3, and 4 are preferred using the BIC, and where Models 2, 3, and 4 fit significantly better than the base model (Model 1) based on the LRT (among the 220 studied stations)
 BICLRT (Compared to Model 1)
Model 1 (Base model)45 (20.5)
Model 2 (Base model + NINO 3.4)145 (65.9)183 (83.2)
Model 3 (Base model + SOI)29 (13.2)153 (69.5)
Model 4 (Base model + SOI phase)1 (0.5)127 (57.7)

The performance of the four models is also compared using the BIC. On the basis of the BIC, the model with the SOI is preferred for modelling the monthly rainfall in Bidyadanga. The model using the NINO 3.4 is preferred for modelling the monthly rainfall totals for the other three case study stations. Among the four studied models, the base model and the model with SOI phase are not preferred for modelling the monthly rainfall totals of any of the four case study stations. When considering all 220 studied stations (Table V), the model with NINO 3.4 is preferred for modelling the monthly rainfall totals for most of the stations (65.9%). The model with SOI phase is preferred for only one of the studied stations. The base model is preferred for modelling the monthly rainfall totals of 20.5% of stations (Table V), suggesting there are a reasonable number of stations for which none of the climatological variables have a substantial impact on rainfall.

Maps of Australia in Figure 2 represent the stations where the four studied models are preferred (based on the BIC). Clearly, as indicated in Table V, the model with NINO 3.4 is generally the preferred model. However, rainfall in southeast Australia is better modelled using the SOI. Stations where the base model is preferred are clustered in western inland Australia and in southern Australia.

Figure 2.

Stations where Model 1, 2, 3, or 4 is preferred (on the basis of BIC)

The regression coefficients from all the 220 studied stations for NINO 3.4 (from Model 2) and SOI (from Model 3) are represented by contour maps in Figure 3. Model 4 considers one categorical variable, the SOI phase, which has 5 levels and so produces 4 extra regression coefficients along with other coefficients. The regression coefficients from Model 4 are not presented by the contour maps but the fit of all the 3 models are compared with the base model using the LRT and the P-values are presented by contour maps in Figure 4. When constructing the contour maps, Kriging (Diggle and Ribeiro, 2007) is used to interpolate the values at unobserved locations from nearby stations.

Figure 3.

Contour maps of Australia showing the regression coefficients of NINO 3.4 (from Model 2) and SOI (from Model 3) on rainfall amounts. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

Figure 4.

Contour maps of Australia showing the p-values from LRT for Model 2 (NINO 3.4), Model 3 (SOI) and Model 4 (SOI phase) comparing with Model 1 (base model). This figure is available in colour online at wileyonlinelibrary.com/journal/joc

The values of regression coefficients for the two models are not comparable as the models consider variables with different scale. After adjusting for the cyclical features of rainfall data, the NINO 3.4 has a negative impact on rainfall almost everywhere in Australia, indicating that when the values of the index increase the amount of rainfall decreases. Larger negative impacts of the NINO 3.4 are observed on monthly rainfall amounts of eastern and northeastern Australia, and in coastlines in western parts of Australia. The SOI has a positive impact on rainfall in Australia, indicating that when the value of the SOI increases, the amount of rainfall also increases. Negative regression coefficients for the SOI are observed in some places in southwest Australia. However, only a few stations are included in the study from this region. Like NINO 3.4, a larger impact of SOI is observed on monthly rainfall amounts on eastern and northeastern Australia.

From Figure 4, observe that models with the climatological variables fit significantly better than the base model for the rainfall data from most of the places in northeastern and eastern Australia and in western parts of Tasmania. The model with NINO 3.4 also fits significantly better for some places on the coastline of Western Australia. No significant improvement in the models with the climatological variables over the base model is observed to fit the monthly rainfall data for the stations from southern, middle, inland western and northwestern parts of Australia.

To interpret the coefficients, recall that the model uses the logarithmic link function. So for example, for Model 2, for a given rainfall station, the mean amount of rainfall for event t is equation image equation image. To understand this interpretation, use, as an example, the values of the climatological variables from February 2008 to predict March 2008 rainfall of Bidyadanga. The values of the NINO 3.4, SOI, and SOI phase for February 2008 are 1.87, 20.99, and 2, respectively. For March 2008, the mean predicted rainfall amounts for Bidyadanga from Models 1, 2, 3, and 4 are 122.85, 152.48, 173.79, and 158.06 mm respectively. Model 1 includes only the sine and cosine terms as predictors, so the predicted rainfall amounts equation image are a recurrent pattern over the years of the study period. The predicted mean rainfall amounts from Models 2, 3, and 4 have some year-to-year variations depending on the NINO 3.4, SOI, and SOI phase.

To compare the results from the models having climatological variables with the base model, percentage changes in predicted mean monthly rainfall amounts are measured for the four case study stations. The changes for the three models for the months of the years 2008 and 2009 are presented in Figure 5. The horizontal line through zero indicates the predictions made by the reference base model, and the other lines show the percentage of change in the predicted mean monthly rainfall due to adding the climatological variable with the base model. Any value far from the line indicates a greater percentage of change in the predicted mean monthly rainfall amount for the model with the climatological variables with respect to the base model. From the figure, substantial changes in the predicted monthly rainfall amounts are observed due to adding the climatological variables with the base model. The lines for percentage of changes in monthly rainfall amounts due to adding the climatological variables have almost the same trends for all the case studies stations. However, the percentage changes are higher for the drier rainfall station (Bidyadanga) and lower for the wetter rainfall station (Clarence).

Figure 5.

Percentage change in predicted rainfall amount for the extra variable (NINO 3.4, SOI or SOI phase) with sine and cosine terms as variables

The Tweedie distribution parameters (µ, p, ϕ) can be reparameterized to the Poisson and gamma parameters (λ, γ, α), when 1 < p < 2 (Dunn, 2004), providing approximate downscaling information about monthly rainfall. The transformation between the parameterisations (when 1 < p < 2) is:

equation image(2)

The mean number of rainfall events (λ), the shape parameter of the rainfall gamma distribution (γ) and the amount of rain per rainfall event αγ can then be computed. The maximum likelihood estimator of the dispersion parameter for the model fitted to rainfall data of Bidyadanga is equation image. For Bidyadanga, the predicted mean number of rainfall events in March, 2008 for Models 1, 2, 3 and 4 are λ = 1.58, 1.72, 1.82, and 1.75, respectively. For the models, the shape parameters of the rainfall gamma distribution are γ = 116.59, 132.73, 143.56, and 135.62; and the mean amounts of rainfall per event are αγ = 77.73, 88.49, 95.71, and 90.41 mm, respectively. Using the model, the parameters can be estimated for other months of Bidyadanga, as well as for each month of other stations. Using the Tweedie distribution in this way to understand the finer timescale structure of monthly rainfall has been successfully used to simulate the growing of sorghum (Lennox et al., 2004) using APSIM (McCown et al., 1996).

Also of interest is how well the model predicts the probability of no rainfall. The probability of recording no rain is π0 = exp(−λ) (Dunn and Smyth, 2005). Note, that for some rainfall stations, there are some months with no exact zero in the observed rainfall records; however, the models permits the possibility of having months with no rainfall in future.

Using the values of the climatological variables from February 2008 for Bidyadanga, the probabilities of recording no rainfall in March, 2008, with the fitted models are 0.21, 0.18, 0.16, and 0.17, respectively. Percentage changes in predicted probability of no rainfall for Models 2, 3, and 4 with respect to the base model are measured for the months of the years 2008 and 2009 and presented in Figure 6. The horizontal line through zero indicates the predictions using the base model, and the other lines represent the percentages of change in the probability of no rainfall due to adding the climatological variable with respect to the base model. From Figure 6, substantial changes in the predicted probability of no rainfall are observed for the Models 2, 3, or 4 from the base model. For the wetter stations, larger percentages of changes in the probability of no rainfall for the Models 2, 3, or 4 from the base model are observed with respect to the drier stations. For drier stations, the models produce more stable result (predicted probability of no rainfall) than for the wetter stations as the wetter stations have few months with no rainfall.

Figure 6.

Percentage change in predicted probability of no rainfall amount for the extra variable (NINO 3.4, SOI or SOI phase) with sine and cosine terms as variables

5. Conclusion

Tweedie GLMs were used to fit the monthly rainfall amounts of 220 Australian rainfall stations with cyclic sine and cosine terms and the lagged values of the climatological variables NINO 3.4, SOI, and SOI phase. The four different models were studied to observe if any of the models with climatological variables improve on the base model, which one(s) perform better than the base model, and where each model with a climatological variable performs better than the base model.

Basing on the BIC, the model with NINO 3.4 is generally preferred for modelling the monthly rainfall totals for most of the studied stations. The model with SOI is preferred for the data for some stations in southeastern Australia, and the model with SOI phase is preferred for only one of the studied stations. The stations where the base model is preferred are concentrated in two clusters in southern Australia and inland Western Australia.

Basing on the LRT, models with the climatological variables fit significantly better than the base model for modelling the monthly rainfall totals in most of the places in northeastern and eastern parts of Australia. Models with NINO 3.4 also fit well in some places on coastlines of Western Australia. None of the climatological variables show significant impact on the monthly rainfall totals for the stations located at the central and northwestern Australia. However, little data is available in these regions. Substantial changes in the predicted amount and probability of no rainfall are observed due to adding the climatological variables to the base model.

The sophisticated Tweedie generalized linear models were used for predicting various features of rainfall, such as the probability of no rainfall, the monthly rainfall amounts, the number of rainfall events in each month, and the amount of rainfall per event. These features of rainfall can be used in many areas of planning in agriculture and hydrology.

Acknowledgements

The comments of the reviewers and the editor are most gratefully acknowledged. They improved the flow, interpretation, and understanding of this paper.

Ancillary