## 1. Introduction

Rainfall models are useful for understanding the impact of different climatological variables on the occurrence and amount of rainfall, and have potential uses in predicting and simulating rainfall. Consequently, rainfall models are useful for modelling the growth and development of crops, developing crop simulation models, and for agricultural planning and management (Lennox *et al.*, 2004; Shui and Haque, 2004; Hansen *et al.*, 2009). Prediction of monthly and seasonal rainfall amounts is useful in determining supplemental irrigation, water requirements, storage of water, and for reservoir management.

The use of the simplest statistical model, the linear regression model, is not appropriate for modelling monthly rainfall amounts for Australian stations, which are generally highly right-skewed. For modelling skewed data, transformations of the rainfall totals are commonly used (Mooley, 1973; Meng *et al.*, 2007). Sometimes, rainfall totals are converted to rainfall anomalies (Lorenzo *et al.*, 2010) or a standardized precipitation index (Loukas and Vasiliades, 2004; Almeira and Scian, 2006; Gonzalez and Cariaga, 2009) to fit linear regression models to the converted indices. However, because of the transformation, some information regarding the rainfall data is lost.

For modelling right-skewed rainfall amounts, the assumption of a gamma distribution has been used extensively in the literature on rainfall modelling (Allan and Hann, 1975; Feuerverger, 1979; Chapman, 1998; Wilks, 1999; Chandler and Wheater, 2002), often with variations or extensions (Das, 1955; Stern and Coe, 1984; Wilks, 1990).

Another difficulty when modelling monthly rainfall is that rainfall may have both discrete (exact zero for dry months) and continuous (amount of rainfall for wet months) components. For the rainfall models used in crop simulation, it is common practice to use Markov chains to model the occurrence and gamma distributions to model the amount of rainfall (Richardson and Wright, 1984; Hamlin and Rees, 1987; Rosenberg *et al.*, 2004; Hansen and Ines, 2005; Abtew *et al.*, 2009). Buishand *et al.* (2004) used logistic regression to model occurrence and gamma distribution to model the amount of monthly rainfall. To model continuous data with exact zeros, some authors proposed mixture models between Bernoulli and gamma or lognormal distributions (Piantadosi *et al.*, 2008; Fernandes *et al.*, 2009; Little *et al.*, 2009). An alternative approach was adopted by Glasbey and co-workers (Glasbey and Nevison, 1997; Durban and Glasbey, 2001), who applied a monotonic transformation of rainfall data to define a latent Gaussian variable with zero rainfall corresponding to censored values below some threshold (1.05 mm).

The rainfall models based on the Tweedie distributions (Smyth, 1996; Jørgensen, 1997; Dunn 2004; Dunn and Smyth, 2005) model the exact zeros and the amount of rainfall simultaneously. The models can be used in the generalized linear modelling framework. Previously, the Tweedie distributions have been shown to model monthly rainfall data well, by fitting separate models for each month but using no predictors (Hasan and Dunn, 2010a). Cyclical patterns are likely to be evident in rainfall; for example, most locations have drier and wetter months consistently from year to year. Using Tweedie generalized linear models (GLMs), Hasan and Dunn (2010b) fitted models with sine and cosine terms as predictors to model the monthly rainfall amounts in Australia.

Apart from the cyclical patterns, climatological factors, such as the Southern Oscillation Index (SOI) and the sea surface temperature anomaly, influence monthly rainfall. The relationship between the SOI and rainfall in Australia has been well known for many years and studied by numerous authors (e.g. Troup, 1965; Quinn and Burt, 1972; Power *et al.*, 1997; Simmonds and Hope, 1997; Chowdhury and Beecham, 2010), among others. Correlation coefficients and linear regression (Power *et al.*, 1997; Simmonds and Hope, 1997; Almeira and Scian, 2006) have been used to understand the relationship between SOI and amount of rainfall. Using the SOI, Stone and Auliciems (1992) constructed five SOI phases which can then be used to study the effect of the SOI on rainfall amounts. The SOI phases have proven useful for studying their effects on rainfall and on different types of cropping in Australia (e.g. Stone and McKeon, 1993; Hammer *et al.*, 1996; Meinke *et al.*, 1996; Stone *et al.*, 1996b; Meinke and Ryley, 1997; Willcocks and Stone, 2000). The NINO 3.4 index is one of the El Niño Southern Oscillation indicators based on sea surface temperature and is related to the monthly rainfall amounts of Australia. Several authors used linear regression analysis to study the effect of NINO 3.4 on the rainfall amounts (Wu and Kirtman, 2007; Everingham and Reason, 2009; Lee *et al.*, 2009). The use of linear regression models is not appropriate for understanding the relationship between the climatological variables and monthly rainfall totals of Australian stations because the rainfall distribution is not normal and the relationship is not approximately linear.

In this paper, Tweedie GLMs will be fitted to understand the effect of the abovementioned climatological variables on monthly rainfall amounts of Australian rainfall stations. First, the simplest model with sine and cosine terms as predictors will be fitted (the base model) as in Hasan and Dunn (2010b). Then one of the three climatological variables (NINO 3.4, SOI and SOI phase) will be added each time to the base model. Thus, in the study, we have four different models (one base model and three models with climatological covariates). The impacts of the climatological variables on monthly rainfall totals after adjusting for the sine and cosine terms will be studied. We then examine the preferred model for each of the 220 Australian stations and identify geographical regions where each climatological variable is superior for modelling monthly rainfall. Statistical methods will be used to assess the extent to which each model with a climatological variable improves the base model for modelling monthly rainfall on different regions across Australia. Values of the climatological variables from the previous month (i.e. a lag of one) will be used in the models for one month lead prediction of rainfall. We will also use the features of Tweedie models to simultaneously examine the improvement in predicted mean rainfall and predicted probability of no rainfall using each climatological variable.

We first discuss the data (Section 2), and then introduce the Tweedie distributions and their properties (Section 3). In Section 4, different models, different model comparison criteria, and test statistics are defined, and a detailed interpretation of the results is presented. The concluding comments are in Section 5.