## 1. Introduction

Rainfall models are important for forecasting and simulation purposes with extended applications in modelling runoff, soil water content and for forecasting drought and flood (Toth *et al.*, 2000; Aubert *et al.*2003). Appropriate rainfall models assist in developing better climate-related risk management and decision-making capabilities. For modelling purposes, two different aspects of rainfall are common on any given timescale: the occurrence of rainfall and the amount of rainfall (Dunn, 2004). Rainfall models exist for different timescales such as hourly, daily, weekly, monthly, seasonal or annual (Boer *et al.*, 1993; Sharda and Das, 2005; Aksoy, 2006; Tilahun, 2006).

To model the occurrence of daily rainfall, first-order (Gabriel and Neumann, 1962; El-seed, 1987), higher order (Katz, 1977; Deni *et al.*, 2009), hybrid (Wilks, 1999) and hidden (Robertson *et al.*, 2003) Markov chain models have been used. First-order Markov chain models assume that the occurrence of rainfall on a day depends on the occurrence of rainfall on the previous day. Higher order Markov chains consider the occurrence of rainfall on a day depends on the occurrence of rainfall on two or more days earlier. Higher order Markov models are more complex, but perform marginally better (Deni *et al.*, 2009). Hybrid Markov chains consider different orders for wet and dry days while hidden Markov chains consider some hidden states. Chandler (2005) used logistic regression to model dry or wet days as a function of site altitude, North Atlantic Oscillation, seasonality and autocorrelation (indicators for rain on each of previous 5 days, plus persistence indicators for rain on both previous 2 days and all previous 7 days).

Sometimes, modelling the amount of rainfall is more important than modelling the occurrence of rainfall. Using rainfall data from New South Wales, Boer *et al.* (1993) used linear regression to model the amount of seasonal and annual rainfall as a function of longitude, latitude and altitude of the stations. Chowdhury and Sharma (2007) used a linear regression model to quantify the effect of El Nino southern oscillation on the amount of monthly rainfall. Considering nonlinear effects of some covariates on monthly rainfall amount, Zaw and Naing (2008) used polynomial regression to model the amount of monthly rainfall in Myanmar. One of the basic assumptions regarding the above-mentioned models is that the amount of rainfall is normally distributed with constant variance. For some stations, the amounts of rainfall on some timescales (e.g. annual) approximately follow a normal distribution when the use of normal distributions is appropriate. However, the amount of monthly, weekly or daily rainfall usually does not follow normal distribution and is right skewed, and so alternative distributions are needed to model the amount of rainfall on shorter timescales.

To model the right-skewed daily rainfall amounts, distributions that have been employed include the gamma (Aksoy, 2006), truncated gamma (Das, 1955), kappa (Meilke, 1973), generalized log-normal (Swift and Schreuder, 1981), mixed exponential (Chapman, 1997; Wilks, 1998, 1999) and mixed gamma (Jamaludin and Jemain, 2008). Jamaludin and Jemain (2008) used exponential, gamma, mixed exponential and mixed gamma distributions to describe the daily rainfall amount in Malaysia, and based on the Akaike Information Criteria (AIC), they showed that the mixture distributions are better than single distributions for describing the amount of daily rainfall.

Comparing log-normal, gamma, Weibull and log-logistic distributions on the non-zero weekly rainfall data from Dehradun, India, Sharda and Das (2005) showed that the Weibull distribution fits best (on the basis of the Anderson-Darling test). Taking 29 stations from Sen and Eljadid (1999) showed that, for monthly rainfall amounts, the gamma distribution fits well. Compared with other Pearsonian distributions (Pearsonian I and Pearsonian IX), the gamma distribution fits best for modelling the amount of monthly rainfall in the Asian summer monsoon (Mooley, 1973). Tilahun (2006) compared five different distributions (normal, log-normal, gamma, Weibull and Gumbel) for modelling the amount of rainfall in wet months in eight rainfall stations in Ethiopia and found none were optimal for every station.

Another alternative for modelling right-skewed rainfall amounts is to use distributions from a special family called the exponential dispersion model (EDM) family of distributions (Jorgensen, 1997). The EDM family of distributions are the response distributions for generalized linear models (GLMs) (McCullagh and Nelder, 1989) and include common distributions such as the binomial, Poisson, gamma and normal distributions. The models are widely used as the GLM framework is already in place for fitting models based on the EDM family of distributions and for diagnostic testing. In addition, covariates are easily incorporated into the modelling procedure (Jorgensen, 1987). GLMs have been used for fitting models to climatological data such as rainfall by numerous researchers (Coe and Stern, 1982; Wilks, 1999; Chandler, 2005).

The common models used in modelling monthly, weekly or daily rainfall amount have difficulty with the mixture of discrete (exact zero when no rainfall is recorded) and continuous (rainfall amount with non-zero rainfall recorded) data. To overcome the difficulty, some authors used logistic regression (Chandler and Wheater, 2002) or Markov chains (Richardson and Wright, 1984; Stern and Coe, 1984; Laux *et al.*, 2009) to model the occurrence of wet or dry days, then gamma distributions to model the amount of rainfall on wet days. For example, Das *et al.* (2006) used Markov chains for rainfall occurrence and gamma distribution to model the amount of weekly rainfall in Bihar, India.

An alternative approach was adopted by Glasbey and Nevison (1997), who applied a monotonic transformation of rainfall data to define a latent Gaussian variable with zero rainfall corresponding to censored values below some threshold (1.05 mm). Husak *et al.* (2007) used a conditional distribution by accumulating probabilities conditional on the presence of rainfall. This is combined with a mixture coefficient used to account for the probability of no rain to create the probability distribution. Yoo *et al.* (2005) used mixed gamma distribution for modelling the amount of daily rainfall for both wet and dry periods. The distribution has two parts: one is the probability of having a dry day and the second is the probability of getting a wet day multiplying by a gamma distribution explaining the amount of rainfall on a wet day. Dunn (2004) used Poisson-gamma distributions to model the occurrence and amount of rainfall simultaneously. The distributions in the Poisson-gamma family belong to the EDM family of distributions (Jorgensen, 1997), upon which the GLMs are based.

Clearly, numerous probability models exist for modelling rainfall over various timescales. Numerous studies have fitted particular distributions to monthly rainfall, using the same distribution for each month but by varying the parameters, such as the mean and the variance, for each month (Mooley 1973; Husak *et al.*, 2007; Piantadosi *et al.*, 2009). The amount of rainfall on different months may follow different distributions rather than following the same distribution with varying parameters. We explore the possibility that different distributions are needed for each month by considering a broad family of distributions. To do so, we restrict ourselves to EDM family of distributions as these distributions are the response distributions for GLMs. Further, we consider EDMs where the variance is proportional to some power of the mean (often called the Tweedie family of distributions), as these distributions have properties useful for rainfall modelling (discussed in Section 3). The Tweedie family includes distributions suitable for modelling positive continuous data (such as the gamma) and also for modelling positive continuous data with exact zeros (such as Poisson-gamma).

We first discuss the data (Section 2), and then introduce the Tweedie distributions and their properties (Section 3). The results and discussion (Section 4) is followed by some concluding comments (Section 5).