## 1. Introduction

[2] Daily rainfall is a major input to drive many models of hydrologic, agricultural, ecological, and other environmental systems [*Mehrotra et al*., 2012; *Kleiber et al*., 2011]. A great deal of attention has therefore been devoted to daily rainfall modeling. Considering the fact that daily rainfall is non-negative with point mass at zero, a discrete-continuous mixed distribution with a probability density function (PDF) of the following form is obtained:

[3] This form is usually used to represent the at-site distribution of daily rainfall *X*, where *p*_{1} is the probability of rainfall occurrence; *δ*(*x*) is the one-dimensional Dirac delta function, which becomes ∞ if and only if *x* is 0, and becomes 0 otherwise; and *f*(*x*) is a skewed density for rainfall amounts. Discrete-continuous mixed distributions of this form have been used in the literature for daily rainfall downscaling [*Cannon*, 2008; *Carreau and Vrac*, 2011].

[4] To simultaneously model multiple rainfall series (e.g., rainfall at multiple sites or of several successive days), it is logical to extend the univariate distribution in equation (1) to its multivariate analog. In theory, there is nothing to limit building a joint discrete-continuous mixed distribution for fully multivariate analysis. In practice, however, this is hardly achievable, as the model complexity increases for higher powers (2* ^{d}* for

*d*-dimension model). One simple extension is to the bivariate level, which models the pairwise dependence of daily rainfall

*X*and

*Y*. The usefulness of a bivariate discrete-continuous mixed distribution can be recognized in several aspects. For example,

[5] 1. If *X* and *Y* denote daily rainfall of two consecutive days, then from the bivariate distribution one may derive the conditional distribution of rainfall of current day given that of previous day, which serves as the “engine” for single-site daily rainfall simulation.

[6] 2. If *X* and *Y* are spatially averaged rainfall of two neighboring watersheds or rainfall of two rain gauges, then one may use the bivariate distribution for simultaneous simulation of rainfall series while preserving their dependence structure.

[7] 3. If *X* represents satellite or radar rainfall estimates and *Y* denotes ground observations, then a best guess (regression) or a conditional distribution (ensemble regression) of actual rainfall given that of satellite or radar estimate may be yielded from the bivariate distribution.

[8] Thus, a well-formulated bivariate discrete-continuous mixed distribution would have much practical appeal.

[9] There are some models, in the context of rainfall simulation, that do allow multisite modeling of daily rainfall. In addition to the multivariate autoregressive model of *Bárdossy and Plate* [1992], the nonparametric hidden Markov chain model developed by *Hughes and Guttorp* [1994], the nearest neighbor bootstrap technique of *Rajagopalan and Lall* [1999], and the regionalized daily rainfall generation approach of *Mehrotra et al*. [2012], another notable multivariate modeling framework is the one proposed by *Wilks* [1998, 2009]. In this framework, each site follows its own model, while the dependence among sites is maintained by driving individual models with spatially correlated random variates. Owing to its advantage of being simple in extending from single-site to multisite simulation, this framework has been frequently used and improved. For instance, *Mehrotra and Sharma* developed semiparametric and nonparametric multisite models for daily rainfall simulation [2007a, 2007b] and downscaling [2005, 2010]; *Thompson et al*. [2007], *Brissette et al*. [2007], and *Tarpanelli et al*. [2012] improved it such that the correlated random variates can be efficiently generated. It must, however, be realized that the aforementioned multisite models are designed specifically for rainfall simulation rather than formulating a joint distribution for multiple rainfall series. They might be unsuitable for the application in scenario 3 as listed above, unless additional efforts are made to reformulate the models. A multivariate or bivariate mixed distribution might be used not only for simulation but also for statistical inference (regression and ensemble regression), which is applicable for situations similar to scenario 3.

[10] We return to the problem of formulating a bivariate discrete-continuous mixed distribution. The first work on this type of distribution was introduced by *Shimizu* [1993]. Given that both *X* and *Y* are zero-inflated random variables, there are four possible mutually exclusive classes, as illustrated in Figure 1,

[11] By the rule of total probability, a bivariate PDF analogous to equation (1) is structured as

and the corresponding cumulative distribution function (CDF) is

where

represent the occurrence probabilities of the four classes, respectively; *δ*(*x*, *y*) is the two-dimensional Dirac delta function which yields ∞ if and only if both *x* and *y* are 0, and yields 0 otherwise; *δ*(*x*) and *δ*(*y*) hold the same meaning as in equation (1); *h _{X}* (

*x*),

*h*(

_{Y}*y*), and

*h*(

*x*,

*y*) are the PDFs of rainfall amounts within relevant classes, respectively;

*H*(

_{X}*x*),

*H*(

_{Y}*y*), and

*H*(

*x*,

*y*) are the corresponding CDFs.

[12] After the pioneering work of *Shimizu* [1993], the bivariate mixed distribution has been applied to investigate the properties of the Pearson's correlation coefficient between rainfall gauges [*Habib et al*., 2001; *Ha and Yoo* 2007; *Yoo and Ha*, 2007] and has been improved such that the joint behavior of contemporaneous rainfall amounts can be properly modeled [*Herr and Krzysztofowicz*, 2005]. The most recent treatment of this distribution was given by *Serinaldi* [2008, 2009a, 2009b], in which the copula theory was used to construct the joint density *h*(*x*, *y*). The copula theory does circumvent restrictions to the marginal distributions and can model different dependence structures with different copulae. This model is not yet without limitation. First, a limited number of copula families (sometimes even one comprehensive family) were used, which may not suffice to describe various autocorrelation structures of rainfall amounts and may thus be of limited use to simulate rainfall of different climate areas. Moreover, the significance of marginal distributions was overlooked. The Weibull, gamma, and Pearson type III distributions were used for the marginal distributions of rainfall amounts, i.e., for *h _{X}* (

*x*),

*h*(

_{Y}*y*) and the margins of

*h*(

*x*,

*y*) [

*Serinaldi*, 2009b]. These distributions perform well in describing the usual behavior of rainfall. However, they might not necessarily perform well in capturing unusual behavior or rare events [

*Vrac and Naveau*, 2007;

*Furrer and Katz*, 2008;

*Hundecha et al*., 2009;

*Carreau et al*., 2009;

*Carreau and Vrac*, 2011;

*Hundecha and Merz*, 2012], as can be seen from Figure 2, which shows observed against modeled rainfall quantiles by the Weibull, gamma, and Pearson type III distributions, respectively, at a sample station in Texas. Till now, we are not yet aware of research on this bivariate mixed distribution specifically accommodating the heavy-tailed property of rainfall amounts, which is much common for rainfall at finer time scales.

[13] In view of the above-mentioned limitations, the goal of this research is to further improve the bivariate mixed distribution based on the work of *Shimizu* [1993], *Herr and Krzysztofowicz* [2005], and *Serinaldi* [2009b]. Innovations in the improvements are twofold. One is the appropriate selection of an optimal copula family from a wider choice of admissible candidates such that the joint behavior of rainfall amounts can be realistically modeled. The other one is the introduction of a hybrid distribution for marginal rainfall, which improves the characterization of extremes while retaining a decent fit for low to moderate values. Although the hybrid distribution was reported in our previous work [*Li et al*., 2012], therein only the distribution of single-site rainfall amount was of interest and no mechanism was designed for generating synthetic rainfall series. Here, we extend the hybrid distribution capable of bivariate inference and simulation of daily rainfall, with both occurrence and amount simultaneously taken into account. In addition, by generalizing daily rainfall as a Markov process with autocorrelation described by the improved distribution, a stochastic rainfall generator is developed and analyzed in this research. Although presented here is a single-site model, it may be used as building blocks for multisite simulation following the approach of *Wilks* [1998]. Attributing to the hybrid marginal distributions, characteristics of historical extreme rainfall events can be preserved in the synthetic series and rare rainfall events beyond the upper range of available observed data may be reasonably extrapolated. An implementational merit of the generator is that it unifies rainfall occurrence and amount processes into a single one. As a consequence, the lag-1 autocorrelation of daily rainfall may be automatically captured in a relatively natural and simple way without much extra work if any.

[14] Besides the aforementioned research on multisite rainfall simulation, it is better to mention some other representative single-site models such that one can get an overall picture about the differences between the suggested generator and other alternatives. A typical approach for single-site daily rainfall breaks down the simulation into two stages. The first stage simulates rainfall occurrence process. Among others, the two-state Markov chain model introduced by *Gabriel and Neumann* [1962] has been extensively used. Once the occurrence series is simulated, the second stage simulates rainfall amounts on wet days. To that end, independent random numbers are generated from a fitted parametric distribution, such as exponential distribution [*Todorovic and Woolhiser*, 1975], gamma distribution [*Richardson*, 1981], and mixed exponential distribution [*Wilks*, 1998]. Rather than focusing on rainfall simulation, *Katz* [1974, 1977] derived some important inferential statistics of this model, for instance, probability distributions for the number of wet days, maximum daily rainfall, and rainfall totals over a given period. It is apparent that the suggested generator bears similarity to this model. The differences, also the merits, of the suggested generator are the following: on one hand, instead of breaking down the occurrence and amount processes, it unifies them into a single one; and on the other hand, instead of assuming independence of rainfall amounts of two consecutive wet days, it properly accounts for the dependence. Reproduction of the structure of daily autocorrelation is recognized as a crucial test for a stochastic rainfall generator [*Gregory et al*., 1993]. There exist alternative models that do not assume independence of rainfall amounts. One is the multistate Markov chain model also known as transition probability matrix (TPM) model [*Haan et al*., 1976; *Srikanthan and McMahon*, 1985; *Srikanthan et al*., 2005]. A second alternative is the nonparametric model developed by *Harrold et al*. [2003a, 2003b]. This model was then adjusted and incorporated into other multisite rainfall simulation and downscaling models [*Mehrotra and Sharma*, 2005, 2007a, 2007b, 2010; *Mehrotra et al*., 2012]. For elaborate reviews of stochastic rainfall simulation studies done in the past and those done more recently, one can refer to the work by *Srikanthan and McMahon* [2001] and by *Sharma and Mehrotra* [2010], respectively. To better understand the merits and demerits of the suggested generator, we shall compare it with three alternate models: the conventional Markov chain generator [*Richardson*, 1981], the TPM model [*Srikanthan and McMahon*, 1985], and the modified nonparametric model of *Harrold et al*. [2003a, 2003b] with parametric Markov chain for rainfall occurrences and nonparametric kernel density estimation (KDE) for rainfall amounts.

[15] The rest of this paper is organized as follows. Section 2 introduces the improved bivariate mixed distribution. Section 3 describes algorithms for the simulation of random numbers and for the estimation of distribution parameters. Based on the improved distribution, section 4 presents a single-site daily rainfall generator and tests it on a sample station in Texas. Section 5 continues with extensive simulation experiments to compare it with other advanced alternatives. Finally, conclusions are presented in section 6.