A stochastic space-time model for the generation of daily rainfall in the Gaza Strip

Authors

  • Muamaraldin Mhanna,

    Corresponding author
    1. Department of Hydrology and Hydraulic Engineering, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
    • Department of Hydrology and Hydraulic Engineering, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.
    Search for more papers by this author
  • Willy Bauwens

    1. Department of Hydrology and Hydraulic Engineering, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
    Search for more papers by this author

Abstract

An important limitation of commonly used single-site rainfall models is that they are not able to reflect spatial characteristics, while many impact studies demand the accommodation of spatial rainfall correlations. As a result, a number of stochastic models have been developed to produce precipitation simultaneously at multiple sites. One of these models is the Wilks approach [1998. Journal of Hydrology 210: 178–191], which was the first method that sufficiently reproduces the main statistics of rainfall at a number of sites. So far, however, the literature does not provide many details about the ‘step by step’ procedure proposed by Wilks. In this paper, the Wilks approach is demonstrated for multi-site, daily rainfall occurrences and amounts in the Gaza Strip. The paper demonstrates a complete methodology taking into account the original Wilks approach and suggests solutions for enhancing the model. The first improvement concerns an analytical calculation of the desired correlations in the random numbers that reproduce the observed correlations between the rainfall occurrence series, through a gamma coefficient. Secondly, the correlations between the rainfall amount series are specified using a rank correlation and then linearly transformed into product-moment correlations to obtain the required correlations in the random numbers. Statistical analyses on the historical and generated rainfall series confirm that the model generally performs well, as it preserves all major characteristics of daily rainfall, as well as spatial characteristics. Major advantages of this model include its simplicity, its increased efficiency, a significant improvement in computational speed and a considerable gain in the effort for the implementation. Copyright © 2011 Royal Meteorological Society

1. Introduction

Most agricultural, hydrological, and ecological models require daily rainfall data. However, at many sites such data are often incomplete, too short, or simply unavailable, which constitutes a serious limitation for these applications. Accordingly, mathematical models, known as stochastic weather generators, have been developed to produce long synthetic weather sequences that are statistically similar to historical records (e.g. Wilks and Wilby, 1999). Numerous approaches for the generation of daily rainfall data at single point are available in the hydrological and climatological literature (e.g. Richardson, 1981; Srikanthan and McMahon, 1985; Woolhiser, 1992; Sharma and Lall, 1999; Hayhoe, 2000; Wan et al., 2005; Srikanthan et al., 2005; Zheng and Katz, 2008a; Liu et al., 2009). These models are widely used because they are based on a relatively simple stochastic process and are easy to formulate and fast to implement (Wilks, 1999; Mehrotra et al., 2005). Nevertheless, a major limitation of widely used single-site rainfall generators is that they are not able to account for spatial characteristics: the resulting rainfall time series for different locations are independent of each other, while a strong spatial correlation can often exist in real rainfall data (Wilks, 1999; Qian et al., 2002; Srikanthan and McMahon, 2001). Many authors have stressed the importance to account for the spatial correlation of rainfall for flow generation applications, e.g. (Mehrotra et al., 2006; Qian et al., 2002). In view of scenario analysis, it is therefore very important to capture the spatial dependence in simultaneous simulations of rainfall sequences at multiple locations.

In order to address the problem of these spatial correlations of rainfall, a number of studies have been carried out on the simulation of daily rainfall at a number of sites (e.g. Bras and Rodriguez-Iturbe, 1976; Cox and Isham, 1988; Bardossy and Plate, 1992; Hughes and Guttorp, 1994; Allerup, 1996; Hughes et al., 1999). However, as discussed in Wilks (1999) and Qian et al. (2002), these approaches are comparatively complex in both calibration and implementation, and thus their operational application has been limited.

To overcome these complexities, Wilks (1998) extended simple single-site rainfall generator models to multiple site models through driving a set of individual models with serially independent and spatially correlated random numbers, which results in the synthetic series having realistic spatial correlations. As pointed out by Brissette et al. (2007), the approach represents the first method that adequately reproduces the main statistics of precipitation data series at multiple sites. Therefore, this method is of great interest to further improvements and it has been used by many researchers (e.g. Qian et al., 2002; Khalili et al., 2004; Srikanthan, 2005; Mehrotra et al., 2006; Brissette et al., 2007; Thompson et al., 2007; Srikanthan and Pegram, 2009). Nevertheless, the literature does not provide many more details about the step by step procedure proposed by Wilks. The research carried out by Brissette et al. (2007) can be considered as the most detailed work done to date.

This study aims to develop a multi-site daily precipitation generator, based on the approach proposed by Wilks (1998), for the simulation of rainfall occurrences and amounts in the Gaza Strip. The paper attempts to presents a complete methodology based on the original Wilks approach, discusses the associated difficulties and suggests enhancements of the model to circumvent some of these difficulties.

A safe, clean, and adequate water supply is vital to the sustenance of all human beings. Albert Szent-Gyorgyi (Hungarian Biochemist, 1937 Nobel Prize for Medicine) said: “Water is life's mater and matrix, mother and medium. There is no life without water”. Never has this been truer than it is in the Gaza Strip where around two million Gazans are deprived of access to this essential component of life, and water scarcity represents a real nightmare for people in this area. Rainwater harvesting, now more than ever, became a strategic option to ensure that water demands are met. The rainfall event characteristics for a given region are the most fundamental elements for designing a water harvesting system (Prinz and Singh, 2000). However, the Gaza Strip is generally characterized by very high variability of rainfall in both time and space. Therefore, for catchment scale, the pattern of dry and wet periods, the amounts and the spatial characteristics of rainfall are potentially important, since they directly affect the occurrence and quantity of runoff. Despite this fact, we are not aware of any multi-site application being developed for the study area. Consequently, in order to support the implementation of the rainwater harvesting systems, the stochastic rainfall model is developed mainly for running a simulation model used to evaluate the performance of large-scale collection systems in the Gaza Strip.

2. Data and study area

The daily precipitation measurements for this study come from eight meteorological stations in the Gaza Strip and cover a period of 33 years (September 1973–September 2006). The data were collected from the Palestinian Meteorology Office in the Ministry of Transport. The locations of the stations are shown in Figure 1. Table I also provides the corresponding annual rainfall and the coefficients of variation.

Figure 1.

Location map of the rainfall stations. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

Table I. General information about the selected stations
Station NameCodeLatitudeLongitudeAnnual rainfall (mm)Coefficient of variation (%)
Bait HanounHan31°33′34°32′41038.96
Bait LahiaLah31°34′34°28′42041.05
Shati CampSha31°32′34°28′38540.12
GazaGaz31°31′34°27′37038.95
NusseiratNus31°26′34°23′35041.86
Deir El BalahDei31°25′34°21′32044.04
Khan YounisKha31°21′34°19′28547.45
RafahRaf31°16′34°15′23045.12

Generally, the Gaza Strip has a typical semi-arid climate as it is located in the transitional zone between a temperate Mediterranean climate in the west and north, and the arid desert climate of the Sinai Peninsula in the east and south. Therefore, despite the small area of the Gaza Strip (365 km2), the amount of rainfall varies significantly from one location to the next with an average annual rainfall of about 420 mm in the north (north governorate), to 230 mm in the south (Rafah governorate). Due to the influence of rising altitudes, the yearly rainfall amount increases inland. Most rain falls in the period between mid-October until the end of March. The period from May to September is dry with no rainfall. Precipitation patterns include thunderstorms and rain showers, but only a few days of the wet months are rainy days.

3. The Wilks approach

Wilks (1998) proposed an extension of the well known first-order Markov chain for rainfall occurrences and a mixed exponential distribution for rainfall amounts, to simulate rainfall simultaneously at multiple sites. The approach comprises the use of serially independent and spatially correlated random numbers, which are then employed individually to generate precipitation occurrence and amount time series at each site. While the original approach of Wilks is applied here, the amounts of rainfall are not modeled by a mixed exponential distribution but by a two-parameter gamma distribution, as earlier studies have shown that the latter distribution is more suited for the rainfall in the Gaza Strip (Mhanna and Bauwens, 2009). Following Wilks, the rainfall occurrences are simulated by using a first-order Markov chain. The subsequent paragraphs describe the procedure followed to generate the spatially correlated random numbers and their use in the generation of rainfall occurrences and amounts at multiple locations.

3.1. The occurrence of precipitation

For each of the (N) sites and for each of the (M) months in the rainy winter half of the year (October–March), an individual local rainfall occurrence model is fitted, resulting in a total of N × M models. With the aid of these models, the transition probabilities of the rainfall occurrence can be specified at each site separately. In order to extend the model for the multi-site generation, the simulation of the rainfall occurrence is forced by serially independent but spatially correlated random numbers. In this way, the spatial correlation is preserved in the generated rainfall series across the network of the stations. The general procedure of estimating the correlations of the random number and the extension of the model is explained in the ensuing paragraphs.

3.1.1. Step 1: Determination of the conditional and unconditional probabilities of rainfall occurrence

The occurrence of precipitation is determined by using the widely used ‘chain-dependent process’ consisting of a first-order two-state Markov process. This model was chosen, in preference to a zero-order, or a second-order Markov model, for its adequacy based on the Bayesian Information Criterion (BIC) (e.g. Schwarz 1978; Katz 1981).

The first-order Markov model involves the assumption that the probability of rain on a certain day is conditioned by the wet or dry status of the previous day. Let Xt represent the binary event of ‘precipitation’ or ‘no precipitation’ occurring on day t. A wet day is defined as occurring whenever the amount of precipitation exceeds a certain threshold, while dry days are days which are not wet. In this study, a day with total rainfall of 0.1 mm or more is considered a wet day. For each site, k, the process is determined by using the two conditional probabilities for the wet-day occurrence pattern: P01(k), the conditional probability of a wet day (Xt = 1) given that the previous day was dry (Xt−1 = 0); P11(k), the conditional probability of a wet day given that the previous day was wet. For each month, these two probabilities need to be determined and used to provide a transition from one month to another. As discussed by Wilks (2006), the parameter estimation procedure consists simply of computing the conditional relative frequencies, which yield the maximum likelihood estimators (MLEs). Mathematically, these estimators can be expressed as (e.g. Zheng and Katz, 2008b):

equation image(1)
equation image(2)

where, for the site k, n01 is the historical count of wet days following dry days, n00 is the historical count of dry days followed by dry days, and so on.

The unconditional probability of a wet day, π1, for the site k, can be derived as (e.g. Katz and Parlange, 1998)

equation image(3)

and the unconditional probability of a dry day being simply

equation image(4)

3.1.2. Step 2: Determination of the correlations between the rainfall occurrence series

For each two sites, k and l, the correlations between the rainfall occurrence series Xt(k) and Xt(l) are calculated

equation image(5)

Given a network of N locations, there are N(N − 1)/2 pairwise correlations that should be specified and maintained in the generated rainfall occurrences. Following Thompson et al. (2007) and Srikanthan and Pegram (2009), these correlations are calculated in this study as:

equation image(6)

Here σ denotes the standard deviation of the binary series

equation image(7)

π0(k) and π1(k) are calculated from Equations (3) and (4). The joint probability that station pairs are both dry, π00(k, l), is estimated as:

equation image(8)

where djoint denotes the historical count of station pairs that are both dry on the same day and n is the number of data values.

3.1.3. Step 3: Definition of the stochastic rainfall occurrence model

The stochastic simulation of the Xt series will, as in the case for a single site model, be forced by a random number generator that produces uniform random numbers ut(k). Here, however, a problem arises due to the fact that the number of multivariate uniform distributions with a particular correlation matrix is infinite (Fackler, 1999). In order to circumvent the problem and to account for the necessary correlations, correlated standard normal variates wt(k)∼N[0, 1] will be used to force the occurrence process. These standard Gaussian variates will subsequently be transformed to uniform variates ut(k) through the transformation:

equation image(9)

where Φ [.] indicates the standard normal cumulative distribution function (CDF).

For the forcing of the occurrence process, the normally distributed random numbers (wt) must preserve the spatial correlation in the rainfall occurrence series. Let ω(k, l) indicate the correlation between the standard normal variates, wt(k) and wt(l), which are generated from a bivariate normal distribution. The aim then becomes to find the value for ω(k, l) leading to rainfall occurrences that exhibit a correlation of ξ0(k, l), the observed value of ξ(k, l). The problem that hereby arises is that a direct computation of ω(k, l) from ξ0(k, l) is not possible. Therefore, these correlations are obtained by constructing empirically-derived curves relating ω(k, l) and ξ(k, l) for all N(N − 1)/2 station pairs.

As discussed by Wilks (1998), one finds empirically that there is a monotonic relationship between ω(k, l) and ξ(k, l) for a given station pair k and l. Different methods can be used to define this relationship:

  • by using a nonlinear root finding algorithm through inverting the relationship between ω(k, l) and ξ(k, l) (Srikanthan and Pegram, 2006);

  • by using a maximum likelihood method (Thompson et al., 2007);

  • by using a hidden covariance model (Srikanthan and Pegram, 2009);

  • by using trial and error procedure through assigning different values of ω(k, l) and repeating the procedure till a reasonable value of ξ(k, l), close to ξ0(k, l), is achieved (Mehrotra et al., 2006);

  • by constructing empirically derived curves for each pair of stations (Wilks, 1998).

Figure 2 illustrates the latter method, which is also used in this study. The curve in Figure 2 is constructed by assuming different values for the correlations between random variates, ω(k, l), and by assessing the resulting correlation between the observed binary series of rainfall occurrences, ξ(k, l). The observed correlation being known (see step 2), the correlation between random variates can be derived.

Figure 2.

The relationship between the random variates correlation [ω(k, l)] and generated occurrences correlation [ξ(k, l)]. The ξ0(k, l) and ξmax are the observed and maximum values of ξ(k, l), and 0.918 is the correlation that has to be considered for the random numbers. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

In the case of the example, given an observed correlation of 0.647, the correlation that has to be considered for the random variates is 0.918. It is also seen on the curve that forcing the rainfall occurrence models for these locations with identical standard random numbers yields the correlation ξmax(k, l) = 0.857.

The multivariate normal variates can now be generated from (e.g. Fackler, 1999):

equation image(10)

where Rt represents an independent normal vector and U is a coefficient matrix such that

equation image(11)

Here Ω denotes the covariance matrix, whose elements are the correlations ω(k, l). Note that because of the unit variances the covariance matrix is also the correlation matrix (Dias et al., 2008).

In order to derive U from Ω, the covariance matrix has to be a positive-definite. As indicated by Wilks (1998, 1999), however, the pairwise correlations ω(k, l) are not necessarily mutually consistent, which will result in a non-positive-definite matrix. In case of a non-positive-definite, the matrix should be smoothed. This can be done by modelling the different correlations ω(k, l) as a function of site separation, which accounts for differences in the site locations between the stations, e.g.:

equation image(12)

Here, Δxkl is the horizontal distance between the stations k and l, Δλkl is the east–west component of the horizontal separation, and Δhkl is the vertical separation.

Once the covariance matrix is positive-definite, the elements of U can be obtained by Cholesky's decomposition (e.g. Fackler, 1999).

After accounting for the necessary spatial correlations, each set of the resulting serially independent random numbers, wt(k), is then used to generate precipitation occurrence time series at a particular site k. These numbers are compared to the appropriate conditional probability for a wet-day occurrence pattern [P01(k) and P11(k)], taking into consideration the wet-dry status of the previous day. This threshold is defined as Critical Probability, Pc, (Wilks, 1998):

equation image(13)

Then, the following wet day is generated if the random number is adequately small:

equation image(14)

3.2. The amount of precipitation

As already shown for the rainfall occurrence model, the rainfall amount is forced by serially independent and spatially correlated random numbers that must preserve the spatial correlation in the rainfall amount series. The general procedure of generating rainfall amount series at a number of sites is explained in the following paragraphs.

3.2.1. Step 1: Determination of the model parameters

The rainfall amounts on wet days are generated by using a 2-parameter Gamma distribution whose probability density function for site k is defined as (e.g. Katz, 1977):

equation image(15)

where rt(k) is the non-zero precipitation amounts, α and β denote the parameters of shape and scale, respectively, and Γ(α) is the Gamma function evaluated at α.

The parameters of the model are calculated, for each site k and for each month in the rainy winter half of the year, by using the method of maximum likelihood through the Thom approach (Thom, 1958). The maximum likelihood estimator for the shape parameter is given by

equation image(16)

and for the scale parameter is calculated as

equation image(17)

Here Ȳ denotes the mean daily rainfall (mm) for the month, considering only wet days, and A is the difference between the logs of the arithmetic and geometric means.

3.2.2. Step 2: Determination of the correlation between the rainfall amount series

For two sites, k and l, the correlations between the rainfall amount series Yt(k) and Yt(l) are calculated:

equation image(18)

In this study, these correlations are calculated using the Pearson product-moment correlation. As the Gamma model attempts to generate rainfall amounts on wet days for site k, the correlations between the rainfall amount series for two sites (k and l) is calculated by taking into consideration that both sites are wet.

3.2.3. Step 3: Definition of the stochastic rainfall amount model

The spatial correlation in the daily rainfall amounts is preserved by using a vector of correlated uniform variates vt. As detailed previously for the rainfall occurrence model, it is convenient to obtain the elements of this vector from a corresponding realization of correlated standard normal variates zt(k)asvt(k) = Φ[zt(k)].

The vector zt can be drawn from a multivariate normal distribution with mean 0 and covariance matrix [Ψ], whose elements are:

equation image(19)

Together with the corresponding correlation ω(k, l) and the Markov chain and Gamma parameters for the stations k and l, a particular ζ(k, l) yields a unique correlation η(k, l) between the synthetic rainfall amounts for the two sites.

Similar to the case for finding the binary Ω, a direct computation of Ψ is not feasible since the zt are not observed. The correlations in Equation (19) are thus estimated by applying an analogous procedure to the one used in the rainfall occurrence model.

Here too, the matrix [Ψ] has to be positive-definite. In case of a non-positive-definite matrix, the matrix should be smoothed using Equation (10) until the matrix is positive-definite.

The correlated multivariate normal variates are obtained from independent normal variates through a similar transformation, employing Equations (10) and (11).

Overall, the steps required for the simulation of precipitation occurrences and amounts at a number of sites are summarized in Figure 3.

Figure 3.

General steps of the original Wilks approach applied in this study for the generation of multi-site precipitation data

4. The modified Wilks approach

The original approach of Wilks involves constructing N × (N − 1)/2 curves of the type shown in Figure 2 for each month (and for all station pairs), in order to generate series with the same correlation as the observed ones. However, such construction may result in the non-positive-definiteness of the covariance matrices. Wilks overcomes this problem by modelling the pair-wise correlations of the random numbers as a function of site separation (Equation (12)). However, the approach is not efficient and requires a significant amount of time.

In what follows, we suggest improvements of the Wilks approach by two analytical solutions for the calculation of the desired correlations in the random numbers for the simulation of the rainfall occurrences and amounts. Moreover, the propped improvements also imply the use of a pragmatic and simpler method to ensure positive-definiteness of the resulting covariance matrices.

4.1. The occurrence of precipitation

The dependence measure in Equation (6), which is used to calculate the correlations between the rainfall occurrence series, depends on the joint probability that station pairs are both dry, π00(k, l) and on the marginal probabilities π0(k) and π0(l). However, during the construction of the empirically derived curves for each pair of stations, we noticed that the correlation between the random numbers, ω(k, l), does not have any significant impact on the marginal probabilities. In other words, by assuming different values for the correlations between random variates, the π0(k) and π0(l) remain constant during the process, whereas π00(k, l) increases gradually as ω(k, l) increases (Figure 4). It is also seen on the curve that the correlation that has to be considered for the random variates, in order to replicate the observed correlation, is practically equal to the needed correlation to reproduce the π00(k, l). So, by using an appropriate measure of dependence that relies only on the joint probability, the needed correlations of random numbers can be calculated analytically.

Figure 4.

The relationship between the random variates correlation [ω(k, l)] and the marginal (π0) and joint (π00) probabilities. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

The gamma coefficient proposed by Goodman and Kruskal (1954) is one of the most useful measures of association that can be used for this purpose. In their investigation of dependence measures for binary data, Tajar et al. (2001) tested seven measures: six of them based on concordance and the other one being the odds ratio. They concluded that the gamma coefficient, in additional to the odds ratio, depends only on the joint probability and not on the marginal probabilities. The basic idea in this approach is to assume that the needed correlations of random numbers are equal to the gamma correlations between the rainfall occurrence series [γ(k, l) = w(k, l)].

The gamma coefficient is calculated as (Rousson, 2007):

equation image(20)

where φ is odds-ratio and is calculated as

equation image(21)

Here π00(k, l) denotes the joint probability that station pairs are both dry, π11(k, l) is the joint probability that station pairs are both wet, and so on.

4.2. The amount of precipitation

As mentioned earlier, the stochastic simulation is forced by a random number generator that produces correlated normal variates and then transforms them each individually to obtain uniform marginal distributions. Such non-linear transformation, however, will typically have an influence on the dependence between the random normal variables. This is due to the fact that the Pearson product–moment (linear) correlation is not invariant to transformations of the underlying marginal distribution (Fackler, 1991). Therefore, the linear correlation will not be captured after the transformation if the original normal variates are linearly correlated. The most convenient solution to this problem is to use other measures of dependence that are preserved under any monotonic transformations (Fackler, 1991). One of those measures is the rank correlation (also called fractile correlation) which is invariant with respect to strictly increasing transformations of the variables involved (e.g. Ghosh and Henderson, 2003; Phoon et al., 2004). The key to this method is to specify the correlations between rainfall series using a fractile correlation and then linearly convert them into the product–moment correlation to obtain the needed correlations of random numbers. As a result, the linear correlation between the uniforms obtained from transforming the normal variates will result in a synthetic rainfall series that exhibits the rank correlation between rainfall amounts.

There are two common measures of rank correlation, being the Spearman rank correlation ρ(k, l) and the Kendall's statistic τ(k, l). The corresponding binomial correlation ζ(k, l), the correlations between the standard normal variates, zt(k) and zt(l), can be obtained via the following relationships between these measures and the product–moment correlation (e.g. Kruskal., 1958; O'Brien and Griffiths, 1965):

equation image(22)
equation image(23)

Srikanthan and Pegram (2009) used the latter equation to calculate the binormal correlations between the random numbers to generate rainfall amounts for a network of 30 stations in Australia. However, in order to find the appropriate measure considering the rainfall data used in this study, the observed correlations are plotted against the corresponding correlations of random numbers obtained by the Original Wilks approach and Equations (22) and (23) (Figure 5). It is evident from the figure that the Spearman correlation is the best fit. In this sense, the linear correlation ζ(k, l) will lead to synthetic rainfall amounts that expose the Spearman correlation between rainfall amounts η0(k, l) the observed value of η(k, l).

Figure 5.

The observed amounts correlation versus the needed random numbers correlation obtained by (a) Wilks approach, (b) Kendall's, and (c) Spearman. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

The transformation may result in a non-positive definite matrix, but this is unlikely to occur often in practice since the function in Equation (23) is so close to an identity mapping on [−1, 1] (e.g. Fackler, 1999). However, as discussed in Phoon et al. (2004), a plausible method to overcome this problem is to set the negative eigenvalues to zero. Brissette et al. (2007) have applied a similar procedure, as proposed by Rebonato and Jäckel (2000), to ensure the construction of a valid correlation matrix. Their procedure comprises the diagonalisation and the replacements of all negative eigenvalues with a small positive value, but whereby the correlation matrix is normalized using a standard equation. For a detailed discussion, considering the problem of non-positive-defined matrices, readers are referred to Rebonato and Jäckel (2000).

The required steps for the modified Wilks approach are summarized in Figure 6. In general, this methodology reduces the amount of work presented in Figure 3, resulting in an improvement in the computational speed.

Figure 6.

General steps of the modified Wilks approach for the generation of multi-site precipitation data

5. Model performance evaluation

The goal of rainfall generators is to produce synthetic rainfall data which are statistically similar to the observed ones. Subsequently, both observed and generated rainfall series are subjected to a standard exploratory data analysis to describe the performance of the daily generation model using a set of statistical parameters. The original Wilks approach and the modified approach have fitted the same stochastic process to produce the precipitation occurrences and amounts at a number of sites. Consequently, the main comparison between the model using the original approach to parameter fitting and the model using the modified approach will be based on the ability of the model to preserve spatial characteristics of the historical rainfall series.

The following statistics are used to evaluate the reproduction of the various characteristics of the processes of rainfall occurrences and amounts:

  • the mean number of wet days per month

  • the mean daily rainfall amount in a month

  • the mean daily rainfall amount on wet days

  • the total rainfall amount in a month

  • the standard deviation of the daily rainfall amount in a month

  • the skew coefficient of the daily rainfall amount in a month

  • the cumulative relative frequencies of the daily rainfall depths

The following statistics are used to evaluate the reproduction of the spatial characteristics and as a comparison between the two approaches:

  • the joint probabilities that station pairs are both wet (π11)

  • the joint probabilities that station pairs are both dry (π00)

  • the cross-correlations of binary wet/dry occurrences between stations

  • the cross-correlations of daily amounts (each pair wet) between stations

6. Results and discussion

6.1. Precipitation occurrence and amount

The transition probabilities used by the model to generate daily precipitation occurrences, and the parameters employed to generate daily precipitation amounts with the use of the Gamma distribution are given in Table II. The various daily statistics derived from generated and historical rainfalls are compared in Figures 7 and 8.

Figure 7.

A comparison of historical and generated rainfall statistics, for all stations and all months. (a) Mean number of wet days; (b) mean rainfall; (c) mean rainfall on wet days; (d) total rainfall; (e) standard deviation; (f) skew coefficient. The 95% UCL and LCL are the upper and lower confidence levels = ± 1.96× standard error (0.812). This figure is available in colour online at wileyonlinelibrary.com/journal/joc

Figure 8.

Cumulative relative frequencies of the historical and generated daily rainfall depths during the winter half of the year. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

Table II. Monthly values of the parameters needed for precipitation occurrence and amount
StationsOccurrence parameters (%)Amount parameters
  Month Month
  OctNovDecJanFebMar OctNovDecJanFebMar
HanP11333745494141α0.890.801.141.011.061.04
 P01491319189β13.5318.7312.7714.1611.238.35
LahP11353847504441α0.910.781.041.051.071.16
 P0141115181910β11.4717.9813.7313.5610.147.03
ShaP11353845494540α0.790.760.940.940.970.84
 P015101419189β12.4216.7314.1813.5711.009.17
GazP11394061585553α0.920.860.960.971.120.91
 P0181017222110β9.9414.0914.0413.178.798.22
NusP11303444464437α1.260.961.001.171.160.95
 P01391417178β6.8714.4813.1810.738.9910.04
DeiP11343541453927α0.900.820.981.081.021.06
 P01491317169β8.5416.3513.4110.849.858.57
KhaP11253642413831α0.980.940.811.071.071.01
 P01391317169β6.8311.9115.069.738.908.57
RafP11193436363631α0.870.961.001.271.221.26
 P01381315148β9.9010.9510.177.716.946.70

For each month of the winter half of the year, the generated and observed mean number of wet days, mean daily rainfall, mean daily rainfall on wet days, the total rainfall per month, the daily standard deviation, and the daily skew coefficient are plotted in Figure 7. The results show that the occurrence model was successful in reproducing the mean number of wet days at the eight stations. There was no significant difference between the observed and generated numbers of wet days. The generator model was also successful in producing the mean daily precipitation and the quality of data was satisfactory for all stations. In addition, the model preserves the mean daily precipitation on wet days and the total precipitation per month adequately. The daily standard deviation was well reproduced by the model. The daily skew coefficient was generally considered to be reasonably preserved by the model. Nearly all points, except two, lie within the 95% confidence level of ± 1.96× standard error (0.812), which confirms that the synthetic values do not differ much from the observed ones.

Figure 8 shows the cumulative relative frequencies of daily rainfall depths during the winter half of the year, simulated by the weather generator, compared with the observed ones. The results show that the Gamma distribution model can reproduce with high performance the properties of the distributions of daily precipitation amounts.

Overall, the performance of the model is generally considered to be satisfactory for all stations, as the synthetic daily precipitation series generated from the weather generator keep all the important characteristics of the observed series.

6.2. Spatial characteristics

Figure 9 compares the joint probabilities of the observed and generated daily series for all station pairs and all months, for the case where both sites are wet (left side), and for the case that both sites are dry (right side). In Figure 9, each dot corresponds to a given station and a given month of the year. The statistics for the simulated values are based on a simulation of 3000 years. Basing on the figure, it is evident that the full joint distribution of simultaneous precipitation occurrence across the network of the eight stations is well represented by the model for the two approaches for parameter fitting.

Figure 9.

Joint probabilities—considering all 168 combinations of station pairs and all 6 months—that station pairs are both wet (left side) and dry (right side) on a given day. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

The cross-correlations between the rainfall occurrences at different sites are presented in Figure 10. There is a little distinction between the performances of the model using the original Wilks approach (left side) and the modified approach (right side) with respect to preserving this statistic. Apparently, the model performs quite well for both approaches: most points are positioned within the 95% confidence level of + 1.96× standard error (0.045) for the upper level (UCL), and − 1.96 × 0.045 for the lower level (LCL).

Figure 10.

Cross-correlations between the rainfall occurrences, for all station pairs and months. The dashed lines indicate the 95% upper confidence level (UCL) and the 95% lower confidence level (LCL). This figure is available in colour online at wileyonlinelibrary.com/journal/joc

As shown in Figure 11, the cross-correlations between the rainfall amounts at different sites are well reproduced by the model for both approaches. The majority of points are within the 95% confidence band, which means that the synthetic and observed correlations of the rainfall amounts do not differ much.

Figure 11.

Cross-correlations between the rainfall amounts, for all station pairs and months. The dashed lines indicate the 95% upper confidence level (UCL) and the 95% lower confidence level (LCL). This figure is available in colour online at wileyonlinelibrary.com/journal/joc

Furthermore, as an application, the generated rainfall series of the eight stations are integrated into one series, representing the average rainfall over the Gaza Strip. A comparison between the historical data and the results of the simulations with the model, using the modified Wilks approach to parameter fitting, shows that the statistics are well captured by the model (Figure 12).

Figure 12.

Mean (a), standard deviation (b), and skew coefficient (c) of the average daily rainfall over the Gaza Strip We have provided “model for generation of rainfall in the gaza strip” as the running head. It will appear on the top of the third page and all the recto pages of the article. Pls confirm whether this is appropriate. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

Overall, the spatial characteristics of the historical rainfall series seem to be reasonably captured by the model for both approaches.

7. Conclusions

A multi-site daily precipitation generator based on the Wilks (1998) approach is developed for the simulations of rainfall occurrences and amounts in the Gaza Strip. A first-order two-state Markov chain is used to determine the occurrence of rainfall. The rainfall amounts on wet days are generated by using the two-parameter Gamma distribution. Parameters of the rainfall model are estimated, for every month in the rainy winter half of the year (October–March), from observed precipitation data by using the method of maximum likelihood.

The paper presents a complete methodology taking into account the original Wilks approach and suggests solutions for enhancing the practical development of the model. The first improvement concerns an analytical calculation of the desired correlations in the random numbers that reproduce the observed correlations between the rainfall occurrence series, through a gamma coefficient. Secondly, the correlations between the rainfall amount series are specified using a rank correlation and then linearly transformed into product-moment correlations to obtain the required correlations in the random numbers. In the original Wilks approach, this was achieved by constructing all empirically derived curves for each pair of stations and for every monthly period, which may however result in non-positive-definiteness problems. The latter problem is solved in this study by setting the negative eigenvalues to zero or to a small positive value. In Wilks (1998), the problem of a non-positive definite correlation matrix was overcome by modeling the pair-wise correlations of the random numbers as a function of site separation, which is not efficient and requires a significant amount of time.

Statistical analyses on the historical and synthetic rainfall series show that the model generally performs well as it preserves all the important characteristics of daily rainfall occurrences and amounts, as well as the spatial characteristics. This superior performance is consistent with the order of the Markov model and the gamma distribution used in this study. It would be interesting to investigate if the proposed fitting algorithms would work as well with higher (or ‘hybrid-’) order Markov chains and with distributions for nonzero amounts other than the gamma distribution. Overall, major advantages of the model include its simplicity, its increased efficiency, a significant improvement in computational speed and a considerable gain in the effort for the implementation.

Ancillary