Stochastic space-time regional rainfall modeling adapted to historical rain gauge data



[1] Stochastic rainfall models are important tools both for practical issues and in studies of weather- and climate-sensitive systems. We propose an event-based model, continuous in space (two-dimensional) and time, that describes regional-scale, ground-observed storms by a Boolean random field of rain patches. The model creates complex space-time structures with a mathematically tractable framework. The estimation method relates temporal observations at fixed sites to the movement of the model storm rain field, thereby making historical rain gauge data suitable for model fitting. The model is estimated using hourly historical data at eight rain gauges in Alabama and tested for its capabilities in capturing statistical characteristics of the historical data, including rainfall intensity, rainfall intensity extremes, temporal correlation, effects of temporal aggregation, spatial coverage, and spatial correlation.

1. Introduction

[2] Weather sequences generated by stochastic models are often used in process simulations because historical weather data may be inadequate in terms of length, spatial coverage, and completeness. Moreover, weather sequences generated from stochastic models provide a mechanism for investigating the implications of weather uncertainty in process models.

[3] Because of its important role in a broad range of land surface processes, rainfall has been one of the most actively investigated elements in weather generator models. Over the past four decades, stochastic rainfall models have evolved through several generations; see review articles by, for example, Wilks and Wilby [1999], Onof et al. [2000], and Wheater et al. [2005]. Among the more popular types are alternating renewal models [Green, 1964; Roldan and Woolhiser, 1982], Markov chain models [Chin, 1977; Katz, 1977; Richardson, 1981; Chandler and Wheater, 2002], clustered point process models [Kavvas and Delleur, 1981; Smith and Karr, 1983; Waymire et al., 1984; Rodriguez-Iturbe et al., 1987; Cox and Isham, 1988; Cowpertwait, 1995; Northrop, 1998], and downscaling models [Wilby et al., 1998; Ferraris et al., 2003]. Many of the early models, such as those in the Markovian process framework, are “observation-based” in the sense that they make statistical assumptions on certain properties of rainfall, then construct the model and estimate parameters based directly on statistical analysis of observed data. Some more recent approaches, exemplified by clustered point process models, may be called “event-based” as they describe and simulate rainfall starting from a simplified prescription of how the storm event actually occurs and develops.

[4] These two types of modeling approaches differ in two basic respects: discreteness and spatial coverage. Observation-based models naturally arose from analysis of daily or hourly rainfall records at a single station. When extended to spatial models, they concern rainfall at multiple discrete locations as opposed to operating in a continuous spatial domain. In order to express the intercorrelation of rainfall between locations, a complex covariance structure and a large number of parameters are often needed [Smith, 1994; Wilks, 1998]. In contrast, event-based models can be intuitively played out in a continuous spatial domain, evolving seamlessly in time without artificial aggregation with regard to clock time intervals. This major advantage is possible because at the heart of these models is a quasi-physical picture of the rainfall process. Prescription of this simplified rainfall mechanism is where much effort is devoted in developing these models.

[5] In this paper we propose an event-based regional model along the line of point process models. At the center of the idea is a Boolean model [Matheron, 1975; Serra, 1982; Stoyan et al., 1995; Molchanov, 1997], which consists of a spatial Poisson point process and additional properties attached to the points. The points are the center of rain patches, within which rainfall intensity varies according to a prescribed profile. This model has a clear spatiotemporal structure, works in continuous spatial and temporal scales, and can be estimated with widely available long-term historical data. To estimate the model parameters, stereological relations of the Boolean field are used. The proposed estimation strategy is well suited to the way rainfall is observed and recorded at rain gauges.

[6] We first describe the formulation of the model, then give a full account of the model fitting procedure. To validate the model against historical data, we use simulations and analytic derivations to examine several statistical properties of rainfall that such stochastic models are expected to capture. The presentation is illustrated throughout using a historical hourly rainfall data set, which we introduce now, before turning to the model itself.

2. Illustrative Data

[7] Ground-based observational climate data are maintained by the National Climatic Data Center (NCDC, in Asheville, North Carolina. Precipitation data come in several time resolutions such as daily, hourly, and quarter hourly, with different time coverage at different stations throughout the nation. We chose data from the hourly data set TD3240 at eight stations in Alabama (Figure 1) in the years from 1949 to 1961 for illustrations. These data were chosen because the region has a relatively simple topology and multiple stations distributed in a balanced configuration. The record period, from 1949 to 1961, corresponds to the period when all stations provided high-resolution data.

Figure 1.

State of Alabama and locations of the eight rain gauges for the illustrative data set. The smallest interstation distance is 44.6 km between New Brockton and Troy. The largest is 168 km between Auburn and River Falls.

[8] Records in this data set are hourly precipitation amounts, accurate to 0.254 mm (0.01 inch). These hourly aggregated data derive from the standard tipping bucket rain gauge, which consists of a bucket that collects rainfall and tips every time it fills. The number of tips during each hour enters the historical record and reflects the rain amount discretized as multiples of the bucket capacity, 0.254 mm.

[9] Common dry periods across all eight stations exceeding 12 hours are taken to be intervals between storms. By this empirical criterion, 918 storms are identified in the illustrative data. The particular choice of 12-hour dry period as a storm separator precludes individual storms with extended dry periods. The choice to model storms without extended dry periods contrasts with rainfall models having clustering mechanisms.

[10] For each year, data for the period between 1 April and 31 October were extracted for model estimation and validation. Model parameters during this summer season were treated as temporally homogeneous.

3. Formulation of the Regional Boolean Rainfall Model

[11] We model the regional time-varying two-dimensional surface rainfall intensity field for a single storm by “rain patches” moving across the region of interest (Figure 2). These model rain patches should be interpreted as elementary, possibly overlapping, rain areas as observed on the ground rather than as cloud structures. The geographic scale of the model rain field, consisting of rain patches, is taken to be large relative to the monitoring region of interest. Storm size is the length along the direction of storm movement, i.e., storm size is the product of storm speed and duration. A modeled storm has a fixed velocity (including speed and direction).

Figure 2.

Schematic of the Boolean model rain field of a storm. (top) Rain patches moving horizontally, as a model of the ground observation of a storm. The gray scale color of each patch indicates the average intensity of rainfall being deposited by the patch. On the transects AB and CD along the storm's movement, X1Y1, X2Y2, X3Y3, X4Y4, X5Y5, and X6Y6 are rain patch chords; X1Y1, X2Y2, X3Y4, X5Y5, and X6Y6 are clumps; Y1X2, Y2X3, and Y5X6 are gaps. (middle and bottom) Rainfall time series observed at locations A and C, assuming constant rainfall intensity within a rain patch. Wet and dry durations observed are proportional to the lengths of clumps and gaps on the linear transects AB and CD.

[12] The internal structure of the modeled rain field is described by an isotropic Boolean model, which is commonly used in stereology and stochastic geometry. The Boolean model is relatively simple, yet its general and flexible structure can accommodate specific variants [see Matheron, 1975; Serra, 1982; Hall, 1988; Stoyan et al., 1995; Molchanov, 1997]. Realizations of the field may appear complex and irregular, but the model is mathematically tractable in many respects and Boolean fields are straightforward to simulate (see section 5).

[13] In our context, circular rain patches are randomly located with centers forming a homogeneous spatial Poisson point process on the plane. Each rain patch has a random radius and a random mean rainfall intensity, both assumed independent of the location of the patch. Different rain patches are independent of each other. In particular, the patches are free to overlap, thus allowing for complex shapes of connected rain areas on the ground. The rainfall intensity at a location covered by overlapping patches is the sum of the intensities at the corresponding location within each participating patch.

[14] The rainfall intensity within a rain patch is assigned a time-invariant profile which peaks at the patch center and decreases linearly toward zero on the edge of the patch. This is more realistic [Konrad, 1978; Goldhirsh, 1983] than assuming a constant intensity over the entire patch [Goldhirsh, 1986; Cox and Isham, 1988]. We make no assumption of independence between the size and rainfall intensity of a patch. The model uses an empirically derived joint distribution of patch size and intensity.

[15] This simple model of a storm, that is, a spatial Boolean field of rain patches moving at a constant velocity, introduces complex space-time structures. A fixed location on the ground, such as a rain gauge, experiences an alternation of dry and wet periods corresponding to a linear transect of the planar field. We define the following geometric attributes on the transect: a chord is an intersection of the transect with a rain patch, a clump is a contiguous wet interval on the transect, and a gap is a dry interval on the transect.

[16] Figure 2 contains examples of these attributes along two transects AB and CD through the rain field. A clump consists of a single rain patch chord or overlapping chords. Gaps are dry segments between consecutive clumps. The relations between patches, chords, gaps, and clumps form the basis of inferring properties of rain patches from dry and wet intervals observed at rain gauges.

4. Estimation of the Model

[17] The regional event-based model is characterized by (1) the joint distribution of storm velocities and storm sizes, (2) the Poisson density of patch centers in the model rain field for a storm, and (3) the joint distribution of patch radii and average patch rainfall intensities. Each storm is a random realization of a moving Boolean field. The available historical data used for model estimation consist of the start and stop times of continuous rain periods (clumps) at each of the monitoring stations, together with the rainfall total in each rain period.

[18] The size and velocity will be estimated for each storm in the historical data, therefore we will have empirical distributions of storm size and storm velocity. Estimating storm velocity entails comparing time series observations at multiple rain gauges that are in a fixed geographic configuration. The estimated velocity is used to convert dry and wet durations in the historical time series data to lengths of gaps and clumps in the modeled Boolean field. Storm size is derived from storm velocity and duration.

[19] Since the model is regionally homogeneous, we pool the gaps and clumps from all the monitoring stations in the region, and over all storms, to form combined samples. Statistics of gaps and clumps of the Boolean field are related stereologically to the density and size of the patches, and these relations are used to estimate the Poisson patch density and patch size distribution. Estimation of the patch rainfall intensity distribution is based on simulations using the estimated patch density and patch size distribution, together with the observed rainfall amounts in wet periods.

[20] Below we discuss issues in the order they occur in the estimation process. Notation used in the model is listed in Table 1.

Table 1. Notation Used in the Modela
  • a

    See especially section 4. The symbols are grouped by the geometric object they are used for.

Dobserved(dry) gap size, km
μDestimatedmean D, km
Wobserved(wet) clump size, km
FWempiricaldistribution function of W
uestimatedstorm speed, km/h
θestimatedstorm direction
Restimatedpatch radius, km
λRestimatedPoisson density of patch centers, km−2
FRestimateddistribution function of R
μRderivedmean R, km
r0specifiedminimum R, km
q(multiple)rainfall intensity
Cestimatedchord length, km
λCestimatedPoisson density of chord centers on linear transects, km−1
FCderiveddistribution function of C
fCderivedprobability density function of C

4.1. Estimating Storm Velocity

[21] The storm velocity here refers to the velocity of the modeled spatial rain field. The storm velocity is estimated from the rainfall data by analyzing relations between observations of the same storm at multiple stations. Several commonly used methods dealing with this problem are reviewed by Niemczynowicz [1987, p. 137]. The reviewed methods all rely on comparing detailed rainfall intensity processes at the rain gauges. This imposes some requirements that the data used in this study do not satisfy, including “the time resolution must be of the order of one or two minutes,” and “distance between gauges must be of the order of one km.” Another requirement is on the model. According to Niemczynowicz [1987, p. 138], “most of the known methods…often fail when more than one rain cell is present over the gauge network at the same time.” Since our model does not require that different locations experience the same set of rain patches, these methods for estimation of storm velocity, based on detailed comparisons of rainfall intensities, are not applicable even with data of higher spatial or temporal resolution.

[22] The key to the method we propose below is that observations of the storm at a pair of stations have a time difference that, under the model, is determined by the location relation of the two stations and the velocity of the storm. This method uses the geometry of the rain gauge network together with starting and stopping times of the storm at each rain gauge.

[23] Suppose a storm is recorded by k stations Pi: (xi, yi), i = 1, ⋯, k. For each station there is an observed reference time ti, taken as the average of the beginning and end of the storm at that station. For each of the k(k − 1)/2 station pairs, the difference of the two observed reference times is related to the storm speed u and direction θ via

equation image

were 1 ≤ i < jk, αj,i denotes the direction of the line from Pj to Pi, and dj,i is the distance between the two stations (see Figure 3). The error term δj,i exists because the reference times observed at different stations are, although conceptually equivalent, subject to random disagreement.

Figure 3.

Relations between the locations of two sites, P1 and P2, the movement of the storm, and the travel distance of the storm between the two sites. See equation (1).

[24] From system (1) we derive the least squares estimates of (cos θ)/u and (sin θ)/u:

equation image

where all the sums are for all the station pairs. Estimates of the storm speed u and direction θ follow. The storm size is then estimated by the product of the estimated storm speed and the longest storm duration across all stations.

[25] The nonlinear relation u = equation image causes an underestimation of u. We have obtained a correction factor of 1.3 through simulations and applied it to the speed estimates.

[26] System (1) requires observations at three or more stations in order for the velocity of a storm to be estimated. Of the 918 storms identified in the illustrative Alabama data, the velocity and size of 535 were estimated, shown in Figures 4 and 5.

Figure 4.

Storm velocities estimated from the Alabama data. Each cross represents a storm moving in the direction of the cross away from the origin, at a speed indicated by its distance to the origin.

Figure 5.

Frequency distribution of storm sizes estimated from the Alabama data. Because the distribution is right-skewed with a very long tail, we use a log scale on the abscissa to help show the full range of the distribution as well as details around where the mass of the distribution concentrates. Similar comments apply to Figures 6, 13, 14, 15, and 20.

4.2. Estimating the Mean Gap Size

[27] Using the estimated storm speed, we convert within-storm dry periods in the time series data to gap lengths in the Boolean field. Each station with no record for a particular storm contributes a gap equivalent to the size of the storm. We similarly account for gaps from the beginning of a storm to the first rainfall clump and from the last rainfall clump to the end of the storm. Gaps in the Boolean field have an exponential distribution [Stoyan et al., 1995, p. 82]. The distribution of gaps derived from the data is compared with the exponential distribution in Figure 6.

Figure 6.

Frequency distribution of gap sizes derived from the Alabama data, compared to the histogram polygon of the exponential distribution whose mean equals the estimated mean gap size, μD. (Abscissa is on log scale.)

[28] Because large gaps are underrepresented in the empirical gap size data, we used a trimmed mean of these data to estimate the model mean gap size, μD, trimming the data at the 5th and 95th percentiles, which are denoted by d5 and d95, respectively. The expected value of the trimmed mean is related to μD by

equation image

Solving the nonlinear equation above provides an estimate of μD. This estimate from the Alabama data is 153 km.

4.3. Estimating the Density and Size Distribution of Rain Patches

[29] The spatial Boolean model undergoes two reductions to yield data at the monitoring network. First, the 2-D Boolean model of patches reduces to a 1-D model of patch chords on a linear transect corresponding to a monitoring location. Then contiguous wet segments on the transect are observed as clumps, whereas dry segments are gaps (see section 3). Each clump is either a single isolated wet chord or the union of overlapping wet chords.

[30] We infer the 1-D transect model of patch chords from the observed clumps and gaps in the historical data, then infer the 2-D Boolean model of patches from the 1-D transect model of patch chords. The relation between the 1-D model and clumps and gaps is somewhat involved, but practical approximations are available.

4.3.1. Estimating the 1-D Patch Chord Model From Clumps and Gaps

[31] On a linear transect through the 2-D Boolean field, wet patch chords (locations represented by their midpoints) form a 1-D Boolean field [Matheron, 1975, p. 140; Stoyan et al., 1995, p. 81] with density λC (number of chords per unit length of transect) and chord length distribution FC. These are the two parameters for the 1-D patch chord model that we need to estimate.

[32] The chord density in the 1-D model is determined by the 2-D model as [Mecke and Stoyan, 1980; Stoyan et al., 1995, pp. 81, 354]

equation image

(see Table 1 for notation). This is connected to statistics of gaps, which have an exponential distribution with expected value [Lu and Torquato, 1993; Stoyan et al., 1995, p. 82]

equation image


equation image

where mean gap size μD has been estimated in section 4.2.

[33] The distribution of clumps, FW, is a known function of the chord properties λC and FC [Hall, 1988, p. 91; Quintanilla and Torquato, 1996]. Handley [1999] describes a convenient discrete recursive approximation to this relation. Let

equation image

where δ is a properly chosen small value, and

equation image

where Q(x) = P(x) Πj = 1i − 1G((j − 1)δ), and G(0) = 1 − δλC. The distribution of chord lengths is calculated from G as

equation image

We use the empirical distribution of data-derived clumps as an estimate of FW.

4.3.2. Estimating the 2-D Patch Model From the 1-D Patch Chord Model

[34] The distributions of patch radii and chord lengths are related stereologically [Mecke and Stoyan, 1980; Stoyan et al., 1995, p. 354] by

equation image

where fC is the probability density function of chord lengths.

[35] The mean patch radius μR can be derived from the expression above. If r0 is a specified minimum patch radius, then FR(r0) = 0 by definition. Substituting FR(r0) and r0 into relation (10) yields the mean patch radius expressed in terms of chord lengths:

equation image

[36] With μR obtained by (11), the patch radius distribution FR(x) can be obtained via (10). (This distribution of patch radii estimated from the Alabama data is shown as part of Figure 13, which will be introduced and fully discussed in section 5.1.) According to the estimated distribution, the median patch radius is 3.3 km. The patch density λR is estimated via relation (5). Its estimate from the Alabama data is 0.00043 rain patches per square kilometer.

[37] Setting r0 to a positive value prevents instability in the estimates of FR and λR. The value of 0.25 km was used in this study.

4.4. Estimating the Distribution of Patch Rainfall Intensities

[38] Historical data contain wet clumps, each with an observed average rainfall intensity and an estimated length. Each clump is implicitly composed of one or more chords, each corresponding to a rain patch. We use a simulation method to estimate the spatially averaged patch rainfall intensity distribution from these rain clump statistics: the estimated Boolean model is used to generate rain patch fields and their derived clumps; simulated clumps are assigned average intensities, based on their length, from the observed joint distribution of clump lengths and intensities.

[39] Suppose that a simulated clump is assigned an average intensity q according to its length w using the empirical joint distribution. The simulated clump consists of n chords, say, of length ci and average rainfall intensity qi, i = 1, …, n. Then

equation image

[40] We assign a common average intensity x to all the n patches corresponding to the chords contributing to this clump. Let ri, i = 1, …, n, be the radii of the patches. Because a modeled rain patch has a conic intensity profile, the average intensity on the ith chord is

equation image

[41] Combining relations (13) and (12), we get the common average rainfall intensity for the n patches:

equation image

[42] This procedure generates a sample of n (identical) patch rainfall intensities with their corresponding radii. By repeating this procedure for multiple clumps in a simulated rain patch field, we generate a joint distribution for rain patch radius and intensity.

[43] This method does not assume an a priori relation between the size and rainfall intensity of a patch. In the simulated sample of patch (radius, intensity) pairs, the two properties of the patch may be correlated or uncorrelated. The derived patch (radius, intensity) distribution for the Alabama data is shown in Figure 7, which exhibits the quartiles of patch rainfall intensities as the size of the patch varies. This joint distribution corresponds to 10 000 simulated patches based on parameter estimates derived from the Alabama data.

Figure 7.

Quartiles of patch rainfall intensity as patch size increases, estimated from the Alabama data. Each patch size interval contains 500 simulated patches, as described in section 4.4, from which the quartiles of their rainfall intensities are obtained.

4.5. Summary of Model Estimation

[44] In summary, we estimate the regional rainfall model in three steps.

[45] First, storm velocity and size are estimated using rainfall time series at multiple stations. The estimated storm speed converts observed wet and dry durations to clump and gap lengths in the Boolean rain field. Mean gap size is estimated with the derived gap lengths.

[46] Second, the Poisson density of patch centers and the distribution of patch radii are estimated using stereological relations between statistics of rainfall clumps and gaps and those of the underlying rain patches.

[47] Third, the distribution of patch average rainfall intensity across patches is estimated using clump rainfall intensities in the data together with Boolean model simulations using parameters estimated in the previous step.

[48] This procedure nonparametrically estimates the distributions of patch sizes and patch rainfall intensities. Relations in the first two steps are summarized in Figure 8.

Figure 8.

Relation map for the model estimation procedure, prior to the estimation of patch rainfall intensities. Primary model parameters are framed.

[49] The components of this abstracted model interact to produce complex spatiotemporal rainfall patterns on the ground. For example, the Poisson density and the size distribution of rain patches together determine the degree of patch overlapping. By allowing elementary rain patches to overlap, the model presents surface rain areas of complex shapes. The movement of this rain field then translates into dry and wet durations in the time series observations at the monitoring network.

[50] In time-aggregated data such as the hourly records used in the Alabama illustration, the time-averaged rainfall intensities in wet periods will underrepresent extreme rainfall intensities. Therefore patch rainfall intensity parameters estimated directly from wet periods (see section 4.4) may not appropriately capture peak intensities. As a solution, we reconstructed continuous time rainfall series from the aggregated historical data using the procedure described by Zhang [2004], and used the reconstructed data in estimating the mean gap size, Poisson patch density, patch size distribution, and patch rainfall intensity distribution. With data of higher time resolution, this reconstruction will be less consequential. We used the original data to estimate storm speeds, because this continuous reconstruction does not affect the storm speed estimation.

5. Assessment of the Model

[51] Simulations provide a platform for investigation of rainfall characteristics and for model validation. We conducted 100 simulations, each consisting of 535 storms with velocity and size taken from each of the 535 storms in the Alabama data that had their velocities and sizes estimated. The procedure for simulating a storm with the estimated model is outlined in Figure 9. The patch centers of each simulated storm are distributed in a rectangular field of desired length (i.e., storm size) and sufficient width. Since storm frequency is not a subject of this study, we inserted long dry periods between simulated storms. Figure 10 shows the spatial rain field of one storm thus simulated.

Figure 9.

Procedure for simulating a storm with the Boolean storm model.

Figure 10.

Example of a simulated Boolean storm rain field. The storm is moving as indicated by the arrow in the bottom right corner. The gray scale color of each patch indicates the average rainfall intensity in the patch. The crossed circles indicate the locations of the eight monitoring sites for the Alabama data set, from which the model was estimated.

[52] Using the Boolean model with parameters estimated from the Alabama data, the simulated storms result in continuous rainfall time series at the eight Alabama rain stations. By imitating the aggregating mechanism of the tipping bucket rain gauge, we converted the simulated continuous data to aggregated hourly data comparable to real historical data. Simulated time-aggregated hourly series at the eight monitoring sites are referred to as the “historical-like” data.

[53] Historical-like data from repeated simulations are used to assess the statistical variability of the model estimation procedure described above. We also used historical-like data to compare model output rainfall with the historical data for selected statistics that were not directly used in model fitting.

5.1. Statistical Properties of Parameter Estimators

[54] The 100 runs of simulated historical-like rainfall data were used to repeatedly estimate storm speed and size as discussed in section 4. The simulated continuous data records at the eight monitoring sites were used to estimate the remaining components of the model, in lieu of the continuous reconstruction device used for the original parameter estimation. Thus 100 sets of models were reestimated from simulated data. Examinations of these reestimated models reveal how much the estimates would fluctuate based on data of 535 storms at these 8 monitoring stations, assuming the model to be correct.

[55] Storms were identified in the simulated data, using the 12-hour storm separator and ignoring knowledge of the storm simulations. Most of the simulated storms were identified as individual storms by this criterion. Each of these storms has an assigned velocity for the simulation and a reestimated velocity from the simulated data. The assigned and reestimated speeds and directions of each storm are compared in Figures 11 and 12. One can see that the bias of the estimated storm direction can be as big as 180 degrees. This may happen when, for example, the storm is recorded by few stations and the temporal sequence of the records for this storm at the stations happen to contradict with the actual movement of the storm. This scenario is more likely if the storm's movement is perpendicular to the configuration of its covered stations. Furthermore, the tipping bucket aggregation also plays a role in distorting the data. It should be noted that once the direction estimate is seriously biased, the velocity estimate cannot be trusted. Estimates of storm-specific velocities exhibit substantial variability at the individual storm level because of the limited data that are available for estimation from single storms.

Figure 11.

Actual (for simulations) moving directions of 5000 simulated storms and the biases of their estimates from the simulated data (estimated minus actual). Each dot represents one storm.

Figure 12.

Comparison of the actual (for simulations) and the estimated (from simulated data) speeds of 5000 simulated storms. Each dot represents one storm.

[56] Compared with their original values in the model, the patch density λ is overestimated by about 59% whereas the mean gap size is underestimated by about 14%. These two quantities are connected through relation (5). Their estimation biases are caused mainly by the fact that low-coverage storms that deposit rain at fewer than 3 of the 8 monitoring sites are not used in model estimation. We are investigating ways to correct these estimation biases.

[57] Frequency distributions of patch radius estimates, based on the repeated simulations, are shown in Figure 13 and compared to the patch size distribution calculated from the historical data. The box plot for each patch size interval indicates the range of statistical variation across the 100 simulations. Similarly, frequency distributions of patch rainfall intensities, based on simulations, are shown in Figure 14 and compared to their counterparts based on the Alabama data. Both patch size and patch rainfall intensity in the reestimated models have some negative bias compared to the original model. The statistical variation of the reestimates is small.

Figure 13.

Frequency distribution of patch radii in the model estimated from the Alabama data, overlaid with box plots of the 100 simulation estimates of the same quantities. Each box plot indicates the median, quartiles, and extremes of the distribution (same in subsequent box plots). (Abscissa is on log scale.)

Figure 14.

Frequency distribution of patch rainfall intensities in the model estimated from the Alabama data, overlaid with box plots of the 100 simulation estimates of the same quantities. (Abscissa is on log scale.)

5.2. Distribution and Extremes of Rainfall Intensities

[58] The fitted Boolean patch model, which simulates spatial-temporal storm fields, thus generates simulated rainfall time series at the monitoring sites and thus indirectly generates rainfall intensity data. The degree of statistical agreement between measured rainfall intensities in the historical data and those indirectly obtained from Boolean patch model simulations is an important indicator of model performance.

[59] For model-simulated historical-like data, we obtained the distribution of rainfall intensities in individual wet hours and compared it with its counterpart in the historical data, as shown in the quantile-quantile plot of Figure 15. This distribution in its entire range is reasonably reproduced. In particular, the reproduction of the long tail of extreme hourly intensities is important for examining rarity of extreme rainfall events using simulations. Similar agreement was seen in the comparison of distributions of wet spell rainfall intensities, i.e., intensities in consecutive wet hours.

Figure 15.

Quantile-quantile plot of individual wet hour rainfall intensities in the simulated historical-like data versus historical data. (Both axes are on log scale.)

[60] In the historical and simulated aggregated data, we found the maximum single-hour rainfall intensity among all eight stations in n storms, where n is 10, 50, 100, or 200. The n storms were sampled at random; this sampling was repeated 100 times. Figure 16 is a summary of these extreme values. Each box on this plot summarizes the 100 rainfall intensity maxima corresponding to 100 random sampling of the group of n storms.

Figure 16.

Comparisons of the maximum individual wet hour rainfall intensities across all the eight monitoring sites in n (n = 10, 50, 100, 200) storms in simulated historical-like data versus historical data. Each box plot summarizes 100 such intensity maxima corresponding to 100 random resamplings of the n storms. For each n the box on the left is for historical data, and the box on the right is for simulated data.

5.3. Temporal Persistence of Rain at a Fixed Location

[61] The rainfall process observed at any location demonstrates strong continuity along time. We consider two measures of temporal persistence to check the model's performance in this regard: a lagged conditional probability of rain and a lagged covariance between rainfall intensities.

[62] The time-lagged conditional probability of rain, given rain at the same site at the earlier time separated by time lag x, is defined as

equation image

where q(t) is rainfall intensity at time t. Comparison of this persistence function between historical and historical-like model-simulated data is shown in Figure 17. The comparison demonstrates reasonable agreement.

Figure 17.

Time-lagged persistence of rain status at a monitoring site for the historical Alabama data (solid curve) and simulated historical-like data (curve with circles). See definition in equation (15).

[63] We could have worked on the correlation of rain status at two moments separated by time lag x instead of the “persistence of rain” defined above. However, to determine the correlation we need the marginal probability of rain. Because we are only interested in times within a raining period (the “rain/dry” status in interstorm periods would have a strong autocorrelation), this raises the question of how to determine the time boundaries of a rain event.

[64] The type of lagged covariance used here is the expected product of individual wet hour rainfall intensities at a monitoring site, lagged by x hours, conditional on rain at the earlier time:

equation image

The plot of this lagged covariance for the historical Alabama data and model-simulated historical-like data is shown in Figure 18. It is seen that for short time lags the temporal persistence of rainfall intensity in the model-simulated data is stronger than in the historical data. A main reason for this disagreement is likely to be the model assumption that rain is deposited by a time-invariant structure of rain patches moving over the monitoring sites. Possible extensions of the patch model with limited rain patch lifetime might bring the model temporal persistence more into line with what was observed. (See Cox and Isham [1988] for discussions about the relative influences of the speed of storm movement and the speed of rain patch death on spatial and temporal covariances of rainfall intensity.) In both Figures 17 and 18, the comparison at large time lags should be taken with caution, because the between-storm dry durations were not generated based on any model.

Figure 18.

Time-lagged covariance of individual wet hour rainfall intensities at a monitoring site for the historical Alabama data (solid curve) and simulated historical-like data (curve with circles). See definition in equation (16).

5.4. Spatial Coverage and Correlation of Rain at a Fixed Time

[65] In this section, we discuss two observable spatial properties of the rain field that can be calculated analytically from the Boolean model. The model-calculated spatial properties are compared with their empirical values estimated from the historical Alabama data.

[66] The first property concerns the spatial coverage of rain patches. Let p denote the expected fraction of the ground covered by (possibly overlapping) rain patches at a moment in time during a storm. Then [Stoyan et al., 1995, p. 67]

equation image

where πE[R2] is the expected area of a circular rain patch. This spatial coverage calculated according to the Boolean model estimated from the Alabama data is 22%.

[67] Since this spatial coverage is equal to the linear coverage on random transects [Baddeley and Vedel Jensen, 2005, p. 11], we can estimate p from historical time series data at fixed monitoring sites, which are the equivalent of linear transects of the spatial rain fields. In estimating p from the historical data, we used the durations of dry and wet periods in each storm at each station as a substitute for clump and gaps sizes, therefore partly avoided relying on the estimated storm speeds. The stormwise and stationwise empirical values demonstrate considerable variation, with a median value of 24%. Part of the observed variability derives from the small number of recorded dry and wet periods during each storm.

[68] The second spatial property concerns contemporaneous rain status correlations between two locations in the rain field. Let Cs(r) be the probability that two locations separated by distance r are simultaneously raining during a storm. This probability can be calculated from the Boolean patch model as a function of the spatial separation of the two locations [Stoyan et al., 1995, p. 83]:

equation image

where FR(x) is the distribution function of patch radii, λR is the Poisson patch density, and p is the spatial coverage of rain patches. At the two extremes, Cs(0) = p and Cs() = p2. Using the fact that the mean and variance of the point rain status (0 or 1) are p and p(1−p), respectively, we easily derive the rain status correlation from Cs(r):

equation image

[69] We obtained empirical values of C(r) for 28 spatial lags between the 8 rain stations in the Alabama data. The fraction of common rainy time for each pair of stations during each storm was calculated. Part of the results of storm velocity estimation was used in this calculation to determine the period in which both stations of the pair were within the storm field. The empirical, stormwise value of p was used in estimating empirical values of C(r). The empirical values, averaged across storms for each spatial lag, and model theoretic values of C(r) are compared in Figure 19. The comparison is limited by the small number of empirical values; a few pairs of stations that are closer to each other would provide more revealing comparisons in the steeper section of the curve. Figure 19 shows generally good agreement between the model and the data. However, the empirical values are affected by the storm velocity estimation, and section 5.1 has only established that storm velocity estimation works if the storms satisfy the model assumptions.

Figure 19.

Correlation of contemporaneous rain status as a function of spatial distance. The solid curve is the theoretical value, by equation (18), for the model estimated from the Alabama data. The circles are empirical values estimated from the Alabama data.

5.5. Effect of Temporal Data Aggregation

[70] It is useful for the storm rainfall model to be able to capture statistical properties of rainfall at different time aggregation levels such as multiminute or multihour levels. Since storm frequency is not part of this study, we do not examine aggregations for longer periods such as days or months. As an exercise, we aggregated the hourly historical data and simulated historical-like data to the 4-hour aggregation level in order to compare selected properties of model simulations against the data.

[71] Figure 20 is a quantile-quantile plot comparing the distributions of wet spell (i.e., consecutive wet aggregation units) lengths in historical and simulated historical-like data. Because of the limited number of quantile values, which are all integer multiples of four hours, multiple data symbols on this plot could overlap and appear as only one. Subsequently, a genuine single symbol and an apparent single symbol would carry the same weight to the viewer and convey misleading messages. To mitigate this problem, we added small random noises to the spell lengths before retrieving their quantiles. One can see that the simulated wet spells are slightly more concentrated on intermediate values. Figure 21 shows the comparison of rain status time-lagged persistence defined in (15). There is good agreement between the results from the model-simulated historical-like data and the historical Alabama data up to time lags comparable to storm durations.

Figure 20.

Quantile-quantile plot of the durations of wet periods in historical and simulated historical-like data, both aggregated to the 4-hour aggregation level. (Both axes are on log scale.)

Figure 21.

Time-lagged persistence of rain status at a monitoring site for the historical Alabama data (solid curve) and simulated historical-like data (curve with circles). Both data sets have been converted to 4-hour aggregated level. See definition in equation (15).

6. Concluding Remarks

[72] We have introduced a regional stochastic spatiotemporal model for surface rain patterns during a storm. The storm model is built from a moving Boolean field of rain-generating patches, each of a random size, location, and rainfall intensity. A storm moves at a constant speed in a fixed direction. Storms differ because of differences in the realized randomness of their size, motion, component patches, and patch rainfall intensities.

[73] We emphasize simplicity of the model structure and intuitive connections between the modeled spatial process and temporal observations at individual rain gauges. Compared with previously published models, the proposed model does not explicitly incorporate clustering mechanisms or time-dependent rain patch characteristics. A two-layer clustered point process has been used elsewhere to reproduce rainfall characteristics at a range of timescales and aggregation levels [Rodriguez-Iturbe et al., 1987]. Our limited tests of the model described here support scalability of the model (section 5.5), at least in the context in which it was applied.

[74] Model assessment has indicated that a potential generalization of the model would allow for time-dependent evolution of rain patches. The first step in this direction could allow a rain patch to have time-invariant properties during a limited, random patch life. This is similar to Cox and Isham [1988]. Another extension involves a richer parameterization of the intensity profile for rain patches. As a generalization of the conic shape adopted here, we may take the intensity to be linearly decreasing from the patch center, reaching at the patch border a fixed fraction of the peak value. Such a parameterization will include conic patches and cylindrical patches as special cases, and will require only small changes to relations (13) and (14). Since the intensity profile enters the model estimation only through the numerical construction of patch rainfall intensities (section 4.4)], it is possible to incorporate more complex profiles [Capsoni et al., 1987; Kawamura et al., 1997; Luyckx et al., 1998; Willems, 2001; von Hardenberg et al., 2003].

[75] A main part of this work is model estimation, which has been integral to the model design. Our bottom line is to use conventional historical rain gauge data to estimate model parameters. The estimation procedure we proposed consists of three components: storm velocity and size, geometric aspects of the storm rain field (patch size distribution and patch density), and rainfall intensities (distribution of patch rainfall intensity). The latter two are encompassed in the Boolean model framework. The procedure for estimating geometric aspects of the model is tightly integrated mathematically. Variations or extensions of the basic Boolean framework will affect strategies for estimation of model parameters.

[76] We use calculated velocities for individual storms based on the available rain gauge network data. The method involves certain approximations that may be reasonable only where the spatial extent of the gauge network does not exceed the typical storm size. The bias in the storm speed estimation, caused mainly by a nonlinear expression used in the least squares estimator, needs further investigations. However, estimation of patch characteristics in the Boolean model is separated from the estimation of storm size and velocity.

[77] We made minimal assumptions regarding distributional shapes for random components of the model that are not directly observable, allowing the historical data to nonparametrically contain these distributions, such as the joint patch size and patch intensity distribution. On the other hand, the basic Boolean model does imply some distributional properties that are calculable, such as those considered by Cox and Isham [1988], for example, the exponential distribution of gaps.

[78] The model estimation strategy contains some necessary or convenient heuristics such as the minimum patch radius, r0, which is introduced for numerical stability. However, we did see encouraging results in the model validation probes described in section 5. Simulations further indicate that the model fitting strategy appears reliable in the context of the underlying Boolean patch model for regional historical data sets of the size and scope of the illustrative Alabama regional monitoring network. There is reasonable agreement between simulated and historical data on rainfall intensity distributions and extremes. The degree of temporal persistence in single-site rainfall observations is reproduced and, to a lesser extent, so is the spatial persistence at a fixed point in time.

[79] Our storm event model needs to be combined with a storm frequency model in order to simulate long-term rainfall series. In such simulations seasonal variations of rain storm characteristics need to be considered. We may incorporate seasonality by fitting the model separately for different seasons and smoothly varying the model parameters in an annual cycle [Stern and Coe, 1984]. In a related treatment, the model parameters can be made specific to storm types.

[80] This model is relevant for applications because of its mathematically tractable structure and the ease with which continuous spatiotemporal rainfall scenarios can be simulated. Simulated scenarios may be used to drive rainfall-sensitive process simulations in studies of environmental problems, ecological dynamics and hydrology. In other applications, simulations allow quantitative inquiries into space-time statistical properties of rainfall that are not readily obtainable from relatively short records, such as temporal and spatial structures, and frequency of extreme events. Of fundamental relevance to both types of applications is the ability of the stochastic model of the kind considered here to generate multiple realistic rainfall scenarios that imitate the statistics of historical network monitoring data. In its current, quite basic form our regional rain storm modeling approach is likely to be most useful in situations comparable to the illustration that we used, i.e., where (1) the region of interest is small relative to the spatial extent of a typical storm, and statistical rainfall characteristics are approximately homogeneous across the region, (2) storms are relatively stable structures seen on the regional scale of the monitoring network, moving across the region without large change in direction or speed, and (3) the time resolution of the historical data used for model estimation is not much coarser than the typical duration of continuous rain periods.


[81] The authors thank David Freyberg, Mark Jacobsin, and Keith Loague for helpful comments. They also thank the Associate Editor and the referees for their many comments that helped improve the manuscript. This paper was finished when Z. Zhang was on the staff of the Center for Integrating Statistical and Environmental Science at the University of Chicago. Although the research described in this article has been funded wholly or in part by the U.S. Environmental Protection Agency through STAR cooperative agreement R-82940201-0 to the University of Chicago, it has not been subjected to the agency's required peer and policy review and therefore does not necessarily reflect the views of the agency, and no official endorsement should be inferred.