Scale invariance of regional wet and dry durations of rain fields: A diagnostic study

Authors


Abstract

[1] Most of the recent work on rainfall data analyses and modeling has focused on either spatial or temporal variability. In this paper the structure of rainfall intermittence in space and time is investigated. Using a series of TOGA-COARE radar scans converted to maps of pixel rain rate over a tropical oceanic region of size 240 × 240 km2, regionally scale-invariant behavior of the probability distributions of wet and dry epoch durations is explored. Durations of wet and dry epochs are estimated by lengths of wet and dry spells, respectively, in time series of spatially averaged rain rate over sampled square subregions of spatial scales ranging from 120 km to 2 km. The investigation is based on sample quantiles and sample moments of the underlying marginal probability distributions, focusing on the behavior of their tails and their variation with respect to spatial scale. We find that sample tail quantiles and sample moments of wet durations exhibit power law multiscaling, while sample tail quantiles and sample moments of dry durations exhibit exponential multiscaling across the above range of scales. These findings provide new statistical diagnostic tools for validation of spatiotemporal models for rain fields.

1. Introduction

[2] This article is concerned with rainfall intermittence in space and time. Rainfall intermittence is defined as the alternation from zero to positive rain rates and vice versa. Several studies have investigated the modeling of rainfall intermittence. They include studies in space at fixed times [Gupta and Waymire, 1993; Over and Gupta, 1994; Onof et al., 1998], and in time at fixed spatial locations [Schmitt et al., 1998]. Recently, Pavlopoulos and Gritsis [1999] have investigated space-time intermittence of rainfall. Our objective here is to develop a new model-free methodology for testing statistical scale invariance of rainfall intermittence in space and time. We illustrate this methodology on data from TOGA-COARE over the tropical Pacific [Short et al., 1997].

[3] Space-time rainfall covers multiple space and timescales. In order to investigate empirical features across multiple scales, the concepts of scale invariance and scale dependence have emerged as fundamental features of multiscale hydrologic processes [Sposito, 1998]. Scale invariance was introduced into studies of rain fields nearly two decades ago [Lovejoy and Mandelbrot, 1985; Lovejoy and Schertzer, 1985; Schertzer and Lovejoy, 1987; Waymire, 1985]. It can be used to exploit information obtained at any one scale for making inferences at another scale within the range of scales where the invariance is valid. For example, scale-invariant stochastic models can be applied at multiple scales by simply rescaling the model parameters appropriately. Some applications of scale-invariant stochastic models include subgrid-scale downscaling of spatial rainfall in climate models [Foufoula-Georgiou, 1998], and downscaling of spatial rainfall on river networks to study scale invariance of peak river flows [Gupta et al., 1996; Gupta and Waymire, 1998a, 1998b; Menabde and Sivapalan, 2001; Troutman and Over, 2001].

[4] In this study, we define regional processes of wet and dry epoch durations at several spatial scales, derived from time series records of regional rainfall at these scales. Given a subregion A of a region of study S (i.e., AS), let an intermittent temporal random process of spatially averaged rain rate over A be denoted by {RA(t); t ≥ 0}. This process can be partitioned into its wet epochs, or spells of positive values, and dry epochs or spells of zeroes, whose lengths are assumed to be sample values from random variables WA and DA respectively. Thus, by varying the sampled subregion AS, one naturally obtains a regional random process of wet epoch duration, {WA; AS}, and another regional random process of dry epoch duration, {DA; AS}. Wet and dry epoch durations can also be recovered from the temporal evolution of the overall fractional wet area of A.

[5] We test for spatial homogeneity at each scale to investigate the nature of probability distributions of regional processes of duration. Then, formulae for tail quantiles are derived in terms of scale and probability level, based entirely on empirical statistics. These formulae are used to examine the form of tail probabilities. They provide a mathematical basis for investigating what type of scale invariance, if any, holds in wet and dry epoch durations. Moreover, tail probabilities allow us to examine the issue of finiteness of moments at each spatial scale, and to estimate bounds of orders above which moments do not exist. This is a key issue, because finiteness of moments is very sensitive to extreme observations, which are represented by the tails of probability distributions.

[6] A formal definition of stochastic scale invariance or stochastic scaling is given in section 2. Its implications on the scaling behavior of quantiles and on the existence of moments is discussed there. A part of TOGA-COARE data used in this study is described in section 3. Nonparametric tests to diagnose homogeneity assumptions in space and time are implemented in section 4. The core of statistical analyses and results of our study are presented in section 5. A summary of the key results is given in section 6, and potential implications of this work for modeling, and for estimation of rainfall from multiple sensors, are briefly discussed there.

2. Probabilistic Framework for Regional-Scale Invariance

[7] Let {XA; AS} be a stochastic process parameterized by subregions AS of a fixed region S, hence referred to as regional process. The duration processes {WA; AS} and {DA; AS} defined earlier, and the process {RA(t); AS} of instantaneous spatial averages of rain rate at any fixed time t ≥ 0, are examples of regional processes. We define scale invariance in terms of marginal probability distributions of the regional process.

2.1. Definition of Regional Scale Invariance

[8] Let λ denote a dimensionless index of spatial scale, taking values in a subset Λ of the interval (0, 1], such that 1 ∈ Λ. The process {XA; AS} is said to be stochastically scale-invariant or stochastically scaling, if and only if there is a scalar process {Cλ}λ∈Λ, such that P(C1 = 1) = 1, and P(Cλ > 0) = 1 for every λ ∈ Λ, and Xλ·Aequation imageCλ·XA, for every AS and every λ ∈ Λ. Here equation image denotes equality of probability distribution functions, so that P(Xλ·Au) = P(Cλ · XAu), for every uequation image. That is, the probability distribution of Xλ·A on the scaled subregion λ · A = {λ · xxA} ⊆ AS, coincides with the probability distribution of the rescaled random variable Cλ · XA on the initial subregion A.

[9] Mathematically speaking, the set Λ may be discrete (finite or countable), or a subinterval, or a more complex subset of the interval (0, 1], quantifying a range of scales where scale invariance holds. In any case, the scaled subregions {λ · A}λ∈Λ clearly form a system of sequentially nested subregions inside a given archetype subregion A. A value of λ ∈ (0, 1] is interpreted as the ratio of diameters (with respect to a given metric in S) of λ · A versus A.

[10] Due to resolution limitations of probing instruments, regional data can be obtained only in a finite range of scales. This forces a positive lower bound on the values of λ, excluding the possibility of λ → 0+ or the possibility of probing a continuum of physical scales. The value λ = 1 corresponds to the largest scale from which regional data are used, hereafter called scale of reference. The shapes of regions at the available scales are also very limited, either due to instrument limitations or due to the sampling design implemented in a given probe. Usually, at the smallest available scale corresponding to single pixels, this shape is square. Limitations of this kind, and some additional ones presented in section 3, lead us to pool duration data from several square subregions at each of the seven scales considered in this study. Since nesting is violated by subregions from which data are pooled, we introduce homogeneity assumptions and modify notation accordingly.

2.2. Assumptions and Notation

[11] Given any subregion AS, it is assumed that probability distributions of wet and dry epoch durations on A remain stationary in the course of time. That is, wet durations on A are identically distributed (but not necessarily independent), and dry durations on A are also identically distributed, but the common distribution of dry durations can differ from that of wet durations. These assumptions of stationarity are hereafter referred to as temporal homogeneity of the regional processes of wet and dry durations, and justify the time-independent notation WA and DA given in the Introduction.

[12] Moreover, given any subregion AS, it is assumed that the probability distribution of wet epoch duration remains invariant on any translated (stationarity) or rotated (isotropy) copy of A within S. The same assumption is made for dry epoch duration. This assumption is hereafter referred to as spatial homogeneity of the regional processes of wet and dry durations.

[13] These homogeneity assumptions are nonparametrically tested in section 4. A consequence of these assumptions is that the probability distributions of WA and DA depend only on the shape and size of the subregion A, and are independent of the exact location or orientation of A inside the region of study S. Therefore, λ · A in the definition of regional scale invariance need not be strictly the set {λ · xxA} nested inside A. Instead, λ · A may denote any translated or rotated copy of the set {λ · xxA} inside S. This generalized interpretation of the notation λ · A, combined with the postulated homogeneity assumptions, facilitates the statistical investigation of regional scale invariance of wet and dry durations in section 5.

[14] The probability distribution functions of Wλ·A and Dλ·A are denoted by Fλ·A(w)(u) = P(Wλ·Au) and Fλ·A(d)(u) = P(Dλ·Au) respectively. For brevity, when a statement or argument refers without distinction to both wet and dry durations, the superscripts (w) and (d) are suppressed from the notation. From a physical standpoint, it is realistic to assume that distribution functions are continuous at every uequation image, and strictly increasing over their support (0, + ∞). From a mathematical standpoint, this assumption guarantees that the corresponding quantile function Qλ·A(p): = inf{uequation imageFλ·A(u) ≥ p}, defined for p ∈ (0, 1), is also continuous and strictly increasing, thus inverse function of the corresponding distribution function Fλ·A [Karr, 1993, p. 63].

[15] Moments of nth order are defined by Mλ·A(n) = equation imageundFλ·A(u), and exist only when the integral converges. In that case all moments Mλ·A(k) of order 0 < kn also exist. Moments can be calculated from the quantile function Qλ·A(p) via the integral

equation image

obtained by the change of variable u = Qλ·A(p), for p ∈ (0,1), whence Fλ·A(u) = Fλ·A(Qλ·A(p)) = p.

2.3. Examples of Scale Invariance

[16] Scale invariance of a regional process {XA; AS} does not characterize its marginal probability distributions. It provides merely a link between the probability distributions of XA and Xλ·A. The key to this link is the stochastic dependence between the scalar variable Cλ and XA, which in turn forces dependence among the variables of the regional process when the region A is down-scaled. This dependence increases the complexity of the process, in addition to dependencies associated with translations or rotations of the region A. The spatial homogeneity assumption postulated earlier refers only to stationarity with respect to translations and rotations at any given scale, and not to dependencies induced by translations, rotations, or down-scaling of subregions. Under additional assumptions, scale invariance yields scaling relationships among quantiles, or among (existing) moments. We illustrate examples of such scaling relationships in two specific cases of scale invariance.

2.3.1. Simple Scaling

[17] Let {XA; AS} be a scale-invariant process, such that P(Cλ = λθ) = 1 for some fixed θ ∈ equation image. That is, Xλ·Aequation image λθ · XA, which is the simplest case of scale invariance since the scalar process is degenerate (i.e., nonrandom). This is known as simple scaling or stochastic self-similarity, with scaling exponent θ [Lamperti, 1962]. Then, Fλ·A(u) = FA−θ · u), and for u = Qλ·A(p) one obtains p = Fλ·A(Qλ·A(p)) = FA−θ · Qλ·A(p)), whence

equation image

If FA admits finite moments up to order n ≥ 1, then Fλ·A does too, and they are obtained by scaling the moments of FA according to

equation image

Even in this simplest case of scale invariance, it is not possible to obtain explicit formulae for quantiles Qλ·A(p) in terms of λ and p, or of moments Mλ·A(k) in terms of λ and k, unless FA(u) or QA(p) are explicitly known.

[18] An example of a physical process where simple scaling has been statistically diagnosed in terms of (2) is regional annual peak flow due to snowmelt [Gupta and Dawdy, 1995]. A brief review of other examples of physical and mathematical processes with simple scaling is given by Gupta and Waymire [1990]. However, Kedem and Chiu [1987] showed that rain rate processes cannot be simple scaling, because of intermittence.

2.3.2. Power Law Multiscaling

[19] Let {XA; AS} be a scale-invariant process of regional aggregates over subregions AS, with respect to a random measure distributed over the region S. Assuming that the corresponding scalar (positive) process {Cλ;λ ∈ (0,1]} is stochastically independent of the process {XA; AS}, then for arbitrary θ ∈ equation image the C-process can be represented by

equation image

where {Zρ; ρ ≥ 0} is a process such that P(Z0 = 0) = 1 (i.e., initiates at zero), and has stationary (but not necessarily independent) increments in the sense that equation image, for every ρ1 ≥ 0 and ρ2 ≥ 0. That is, the scalar process is an arbitrary power law of the scale λ (i.e.with arbitrary exponent θ ∈ equation image), randomized by the factor exp {Zln(1/λ)}. Simple scaling can be viewed as a special case in this setting, taking θ equal to the fixed value of the scaling exponent, and P(Zρ = 0) = 1 for every ρ ≥ 0. Introducing a new term, hereafter we refer to the above notion of (marginal) scale invariance as power law multiscaling of the regional process {XA; AS}. However, the notion of scale invariance referred to as power law multiscaling has been explained by Gupta and Waymire [1990, section 4], along with the derivation of the representation (4), and several mathematical examples of processes exhibiting this type of scale invariance.

[20] Due to (4) scale invariance yields Xλ·Aequation image λθ · exp{Zln (1/λ)} · XA. Since Cλ and XA are independent, if moments of order k > 0 exist, then Mλ·A (k) = λS(k) · MA (k), or equivalently

equation image

where S(k) = θ · k + lnΞ(k)/ln λ is a concave function of k, since Ξ (k) = E(exp {k · Zln(1/λ)}) is convex as moment generating function of Zln(1/λ). Equation (5) illustrates a scaling relationship among moments of marginal distributions of a power law multiscaling process. Gupta and Waymire [1990] defined wide sense multiscaling as log-log linearity between moments and scale parameter according to (5), and showed statistical evidence identifying this as a common property shared by spatial averages of rain rate and river flows.

[21] Determination of an analogous to (5) log-log linear relationship between quantiles and scale parameter, even in the case of a power law multiscaling process, is a far more difficult issue which we shall not address here. However, Gupta et al. [1994] showed that log-log linearity between quantiles of regional peak flows (due to rainfall) and spatial scale of drainage basin holds only approximately, with slopes decreasing as the probability of exceedance decreases or as the return period increases.

[22] A well-studied class of spatial processes exhibiting properties of power law multiscaling (and consequently wide sense multiscaling too), is known as random cascades. A large and still growing body of literature has emerged on the theory and applications of cascades [Mandelbrot, 1974; Kahane and Payriere, 1976; Holley and Waymire, 1992; Gupta and Waymire, 1993; Tessier et al., 1993; Over and Gupta, 1994; Marshak et al., 1994; Lovejoy and Schertzer, 1995; Waymire and Williams, 1995, 1996; Over and Gupta, 1996; Menabde et al., 1997a, 1997b; Troutman and Vecchia, 1999; Ossiander and Waymire, 2000]. However, familiarity with cascade processes is not required for understanding the remainder of this article, thus we omit their introduction.

3. Formulation of Pooled Working Data

[23] The raw data used in this study is a time series of maps of radar reflectivity measurements obtained during the Tropical Ocean Global Atmosphere (TOGA) Coupled Ocean-Atmosphere Response Experiment (COARE) by ship-borne Doppler precipitation radar (MIT). Each scan covers a fixed oceanic region S of area 240 × 240 km2 in the tropical Pacific, with temporal resolution of approximately 20 min between successive scans. Reflectivity measurements Z from each scan, binned over square pixels of area 2 × 2 km2, were converted to instantaneous rain rate R by the Z-R relationship R = (Z/230)1/1.25. These scans correspond to cruise 1 (10 November through 9 December 1992), consisting of 1992 scans, and to the early part of cruise 2 (21–29 December 1992), consisting of 617 scans. Figure 1 depicts the instantaneous rain rate field retrieved from a typical scan. Detailed information on TOGA-COARE is given by Short et al. [1997].

Figure 1.

Instantaneous rain rate field retrieved from a typical radar scan over the region of study.

[24] Preliminary exploration of time series of rain rate regional averages at the range of scales between 240 km and 120 km showed very little intermittence. Therefore, scales chosen for our analysis range discretely from 120 km (scale of reference with λ = 1) down to 2 km (pixel scale with λ = 1/60), following the rule of half (approximately). This amounts to a total of seven scales (120 km, 60 km, 30 km, 16 km, 8 km, 4 km, 2 km), so that λ = 1,1/2, 1/4, 2/15, 1/15, 1/30,1/60 respectively.

[25] A fixed symmetric spatial design is used for sampling square subregions in each scale. Figure 2 depicts a fixed sample of five regions, each of area 120 × 120 km2. Subregions from each of the remaining six scales are sampled from each subregion of Figure 2, according to five different designs. These designs are labeled as ‘x-y’, where ‘x’ takes quadrant values CR, NW, NE, SW, SE, as depicted in Figure 2, and ‘y’ takes zoom-in values CR, NW, NE, SW, SE, as depicted in Figure 3. This design samples 25 systems of nested square subregions, covering densely the entire region S at all seven scales. Due to symmetry, this design minimizes overlaps between subregions of the same scale.

Figure 2.

Spatial sampling of five square subregions (NW, NE, SW, SE, CR) at the scale of reference (120 km or λ = 1) from the 240 × 240 km2 square region of study.

Figure 3.

Five designs for spatial sampling of systems of six nested square subregions, corresponding to scales λ ∈ {1/2, 1/4, 2/15, 1/15, 1/30, 1/60}, from each one of the five subregions at reference scale (λ = 1) depicted in Figure 2. Dots imply the nondepicted smaller scales.

[26] Time series of spatially averaged rain rate on each of the 155 sampled subregions (5 × 1 + 25 × 6) were obtained from cruise 1 and cruise 2. Spells of zeros and spells of positive values were identified as dry and wet epochs respectively. The integer-valued lengths of these spells, multiplied by 1/3, provide “quantized” working data of dry and wet durations in units of hours (hr). These estimates of duration presume no temporal intermittence during any 20 min sampling interval. This may be unrealistic over regions of small size, and a source of positive bias. Wet or dry spells bordering with a few (2.6%) missing values in a time series, were discarded, since they are another source of ambiguity for the true length of dry and wet spells. Figure 4 depicts time series from cruise 1.

Figure 4.

Time series of spatially averaged rain rate over the seven nested subregions sampled by the NE-CR design. Sample sizes of wet and dry spells are given for each time series (e.g., the 60 km NE-CR subregion yields 88 dry and 75 wet spells), after discounting spells of ambiguous length due to missing values. Discounted spells are depicted as unit gaps (i.e., 20 min.) in each time series, modifying its length from 1992 to 2040.

[27] Quantization of working data of durations, combined with their extremely high skewness, makes scaling analysis of sample quantiles impossible for probability levels p < 0.8 [Pavlopoulos and Gupta, 2001]. On the other end, scaling analysis of sample tail quantiles for p ≥ 0.8 relies heavily on extreme values of working data in the top 20% range. Since only very few extreme values are available in that range, especially on large scales, sample tail quantile estimates can be even more biased, in addition to bias due to the quantized nature of the data.

[28] To suppress some of this bias, we decided to pool working data from both cruises on each sampled subregion, retaining chronological sequence. This step is referred to as “temporal pooling”. Then, from all subregions of the same scale we pooled the temporally pooled data, and we refer to this step as “spatial pooling”. The final product of this procedure is a set of 14 samples (7 wet and 7 dry) of spatiotemporally pooled working data.

[29] Figure 5 depicts box plot summaries of these 14 samples. Quantile estimates qλ(p) obtained from these samples are tabulated in Table 1, from which only those corresponding to p ≥ 0.8 are used for the scaling analysis carried out in section 5. Note that spatial pooling of data from square subregions of the same scale allows suppression of A from the index notation λ · A.

Figure 5.

Box plot summary statistics of dry (first row) and wet (last row) durations at all scales after spatiotemporal pooling of data from all designs. Sample sizes are denoted by N under each box plot. Upper and lower sides of each box indicate upper and lower quartiles, the inner white line indicates median (middle quartile), and lines outside each box indicate extreme values in the upper quartile range of the underlying probability distribution.

Table 1. Sample Quantiles of Spatiotemporally Pooled Working Data of Dry and Wet Durations at All Spatial Scales, in Units of 20 Minute Intervals
Dry Duration QuantilesWet Duration Quantiles
p120 km60 km30 km16 km8 km4 km2 kmp120 km60 km30 km16 km8 km4 km2 km
0.0511111110.051111111
0.1511111110.151111111
0.2511111110.251111111
0.3511122220.351211111
0.4511222220.451222222
0.5522233330.552333222
0.6523344540.652.6544333
0.7534466760.7541065444
0.834678880.89.61386554
0.8253568910100.82514.41597665
0.853.46791011110.8519.218118766
0.875468111212130.87524211310877
0.947.210121414150.9272316111098
0.9254912161717180.92532.8291813121110
0.954111519.72122230.9544.2382316151412
0.97541421262932320.97557.253312423.452321
0.9854.681727343638.74380.98562.7260.43831323026
0.9955.5624.6133.5649.2852.0659.9162.060.99568.2483.463.445248.1840.5234

4. Testing Homogeneity Assumptions

[30] The assumptions of temporal and spatial homogeneity postulated in section 2, are instrumental for the use of spatiotemporally pooled data in investigating scaling relationships of quantiles or moments of regional processes of duration. If valid, then spatiotemporally pooled samples can replace individual samples from subregions of the same scale, with the advantages pointed out in section 3, and without distortion of the parent distribution. What may change in the passage from individual to spatiotemporally pooled samples, is the dependence within a sample. This section is a brief account on statistical evidence regarding these assumptions, based on nonparametric tests.

4.1. Temporal Randomness

[31] The nonparametric test of runs above and below the median was applied to temporally pooled data on every individual subregion sampled at each scale. Details on this procedure are given by Pavlopoulos and Gritsis [1999]. It tests not only homogeneity (i.e., temporal stationarity) of the parent probability distribution, but the stronger null hypothesis that the data constitute a random sample of observations from i.i.d. random variables. The distributions of P values obtained at each scale yield that, in at least 90% of the sampled subregions the hypothesis of randomness is acceptable, with confidence up to 99%, for both wet and dry durations [Pavlopoulos and Gupta, 2001].

4.2. Spatial Homogeneity

[32] The objective is to test homogeneity among temporally pooled samples on subregions of the same scale. Since such samples are spatially dependent, our approach is to test pairwise homogeneity among samples which are pairwise independent. The key idea in our approach is that, evidence of pairwise homogeneity between pairwise independent samples does imply homogeneity, although these samples need not be mutually independent. For example, some combination of three or more of these pairwise independent samples, may be dependent.

[33] We use χ2 test, Kendall's τ test, and Spearman's rank-correlation test for pairwise independence. Consequently, if independence between samples from a given pair of subregions of the same scale is not rejected, we proceed to test homogeneity. We use χ2 test, Kolmogorov-Smirnov, Wilcoxon-Mann-Whitney, and Kruskal-Wallis tests for pairwise homogeneity. Descriptions of all these nonparametric tests are given in standard statistical books [Bickel and Doksum, 1977; Lehmann, 1986; Rao, 1973; Rohatgi, 1976].

[34] We applied this procedure only to the 10 possible pairs among the 5 sampling designs ‘x-CR’. Table 2 tabulates P values obtained from the NW-CR and NE-CR pair, while P values from the other nine pairs are quite similar. P values from each of the 49 tests, in all 10 pairs of designs, are uniformly higher than the Bonferroni adjusted level of significance 0.05/10 = 0.005, rendering pairwise independence and homogeneity plausible at the 0.05 level of significance.

Table 2. P Values of Tests of (Pairwise) Independence and Homogeneity Between the Designs NW-CR and NE-CR for Wet and Dry Durations, at Each Spatial Scale
Spatial ScaleIndependence TestsHomogeneity Tests
χ2KendallSpearmanχ2Kolm-SmWilcoxonKruskal
P Values of Pairwise Tests on Samples of Wet Duration
120 km0.90560.99900.98110.02870.11520.15440.1438
60 km0.12700.55110.45550.46470.84770.37310.3721
30 km0.41730.17580.16500.47570.73910.69620.6956
16 km0.17460.71690.71620.25160.53360.22160.2214
8 km0.80420.40310.36090.60170.99480.63580.6354
4 km0.54210.34290.30220.67710.99580.92070.9200
2 km0.18030.05320.03270.28060.54270.57350.5727
 
P Values of Pairwise Tests on Samples of Dry Duration
120 km0.86430.14470.10900.41550.85240.37710.3689
60 km0.60760.64340.58880.91520.67420.45460.4538
30 km0.18950.41490.36610.32140.81490.29280.2925
16 km0.13720.40370.37980.91800.99910.93830.9378
8 km0.07350.15760.15360.41130.63770.37800.3776
4 km0.23700.20860.18270.70850.81350.56900.5682
2 km0.55300.56920.53530.64520.60660.33670.3360

5. Scaling Analysis and Results

[35] In this section we investigate scaling relationships for tail quantiles and moments of wet and dry durations. Let qλ(p) be sample estimates of tail quantiles Qλ(p) at probability level p ≥ 0.8 and scale λ (Table 1). Linear regressions of ln {qλ(p)/q1(p)} on ln λ for wet, and on λ for dry, followed by regressions of ln {qλ(p)} on ln(1 − p), suggest closed form expressions of Qλ(p) in terms of p and λ. These formulae enable us to infer hyperbolic behavior of tail probabilities, and subsequently to estimate bounds of orders above which moments do not exist. Sample moment estimates mλ(k) of finite moments Mλ(k) are used for investigation of scaling of moments by regressions of ln {mλ(k)/m1(k)} on ln λ for wet, and on λ for dry durations. All linear regressions are performed under the statistical setting of ordinary least squares (OLS), assuming uncorrelated homoscedastic normal errors of mean zero. The hyperbolic behavior of tail probabilities of dry durations enables us to obtain a relationship between fractal dimensions of wet residence sets [Schmitt et al., 1998] at different scales.

[36] Regarding linear regressions, a more appropriate setting would be generalized weighted least squares (GWLS), accounting for possible correlations and heteroscedasticity of errors. This is an extremely difficult issue, remaining an open problem due to lack of information mentioned above. The only case known to us, where rigorous statistics have been obtained for the estimates of exponents in power law multiscaling relationships of moments, via GLS log linear regression, is that of discrete multiplicative cascades [Troutman and Vecchia, 1999]. A GLS setting was feasible in that case, because the spatial dependence due to nesting is completely specified by the generator of the cascade.

5.1. Tail Quantiles and Moments of Wet Durations

[37] Statistical information regarding linear regressions of ln {qλ(w)(p)/q1(w)(p)} on ln λ for p ≥ 0.8 is given in Table 3. Correlation coefficients are high, slopes are quite significant, and intercept terms are insignificant, as inferred from P values of corresponding t tests. A conclusion from this set of regressions is that wet tail quantiles fit to a scaling formula

equation image

hereafter referred to as power law multiscaling of tail quantiles.

Table 3. Summary Statistics From Linear Regression of ln {qλ(w)(p)/q1(w)(p)} Versus lnλ for Probability Levels p ≥ 0.8a
Probability Level pCorrelation CoefficientRegression Slope Θ(p)Regression InterceptResidual's Standard Error
  • a

    P values from t tests of significance (difference from zero) for the slope and intercept of each regression accompany in parentheses in Tables 345678.

0.80.92120.2625 (0.0032)0.1749 (0.2096)0.1781
0.8250.94960.2839 (0.0010)0.0143 (0.8947)0.1507
0.850.95510.3220 (0.0008)−0.0325 (0.7780)0.1605
0.8750.96270.3347 (0.0005)−0.0568 (0.6052)0.1512
0.90.96850.3145 (0.0003)−0.0560 (0.5554)0.1301
0.9250.95230.3099 (0.0009)−0.0718 (0.5386)0.1597
0.950.95200.3323 (0.0009)−0.0860 (0.4963)0.1718
0.9750.91330.2598 (0.0040)−0.0923 (0.5001)0.1863
0.9850.92610.2211 (0.0027)−0.0533 (0.6127)0.1447
0.9950.94420.2000 (0.0013)0.1642 (0.0848)0.1122

[38] Statistical information given in Table 4 from regressions of ln qλ(w)(p) on ln (1 − p) for each λ, shows correlation coefficients close to −1, with statistically significant slopes and intercepts, rendering the linear relationship

equation image

The intercept ln B(λ) = α · lnλ + β and the slope A(λ) = γ · lnλ + δ, as functions of λ, and the exponent Θ(p) = γ · ln(1 − p) + α, as function of p, satisfy both equations (6) and (7), whence by elementary algebraic manipulation we obtain

equation image

Regression of intercepts from Table 4 against ln λ gives estimates of α = 0.3652 and β = 0.8746, with correlation coefficient 0.951. Similarly, regression of slopes from Table 4 against ln λ yields estimates of γ = 0.0285 and δ = −0.5006, with correlation coefficient 0.8884. Finally, as a check, regression of slopes from Table 3 against ln (1 − p) yields exactly the same estimates of α and γ, with correlation coefficient 0.7318. All the above estimates of parameters are statistically significant.

Table 4. Summary Statistics From Linear Regression of ln qλ(w)(p) Versus ln(1 - p) at Each Scale λ
Spatial Scale λCorrelation CoefficientRegression Slope A(λ)Regression Intercept ln B (λ)Residual's Standard Error
1/60 (2 km)−0.9782−0.5865 (9 × 10−7)−0.4216 (0.1410)0.1598
1/30 (4 km)−0.9852−0.5990 (2 × 10−7)−0.3389 (0.0169)0.1336
1/15 (8 km)−0.9894−0.6153 (5 × 10−8)−0.3047 (0.0143)0.1158
2/15 (16 km)−0.9933−0.5708 (8 × 10−9)−0.0781 (0.3084)0.0851
1/4 (30 km)−0.9858−0.5424 (1 × 10−7)0.2865 (0.0210)0.1185
1/2 (60 km)−0.9764−0.5069 (1 × 10−6)0.8456 (0.0001)0.1439
1 (120 km)−0.9028−0.4906 (3 × 10−4)0.9122 (0.0069)0.2999

[39] Equation (8) yields P(Wλ > Qλ(w)(p)) = 1 − p = [e−(α·ln λ+β) · Qλ(w)(p)]1/(γ·ln λ+δ), as p ↗ 1, whence by setting u = Qλ(w)(p) we obtain

equation image

showing that tail probabilities of wet duration are of hyperbolic type, since for the estimated values of γ = 0.0285 and δ = −0.5006 it is γ · ln λ + δ < 0.

[40] Furthermore, given p0 close enough to 1, so that (8) holds for p ∈ (p0, 1),then (1) implies Mλ(w)(n) > equation image [Qλ(w)(p)]ndp = en(α·ln λ+β) · equation image (1 − p)n(γ·ln λ+δ)dp, and the integral is finite if and only if n < −(γ · ln λ + δ)−1. Therefore, wet duration Wλ does not possess finite moments of order −(γ · ln λ + δ)−1 or higher. However, we note that −(γ · ln λ + δ)−1 need not be a tight upper bound, because moments of some order smaller than this bound may also not exist due to possible nonintegrability of the lower tail of the distribution near zero.

[41] Using the obtained estimates of γ and δ, we calculate estimates 1.62, 1.67, 1.73, 1.79, 1.85, 1.92, 1.99 of this bound, from smallest to largest scale λ respectively. These bounds imply that, in the range of probed scales, second order moments do not exist, and therefore duration of wet epochs has infinite variance. Guided by these bounds we investigate scaling of wet duration moments of order k ∈{0.25, 0.5, 0.75, 1, 1.25, 1.50}, existing in all scales. Statistics from linear regressions of ln {mλ(w)(k)/m1(w)(k)} on ln λ are given in Table 5. Correlation coefficients are high, slopes S(k) are significant, and intercepts are insignificant, pointing to a scaling formula

equation image

referred to as power law multiscaling of moments, provided that the exponent function S(k) is nonlinear in k. As shown in Figure 6, S(k) may be linear in k, a possibility that would point to simple scaling of moments according to (3), if S(k) has no intercept term. However, linear regression of S(k) from Table 5 against k, yields statistically significant intercept (t test P value 0.0045). Thus, scaling of wet duration moments is significantly different from simple scaling in the range of probed scales (120–2 km). In any case, as seen in Figure 6, S(k) is a convex function (either linear with intercept, or nonlinear).

Figure 6.

Slopes S(k) from Table 5 plotted and linearly regressed versus moment order k.

Table 5. Summary Statistics From Linear Regression of ln{mλ(w) (k)/m1(w) (k)} Versus ln λ for Each Moment Order k
Moment Order kCorrelation CoefficientRegression Slope S(k)Regression InterceptResidual's Standard Error
0.250.90270.0314 (0.0053)0.0328 (0.1025)0.0241
0.50.95130.0839 (0.0009)0.0414 (0.2233)0.0437
0.750.96600.1554 (0.0004)0.0224 (0.6435)0.0668
10.96760.2374 (0.0003)−0.0147 (0.8371)0.0996
1.250.96610.3201 (0.0003)−0.0568 (0.5710)0.1375
1.50.96490.3965 (0.0004)−0.0947 (0.4601)0.1736

[42] We compared predictions under simple scaling and power law multiscaling of tail quantiles at all scales, based on the above analyses. The results show slight preference for multiscaling versus simple scaling. The details of this comparison are given by Pavlopoulos and Gupta [2001], supporting the finding of Pavlopoulos and Gritsis [1999] that simple scaling of wet durations appears to hold over the range of smaller scales from 10 km to 2 km.

5.2. Tail Quantiles and Moments of Dry Durations

[43] Sample tail quantiles of dry duration data do not support power law multiscaling described by (6). Instead, regressions of ln{qλ(d)(p)/q1(d)(p)} on λ, for p ≥ 0.8, strongly support linearity with correlation coefficients below −0.97, and significant intercept terms algebraically opposite to values of the corresponding slopes [Pavlopoulos and Gupta, 2001]. This structure is confirmed here by regressions of ln {qλ(d)(p)/q1(d)(p)} on λ - 1, summarized in Table 6, rendering intercepts insignificant since the significant part is accounted by the new regressor λ − 1. Thus, we are lead to a new scaling formula

equation image

hereafter referred to as exponential multiscaling of tail quantiles, different from both simple scaling and power law multiscaling of quantiles.

Table 6. Summary Statistics From Linear Regression of ln{qλ(d) (p)/q1(d) (p)} Versus λ − 1 for Probability Levels p ≥ 0.8
Probability Level pCorrelation CoefficientRegression Slope Ψ(p)Regression InterceptResidual's Standard Error
0.8−0.9780−1.0703 (1 × 10−4)−0.0830 (0.3487)0.0892
0.825−0.9815−1.2203 (8 × 10−5)−0.0586 (0.5150)0.0930
0.85−0.9884−1.1781 (2 × 10−5)−0.0286 (0.6716)0.0706
0.875−0.9794−1.2123 (1 × 10−4)−0.0822 (0.3931)0.0977
0.9−0.9964−1.3343 (1 × 10−6)−0.0345 (0.4287)0.0446
0.925−0.9974−1.5215 (6 × 10−7)0.0122 (0.7630)0.0428
0.95−0.9969−1.7490 (9 × 10−7)0.0420 (0.4230)0.0536
0.975−0.9958−2.0848 (2 × 10−6)0.0712 (0.3395)0.0749
0.985−0.9938−2.1344 (5 × 10−6)0.0859 (0.3532)0.0933
0.995−0.9927−2.3761 (8 × 10−6)0.0876 (0.4284)0.1130

[44] Regressions of ln qλ(d)(p) on ln (1 − p) at each λ give the information in Table 7, with negative correlations close to −1, and significant slopes and intercepts. This information leads to the equation

equation image

quite similar to (7) for wet tail quantiles. The functions ln B*(λ) = α* · λ + β*, A*(λ) = γ* · λ + δ*, Ψ (p) = γ* · ln(1 − p) + α* satisfy both (11) and (12), whence we obtain

equation image

Regression of intercepts from Table 7 against λ yields α* = −0.5117 and β* = 0.2327, with correlation coefficient −0.7958, while regression of slopes from Table 7 against λ yields γ* = 0.379 and δ* = −0.57, with correlation coefficient 0.9415. As a check, regression of slopes from Table 6 against ln (1 − p) yields exactly the same estimates of α* and γ*, both significant again, with correlation coefficient 0.9792.

Table 7. Summary Statistics From Linear Regression of ln qλ(d)(p) Versus ln(1 - p) at Each Scale λ
Spatial Scale λCorrelation CoefficientRegression Slope A*(λ)Regression Intercept ln B* (λ)Residual's Standard Error
1/60 (2 km)−0.9882−0.5342 (8 × 10−8)0.3149 (0.0078)0.1060
1/30 (4 km)−0.9908−0.5376 (3 × 10−8)0.2785 (0.0081)0.0943
1/15 (8 km)−0.9865−0.5125 (1 × 10−7)0.2938 (0.0127)0.1090
2/15 (16 km)−0.9851−0.5309 (2 × 10−7)0.1472 (0.1807)0.1189
1/4 (30 km)−0.9784−0.5062 (9 × 10−7)−0.0406 (0.7344)0.1371
1/2 (60 km)−0.9807−0.4704 (5 × 10−7)−0.2524 (0.0377)0.1203
1 (120 km)−0.9009−0.1401 (3 × 10−4)−0.1355 (0.1009)0.0866

[45] Following the same line of thinking presented previously for tail probabilities and moments of wet durations, we again obtain hyperbolic tail probabilities

equation image

since for the estimated values γ* = 0.379 and δ* = −0.57 it is γ* · λ + δ* < 0. Therefore, dry epoch duration Dλ does not possess finite moments of order −(γ* · λ + δ*)−1 or higher. Values of this upper bound equal 1.77, 1.79, 1.83, 1.92, 2.10, 2.62, 5.23 respectively, from smallest to larger scale λ, implying that dry durations have infinite variance for scales of 16 Km or less.

[46] Guided by these bounds, we investigate scaling of dry duration moments of order k ∈{0.25, 0.5, 0.75, 1, 1.25, 1.50}, existing in all scales. Regressions of ln {mλ(d)(k)/m1(d)(k)} on ln λ showed that power law scaling scenarios do not hold [Pavlopoulos and Gupta, 2001].

[47] Thus, guided by the results on tail quantiles, we regressed ln{mλ(d)(k)/m1(d)(k)} on λ − 1 for each k ∈ {0.25, 0.5, 0.75, 1, 1.25, 1.50}. The statistical information from these regressions is summarized in Table 8, supporting strongly the exponential multiscaling relationship of moments

equation image

Figure 7 depicts variation of H(k) versus moment order k, indicating that H(k) may be linear in k. Indeed, linear regression of H(k) against k from Table 8 yields strong evidence of linearity, but with significant intercept term (t test P value 0.0273). Nevertheless, as seen in Figure 7, H(k) is a concave function (either linear with intercept, or nonlinear).

Figure 7.

Slopes H(k) from Table 8 plotted and linearly regressed versus moment order k.

Table 8. Summary Statistics From Linear Regression of ln{mλ(d) (k)/m1(d) (k)} Versus λ − 1 for Each Moment Order k
Moment Order kCorrelation CoefficientRegression Slope H(k)Regression InterceptResidual's Standard Error
0.25−0.9852−0.1907 (5 × 10−5)−0.0112 (0.3810)0.0129
0.5−0.9884−0.4425 (2 × 10−5)−0.0225 (0.3906)0.0266
0.75−0.9913−0.7593 (1 × 10−5)−0.0325 (0.4013)0.0394
1−0.9938−1.1398 (5 × 10−6)−0.0397 (0.4154)0.0497
1.25−0.9959−1.5777 (2 × 10−6)−0.0425 (0.4374)0.0560
1.5−0.9974−2.0642 (6 × 10−7)−0.0401 (0.4768)0.0579

[48] We compared predictions under simple scaling and exponential multiscaling of tail quantiles at all scales, based on the above analyses. The results show strong preference for multiscaling versus simple scaling [Pavlopoulos and Gupta, 2001]. These comparisons do not support the finding of Pavlopoulos and Gritsis [1999] that simple scaling of dry durations appears to hold over the range of smaller scales from 10 km to 2 km.

[49] We conclude this section with a remark pertaining to the asymmetry between wet scaling (power law type) and dry scaling (exponential type). Based on the empiricism acquired through this study, and some intuitive thinking, we observe that wetness is the dominant state of regional rainfall in large regions, while in small regions dryness is substantially more persistent and thus dominant over wetness (see Figure 4). That is, when wetness dominates in large regions, then dry epochs are very short. At the other end, when dryness dominates in small regions, then wet durations can be much longer compared to duration of dry epochs in large regions. Thus, the scaling behavior of wet and dry durations across the range of intermediate scales (between large and small regions) ought to be dictated by different schemes, at least in principle. This is manifested quantitatively by power law versus exponential scaling, as shown in this study regarding tail quantiles and moments. This intuitively anticipated asymmetry can be seen rather clearly in the values of wet and dry duration tail quantiles given in Table 1 (especially in the last three rows). For example, for p = 0.995, wet quantiles scale (increasingly with respect to spatial scale) between 34/3 hours (at 2 km) and 68.24/3 hours (at 120 km), while dry quantiles scale (decreasingly with respect to spatial scale) between 62.06/3 hours (at 2 km) and 5.56/3 hours (at 120 km). However, the key to the asymmetry in the type of scaling, is that the rate of scaling of dry quantiles must be much steeper (i.e., exponential) than the apparently more moderate rate of scaling of wet quantiles (i.e., power law), in order to cover a wider span of quantile values over the same range of scales at the same probability level. Similar explanation pertains to the asymmetry between types of scaling of moments of wet and dry durations.

5.3. Fractal Dimensions of Sets of Wet Residence

[50] Hyperbolic tail probabilities of wet and dry durations, given by (9) and (14) respectively, point to the multifractal nature of the set consisting of the wet epochs in time series of regional rainfall. A general account on the connection between hyperbolic tails and fractals is given by Mandelbrot [1983]. Given a timescale τ for rainfall time series (τ = 20 min for our data), the set of all wet epochs in the series is referred to as the set of wet residence.

[51] Schmitt et al. [1998] showed that the pdf of dry duration is proportional to τ−(Δ+1), where Δ is the capacity fractal dimension [Cutler, 1993, sections 2.2, 5.4, 6.1] of the set of wet residence for a given temporal scale τ at a fixed spatial location. The same result has also been obtained by Lowen and Teich [1993], using a different approach. The argument given by Schmitt et al. [1998] extends to temporal records of regional rainfall. Here we explore the dependence of capacity fractal dimension Δλ of wet residence set on the spatial scale λ.

[52] In this more general setting, the pdf of dry duration ought to be proportional to equation image, at spatial scale λ. Consequently, integration with respect to τ yields tail probabilities of dry duration proportional to equation image, that is P(Dλ > τ) ∝ equation image. Substituting τ = Qλ(d)(p), we obtain 1 − pequation image, whence Qλ(d)(p) = σλ · equation image, where σλ > 0 denotes the constant of proportionality. For p ↗ 1, the last equation combined with (13) yields equation image. Solving for the exponent of 1 − p gives

equation image

The second term on the RHS of (16) is difficult to resolve due to the unknown ratio σλ1. However, if σλ is continuous in λ, then that very term vanishes as λ ↗ 1. This allows an approximation of equation image by γ* · (1 − λ), for large enough scales (λ ↗ 1).

[53] To test this approximation, we obtained estimates of fractal dimensions. Pavlopoulos and Gupta [2001] give details about those estimates, obtained using a box counting procedure. Due to skewness of the estimates, their sample medians (see Table 9) are used as working estimates of Δλ at each scale λ. Clearly, Δλ increases with λ, an intuitively anticipated behavior. Using the earlier obtained estimate γ* = 0.379, we predict differences 1/Δλ − 1/Δ1 by the predictor 0.379 · (1 − λ), according to the proposed approximation. Sample median estimates of 1/Δλ − 1/Δ1 are plotted against the predictions 0.379 · (1 − λ) in Figure 8. It is seen that in the range of the larger scales, from 120 km to 16 km, estimates and predictions are in good agreement, depicted by the four points closest to the diagonal line in Figure 8. However, in the range of the smaller scales, from 8 km to 2 km, the predictor 0.379 · (1 − λ) underestimates 1/Δλ − 1/Δ1. This effect may be attributed to the significance of the second term on the RHS of (16), as λ ↘ 0.

Figure 8.

Estimates of the differences 1/Δλ − 1/Δ1, based on median fractal dimension at each scale, plotted versus predicted values 0.379 · (1 − λ). The drawn line is the diagonal through the origin.

Table 9. Summary Descriptive Statistics of Samples of Box Counting Estimates of Capacity Fractal Dimension of Wet Residence Sets at Each Scale, Obtained From Cruise 1 Time Series of Spatially Averaged Rain Rate Over the Subregions Sampled by the Designs Described in Section 3
Spatial Scale λSample Median Capacity DimensionSample Mean Capacity DimensionSample Standard Deviation
1/60 (2 km)0.53200.52770.0857
1/30 (4 km)0.59650.57170.1321
1/15 (8 km)0.59000.57760.0711
2/15 (16 km)0.65200.60370.1460
1/4 (30 km)0.68500.67570.0591
1/2 (60 km)0.73200.71910.0575
1 (120 km)0.82500.80880.0712

6. Concluding Remarks

[54] This study presented the spatial scaling properties of wet and dry epoch durations based on time series records of regional rainfall. Several limitations were pointed out regarding the information available from TOGA-COARE data toward this goal. Some of these limitations can be partially alleviated by spatiotemporal pooling of relevant information via implementation of a nonrandom and symmetric design for sampling subregions within the overall region of study. This pooling strategy requires certain conditions of spatiotemporal homogeneity to hold, which are necessary for the scaling analysis to be meaningful. We illustrated how these conditions can be tested. This effort provided sufficient statistical evidence in support of the validity of spatial and temporal homogeneity. The main conclusions of this study can be summarized as the following.

[55] 1. Tail quantiles and moments of wet duration in regional rainfall records conform to power law multiscaling according to (6) and (10) respectively.

[56] 2. Tail quantiles and moments of dry duration in regional rainfall records conform to exponential multiscaling according to (11) and (15) respectively.

[57] 3. Wet and dry durations have tail probabilities of hyperbolic type according to (9) and (14) respectively. Empirical expressions for tail probabilities can be used to estimate upper bounds for the finiteness of moments.

[58] 4. The reciprocal of capacity dimension of the set of wet residence in time series of regional rainfall may be approximated by a linear function of the spatial scale.

[59] No assessment of accuracy of sample estimates of quantiles and moments was made in this study. Although they are theoretically unbiased estimates, and spatiotemporal pooling reduces their bias due to sampling errors, their sensitivity to extreme observations from highly skewed distributions with heavy tails (and infinite variance in some cases), and the unknown structure of spatial dependence, obscure a rigorous assessment of their variances (e.g., by central limit theorems) and of their consistency (e.g., by laws of large numbers). An alternative is to rely on resampling techniques and obtain bootstrap sample estimates of tail quantiles and moments, along with their bootstrap standard errors. Based on such information, construction of bootstrap confidence intervals is feasible [Efron and Tibshirani, 1993]. However, this task is beyond the scope of this study, and could be the subject of a separate study.

[60] The space-time approach proposed here has the potential to be generalized so as to establish links between duration and intensity of regional rainfall, and between duration and fractional wet areas defined with respect to different intensity thresholds. Such investigations are likely to lead to generalizations of the well-known “optimal threshold” linear relationship between regional rain rate and fractional wet area where rain rate exceeds a given threshold [Kedem and Pavlopoulos, 1991]. Development of the optimal threshold method was motivated by the problem of rainfall estimation from satellite remote sensing.

[61] Finally, an empirical understanding of space-time rainfall intermittence provides a framework to test newly developing statistical models [Over and Gupta, 1996; Marsan et al., 1996; Freidlin and Pavlopoulos, 1997], and newly developing dynamical models [Grabowski et al., 1996]. Diagnosing the same set of statistics using both dynamical as well as statistical models suggests a new approach to linking these two types of models (K. Nordstrom and V. K. Gupta, preprint, 2003). The problem of linking rainfall dynamics with its statistics is a grand challenge where much work remains to be done.

Acknowledgments

[62] This research was supported by a joint NSF/NASA grant, and by the European Union grant ERB-FMRX-CT96-0095. Sincere thanks are expressed to Dr. Brian Mapes for providing us with a section of the TOGA-COARE data set.

Ancillary

Advertisement