Ubiquity of, and geostatistics for, nonstationary increment random fields

Authors

Daniel O'Malley,

Corresponding author

Department of Earth, Atmospheric and Planetary Sciences, Purdue University, West Lafayette, Indiana, USA

Corresponding author: D. O'Malley, Department of Earth, Atmospheric and Planetary Sciences, Purdue University, West Lafayette, IN 47907, USA. (omalled@gmail.com)

[1] Nonstationary random fields such as fractional Brownian motion and fractional Lévy motion have been studied extensively in the hydrology literature. On the other hand, random fields that have nonstationary increments have seen little study. A mathematical argument is presented that demonstrates processes with stationary increments are the exception and processes with nonstationary increments are far more abundant. The abundance of nonstationary increment processes has important implications, e.g., in kriging where a translation-invariant variogram implicitly assumes stationarity of the increments. An approach to kriging for processes with nonstationary increments is presented and accompanied by some numerical results.

[2] Let A(x) be a random field. When nonstationarity is mentioned in the context of random fields, it usually refers to nonstationarity of the process values, i.e., the distribution of A(x) and A(y) may differ. This could arise, for example, if A(x) is a log-fractional Brownian conductivity field [Mandelbrot and Ness, 1968]. Another type of nonstationarity that can arise is the nonstationarity of the increments, i.e., the distribution of A(x+y)−A(x) may depend on x. Such nonstationarity can arise, for example, if A(x) is a fractional Gaussian noise with nonlinear space [O'Malley et al., 2012].

[3] Stochastic processes with nonstationary field values have been studied extensively in hydrology using models such as fractional Brownian motion (fBm, which has a power-law variogram) [Molz et al., 1997] and fractional Lévy motion [Lu et al., 2003; Painter, 1996]. Stochastic processes with nonstationary increments have been scrutinized much less intensely. The primary purpose of this note is to demonstrate the ubiquity of processes with nonstationary increments via a mathematical argument showing that processes with stationary increments are the exception, rather than the rule. The implications of this are then considered in the context of kriging, and a method for kriging in the presence of nonstationary increments is presented along with some numerical results.

2. Nonstationary Increment Processes

[4] If a constant mean process has nonstationary increments, then the variance of the increments is generally given by a function of two variables,

g(x,y)=E[(A(x)−A(y))2](1)

[5] When the increments are stationary, g(x,y) becomes a function of a single spatial variable, g(x,y)≡g(x−y). It is not hard to imagine that just as there are many more functions of two variables than functions of one variable, there are many more processes with nonstationary increments than processes with stationary increments. This speaks to the sparsity of models with stationary increments relative to the abundance of models with nonstationary increments. This idea can be made more precise mathematically by observing that the set of stochastic processes with stationary increments is nowhere dense, and the set of processes with nonstationary increments is dense everywhere. To clarify what this means, some definitions may be helpful. A limit point, x, of a set, C, is a point such that x=limn→∞xn where each xn∈C. The closure of C (denoted C¯) is the set containing C plus all the limit points of C. The interior of a set is the complement of the closure of the complement of the set. A set is called nowhere dense if it has empty interior. A set, C⊂X, is called dense everywhere if the closure of C is X.

[6] Let S denote the set of stochastic processes defined on the domain (0,1)d that have suitably smooth probability density functions for every finite dimensional distribution. If A(x),A1(x),A2(x),…∈S, let it be said that

for any m, for all (x1,x2,…,xm)∈(0,1)dm where d is the dimension of the xi, and for any measurable set C⊂ℝm. In other words, An(x)→A(x) if the distribution functions for the finite dimensional distributions of An(x) converge pointwise to the distribution function for the same finite dimensional distribution of A(x).

[7] Let S_{s} denote the set of elements of S with stationary increments, and S_{ns} denote the set of elements of S with nonstationary increments. The objective is to show that S_{s} is a nowhere dense set in S using the definition of limit points given above. The first step in the proof is to show that the closure of S_{s} is S_{s}, i.e., S¯s=Ss. The second step is to show that for any A(x)∈S¯s=Ss, there is a sequence of elements An(x)→A(x) such that An(x)∈Sns for n=1,2,…. This shows that A(x) is not in the interior of S¯s=Ss=S∖Sns, where ∖ denotes the set difference operator. Since A(x) was an arbitrary element in S¯s, the interior of S¯s must be the empty set. Therefore, S_{s} is nowhere dense in S.

[8] Other mathematical definitions for convergence can be given, and other sets S can be chosen. However, the basic outline given above would provide a reasonable approach to proving that the set of stochastic processes with stationary increments is nowhere dense with a different set of assumptions. Depending on these assumptions, different choices of An(x) may become necessary. The one that will be employed here is to let An(x)=A(x)+ξn(x) where the ξn(x) have nonstationary increments and become smaller as n becomes larger. This amounts to adding a small perturbation to the original process.

[9] To begin the proof, it must be shown that S¯s=Ss. Let A(x)∈S¯s. This implies that there is a sequence An(x)∈Ss such that An(x)→A(x), and

so that A(x) has stationary increments as well. Therefore, A(x)∈Ss, and S¯s=Ss.

[10] Let An(x)=A(x)+x12/n, so that ξn(x)=x12/n is a trivial stochastic process with no uncertainty, but with nonstationary increments where x=(x1,x2,…,xd). From the fact that

it can be seen that the increments of An(x) have a spatially dependent mean and are thus nonstationary. Hence, An(x)∈S∖S¯s. Let C be an open set in ℝm, and observe that

where ei is the ith standard basis element in ℝm. The last equality follows from the assumption of smoothness on the joint probability density function for A(x1),…,A(xm), fA(x1),…,A(xm)(y). This string of equalities implies that An(x)→A(x). Since An(x)∉S¯s, A(x) is not in the interior of S¯s and because A(x) was an arbitrary element of S¯s, the interior of S¯s must be empty. Hence, the interior of the closure of S_{s} is empty, so S_{s} is nowhere dense in S. This argument also demonstrates that the closure of S_{ns} is S, so S_{ns} is everywhere dense in S.

[11] The proof presented here concerns an abstraction of spatial statistical processes, and it is natural to wonder whether this abstraction bears any resemblance to stochastic processes found in nature. Recent studies of diverse phenomena such as soil properties [Haskard and Lark, 2009], air pollution [Fuentes, 2002], and ocean temperatures [Higdon, 1998] indicate that nonstationarity does frequently occur in nature. The argument given here can be seen to provide some theoretical motivation for why this is the case.

2.1. Nonstationary Increments Via Nonlinear Spatial Transformations

[12] The mathematical simplicity of processes with stationary increments has made them a rich area of study that has produced models based on Brownian motion, fBm, Lévy motion, and fractional Lévy motion. Assuming zero mean, the variogram for a stationary increment model takes the form

γ(x−y)=12E[(A(x)−A(y))2]=12E[(A(x−y)−A(0))2].(7)

[13] Many varieties of processes with nonstationary increments can be constructed, and the argument in the previous section demonstrates the need for more effort in this direction.

[14] Models with stationary increments can be used as building blocks for models with nonstationary increments. Suppose that A(x) is a random field with stationary increments (such as those just mentioned), and F(x) is a deterministic function that transforms the spatial coordinate, x. That is, F(x) is a function that maps ℝd into ℝd. These two components can be combined to define a new random field,

B(x)=A(F(x))(8)

that has nonstationary increments. A more detailed discussion of these processes can be found in O'Malley et al. [2012]. This approach has been employed in the context of diffusion and dispersion to introduce nonstationarity [O'Malley and Cushman, 2010].

[15] If A(x) is normally distributed, B(x) will be normally distributed as well. If the process A(x) has mean zero and covariance function CA(x,y), then B(x) will also have mean zero, but the covariance function is given by CB(x,y)=CA(F(x),F(y)). This follows from the fact that

[16] It can be shown similarly that the variogram for B(x), γB(x,y), can be written in terms of the variogram for A(x),

γB(x,y)=γA(F(x),F(y))(11)

[17] An important point to observe is that if F(x) is nonlinear then γB(x,y) cannot be expected to take the form in equation (7) even if (or, especially if) γA(x,y) does take that form.

[18] Random fields such as B(x) with A(x) being a process with stationary increments and F(x) being nonlinear provide a powerfully descriptive approach to studying processes with nonstationary increments for two reasons. First, the choice of the base process, A(x), allows for many different types of statistics (e.g., heavy tailed or not and skewed or not). Second, the choice of the nonlinear function allows for many different types of nonstationarity (e.g., power laws, exponentials, and polar coordinates). For these reasons, these types of processes will be used as geostatistical models for the kriging process in the absence of stationary increments.

3. Kriging With Nonstationary Increments

[19] If either nonstationarity of the field values or the increments arise, the variogram in equation (7) does not provide sufficient information to enable proper kriging. In the case of nonstationary field values, information about the point variance is needed. In the case of nonstationary increments, it does not even make sense to write equation (7). Therefore, in the presence of either of these types of nonstationarity, something more informative than the variogram is needed.

[20]Guttorp and Sampson [1994] review a number of methods for estimating heterogeneous spatial covariance functions in the context of spatiotemporal stochastic processes. Their approach [Sampson and Guttorp, 1992] assumes temporal stationarity and relies on temporal averages. The approach described here is more appropriate when temporal variability is not present in the data. Stein [2005] provides a review of some analytical approaches to constructing covariance functions for processes with nonstationary variations that builds on the work of Pintore and Holmes [2004], Paciorek [2003], and Higdon [1998] and includes some practical advice for applying these covariance functions. Haskard and Lark [2009] build upon the work of Pintore and Holmes [2004] in dealing with nonstationary covariance by tempering an empirical spectrum. A review of numerous methods for estimating covariance functions for nonstationary spatial processes is provided by Sampson [2010].

[21] Suppose that there is a field, A(x), of random variables that has a constant, known mean and the values of this field are known at locations x1,x2,…,xN. The increments and the variation about the mean of the field are not assumed to be stationary. The goal is to determine an estimate, A^0, for A(x0) at some point x0. Subtracting this mean from each value in the field results in a field with zero mean. By kriging on this modified field and adding the mean on afterward, we may assume without loss of generality that the mean is zero. A set of candidate covariance functions, Ci(x,y;pi), is devised where pi is a vector of length n_{i} containing the parameters for the ith covariance function and i=1,2,…,M. The choice of M and the covariances is specified at run time by the user.

[22] For each of the candidate models, a maximum likelihood set of parameters, pi*, is determined. In order to determine these parameters a likelihood function is needed, and for this purpose the field is assumed to obey a multivariate normal distribution with zero mean and covariance Ci(x,y;pi). Although the assumption of normality is made here to compute the likelihood functions, the same procedure can be used for other distributions provided that the joint probability density function can be computed. A constrained optimization procedure is used to determine the maximum likelihood parameters. This procedure requires an initial set of parameters and a range of parameters to be specified at run time.

[23] Upon determining the maximum likelihood parameters for each candidate covariance function, the Akaike information criteria (AIC) and the “corrected” AIC (AICc) [Burnham and Anderson, 2002] are used to determine weightings for models associated with each of the covariance functions,

wi=exp(A*−Ai2)(12)

where A_{i} is either the AIC or AICc for the ith model, and

A*=min(A1,A2,…,AM)(13)

[24] The AIC and AICc were chosen because they are simpler than some other information criteria such as the Bayesian and Kashyap information criteria, and they have proven to be effective for our purposes. These weightings are based on the probability that model i minimizes the information loss. See Ye et al. [2008] for a review of model selection criteria in a hydrological context. Two different approaches can be used to determine A^0 as well as metaparameters. One option is to krige using the covariance function Ci(x,y;pi*) that corresponds to the minimum A_{i}. The other is to use the weights to average the kriging estimates based on each of the covariance functions. Simple kriging is used because it employs the covariance function rather than the variogram [Kitanidis, 1997]. A presentation of kriging techniques can be found in Chilès and Delfiner [2012] or Kitanidis [1997].

3.1. Results

[25] Two types of random fields were chosen to measure the effectiveness of the methodology. One has stationary increments, the other does not, and both are functions of two spatial variables. These choices were made to answer two basic questions: Can this approach effectively distinguish between models that are stationary and models that are nonstationary based on a relatively small sample from a single realization of the random field? Does weighting the kriging results from two models (one with stationary increments, the other with nonstationary increments) via the model selection criteria result in better estimates than using only one of the models? As will be seen, the answer to the first question is unequivocally in the affirmative. The answer to the second question is clearly in the affirmative for estimating the mean, but somewhat ambiguous for estimating the kriging error.

[26] The field with stationary increments is a fBm. As an example, the Hurst exponent, H, was chosen to be 0.4, and the scaling coefficient, σ^{2}, was chosen to be 1. When searching for the maximum likelihood set of parameters, the initial point was chosen so that the Hurst exponent was 0.5 and the scaling coefficient was 1. The lower and upper bounds for the Hurst exponent were chosen to be 0.1 and 0.9, respectively. The lower and upper bounds for the scaling coefficient were chosen to be 0.1 and 10, respectively.

[27] The field with nonstationary increments is a fractional Brownian field with a nonlinear spatial transformation that takes the form of equation (8) with A(x) being a fBm and

F(x,y)=R[(x0+x)α1,(y0+y)α2]T(14)

and R is a rotation matrix that rotates the plane through the angle θ. This makes the coordinate system obey power laws in two perpendicular directions that need not be aligned with the coordinate axes. For this reason, we refer to such a field as fBm with power-law space (fBm-pls). As an example, the parameters were chosen to be 0.6 for the Hurst exponent (H), 1 for the scaling coefficient (σ2), 1.5 for α1, 0.5 for α2, 10 for x_{0}, 25 for y_{0}, and π/4 for θ. The parameters H, σ2, α1, α2, x_{0}, y_{0}, and θ were assumed to be contained in the intervals [0.1, 0.9], [0.1, 10], [0.25, 2], [0.25, 2], [0, 100], [0, 100], and [0,2π), respectively.

[28] For both the fields with stationary and nonstationary increments, 10,000 realizations were tested using a combination of the Cholesky decomposition and random midpoint displacement methods [O'Malley et al., 2012]. In each realization, the field values were sampled at the 25 points given by (1,1),(1,2),…,(2,1),(2,2),…,(5,5). Kriging was used to estimate the mean and variance at the point (2.5,2.5). See Figure 1 for a pictorial representation. The model selection criteria were used to determine the most appropriate model based on each individual realization. The known field values were limited to 25 points for two reasons. One is that it is important that this approach is able to determine the correct model based only on a modest data set from a single field. The other is more practical. Computing the likelihood function requires O(N2) operations where N is the number of points at which the field value is known. Many likelihood function evaluations are required to determine the maximum likelihood set of parameters, and therefore, this procedure can have significant computational requirements if the field values are known at many points. In practice, this is not a significant limitation, because the maximum likelihood set of parameters for a given field need only be determined once. However, for the present purposes, the maximum likelihood parameters must be determined for a large number of random fields in order to compute accurate statistics. For each random field, two covariance models were considered as candidate models: the covariance for fBm and the covariance for fBm-pls. In practice, it would be prudent to use a greater number of points to improve the accuracy of the parameter estimation and to deal with a greater number of covariance functions.

[29] When the field was an fBm, both model selection criteria performed well in selecting this as the model preferred over fBm-pls. The AIC selected fBm over fBm-pls 99.44% of the time, and the AICc selected fBm 100% of the time over 10,000 realizations. This reliability is very good considering that only 25 points were used in each case. In estimating the mean and variance via kriging, the fBm covariance outperformed the fBm-pls covariance model, and the weightings determined by both model selection criteria tracked the performance of the fBm model closely. Table 1 summarizes the results.

The first two columns give the number of times out of 10,000 that each covariance model was chosen when the underlying field is a 2-D fBm based on AIC and AICc. The latter two columns give the mean-square error for the mean and absolute value of the estimated kriging error both relative to the kriging error based on the true covariance model.

fBm

9944

10,000

0.0227

0.3168

fBm-pls

56

0

0.0443

0.3359

AIC-weighted

N/A

N/A

0.0229

0.3170

AICc-weighted

N/A

N/A

0.0227

0.3166

[30] When the field was an fBm-pls, both model selection criteria performed well in selecting this as the model preferred over fBm. The AIC selected fBm-pls over fBm 100% of the time, and the AICc selected fBm-pls 100% of the time over 10,000 realizations. Once again, this reliability is very good considering that only 25 points were used in each case. In estimating the mean, the fBm-pls covariance outperformed the fBm covariance model but was outperformed by the fBm model in estimating the variance. The weightings determined by both model selection criteria tracked the performance of the fBm-pls model closely. Table 2 summarizes the results.

The first two columns give the number of times out of 10,000 that each covariance model was chosen when the underlying field is a 2-D fBm with a power-law spatial transformation based on AIC and AICc. The latter two columns give the mean-square error for the mean and absolute value of the estimated kriging error both relative to the kriging error based on the true covariance model.

fBm

0

0

0.0436

0.5681

fBm-pls

10,000

10,000

0.0146

0.6249

AIC-weighted

N/A

N/A

0.0146

0.6249

AICc-weighted

N/A

N/A

0.0146

0.6249

4. Conclusion

[31] The fact that processes with stationary increments are nowhere dense in the set of stochastic processes provides a rigorous mathematical backing to the notion that the occurrence of such a process in nature is an exceptional event. On the other hand, the fact that processes with nonstationary increments are dense everywhere implies that these processes should be expected to occur with great frequency.

[32] There are innumerable ways to devise a process with nonstationary increments, but one particularly simple method is to apply a nonlinear spatial transformation to a process that does have stationary increments. This approach allows for a wide variety of field statistics and nonstationary behaviors.

[33] Kriging on processes with nonstationary increments requires a more careful approach than kriging on processes with stationary increments, because a translation-invariant variogram does not exist in the former case. The approach to kriging these processes presented here relies on maximum likelihood methods and model selection criteria to determine the covariance structure and simple kriging to estimate field values. This approach proved effective at differentiating between stationary and nonstationary increments in the fields tested and estimating the mean field values.

Acknowledgments

[34] The authors wish to thank the National Science Foundation for supporting this work under contracts CMG-0934806 and EAR-0838224.