Our site uses cookies to improve your experience. You can find out more about our use of cookies in About Cookies, including instructions on how to turn off cookies if you wish to do so. By continuing to browse this site you agree to us using cookies as described in About Cookies.

Notice: Wiley Online Library will be unavailable on Saturday 27th February from 09:00-14:00 GMT / 04:00-09:00 EST / 17:00-22:00 SGT for essential maintenance. Apologies for the inconvenience.

Modeling multisite streamflow dependence with maximum entropy copula

Authors

Z. Hao,

Corresponding author

Department of Biological and Agricultural Engineering, Texas A&M University, College Station, Texas, USA

Corresponding author: Z. Hao, Department of Biological and Agricultural Engineering, Texas A&M University, 2117 TAMU, College Station, TX 77843-2117, USA. (hzc07@tamu.edu)

[1] Synthetic streamflows at different sites in a river basin are needed for planning, operation, and management of water resources projects. Modeling the temporal and spatial dependence structure of monthly streamflow at different sites is generally required. In this study, the maximum entropy copula method is proposed for multisite monthly streamflow simulation, in which the temporal and spatial dependence structure is imposed as constraints to derive the maximum entropy copula. The monthly streamflows at different sites are then generated by sampling from the conditional distribution. A case study for the generation of monthly streamflow at three sites in the Colorado River basin illustrates the application of the proposed method. Simulated streamflow from the maximum entropy copula is in satisfactory agreement with observed streamflow.

If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.

[2] For multisite streamflow simulation in a river basin, it is desired that statistical properties of streamflow at individual sites and dependence structure among different sites are preserved. The autoregressive moving average (ARMA) framework has been commonly used for multisite streamflow simulation [Salas and Delleur, 1980]. Parametric disaggregation method is another type of parametric methods for multisite streamflow simulation that generally consists of two steps. It first generates aggregated streamflow (e.g., annual flow) and then disaggregates or divides it into lower-level variables (e.g., monthly flow) [Valencia and Schaake, 1973; Stedinger and Vogel, 1984; Koutsoyiannis and Manetas, 1996].

[3] There are several disadvantages of the parametric ARMA and disaggregation methods, including the limitation in representing nonlinear dependences and nonstandard probability distribution forms [Sharma and O'Neill, 2002]. The nonparametric model, such as kernel density method, moving block bootstrapping method, or K-nearest neighbor resampling method, does not make assumptions about the probability distribution or dependence forms and provides an alternative for stochastic simulation [Vogel and Shallcross, 1996; Sharma et al., 1997; Prairie et al., 2007; Nowak et al., 2010].

[4] Recently, the copula method has been commonly used for modeling the dependence structure of multivariate random variables and also for the multisite stochastic simulation [Bárdossy and Pegram, 2009]. However, the ability of the commonly used parametric copulas to model dependences in higher dimensions is rather restricted [Kao and Govindaraju, 2008; Chui and Wu, 2009]. In this study, we propose the maximum entropy copula for multisite monthly streamflow simulation in which the rank correlation in higher dimensions among monthly streamflows at different sites can be modeled. The proposed method is applied to monthly streamflow simulation at three sites in the Colorado River basin.

2. Methodology

2.1. Entropy Concepts

[5] Let the joint probability density function (PDF) of two random variable X and Y on the interval [a_{1}, b_{1}] ×[a_{2}, b_{2}] be f(x, y). The entropy H of the joint PDF f(x, y) can be defined as [Shannon, 1948; Shannon and Weaver, 1949]

H=−∫a2b2∫a1b1f(x,y)lnf(x,y)dxdy(1)

[6] The principle of maximum entropy developed by Jaynes [1957] can be employed to derive the joint probability density function f(x, y) in that the joint PDF with the maximum entropy should be selected subject to the given constraints (or known information).

2.2. Maximum Entropy Copula

[7] The maximum entropy copula has been developed based on the entropy theory [Chui and Wu, 2009; Chu, 2011]. Let U and V be the marginal probabilities of the random variables X and Y with u and v denoting realizations of U and V. For a copula density function c(u, v), the entropy can be expressed as

W=−∫01∫01c(u,v)logc(u,v)dudv(2)

[8] The constraints can be expressed as

∫01∫01c(u,v)gi(u,v)dudv=gii=1,2…,n(3)

where g_{i} is the expectation of the function g_{i}(u,v), i.e., E(g_{i}(u, v)). To ensure the integration of the copula density function over all the space equates one, g_{1}(u, v) can be specified as 1.To ensure that the marginal of c(u,v) is the uniform [0,1], the moments of u and v can be specified as constraints (i.e., u, u^{2}, u^{3}, v, v^{2} and v^{3}…) to approximate the marginal properties numerically [Chu, 2011]. To model the dependence structure, the function g(u, v) can be specified in the form that is related to an association measure such that the expectation E(g(u, v)) becomes some linear form of rank correlation. For example, when the pairwise product constraint g(u, v)=uv is used, the commonly used Spearman rank correlation (ρ) can be linked to the constraint [Chu, 2011]:

∫01∫01uvc(u,v)dudv=ρ+312(4)

[9] With the moment constraints up to order m and pairwise product constraint in equation (4), the maximum entropy copula density function can be obtained as [Chui and Wu, 2009; Chu, 2011]

c(u,v)=exp[−λ0−∑r=1m(λrur+λr+mvr)−λ2m+1uv)](5)

where m is the maximum order of moments (m=3 in this study) and λ_{0},…,λ_{2}_{m}_{+1} are the Lagrange parameters. Parameter λ_{0} can be expressed as a function of other parameters as

λ0=∫01∫01exp[−∑r=1m(λrur+λr+mvr)−λ2m+1uv)]dudv(6)

[10] The dependence structure in terms of the Spearman rank correlation can be modeled through the joint probability density function in equation (5). Note that other measures of the dependence structure, such as Blest's measure and Gini's gamma, can also be modeled through the maximum entropy copula [Chu, 2011].

[11] The joint distribution in the higher dimension is of particular interest when the multivariate dependence structure has to be modeled. In this case, a multivariate entropy in equation (2) can be defined and then copula density function with the maximum entropy can be derived straightforward. It can be seen that the derivation of the maximum entropy copula is separate from that of the marginal probability distributions. Suitable marginal distributions, such as kernel density, can be selected to model the properties of streamflow of each month, such as skewness and bimodal properties, which have been well documented [Sharma et al., 1997; Prairie et al., 2007; Salas and Lee, 2010; Hao and Singh, 2012]. Thus, we omit the discussion of the marginal distributions but focus on the dependence structure modeling of multisite monthly streamflow through the maximum entropy copula.

2.3. Parameter Estimation

[12] For the maximum entropy copula, the Lagrange multipliers λ_{i} (i=1,., 2m+1) in equation (5) have to be estimated. It has been shown that these Lagrange multipliers can be solved by finding the minimum of a convex function Γ expressed as [Kapur, 1989]

Γ=λ0+∑i=12m+1λigi(7)

[13] These parameters can be estimated using the Newton Raphson iteration method [Wu, 2003; Hao and Singh, 2011]. However, a high-dimensional integration is involved in the parameter estimation for the multisite simulation to obtain the value of λ_{0} in equation (6), which makes it even more complicated than the single-site streamflow simulation. In this study, an adaptive algorithm for numerical integration over hyperrectangular region was employed for the high-dimensional integration (programmed as a MATLAB function ADAPT available from www.math.wsu.edu/faculty/genz/homepage) [Genz and Malik, 1980; Berntsen et al., 1991].

2.4. Simulation Methodology

[14] Suppose three sites from upstream to downstream are denoted as site 1, 2, and 3 and denote the marginal probability of the monthly streamflow at each site as (U_{1}, U_{2}, _{…}), (V_{1},V_{2}, _{…}) and (W_{1},W_{2}, _{…}) and the realizations as (u_{1},u_{2},…), (v_{1},v_{2},…) and (w_{1},w_{2},…). For site 1, the joint distribution C(u_{s}, u_{s}_{−1}) of monthly streamflow for two adjacent months s and s−1 must be estimated and the conditional distribution C(u_{s}|u_{s}_{−1}) can be used to generate the monthly streamflow (marginal) U_{s} given the previous monthly streamflow (marginal) U_{s−1}. For monthly streamflow of site 2, the joint distribution C(u_{s},v_{s−1},v_{s}) has to be estimated, and the conditional distribution C(v_{s}|v_{s−1},u_{s}) can be used to generate the monthly streamflow V_{s} given the streamflow V_{s−1} of site 2 and the monthly streamflow U_{s} of site 1. Similarly, for the monthly streamflow W_{s}, the conditional distribution C(w_{s}| w_{s−1}, u_{s}, v_{s}) can be used to generate the monthly streamflow W_{s} given the streamflow W_{s−1} of site 3, U_{s} of site 1 and V_{s} of site 2.

[15] The simulation methodology to generate the monthly streamflow (marginal) at each site can be summarized as follows:

[16] (1) Initialize monthly streamflow at sites 1, 2, and 3 of the first month, i.e., u_{1,}v_{1}, and w_{1}, by assigning random values from historical records.

[17] (2) With the initialized u_{1}, generate monthly streamflow at site 1 for the second month u_{2} from the conditional distribution C(u_{s}|u_{s−1}).With the generated u_{2}, and initialized value v_{1}, the monthly streamflow at site 2 for the second month v_{2} can be generated from the distribution C(v_{s}|v_{s−1}, u_{s}). With the generated u_{2}, v_{2} and the initialized w_{1}, the monthly streamflow at site 3 for the second month w_{2} can be generated from the distribution C(w_{s}|w_{s−1}, u_{s}, v_{s}).

[18] (3) With the generated u_{2}, v_{2}, and w_{2}, repeat step (2) to generate the monthly streamflow for the next month u_{3}, v_{3} and w_{3} for sites 1, 2, and 3, respectively.

[19] (4) Repeat step (3) to generate a sequence of monthly streamflows u_{4},…, u_{t}, v_{4},…,v_{t} and w_{4},…,w_{t} up to time t.

3. Application

[20] Monthly streamflow from 1906 to 2003 of three sites in the Colorado River basin, namely Paria River at Lees Ferry, Arizona (AZ) (denoted as site 1), Little Colorado River near Cameron, AZ (denoted as site 2), and Virgin River at Littlefield, AZ (denoted as site 3), were used for illustrating the proposed method. The monthly streamflow at each site can be downloaded from the website (http://www.usbr.gov/lc/region/g4000/NaturalFlow/previous.html).

[21] We illustrate the derivation of the joint probability density function for monthly streamflow at site 1 and 2 as an example. Denote the marginal probabilities of monthly streamflow for the month s at sites 1 and 2 as U_{s} and V_{s}. From equation (5), the maximum entropy copula density function c(u_{s},v_{s}_{−1},v_{s}) with the moment constraints up to order 3 and pairwise product constraint can be expressed as

[22] The joint distribution C(u_{s}, v_{s−1}, v_{s}) and conditional distribution C(v_{s}| u_{s},v_{s−1}) can be obtained from the density function accordingly.

[23] One hundred sequences of monthly streamflow (marginal) with the same length as the historical record (98 years) were generated for each site with the simulation methodology. The scatterplots of the rank of observed streamflow pairs and one sequence of simulated streamflow pairs (marginal) from the copula at different sites for March and April are shown in Figure 1 (top). The spread pattern of simulated streamflow pairs generally matched that of observed streamflow pairs of the 2 months well. As an example, the monthly streamflow of March and April for site 3 shows a strong dependence (Spearman correlation: 0.83) and most of the streamflow pairs spread along the diagonal. The simulated streamflow pairs are also spreading near the diagonal with Spearman correlation 0.77. The scatterplots of the rank of observed monthly streamflow and one sequence of simulated streamflow pairs from the copula at different sites for the same month of March are also shown in Figure 1 (bottom). The simulated Spearman correlations are 0.59, 0.59, and 0.58, which are relatively close to the observed Spearman correlation (i.e., 0.65, 0.68, and 0.67).

[24] Boxplots were used to display the observed and simulated statistics, and the performance was judged to be good when a statistic fell within the boxplot [Nowak et al., 2010; Salas and Lee, 2010]. Boxplots of the Spearman correlation of the observed and simulated monthly streamflows for three sites 1, 2, and 3 are shown in Figure 2 (left column), which display the temporal dependence between the adjacent months of a specific site. From Figure 2, it can be seen that for most months, the median of simulated statistics is within the boxplot. Box plots of the spatial dependence of the observed and simulated monthly streamflow of the same month between different sites are shown in Figure 2 (right column). All these simulations show good results since the observed Spearman correlation falls within the boxplots for most months. These results show that the dependence structure of the monthly streamflow at each site and between different sites can be preserved relatively well.

4. Conclusions

[25] The maximum entropy copula method is proposed for the multisite monthly streamflow simulation and shown to be capable of modeling the rank correlation of monthly streamflows at different sites. The joint distribution (copula) is derived by specifying functions of the marginal probability as constraints having maximum entropy and its extension to higher dimensions for dependence modeling is straightforward. The proposed methodology can also be applied to similar topics, such as rainfall simulation and geostatistical interpolation. The potential drawbacks would be that the marginal properties of the copula are approximated numerically and the sum of tributary flows adding up to the downstream flow cannot be ensured with the current framework.