Entropy-copula method for single-site monthly streamflow simulation

Authors

  • Z. Hao,

    Corresponding author
    1. Department of Biological and Agricultural Engineering, Texas A&M University,College Station, Texas,USA
      Corresponding author: Z. Hao, Department of Biological and Agricultural Engineering, Texas A&M University, College Station, TX 77843-2117, USA. (hzc07@tamu.edu)
    Search for more papers by this author
  • V. P. Singh

    1. Department of Biological and Agricultural Engineering, Texas A&M University,College Station, Texas,USA
    2. Department of Civil and Environmental Engineering, Texas A&M University,College Station, Texas,USA
    Search for more papers by this author

Corresponding author: Z. Hao, Department of Biological and Agricultural Engineering, Texas A&M University, College Station, TX 77843-2117, USA. (hzc07@tamu.edu)

Abstract

[1] An entropy-copula method is proposed for single-site monthly streamflow simulation. In this method, the joint distribution of adjacent monthly streamflows is constructed using the copula method, whereas the marginal distribution of streamflow for each month is derived using the entropy method. Then, the conditional distribution is derived from which monthly streamflow is generated. Moment statistics of monthly streamflow (such as mean, standard deviation and skewness) can be modeled by the entropy-based marginal distribution while the dependence structure between streamflows of adjacent months can be modeled by the copula-based joint distribution. The proposed entropy-copula method can be extended by incorporating an aggregate variable for modeling the interannual dependence. Application to the simulation of monthly streamflow from the Colorado River illustrates the effectiveness of the proposed method.

1. Introduction

[2] Synthetic streamflow data are needed in water resources studies for the evaluation of alternative designs and policies against the range of sequences that are likely to occur in the future [Loucks et al., 1981]. It is desired that synthetic streamflow is similar to historical streamflow and preserves moment statistics (such as mean, standard deviation, and skewness) and dependence structure (such as lag-one correlation). The interannual dependence is important for the simulation of long wet and dry periods [Sivakumar and Berndtsson, 2010], and also needs to be preserved. Sivakumar and Berndtsson [2010] provided a review of streamflow simulation models.

[3] The copula method has been extensively applied for hydrologic modeling mainly due to its flexibility in constructing the joint distribution to describe the dependence structure between random variables. One of the most common applications is frequency analysis of hydrological variables [Favre et al., 2004; Salvadori and De Michele, 2004; Genest et al., 2007; Kao and Govindaraju, 2008; Chebana and Ouarda, 2009; Salvadori and De Michele, 2010; Renard, 2011; Vandenberghe et al., 2011]. Recent years have been witnessing an upsurge in applications of the copula method. Bardossy and Li [2008] introduced a copula-based model to describe spatial variability for the interpolation of groundwater quality parameters. Serinaldi [2009] employed the bivariate copula-based mixed distribution to deduce the multisite Markov model for modeling and generating daily rainfall series. Chowdhary and Singh [2010] developed a copula-based approach for reducing uncertainty in the parameter estimation of frequency distributions. Gyasi-Agyei [2011] used a copula to model the dependence structure of daily rainfall properties for daily rainfall disaggregation.

[4] One of the important applications of entropy theory is to derive the maximum entropy-based distribution of random variables [Kapur, 1989; Kesavan and Kapur, 1992]. Hao and Singh [2011] proposed the entropy method for single-site monthly streamflow simulation with the entropy-based joint distribution and Lee and Salas [2011] proposed the copula method for annual streamflow simulation with the copula-based joint distribution. This study proposes an entropy-copula method for monthly streamflow simulation in which the joint distribution is constructed using the copula method with the marginal distribution derived using the entropy method. The entropy-copula method is simpler than the previous work by Hao and Singh [2011], since less parameters are determined simultaneously and is able to model different (nonlinear) dependence structures of streamflow due to the copula component. To model the interannual dependence of monthly streamflow, an aggregate variable is used to guide the simulation. The proposed method is applied to the monthly streamflow of the Colorado River at Lees Ferry, Arizona, and its performance is evaluated by comparing generated and observed statistics.

2. Methodology

2.1. Entropy Theory and Marginal Distribution

[5] For a continuous random variable X with probability density function (PDF) f(x) defined on the interval [a, b], the Shannon entropy I can be expressed as [Shannon, 1948]:

display math

[6] Using the principle of maximum entropy proposed by Jaynes [1957] with the first four moments as constraints specified as:

display math

[7] The maximum entropy-based PDF can be obtained as [Kesavan and Kapur, 1992]:

display math

where λi, i = 0,1 , … , 4, are the Lagrange multipliers; and E(gi) is the expectation of the function gi(x) (gi(x) = xi). The Lagrange multipliers in equation (3) can be determined in terms of the specified constraints using the Newton-Raphson method [Hao and Singh, 2011].

[8] The constraints in the form of moments have been used to derive the maximum entropy distribution of random variables in different areas [Mead and Papanicolaou, 1984; Smith, 1993; Gotovac et al., 2010]. In addition, the distribution in equation (3) can also model bimodality in the data [Matz, 1978]. For streamflow simulation, the distribution derived from the entropy method in equation (3) can be used as the marginal distribution of streamflow for each month and samples drawn from the distribution can be expected to preserve the mean, standard deviation, skewness (and kurtosis).

[9] The cumulative distribution function (CDF) of the maximum entropy-based PDF in equation (3) can be expressed as

display math

2.2. Copula Theory and Joint Distribution

[10] For the continuous random vector (X, Y) with marginal CDFs FX(x) and FY(y), the joint distribution function of the random vector (X, Y) can be expressed with its marginal CDFs and copula C as [Nelsen, 2006]:

display math

where θ is the parameter of the copula that measures the dependence between marginals; and u and v are realizations of the random variables U = FX(x) and V = FY(y). The two-dimensional Copula C maps the two marginal distributions into the joint distribution as [0,1]2 → [0,1]. For the estimation of parameter θ, the method of moment (MOM), the exact maximum likelihood (EML) method, and the inference functions for margins (IFM) method can be used [Joe, 1997; Genest and Favre, 2007].

[11] There are several copula families and a variety of dependence structures can be modeled [Joe, 1997; Nelsen, 2006]. For example, random variables from both the Gaussian copula and Frank copula exhibit symmetric dependence, while those from the Clayton copula exhibit asymmetric dependence [Trivedi and Zimmer, 2005]. For streamflow simulation, the joint distribution of streamflows for two adjacent months constructed from the copula method in equation (5) can be used to model the (nonlinear) dependence structure of monthly streamflow.

[12] The conditional distribution of random variable Y given X (denoted as C2|1(v|u)) can be derived from the joint distribution in equation (5) as:

display math

2.3. Entropy-Copula Method

[13] The proposed entropy-copula method combines the entropy method to derive the marginal distribution and the copula method to construct the joint distribution. The joint distribution from the entropy-copula method can be constructed from equations (4) and (5) and expressed as:

display math

where α and β are the parameters of the entropy-based marginal distribution, and γ is the copula parameter. The IFM method can be used for parameter estimation, in which parameters of the marginal distributions and those of the copula can be split. Parameters α and β can be estimated by the Newton-Raphson method, while copula parameter γ can be estimated using the maximum likelihood method. Compared with the entropy based method proposed by Hao and Singh [2011], less number of parameters needs to be estimated simultaneously. In addition, the copula component in the joint distribution enables the characterization of different (nonlinear) dependence structures [Joe, 1997; Nelsen, 2006].

[14] The samples from the joint distribution in equation (7) can be expected to preserve the mean, standard deviation, skewness through the entropy-based marginal distribution and dependence structure through the copula-based joint distribution. The conditional distribution from the joint distribution in equation (7) can also be expressed using equation (6). Monthly streamflow can be generated sequentially from the conditional distribution and totally 12 conditional distributions have to be used.

2.4. Extended Entropy-Copula Method

[15] The proposed entropy-copula method can be extended to preserve the interannual dependence by introducing an aggregate variable in the conditional distribution similar to the framework developed by Sharma and O'Neill [2002]. For monthly streamflow denoted as X1, X2, … , X12, X13, X14, … , Xn, where X1, X2, … , X12 are the monthly streamflows of the first year and so on, an aggregated variable can be defined as the summation of the previous m monthly streamflows (m = 12 in this study):

display math

[16] Denoting streamflows of two adjacent months as Xt−1,Xt and the corresponding aggregate variable as Zt−1 with the cumulative distribution functions F(Xt−1), F(Xt) and G(Zt−1), respectively, the joint distribution of random vector (Zt−1, Xt−1, Xt) can be expressed by copula C as:

display math

where ϕ is the parameter that can be a scalar or vector depending on the copula family; v1, v2, and v3 are the realizations of random variables V1 = G(Zt−1), V2 = F(Xt−1) and V3 = F(Xt). Then the conditional distribution of Xt given Xt−1 and Zt−1 (denoted as C3|12(v3|v1, v2)) can be derived from the joint distribution by copula C in equation (9) as:

display math

[17] The procedure for generating monthly streamflow by the extended entropy-copula method is summarized as follows:

[18] 1. Pick any zt−1 and the corresponding xt−1 values from the historic record. Compute the corresponding cumulative probabilities Gt−1 (zt−1) (denoted as v1) and Ft−1 (xt−1) (denoted as v2).

[19] 2. Generate a uniform random number η between [0, 1], which is considered to be the conditional cumulative probability corresponding to a specific value xt, given the initial value zt−1 and xt−1 (or v1 and v2). From equation (10), one obtains: inline image. The cumulative probability Ft(xt) (denoted as v3) can be obtained as: inline image and then xt can be obtained as: inline image.

[20] 3. Increase time step t and update random values xt−1 and zt−1.

[21] 4. Repeat steps (1)(3) until the required length of monthly streamflow is generated (The first few streamflow values can be discarded to avoid initialization bias).

3. Application

[22] Monthly streamflow from the Colorado River at Lees Ferry, Arizona, from 1906–2003 was used for application of the proposed method. More details about the datasets are given by Hao and Singh [2011]. 100 flow sequences with 100 years of streamflow in each sequence were generated to assess the performance of the proposed method. The basic statistics (mean, standard deviation, skewness, lag-one (Pearson) correlation, maximum and minimum values) and interannual dependence from generated streamflow were compared with those from the observed streamflow using box plots. The performance was considered to be satisfactory when a statistic fell in the box plot.

3.1. Marginal and Joint PDF

[23] Maximum entropy-based marginal PDF and CDF for each month were compared with empirical histograms and empirical CDFs estimated from the Gringorten plotting position formula. These results for the May and September streamflow are given in Figure 1. Generally the theoretical PDF fitted the empirical histogram relatively well and theoretical CDF also fitted the empirical CDF well. Note that the skewness of September streamflow is relatively high (1.96). These results showed that the entropy-based marginal distribution modeled the underlying streamflow well, even though high skewness was involved. The bimodality in the PDF of the May streamflow, which has been found in the previous study [Prairie et al., 2006], was not resolved with the PDF in equation (3). However, this can be overcome when more moments are used to derive the entropy-based marginal distribution. To avoid the uncertainty in the estimation of higher moments from the historical record (98 years), the entropy-based PDF in equation (3) was used as the marginal distribution for monthly streamflow throughout the year. Box plots of the entropy-based marginal PDFs for the generated May and September streamflows are also shown in Figure 1, which shows the uncertainty of the moments estimated from the generated series of a particular length (100 years) appreciably affects the entropy-based PDF in equation (3). Compared with the PDF estimated from the observed streamflow, the generated streamflow preserved the marginal distribution relatively well and the use of the first four moments as constraints to derive the marginal distribution is acceptable for this study.

Figure 1.

Comparison of empirical and theoretical PDFs and CDFs for the May and September streamflow (Units for streamflow are in Cubic Meter per Second (CMS)).

[24] The Clayton, Frank, Gumbel and Gaussian copula functions were selected to construct the joint distribution. Two goodness of fit test statistics, the Cramér–von Mises statistic (Sn) and Kolmogorov-Smirnov statistic (Tn) given by Genest et al. [2006], were employed to choose the suitable copula function. Statistics (Sn and Tn) and the associated p values based on a run of 5000 samples were obtained using the parametric bootstrap procedure [Genest et al., 2006; Genest and Favre, 2007]. Results for statistic Sn are given in Table 1. The very low p value (<0.05) signified that the null hypothesis that the copula function was a valid model should be rejected. The times of streamflow pairs of adjacent months that a copula function was rejected for the Clayton, Frank, Gumbel and Gaussian copulas were 6, 4, 7, and 2, respectively. Similar results were also obtained from statistic Tn (times of rejecting the copula were 6, 2, 4, and 2, respectively). It can be seen that there was not a single copula function that performed best for modeling all streamflow pairs. A practical way for the simulation of monthly streamflow may be to choose different copula functions in modeling different streamflow pairs. The Gaussian copula function seemed to be preferable based on the number of rejections. In this study, the Gaussian copula function was selected hereinafter for the illustration of the proposed method for monthly streamflow simulation. The entropy-copula (EC) method and extended entropy-copula (EEC) method with the Gaussian copula function were denoted as ECG method and EECG method.

Table 1. Statistics Sn and Associated p Values of Different Copulas for Different Streamflow Pairs
CopulaSn and p Value1–22–33–44–55–66–77–88–99–1010–1111–1212–1
ClaytonSn0.250.190.090.060.130.110.150.440.170.220.290.15
Claytonp value0.010.050.320.650.130.090.030.000.040.010.000.10
FrankSn0.050.080.330.250.060.070.080.120.110.070.130.15
Frankp value0.720.310.000.000.450.160.130.050.070.300.030.01
GumbelSn0.060.220.510.540.210.140.190.110.160.080.060.10
Gumbelp value0.640.020.000.000.010.010.000.150.040.300.520.13
GaussianSn0.060.120.220.300.040.040.070.140.080.030.080.08
Gaussianp value0.690.150.010.000.830.620.240.060.350.940.240.24

3.2. Basic Statistics

[25] A plot of observed monthly streamflow and a sequence of generated monthly streamflow for 98 years by the ECG method is shown in Figure 2. Generally the variability of generated streamflow was similar to that of observed streamflow (e.g., maximum and minimum values). Similar results were obtained from the streamflow generated by the EECG method (not shown).

Figure 2.

Comparison of observed monthly streamflow (98 years) and a sequence of generated monthly streamflow by the ECG method.

[26] Basic statistics of observed and generated streamflow by the ECG method are shown with box plots in Figure 3. The relative error (RE), defined as RE = (Sm – Xo)/Xo, where Sm is the median of generated statistic and Xo is the observed statistic, for each statistic is shown in Table 2. The ECG method performed well in preserving the mean, standard deviation, and skewness, since all the statistics fell in the box plot. The RE for mean and standard deviation was under 5% and that for skewness was under 10% for all months. In addition, there seemed to be some underestimation of standard deviation and skewness for this simulation, since negative relative error was obtained for 9 and 11 months, respectively.

Figure 3.

Box plots of basic statistics of observed and generated monthly streamflow by the ECG method. Continuous lines with star marks for each month represent statistics of historical record. Mn, Sd, Sk, Lag-1, Max and Min represent the mean, standard deviation, skewness, lag-1 correlation, maximum and minimum values.

Table 2. Relative Error (%) of Generated Statistics for Each Montha
Statistics123456789101112
  • a

    Mn, mean; Sd, standard deviation; Sk, skewness; Lag1, lag-one correlation; Max, maximum values; and Min, minimum values.

MN−0.2−0.60.50.1−0.30.70.30.20.6−0.8−0.1−0.2
Sd−1.7−1.6−0.4−1.8−0.7−1.5−2.50.01.3−0.80.0−0.9
Sk−8.5−3.5−3.1−0.75.5−3.3−2.7−2.9−5.1−5.2−4.2−2.1
Lag1−6.1−2.512.916.75.81.42.33.0−5.015.6−0.1−8.4
Max−1.9−1.8−0.6−1.8−2.5−2.91.7−5.3−6.9−9.3−3.0−1.3
Min2.6−4.1−3.9−12.715.00.8−16.8−0.8−26.8−16.837.6−1.1

[27] Generally the lag-one correlation was preserved well, though for certain months (e.g., October with RE of 15.6%) the observed statistic did not fall in the box. Compared with the entropy method by Hao and Singh [2011] using the same datasets, there seemed to be no significant difference in the preservation of mean, standard deviation and skewness between the two methods, while the entropy method performed relatively better than the entropy-copula method in preserving the lag-one correlation. However, when the Spearman and Kendall (rank) correlations were also used for measuring the dependence structure, results from the entropy-copula method, as shown in Figure 4, preserved the two correlations well, while the entropy method did not preserve as well (not shown). These results showed the outperformance of entropy-copula method in preserving the nonlinear dependence. The maximum and minimum values were generally preserved well for most months (with RE under 5% and 20% for maximum and minimum values for most months), though overestimation or underestimation for certain months occurred. Results of the EECG method in preserving these statistics of each month were similar to those by the ECG method and thus are not presented.

Figure 4.

Box plots of Spearman and Kendall correlations of observed and generated monthly streamflow by the ECG method.

[28] Statistics of observed and generated streamflow at the annual time scale by the ECG and EECG methods are shown in Figure 5. The mean and standard deviation were preserved well by both methods. Neither ECG nor EECG method preserved the skewness well. The reason may be that simulation errors from both the mean and standard deviation would affect the simulated skewness. However, the two methods differed significantly in preserving the lag-one correlation. The EECG method preserved the lag-one correlation well, while the ECG method did not perform as well. The reason was that the lag-one correlation of annual streamflow was not incorporated in the ECG model, while for the EECG model this property was incorporated through the aggregate variable. For the preservation of maximum and minimum values, the ECG and EECG methods performed relatively well with slight overestimation.

Figure 5.

Box plots of basic statistics of observed and generated annual streamflow by the (a) ECG method and (b) EECG method. Units for Mn, Sd, Max and Min are in 103 CMS.

3.3. Interannual Dependence

[29] The interannual dependence between streamflows of seasonal and annual time scales was also assessed for the EECG method. Box plots of lag-one and lag-four interannual dependence between streamflow of a specific month (seasonal time scale) and streamflows of the previous 12 months (annual time scale) of the generated streamflow are shown in Figure 6. It is seen that the lag-one correlation was preserved well for all months except for February, as expected from the structure of the EECG method. The lag-four correlation, although not directly included in the model, was also preserved well for most months.

Figure 6.

Box plots of lag-one and lag-four interannual dependence of observed and generated monthly streamflow by the EECG method.

4. Summary and Conclusion

[30] An entropy-copula method is proposed for single-site monthly streamflow simulation and is shown to preserve statistics of monthly streamflow well. The entropy-based marginal distribution with the first four moments as constraints is capable of modeling the complex properties (such as high skewness) of the underlying streamflow data. The mean and standard deviation of the generated streamflow at the annual time scale can also be preserved well. The extended entropy-copula method is shown to preserve interannual dependence well. The preservation of the lag-one correlation at the annual scale can also be improved by the extended method. However, the skewness at the annual time scale is generally not preserved well for both methods.

[31] Compared with the entropy method, the proposed entropy-copula method is easier to carry out in terms of parameter estimation and enables the characterization of nonlinear dependence structures due to the copula component. The use of higher moments (3rd and fourth moments in this study) as constraints to derive the marginal distribution relies on the accurate estimation of higher moments. Thus the proposed method is preferable for streamflow simulation with relatively long observation record.

[32] The entropy method provides a way to derive the marginal distribution of hydrologic variables, including some abnormal properties. For instance, to model multimodal property, the entropy-based distribution with moments as constraints provides an alternative way to resolve this property. The copula method offers the flexibility to model different (nonlinear) dependence structures of hydrologic variables through a variety of copula families. A combination of these two theories, entropy and copula, enables the modeling of complex properties of the marginal distribution and different dependence structures of data under investigation. One possible limitation of the entropy-copula method is that many Lagrange multipliers may be needed to derive the marginal distribution for modeling certain properties (e.g., multimode in the distribution) of the data. For the cases where a large number of Lagrange multipliers are involved in the entropy-based marginal distribution, the inference functions for margins (IFM) method for parameter estimation would reduce the burden of computational complexity. In addition, a potential limitation of the entropy-based marginal distribution would be that the desired property needs to be expressed in the form of constraints in order to derive the suitable marginal distribution. The connection between the properties (such as extreme values) of the data under investigation with the form of constraints needs further study.

[33] The entropy-copula framework can be applied and extended to higher dimensions for hydrologic modeling with the entropy-based marginal distribution to model the marginal properties and the copula-based joint distribution to model the dependence structure of the data. For certain hydrologic application (e.g., rainfall simulation), the copula method has made its way for practical use. With the entropy-based marginal distribution, the entropy-copula framework would also be applicable as a complement to the current study.