Entropy theory-based criterion for hydrometric network evaluation and design: Maximum information minimum redundancy

Authors


Abstract

[1] Hydrometric information constitutes the fundamental input for planning, design, operation, and management of water resources systems. How to optimally site monitoring gauges such that they are effective and efficient in gathering the hydrometric information or data has received considerable attention. This paper presents a generic approach for the design (or evaluation) of hydrometric networks. First, an entropy theory-based criterion, named as maximum information minimum redundancy (MIMR), is proposed. The MIMR criterion maximizes the joint entropy of stations within the optimal set, and the transinformation between stations within and outside of the optimal set. Meanwhile, it insures that the optimal set contains minimum duplicated information. An easy-to-implement greedy ranking algorithm is developed to accomplish the MIMR selection. Two case studies are presented to illustrate the applicability of MIMR in hydrometric network evaluation and design. We also compare the MIMR selection with another entropy-based approach. Results illustrate that MIMR is apt at finding stations with high information content, and locating independent stations. The proposed approach is suitable for design (or evaluation) of any type of hydrometric network.

1. Introduction

[2] Hydrometric information, which is mainly collected by monitoring networks, constitutes the fundamental input for planning, design, operation, and management of water resources systems. Optimally siting of monitoring gauges such that they are effective and efficient in gathering data (information) has received considerable attention. Although there are myriad concerns in hydrometric network design, this study focuses on the fundamental theme, i.e., selecting an optimum number of stations and their optimum locations. Many approaches have been developed for that purpose. A comprehensive review can be found in the work of Mishra and Coulibaly [2009]. Among others, one type of approach is based on entropy theory. The merit of entropy theory is that it directly defines information and quantifies uncertainty [Harmancioglu and Singh, 1998; Mogheir et al., 2006]. A significant body of literature employing entropy theory for the design of monitoring networks has been reported, as briefly reviewed in what follows.

[3] Caselton and Husain [1980] and Husain [1987] used an information maximization principle for identifying optimum locations of rainfall gauges to be retained in a dense network. When an existing network is sparse, Husain [1989] proposed a methodology for expanding it by means of information interpolation. Krstanovic and Singh [1992a]developed an entropy-based approach for hydrologic network evaluation. This approach was then used to spatiotemporally evaluate the rainfall network in Louisiana [Krstanovic and Singh, 1992b]. Yang and Burn [1994] presented a method for data collection design in which a concept of directional information flow was employed. Also in terms of entropy theory, Mogheir et al. [2006] evaluated the optimality of the groundwater quality monitoring network in Gaza Strip, Palestine. Mishra and Coulibaly [2010] assessed streamflow network in different Canadian watersheds using entropy, joint entropy and transinformation.

[4] Common with the aforementioned studies is that entropy terms were computed using univariate or bivariate formulations. However, joint information retained by multiple stations and their dependence are always required for a more objective evaluation. A few studies did use multivariate distributions but assumed that the data were normally or lognormally distributed as, for example, did Husain [1987, 1989], and Krstanovic and Singh [1992a, 1992b]. This assumption is debatable, since many natural phenomena are heavy tail distributed, like streamflow and precipitation [Bernadara et al., 2008; Carreau et al., 2009; Li et al., 2012].

[5] Similar to Krstanovic and Singh [1992a, 1992b], Alfonso et al. [2010a]introduced several adaptations to make the entropy-based method applicable to the design of water level monitors for highly controlled polder system in the Netherlands. They first used multivariate total correlation to assess the performance of three pairwise dependence criteria. The selection of optimal monitors was restricted to low dimensional analysis (less than 2).

[6] Later, Alfonso et al. [2010b] proposed another criterion by maximizing multivariate joint entropy and minimizing total correlation for optimally siting water level monitors. Yet, this approach failed to account for the information transition ability (transinformation) of a network. It is acknowledged that transferring hydrologic information from points where it is available to those where it is required is one of the purposes of collecting hydrometric information [Harmancioglu and Yevjevich, 1987].

[7] Moreover, optimally siting hydrometric monitors is a multiobjective problem. Solving this problem is tricky in practice. Alfonso et al. [2010b]exploited a genetic algorithm to approach the multiobjective optimization. The advantage of multiobjective optimization is that it provides different feasible solutions under different scenarios. Nevertheless, selection of the final network is not straightforward. To assist the decision making processes, it is quite admirable to find an easy-to-implement way to solve the multiobjective problem and to provide the end user a unique solution with decent performance.

[8] Our objective therefore is to develop an easy-to-implement approach for the design (or evaluation) of hydrometric networks. To that end, we first propose an entropy theory-based criterion, named as maximum information minimum redundancy (MIMR), satisfying three norms: (1) maximum overall information (joint entropy), (2) maximum information transition ability (transinformation), and (3) minimum redundant information (total correlation). These entropy terms are calculated at multivariate level without any distributional assumption. Thereby interactions among stations can be properly accounted for. Then we present a straightforward greedy selection algorithm to rank the candidate gauges based on MIMR. During selection, the three commensurable norms are additively unified, which circumvents the complexity of multiobjective optimization, while preserving its advantage in achieving different feasible solutions through information-redundancy tradeoff weights.

[9] The paper is organized as follows. Formulating the objectives of the study in this section, basic entropy theory is briefly presented in section 2 for the ease of understanding the MIMR criterion, which, together with a selection algorithm, is discussed in section 3. Through two case studies, section 4 illustrates the applicability of MIMR to the evaluation and design of hydrometric networks. Merits and demerits of MIMR are discussed in section 5. Conclusions are generalized in section 6.

2. Entropy Theory

2.1. Basic Entropy Terms

[10] Marginal entropy, joint entropy, transinformation, and total correlation constitute basic information measures commonly used in hydrometric network evaluation and design. Let [ inline image, inline image, … , inline image] denote a discrete random vector with joint probability mass function inline imageand one-dimensional margins, respectively, inline image, inline image, … , inline image for inline image, inline image, … , and inline image. In hydrometric network evaluation and design, one usually wants to know: (1) How much information is retained by a random variable (station)? (2) What is the information conveyed by several variables (stations) together? (3) How much information of a random variable (station) can be inferred from the knowledge of another one? (4) What is the duplicated information among several variables (stations)?

[11] The first question can be answered by marginal entropy. The marginal entropy of a discrete random variable measuring the information retained by it is defined as

display math

where, for simplicity, subscript of inline image is suppressed. The sum notation means summation over all possible outcomes of inline image. All sum notations in what follows hold the same meaning, if no lower or upper limit is specified.

[12] To answer the second question, joint entropy is defined as a measure of the overall information retained by random variables. The bivariate joint entropy of inline image and inline image is defined as

display math

There is a natural extension for multivariate joint entropy from bivariate if one wants to know the information retained by more than two random variables. The definition of multivariate joint entropy is

display math

The joint entropy is symmetric with respect to its arguments. Quantitatively, joint entropy is less than or equal to the sum of its one-dimensional marginal entropies. The equality holds if and only if the random variables are stochastically independent.

[13] Concerning the third question, transinformation (also referred to as mutual information) provides the answer. Transinformation of inline image and inline image, measuring the information of inline image (or inline image) inferred from that of inline image (or inline image), is computed as

display math

Transinformation provides a general measure of dependence between random variables. It is superior to the Pearson correlation coefficient, since it captures both linear and nonlinear dependence, whereas the Pearson correlation coefficient is only suitable for linear relationships, or more generally, for spherical and elliptical dependence structures.

[14] To assess the redundancy of a hydrometric network, one is usually interested in the amount of duplicated information among a set of stations, which can be measured by total correlation [Watanabe, 1960]. Mathematically, the total correlation is defined as

display math

Total correlation is also symmetric, facilitating its computation through grouping property [Kraskov et al., 2005; Alfonso et al., 2010a, 2010b]. Total correlation inline image is nonnegative. It assumes 0 if and only if all random variables are independent. Otherwise, inline image will be greater than 0. Total correlation answers the fourth question.

2.2. Continuous Time Series Discretization

[15] The entropy terms can be computed by discretizing hydrometric time series data collected at stations. Several methods are available for continuous data discretization. Among others, the histogram method and mathematical floor function are frequently used [Caselton and Husain, 1980; Husain, 1987; Mishra and Coulibaly, 2010; Alfonso et al., 2010a, 2010b].

[16] When applying the histogram discretization, it is common to assume a somewhat arbitrary number of bins, like 10, 15, and 20. Yet, this strategy is questionable, since discrete entropy terms are sensitive to the bin size. Optimal bin size estimators can be used, like those in the work of Scott [1979], Freedman and Diaconis [1981], Birgé and Rozenholc [2006], Shimazaki and Shinomoto [2007]. These estimators are derived based on different criteria, nevertheless, none of them are completely convincing in that one has been shown to be better than others. Given a data sample, different estimators may lead to different bin size estimates, which in turn will lead to different entropy values.

[17] The subjective determination of the bin size is addressed to some extent, with the use of a mathematical floor function [Ruddell and Kumar, 2009; Alfonso et al., 2010a, 2010b]. Through a mathematical floor function, a continuous value inline image is converted to its nearest lowest integer multiple of a constant inline image, i.e.,

display math

where inline image represents the conventional mathematical floor function. The advantages of mathematical floor function include: (1) it avoids the choice of a parametric distribution to fit the continuous data; (2) it can incorporate physical considerations in that the resolution of inline image should be no less than uncertainties involved in the continuous data, as will be illustrated in section 4.2. However, how to physically determine an appropriate inline image is not always explicit in all situations. Then parameter inline image might be empirically selected through trial and error, as will be illustrated in section 4.1. Generally, it should be neither too large nor too small. Rules of thumb can be used to guide the selection of parameter inline image: (1) it should guarantee that all candidate stations have significant and distinguishable information contents; (2) the spatial and temporal variability of time series of stations should be preserved as much as possible before and after discretization; and (3) the selected stations should be stable as much as possible, when inline image fluctuates within some interval centered near its optimal value.

[18] One point worth noting is that after applying the mathematical floor function the marginal entropy is no longer a measure of information of the continuous random variable inline image, but the information of inline image rounded to its nearest lowest integer multiple of a constant [Papoulis and Pilli, 2001]. Analogous meanings hold for other entropy terms. In the context of hydrometric network design, we do not need to precisely quantify the information retained by stations. A reasonable approximation is sufficient as long as the relative relationship among stations can be preserved in terms of information content. Indeed, treating a continuous time series as a discrete pulse signal is reasonable, considering the fact that in practice any records are subject to noise, round off errors and errors caused by measurement instruments. It is necessary to filter out such high-frequency low-amplitude fluctuations.

2.3. Discrete Variable Merging

[19] Multivariate joint entropy and total correlation can be computed at any dimension with the aid of discrete variable merging. The basic idea for variable merging lies in creating a new variable inline image such that the information retained by which is equal to that of the original variables, say inline image, inline image, … , inline image. Consider merging two discrete variables as an example. If it is assumed that inline image = [1, 2, 1, 2, 1, 3, 3]T and inline image = [1, 2, 2, 2, 1, 3, 2]T, then the new variable inline image can be obtained by pairwise welding the corresponding digits together [Alfonso et al., 2010b], i.e., inline image = [11, 22, 12, 22, 11, 33, 32]T. It can be verified that the information amount keeps invariant before and after merging.

[20] The direct welding approach, however, has a deficiency which may cause the so called problem of “out of memory” as the number of variables to be merged increases, especially when the sample size is large. An adapted alternative to the direct welding approach, which avoids the above problem, is generalized in what follows:

[21] Algorithm 1

[22] 1. Create a new sample inline image from inline image and inline image by the direct welding approach.

[23] 2. Pick out the unique values in inline image and rank them in ascending order, resulting in a ranked sample Xr with a length of l.

[24] 3. Access the location index of each element of inline image in the ranked sample inline image.

[25] 4. Assign each element of inline image a new label as its location index obtained in step 3.

[26] After this, each element in the new merged sample is relabeled by an integer ranging from 1 to inline image. Remember that the merging approach is only suitable for discrete variables. The variable merging approach satisfies the law of association and commutation in terms of information content. Taking merging 3 variables as an example, according to the law of association and commutation, the following equalities are satisfied:

display math

where inline image denotes the merging operator. Assuming inline image = [1, 1, 2, 2, 1, 3, 3]T and applying algorithm 1 yields

display math
display math
display math

[27] In terms of variable merging, the multivariate joint entropy, inline image, can be computed by sequentially applying algorithm 1, i.e.,

display math

[28] Concerning the computation of total correlation, one can proceed in two different ways. One is to use the shortcut formula in equation (5). The other one is to apply the grouping property [Kraskov et al., 2005; Alfonso et al., 2010a, 2010b] together with algorithm 1, as shown in the following:

display math

Since the total correlation at the bivariate level reduces to transinformation, equation (9) indicates that the n-dimensional total correlation is factorized as a summation of traditional transinformation values.

[29] With the aid of variable merging, we can define the multivariate transinformation between single and grouped variables, and between two grouped variables. The first type multivariate transinformation, inline image, measures the information amount of a single variable which can be inferred from that of variables in the group, and vice versa. Transinformation between grouped variables, inline image, measures the common information shared by the two grouped variables. The above two types of multivariate transinformation can be easily computed by first merging variables in the group and then applying the definition of bivariate transinformation in equation (4).

3. MIMR Criterion and Its Implementation

3.1. MIMR Criterion

[30] The idea of MIMR criterion lies in selecting (or ranking) stations from a candidate set, through which, the selected stations can: (1) maximize the overall information, (2) maximize the information transition ability, and (3) minimize the redundant information.

[31] Assume there are inline image candidate stations. For each station, there are several years of records of the variable of interest denoted by inline image, such as streamflow. Let inline image be the set of stations already selected into the optimal network whose elements are denoted as inline image, inline image, … , inline image. Similarly, let inline image be the set of candidate stations to be selected whose elements are denoted as inline image, inline image, … , inline image. Apparently, the sum of inline image and inline image is equal to inline image. The amount of overall information retained by inline image can be quantified by multivariate joint entropy

display math

The information transition ability of inline image can be measured by the sum of transinformation between grouped variables in inline image and each station in inline image, denoted as inline image for simplicity,

display math

Also, it can be measured by the transinformation between grouped variables in inline image and inline image, denoted as inline image,

display math

The redundant information of the optimal set inline image can be quantified by the total correlation

display math

Then, the MIMR criterion-based objective functions are formulated as

display math

or

display math

The objectives in equation (14a) [or equation (14b)] indicate that an optimal network should convey effective information as much as possible, while retaining redundant information, if any, as little as possible.

[32] To circumvent the complexity of multiobjective optimization and facilitate end user's decision making, the three objectives can be additively unified considering that they are commensurate. Hence, the multiobjective optimization reduces to single-objective optimization. The integrated objective function corresponding toequation (14a) is

display math

where λ1 and inline image, whose summation is 1, respectively, are the information-redundancy tradeoff weights. The purpose of tradeoff weights is to give users the possibility to include additional knowledge or their preference. Moreover, varying tradeoff weights can achieve different feasible solutions under different scenarios. Therefore, the advantage of multiobjective optimization is retained to some extent. Similarly, the analogous integrated objective function corresponding toequation (14b) is

display math

3.2. Selection Algorithm

[33] Using the MIMR criterion, a selection algorithm for the design of hydrometric networks can be generalized in what follows:

[34] Algorithm 2

[35] 1. Collect data of hydrometric variable of interest at each candidate station;

[36] 2. Discretize the continuous time series data by equation (6);

[37] 3. Initialize the optimal set inline image as an empty set and the candidate set inline image as the one containing all candidate stations;

[38] 4. Identify the central station as the one with maximum marginal entropy among all candidates;

[39] 5. Update sets inline image and inline image;

[40] 6. Select the next optimal station from inline image by the MIMR criterion. In this step all stations in inline image are scanned sequentially to search the one satisfying equation (15a) [or equation (15b)];

[41] 7. Repeat steps 5 and 6 iteratively until the expected number of stations have been selected.

[42] A pseudo code for algorithm 2 is given in Table 1. Convergence of the selection can be determined by the ratio of joint entropy of the selected stations to that of all the candidates. If the ratio is over a threshold, like 0.90, the selection stops. If no threshold is provided, then all candidate stations will be ranked in descending order of priority. To reduce the implementation effort, a MATLAB package HydroMIMR was developed.

Table 1. Pseudo Code for the MIMR-Based Greedy Selection Algorithm
Commands OutlineComments
1: F ← candidate set including all candidate stationsInitialize candidate set F and empty set S
  S ← empty set
2: Discretize the continuous time series. 
3: tInfoH(F) ← Equation (8)Compute the total information ( inline image) of all the potential stations
4: For i = 1: NCompute the marginal entropy of each potential station
    inline imageEquation (1)
  End
5:  inline imageinline imageSelect the first center station
6:  inline imageUpdate inline image and inline image for the first time
   inline image
7: For i = 2: inline imageSequentially select station from the updated candidate set according to MIMR criterion
    inline image ← size inline image
    inline image ← size inline image
   For inline image = 1:m
     inline imageEquation (14)
   End
    inline imageUpdate the candidate set and already selected set successively
    inline image
    inline image
  End
8: For i = 1: inline imageDetermine the final optimal set inline image according to the information fraction of the selected set to the total information. Joint entropy of selected stations is denoted by inline image
    inline imageinline image
    inline imageinline image
   If inline image
     inline imageinline image
    return
   End
  End

[43] Other than forward selection, the optimal stations can also be determined in an opposite direction. For backward selection, the criterion should be changed to minimum information reduction and maximum redundancy reduction, which is also based on the principle of MIMR.

4. Application

[44] Two case studies are presented in this section to illustrate the applicability of MIMR in evaluating and designing a hydrometric network. The first case shows how to evaluate an existing streamflow gauge network. The second case demonstrates the way to optimally select stations from a dense polder water level monitoring system.

4.1. Case 1: Streamflow Network Evaluation

4.1.1. Study Area and Data

[45] This case study aims to rank the importance of streamflow gauges located on the mainstream of the Brazos River in Texas. The Brazos River, with a drainage area of about 118,000 km2, extends from eastern New Mexico, and flows for more than 1000 km southeasterly across Texas, to the Gulf of Mexico (Figure 1) [Wurbs et al., 2005]. Monthly scale streamflow observations from 1990 to 2009 of 12 USGS stream gauges were selected for this study, as depicted in Figure 1. Notice that only gauges on the mainstream were chosen. Evaluating gauges separately for each tributary and the mainstream is reasonable, since they belong to different geophysical hydrological units and pooled evaluation would lead to irrational results. Reallocated station identification number (ID) sequentially from upstream to downstream and the corresponding summary statistics of streamflow are given in Table 2. One major finding was the increased variability of streamflow from upstream to downstream as signified by the increased variance. From the perspective of information theory, generally the larger the variance of a station the more information (or uncertainty) it contains. In this sense, the importance of each station also increases from upstream to downstream. Selecting stations simply based on the variance (or standard deviation), however, cannot make sure that the transinformation between selected and unselected stations is high and the redundant information among the selected stations is low.

Figure 1.

Location map of the Brazos River basin and streamflow gauges on the mainstream.

Table 2. USGS Code, Reallocated ID and Summary Statistics of Streamflow for Each Gauge in this Studya
USGS CodeIDMeanVarianceSDMaximumMedianMinimum
  • a

    Streamflow is measured in m3 s−1.

0808250017.13181.6513.4799.252.470.00
08088000218.961415.0737.62254.485.190.00
08088610318.921588.4339.86245.206.510.77
08089000421.991864.0343.17256.667.780.96
08090800528.093293.3957.39377.188.170.75
08091000631.684502.9767.10423.629.000.35
08093100744.577277.0985.31652.1416.690.65
08096500871.7915493.18124.47807.0227.820.94
08098290989.7920146.78141.94904.1635.253.04
0811150010235.4784091.05289.981646.06104.3010.07
0811400011248.8890336.52300.561727.61117.8711.88
0811665012258.96101733.94318.961998.03135.716.58

4.1.2. Results

[46] Prior to the MIMR ranking, the continuous streamflow time series for each station was discretized using equation (6). A too small value of inline image would the make the information content retained by every station indistinguishable, leading to inconvenience for further analysis. As illustrated in the top plot of Figure 2, the information content measured by marginal entropy was similar among stations due to a small value of inline image (1.0 m3 s−1). On the other hand, an over large inline image would make stations with relatively small values insignificant. As in the bottom plot of Figure 2, the large inline image (1000 m3 s−1) compiled stations with small discharges to be noninformative (e.g., gauges with ID 1, 2, and 3). That is because equation (6) with large inline imagecan be considered as a “high-pass-filter,” by which values below a high threshold will be converted to the same integer. In both of the above cases, the spatial variability of streamflow was distorted after discretization. From the middle plot ofFigure 2 corresponding to an appropriate value of 150 m3 s−1 for inline image, which is determined by trial and error, one can see that all stations were informative and distinguishable. The spatial variability of streamflow among stations is also preserved. To confirm the rationale of 150 m3 s−1 for inline image, a sensitivity analysis was carried out by comparing station ranks corresponding to different integers of inline image varying from 130 m3 s−1 to 170 m3 s−1. Results showed that station ranks obtained by MIMR were stable. The top 9 selected stations kept invariant no matter how inline image changed. These observations together empirically justified the rationale for the value of 150 m3 s−1 for inline image.

Figure 2.

Marginal entropy after discretization and standard deviation at the original scale of streamflow of each station. The discretization parameters are 1.0 m3 s−1, 150.0 m3 s−1 and 1000.0 m3 s−1 for each plot from top to bottom, respectively.

[47] Marginal entropy map, multivariate joint entropy and total correlation provide an overall picture of the behavior of information content of all candidate stations. Figure 2offered an implicit entropy map. The station with minimum marginal entropy was located at the most upstream on the river, whereas the one with maximum entropy was found at the most downstream location. Also we detected an increasing trend of information content from upstream to downstream, which was consistent with the empirical conclusion achieved based on the streamflow variance. Following this, we suspected that the information amount increased as the controlled drainage area accumulated. The joint entropy of all stations was 4.10 bits, representing the maximum information amount that could be extracted from them. The total correlation was 8.78 bits, signifying the amount of duplicated information among them. The sum of marginal entropies of all stations was 12.88 bits, which was approximately 47% larger than the total correlation, implying the gauging system was of an information-scattered type.

[48] With information weight λ1 adopting a value of 0.8, the MIMR selection was implemented to rank the 12 gauges. Justification for the value 0.8 for λ1 would be detailed in section 4.1.4. Results of MIMR selection are presented in Table 3. We noted the following: (1) as expected, the “central” station, with ID 12, was the one with maximum marginal entropy located at the most downstream area; (2) unlike that derived from marginal entropy or variance, the MIMR based priority rank did not exhibit an increasing pattern from upstream to downstream; (3) the joint entropy of stations in the optimal set did not always increase as more and more stations were added; (4) the top 6 stations explained roughly 80% of the total information retained by the 12 stations; (5) generally, multivariate transinformation inline image and inline image first increased and then decreased as selection proceeded; and (6) the total correlation increased as a new station was added to the optimal set.

Table 3. Joint Entropy, Transinformation, and Total Correlation of Ranked Streamflow Gauges Obtained by MIMR With λ1 = 0.8 and by WMPs
 Iteration Step
123456789101112
MIMR
ID126182347591011
H2.472.842.873.213.233.233.233.323.333.523.934.10
T16.577.127.767.427.136.786.375.815.244.122.28 
T22.172.542.592.772.782.772.772.762.582.322.28 
C0.000.270.341.091.401.752.162.823.384.536.508.78
 
WMP1
ID1297654321   
H2.473.073.213.363.373.373.383.383.38   
T16.577.226.896.816.255.845.515.175.07   
T22.172.592.662.802.722.692.702.622.58   
C0.000.731.341.822.382.793.133.473.57   
 
WMP2
ID1297654321   
H2.473.073.213.363.373.373.383.383.38   
T16.577.226.896.816.255.845.515.175.07   
T22.172.592.662.802.722.692.702.622.58   
C0.000.731.341.822.382.793.133.473.57   
 
WMP3
ID129611108521   
H2.473.073.263.573.864.014.024.064.08   
T16.577.227.265.293.202.311.791.561.46   
T22.172.592.762.651.411.110.980.980.95   
C0.000.731.203.325.426.366.927.227.32   

[49] Observation 2 indicates that siting gauges simply based on marginal entropy cannot obtain an optimal network, but leads to the one with either low transinformation or high duplicated information. Concerning observation 3, theoretically, the joint entropy is nondecreasing as more stations are added. It is invariant if and only if the information of new added stations is thoroughly duplicated among those already selected. The invariant joint entropy over steps 5, 6 and 7 is not surprising, if one looks at the close locations of the selected stations (Figure 1) during these steps (ID 2, 3, and 4). The decreased transinformation in observation 5 is explained by realizing that fewer and fewer stations were left in the candidate set as selection proceeded. Figure 3 (MIMR) shows the spatial locations of the top 6 selected stations. They are nearly uniformly distributed along the stream, which is desirable for obtaining the area mean value. Even though the spatially averaged streamflow is of little practical interest, the MIMR criterion is generic for any hydrometric network evaluation and design, of course suitable for siting gauges across a rainfall field, for which the area mean value is important.

Figure 3.

Spatial distribution of the top 6 streamflow gauges ranked by MIMR with inline image and by WMPs.

4.1.3. Comparison With WMPs

[50] We compared the performance of MIMR selection with another entropy-based method developed byAlfonso et al. [2010a], i.e., water level monitoring design in polders (WMP), whose objective function is generalized as:

display math

where inline image and inline image represent stations in sets inline image and inline image, respectively. Other than transinformation, another two criteria were used by WMP to measure the dependence of stations. One was the directional information transfer index

display math

The other one was also the directional information transfer index but from inline image to inline image, i.e.,

display math

For simplicity, WMPs with transinformation, inline image and inline image were denoted by WMP1, WMP2 and WMP3, respectively. The objective functions for WMP2 and WMP3 can be obtained by replacing inline image in equation (16) with inline image and inline image, respectively. A brief description of the step by step selection of WMPs can be found in the Appendix. Be aware that the objective function of WMPs is generalized in a way such that their selection concerns could be explicitly embodied. Implementation of WMPs should strictly follow procedures in the Appendix.

[51] A summary of entropy terms associated with WMPs is tabulated in Table 3. First, we compared the performance of MIMR and WMPs in finding stations with high information content. The joint entropy of MIMR-ranked stations was a little bit smaller than that of stations obtained by WMPs. The top 6 stations selected by MIMR explained approximately 80% of the total information and 82% of which was explained by the same number of top stations derived from WMP1 and WMP2. The performance of MIMR was comparable to WMP1 and WMP2 from the point of view of joint entropy. For WMP3, the top 6 stations explained 98% of the total information. The above comparison might seem against MIMR. A network is considered as optimal not only because of its high joint entropy, but it is better to balance its performance in the three respects mentioned insection 3.

[52] Second, we compared the performance of MIMR and WMPs in locating stations with high information transition ability. By looking into inline image in Table 3, one can see that MIMR outperformed WMPs, indicating the decent information transition ability of the network obtained from MIMR. From the point of view of inline image, the performance of MIMR was not always superior to that of WMPs. This is unsurprising since in this study MIMR used function (14a) as an objective in which inline image was not accounted for. Actually even in this case, MIMR outperformed WMPs after more than 4 stations were selected. The above analysis also suggests a hydrometric network with high joint entropy does not mean that it also has decent information transition ability.

[53] Third, we evaluated the performance of MIMR in searching independent stations. Total correlation of the top 6 stations selected by MIMR occupied less than 20% of the duplicated information among the 12 stations. The percentages for the same number of top stations derived from WMPs were as high as values ranging approximately from 32% to 72%, as inferred from Table 3. Apparently, MIMR is more effective in searching independent stations.

[54] Finally, from the spatial distribution of the top ranked stations by different approaches (Figure 3), an interesting observation emerged. Compared with stations selected by MIMR, stations obtained from WMPs clustered along the stream, explaining why they contained more duplicated information. It is desired to screen the clustering effect of a data collection network. In this sense, MIMR should gain more confidence.

4.1.4. Sensitivity Analysis of λ1

[55] Since the primary purpose of data collection is to obtain information, the sensitivity analysis was only carried out for information weight falling between 0.5 and 1.0. Results are summarized in Table 4, which mainly signify the stability of MIMR with respect to information weight. In this case, the stability of MIMR was mainly because of the relatively small number of candidate stations. When an existing network is dense, it is cautioned that paying special attention to the tradeoff weights is required.

Table 4. IDs of Selected Stations Step By Step Using the MIMR Criterion With Different Information Weights
λ1Iteration Step
123456789101112
0.5126123475891011
0.6126123745891011
0.7126127345891011
0.8126182347591011
0.9126182347591011
1.0126182347591011

[56] As previously stated, the tradeoff weights should reflect the user's knowledge and preference about the hydrometric network under consideration. In practice, however, such kind of prior information or experience may not always be available. Even if it is available, selecting optimal weights is still challenging. One question arises: how to determine a suitable information weight without prior knowledge about the system? Investigating the performance of MIMR with different tradeoff weights can achieve useful hints. As inferred from Figure 4, a more informative but less independent network can be obtained by increasing the information weight. Maximum information and minimum redundancy are two conflicting objectives. It is better to balance them in order to reach a satisfactory result. The information-redundancy tradeoff weights provide a flexible “handle” to control this balance. As seen fromFigure 4, a value of 0.8 for information weight led to a decent balance between information and redundancy, which provided a justification for selecting 0.8 for λ1 in this case study.

Figure 4.

Behavior of different entropy terms of stations selected by MIMR with different information-redundancy tradeoff weights and by WMPs.

[57] The behavior of different optimality measures associated with WMPs is also presented in Figure 4. It was noted that as the information weight increased, MIMR moved toward WMP1 and WMP2. MIMR performed better in maximizing the information transition ability and avoiding redundant stations. Concerning the joint entropy, MIMR performed slightly worse than WMPs. An interesting result of MIMR worth noting is the smaller total correlation (redundancy) than that related to WMPs even when the information weight was 1.0. A value of 1.0 for information weight means that the objective of minimizing redundancy was excluded from the selection. Figure 4 inspires a viable approach to guide the selection of tradeoff weights. Using WMPs as benchmark methods, the information weight is modulated until a satisfactory result is obtained.

4.2. Case 2: Polder Water Level Monitoring Network Design

4.2.1. Study Area and Data

[58] Water level time series of a highly controlled polder system reported by Alfonso et al. [2010a]were used in this case study. The polder system is located in a low-lying region of Pijnacker, Delfland, Netherlands. There were in total 1520 potential monitoring points along the canals separated by a distance of 15 m on average, as depicted inFigure 5. Water level time series for these points were generated from a hydrodynamic model built by the Delfland Water Board. Driven by a given storm event, the model was run for a simulation period of 10 days at a time step of 15 min, resulting in 1520 synthetic water level time series, each with a length of 960. At each computational point, only the first 792 records were used in the present analysis due to data availability.

Figure 5.

Marginal entropy map of the 1520 calculation points used in this study. Points with null entropy are highlighted with black points.

4.2.2. Results

[59] The generated continuous water level time series were first discretized by equation (6) with a value of 5 cm for parameter inline image. This value was determined based on physical considerations that water level variations smaller than 5 cm were caused by wind, ship movement, or dynamic waves generated by the operation of pumping stations [Alfonso et al., 2010a]. These kinds of variations should be treated as noise. The role of the mathematical floor function was to filter out such noise to produce a noise-free pulse signal. This example illustrates the advantage of mathematical floor function in incorporating physical considerations.

[60] The marginal entropy for each simulation point was superimposed over the location map in Figure 5. The 16 null entropy points were highlighted by black points. These points corresponded to pumping stations discharging to big storage bodies, where fixed boundary conditions for water level should be satisfied. Generally, points with static observations can be determined in advance whether or not to be retained in the final monitoring set based on management considerations. Therefore, they were left out from the following analysis.

[61] Figure 5provided an overall picture for the information content of the polder system. Besides the null information points, a low information area was identified in the south part of the system. A short canal segment with extremely high information content was found in the northwest part, implying the suitable area for siting the “central” monitor. Excluding the null information points, the joint entropy of the 1504 monitors was 9.09 bits, which represented the maximum information amount that can be obtained from the system. The total correlation was 3411.49 bits, which quantified the duplicated information of the system. The summation of the marginal entropies was 3420.58 bits, which was almost equal to the amount of duplicated information, suggesting the synthetic network was of an information-clustered type.

[62] A summary of entropy terms for the top 10 monitors obtained by MIMR with a value of 0.8 for the information weight are presented in Table 5. The joint entropy reached a plateau after 9 monitors were selected, which was used as the convergence indicator. The top 9 monitors explained about 84.9% of the total information, whereas roughly 85% of which was explained after adding one more station. The increment was no more than 0.1%, indicating it was reasonable to choose the top 9 monitors as the final set. Transinformation inline image was much larger than inline image. This was expected, since the duplicated information in inline image was filtered out in the variable merging process.

Table 5. Joint Entropy, Transinformation, and Total Correlation of Monitors Obtained by MIMR With λ1 = 0.8 and by WMPs
 Iteration Step
12345678910
MIMR
H4.255.536.376.927.127.257.577.707.727.73
T12581.582942.343126.603201.553215.893229.463281.503306.453306.403306.19
T24.245.526.376.917.117.247.567.707.717.71
C0.001.763.886.106.897.319.4410.8311.3711.87
 
WMP1
H4.255.425.746.166.466.656.696.756.776.97
T12581.582915.383028.623076.653099.923143.043151.823157.583161.703159.53
T24.245.425.746.166.456.636.686.736.756.94
C0.001.633.705.727.609.8311.9514.0516.1718.18
 
WMP2
H4.255.425.746.166.456.656.696.736.756.80
T12581.582915.383031.153077.773098.633142.143149.103152.333156.543159.98
T24.245.425.746.166.446.626.676.706.736.78
C0.001.633.655.677.559.7811.9014.0316.1418.34
 
WMP3
H4.255.435.726.386.946.966.976.996.997.05
T12581.582918.612933.073081.563195.223195.473193.733192.243188.903190.04
T24.245.435.726.376.936.956.976.996.997.05
C0.002.165.728.2910.9415.0318.3722.4325.6429.62

4.2.3. Comparison With WMPs

[63] Different entropy terms corresponding to MIMR and WMPs were also presented in Table 5. More or less the same results were obtained as in case 1. The better performance of MIMR was more apparent. The joint entropy of selected stations by MIMR was larger than that of stations selected by WMPs. This confirmed the decent performance of MIMR in locating stations with high information content. Both inline image and inline image obtained by MIMR were higher than those obtained by WMPs. Even though equation (14a) was used as the objective in the MIMR selection, implying that inline image was not accounted for, inline image obtained by MIMR was still larger than that obtained by WMPs. This is another interesting finding about MIMR. The network selected by MIMR contained notably less redundant information than that contained in the network obtained by WMPs, as indicated by total correlation.

[64] Spatial locations of the selected stations are presented in Figure 6. The top 10 monitors derived by MIMR were widely spaced without clusters across the canal system, confirming the good performance of MIMR in determining a network with minimum redundant information. The spatial distribution of the top 10 stations ranked by WMP1 and WMP2 was almost the same except the one selected at the 10th step. This observation explained why entropy terms associated with WMP1 and WMP2 were similar as given in Table 5. Compared with the widely spaced stations determined by MIMR, station clusters were found in the network determined by WMPs, especially by WMP3. This signified the good performance of MIMR in determining an unbiased monitoring network as long as the information weight was properly selected. A monitoring network with clustered stations is considered as biased in that the underlying spatial variability of the variable of interest cannot be realistically discerned by the data collected from it.

Figure 6.

Spatial distribution of the top 10 stations ranked by MIMR with inline image and by WMPs.

4.2.4. Sensitivity Analysis of λ1

[65] The right panel of Figure 4 displays the results of sensitivity analysis, which are similar to those obtained in case 1. As can be seen, the performance of MIMR approached and then surpassed that of WMPs with respect to multivariate joint entropy and transinformation as the information weight increased from 0.5 to 1.0. On the other hand, from the view point of total correlation, MIMR moved toward and then surpassed WMP1 and WMP2 but never reached WMP3. Again it was found that even though the redundancy weight was null, the MIMR criterion still performed better in finding independent stations compared with WMP3.

[66] Spatial locations of the selected stations by MIMR with different information weights are presented in Figure 7. When the information weight was small, a cluster of stations with small information content was always indentified. A small information weight corresponds to a large redundancy weight. With a large redundancy weight, the total correlation has more influence on the objective function. Therefore, stations which would introduce less redundant information were selected. The points with small information content in inline image shared a small amount of information with stations in inline image. That is why clustered stations were selected from the low information area. It was also noted that as the information weight increased, the selected stations became widely spaced, and then became clustered again (e.g., inline image). This time the clustered stations were found in the high information area. In view of this, it was suggested again, using WMPs as benchmark methods to modulate the information weight until a satisfactory result can be obtained. This provides a practical way for determining suitable information-redundancy tradeoff weights.

Figure 7.

Spatial distribution of selected stations by MIMR with different information-redundancy tradeoff weights.

5. Merits and Demerits of MIMR

[67] It is instructive to compare the criteria of different approaches that have appeared in the literature through which to highlight the merits and demerits of the MIMR criterion.

[68] The idea of the information maximization principle [Caselton and Husain, 1980; Husain, 1987, 1989; Al-Zahrani and Husain, 1998; Yoo et al., 2008] is to select a set of inline image stations from a dense network of inline image stations such that

display math

where inline image represents the station in the possible p-combination of the inline image stations and inline image denotes the station outside of the p-combination. The combination of these inline image stations that satisfies equation (19) will be the optimally retained stations. Redundancy of the network is not accounted for.

[69] The idea of Krstanovic and Singh [1992a, 1992b] is to select the “central” station first then sequentially select others that give the lowest reduction in uncertainty. The objective is generalized as

display math

where inline image represents the station selected in the ith step. Equation (20) focuses on avoiding the redundant information of a network and explicitly neglects its ability in maximizing the joint entropy and the transinformation between selected and unselected stations.

[70] A two-phase approach was described byYang and Burn [1994]. The first stage uses a hierarchical clustering technique to form groups of hydrometric gauging stations. The second stage involves the selection of a single station from each group for retention in the final network. The single station is selected based on that the maximum fraction of its information can be transferred to other stations in the group. Generally, this approach focuses on the information transition ability and on avoiding redundancy of the network. It omits the ability in providing maximum joint entropy.

[71] Quantitative comparisons between WMPs and MIMR have been detailed. One more point we want to further highlight is that WMPs are implemented through matrix operation. Therefore it is more computationally efficient than the MIMR greedy selection. At each step of MIMR selection, all stations remaining in the candidate set will be scanned to search the optimal one, which would largely increase the computational burden.

[72] Another criterion was described by Alfonso et al. [2010b], which is generalized as

display math

where inline image represent the selected stations in the optimal set and inline image is a constant with cost units of bits per new station. Equation (21) fails to account for transinformation. The advancement is the consideration for the economic effect. It is easy to incorporate a cost function into MIMR. The important point is to develop cost and benefit functions in information units. Compared with the approach used in the reference by Alfonso et al. [2010b] to solve the multiobjective problem, the greedy ranking algorithm for MIMR selection is straightforward and much easier to implement. However, we want to caution that this simplicity is traded by the declined diversity of feasible solutions.

[73] Another limitation of MIMR selection is that it is only suitable for choosing stations from a dense network. How to expand a sparse network is beyond its capacity. Fortunately, this problem can be addressed by simulation. Physically based hydrologic or hydrodynamic models can be used to generate synthetic dense networks, like the polder water level monitors in case 2. Weather generators are accessible to generate synthetic rainfall fields [Wilks, 2009]. High spatial and temporal resolution of radar rainfall products also can be exploited to approach this problem [Volkmann et al., 2010]. It is cautioned that the synthetic dense network should preserve the properties of the observed network as much as possible. Otherwise, one should be careful with the potential dangers.

6. Concluding Remarks

[74] A generic criterion for the evaluation and design of hydrometric network based on entropy theory is presented. The proposed criterion, named as maximum information minimum redundancy (MIMR), covers the following three optimality measures of a hydrometric network: (1) joint entropy, (2) transinformation, (3) and total correlation. In terms of the MIMR, the selected network can provide the highest information content and avoid dependent stations as much as possible, guaranteeing while the stations within and outside of the optimal set has high common information. Based on MIMR, a greedy ranking algorithm is proposed to optimally select hydrometric stations from a dense network. The optimum number of stations is determined by the percentages of total information that can be explained by stations in the optimal set.

[75] With the aid of discrete variable merging, all of the entropy terms involved in MIMR can be easily evaluated at high dimension without any distributional assumption. The basic idea of variable merging is to create a new variable whose information content is equal to that retained by the original variables. This technique is only suitable for discrete data. In the context of hydrometric network design (or evaluation), continuous time series data should be discretized first, which is done with the use of a mathematical floor function. Mathematical floor function on one hand avoids the choice of a parametric distribution to fit the continuous data and on the other hand, it can incorporate physical considerations. Three thumb rules suggest how to select a proper discretization parameter suitable for general hydrometric system, even in the case that no clear physical consideration is available.

[76] The MIMR selection is quantitatively compared with another entropy-based approach, i.e., WMPs. Results show that MIMR is better at finding stations with high information content, and better at locating independent stations. The information-redundancy tradeoff weights provide the user a flexible handle to balance the two conflicting objectives: maximum information and minimum redundancy. The tradeoff weights reflect the user's knowledge about the hydrometric network. When such knowledge is unavailable, a promoted approach is to use the computationally efficient WMPs as benchmarks and adjust the information weight until a satisfactory result is obtained. In this study, we only quantitatively compare MIMR with WMPs. Further comparisons with other approaches are merited, especially withAlfonso et al. [2010b], to see the difference between monitoring networks determined by greedy selection and by multiobjective optimization.

[77] Even though there are many ways to define the objective for the design of a hydrometric network, MIMR focuses on the fundamental theme, i.e., selecting an optimum number of stations and their optimum locations. Other considerations, like the cost of placing new stations, are not included. Nevertheless, they can be easily incorporated into the objective by introducing an extra penalty function. The crucial point is to find an appropriate way to measure the economic cost in terms of information units. This is a problem pending for future work.

Appendix A:: The Step by Step Implementation Procedure of WMPs'

[78] Following Alfonso et al. [2010a], the step by step selection procedure of WMPs is generalized as:

[79] 1. Collect data of hydrometric variable inline image of interest at each candidate station inline image (i = 1, 2, … , n, where n is the number of candidates) and discretize them using equation (6).

[80] 2. Calculate the marginal entropy inline image for each inline image with the use of equation (1).

[81] 3. For each inline image, calculate the transinformation in equation (4) with respect to each of the remaining points and build the symmetric matrix

display math

[82] For WMP2 and WMP3, the transinformation is replaced by directional information transfer index in equations (17) and (18), respectively.

[83] 4. Select the first monitor inline image, which is located at the point with the highest information content measured by marginal entropy, and add the monitor inline image to the selected set inline image.

[84] 5. Recover the transinformation vector inline image of the monitor inline image : inline image, inline image.

[85] 6. Divide the candidates into two sets of stations with respect to their dependence on inline image: those that are dependent and those that are independent inline image. The independent set inline image is obtained by looking at the elements of inline image such that inline image is less than a threshold given as inline image.

[86] 7. From the independent set inline image, select the second monitor inline image, which has the highest marginal entropy.

[87] 8. Recover the transinformation vector inline image of the monitor inline image : inline image, inline image.

[88] 9. Select the next monitor inline image in a similar way, but now using an updated independent set inline image given by the common set of independent points in the overlapping transinformation vector for the previously selected monitors inline image and inline image. Set inline image.

[89] 10. Set inline image and repeat the procedure from step 5 until inline image does not provide significant information content or until the independent set inline image is empty.

Acknowledgments

[90] The authors would like to express their sincere appreciation to Leonardo Alfonso, at the Hydroinformatics and Knowledge Management, UNESCO-IHE, Delft, Netherlands, who provided part of the data used in this paper. Three anonymous referees raised constructive comments. Their help in improving this paper is gratefully acknowledged. The work was financially supported in part by the United States Geological Survey (USGS, Project ID: 2009TX334G) and TWRI through the project “Hydrological Drought Characterization for TX under Climate Change, with Implications for Water Resources Planning and Management.”