SEARCH

SEARCH BY CITATION

Keywords:

  • epistemic uncertainty;
  • aleatory uncertainty;
  • random set theory;
  • groundwater flow;
  • transport modeling

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

[1] The characterization of aleatory hydrogeological parameter uncertainty has traditionally been accomplished using probability theory. However, when consideration is given to epistemic as well as aleatory uncertainty, probability theory is not necessarily appropriate. This is especially the case where expert opinion is regarded as a suitable source of information. When experts opine upon the uncertainty of a parameter value, both aleatoric and epistemic uncertainties are introduced and must be modeled appropriately. A novel approach to expert-provided parameter uncertainty characterization can be defined that bridges an historical gap between probability theory and fuzzy set theory. Herein, a random set, a generalization of a random variable is employed to formalize expert knowledge, and fuzzy sets are used to propagate this uncertainty to model estimates of contaminant transport. The resultant random set-based concentration estimates are shown to be more general than the corresponding random variable estimates. In some cases, the random set-based results are shown as upper and lower probabilities that bound the corresponding random variable's cumulative distribution function.

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

[2] Uncertainty in groundwater flow and transport modeling comes in two forms: aleatory and epistemic. Such distinctions in uncertainty are most often identified in risk assessment and reliability engineering [Helton et al., 2000a, 2000b, 2004; Hofer et al., 2002; Helton and Oberkampf, 2004; Oberkampf et al., 2004]; and only recently have these distinctions been identified in hydrogeological applications [Srinivasan et al., 2007]. Aleatory uncertainty, also called stochastic or variable uncertainty, refers to uncertainty that cannot be reduced by more exhaustive measurements or a better model. Epistemic uncertainty, or subjective uncertainty, on the other hand, refers to uncertainty that can be reduced.

[3] Despite these apparent distinctions in uncertainty, probability theory alone has traditionally been used to characterize both forms of uncertainty in engineering applications [Apostolakis, 1990; Helton et al., 2004]. While it is commonly accepted that probability theory is ideal for the characterization of aleatory uncertainty [Ganoulis, 1996], the facility with which probability theory effectively captures epistemic uncertainty has been called into question [O'Hagan and Oakley, 2004], especially given the introduction of a number of alternative methods of epistemic uncertainty characterization [Choquet, 1954; Zadeh, 1965, 1978; Shafer, 1976].

[4] One such method, fuzzy set theory, has failed to gain acceptance in engineering, much less hydrogeological, applications. A possible reason for this is the necessary paradigm shift one must make in order to apply fuzzy set theory to uncertainty characterization.

[5] Random set theory [Zadeh, 1965] provides an intuitive means for both epistemic and aleatory uncertainty characterization. Whereas probability theory's basic tool for uncertainty characterization is the probability density function (PDF), discrete random set theory is predicated upon the assignment of probabilities to intervals, rather than points, as with discrete PDFs. As such, a random set is a generalization of a random variable, since intervals are more imprecise than point values. The use of random set theory is more appropriate for representation of subjective knowledge because it does not rely upon means, variances and probabilistic models, which are inconsistent with the nature of human thought and discourse.

[6] Random set theory [Helton and Oberkampf, 2004; Joslyn and Kreinovich, 2005], however, is a general approach to subjective knowledge characterization, and in addition, random sets can be transformed into fuzzy sets [Joslyn and Booker, 2004; Joslyn and Ferson, 2004] with little difficulty. As will be shown, the transformation from random sets to fuzzy sets facilitates the efficient solution of groundwater flow and transport model equations characterized by uncertainty.

[7] Though a few applications of fuzzy set theory to expert knowledge characterization in hydrogeologic applications have been published [Bardossy et al., 1989, 1990a, 1990b, 1990c; Bagtzoglou et al., 1996; Dou et al., 1995, 1997a, 1997b, 1999; Fang and Chen, 1997; Demmico and Klir, 2004; Guan and Aral, 2004; Ozbek and Pinder, 2006; Ross et al., 2006, 2007, 2008], fuzzy sets are not a mainstream method of uncertainty characterization in hydrogeology. Consequently, an explanation of the distinctions between probability and fuzzy set theories and aleatoric and epistemic uncertainties in the characterization of hydrogeological uncertainty is needed.

[8] The purpose of this paper is to frame uncertain hydraulic conductivity information in terms of aleatory and epistemic uncertainty, to show how it is related to random set theory, and to demonstrate how it can be used in groundwater flow and transport modeling. To this end, we present the random set characterization of hydraulic conductivity using both uncertainty types and the corresponding simultaneous use of random set, probability and fuzzy set theories. We also describe how to propagate both types of uncertainty in model estimates of concentration. In doing so, we provide a reasonable method of uncertainty characterization that is compatible with both probability and fuzzy set theories.

2. Problem Statement

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

2.1. Site Information

[9] In groundwater modeling problems where hydraulic conductivity measurements are few, a hydraulic conductivity field is often assumed to be composed of a few large subdomains of equal hydraulic conductivity, like the simplified representation of the Woburn, Massachusetts site presented in Figure 1, which we will use as our illustrative example problem. A small number of hydraulic conductivity measurements are available in each of these hydraulic conductivity subdomains. Total correlation is assumed within boundaries and zero correlation is assumed between the units. Constant head conditions are specified on the left (22.9 m) and right (38.1 m) boundaries, and no-flow boundary conditions are specified along the top and bottom of the domain. Contaminant sources (gray ovals in Figure 1) are located in formations four and five at concentrations of 2000 ppb and 1500 ppb, respectively. Finally, two pumping wells are placed in formation two. Wells A and B pump at 8.19 × 10−4 m3/s and 4.91 × 10−4 m3/s, respectively.

image

Figure 1. Inversely estimated hydraulic conductivity field; different numbers identify fields with distinct hydraulic conductivity values. Gray ovals represent contaminant source locations. Dots denoted as A and B are pumping wells.

Download figure to PowerPoint

[10] The Princeton Transport Code (PTC), a three-space dimensional finite element simulator was employed to model groundwater flow and transport, with a mesh density value of 15.2 m everywhere but at the well locations, where the density is increased to 3.0 m. Dispersivity, storativity and porosity values are defined as 0.3 m, 0.0001, and 0.2, respectively, throughout the domain. These are the default PTC values for the parameters, and were deemed adequate as the purpose of the study was the novel characterization of hydraulic conductivity uncertainty and model estimates of concentration. Though the low dispersivity value in conjunction with the mesh density suggests a high Peclet number, and, as a result the possibility of significant numerical errors in the finite element transport model, automatic upstream weighting of the convection term adjusts for small dispersivities.

2.2. Traditional Approach (Confidence Intervals)

[11] For a given subdomain, a representative random variable hydraulic conductivity value can be constructed from the mean and variance of that subdomain's measurement data set. Because these measurements are themselves inherently uncertain owing to measurement and inverse model uncertainty, both the mean and variance of the measurement data set may not be representative of their true values. An appropriate expert familiar with the hydrogeology of the area may be asked to provide some measure of the uncertainty in the form of a 95% confidence interval. Given a mean value, the assumption of hydraulic conductivity's lognormality and this 95% confidence interval a PDF defining the hydraulic conductivity random variable can be constructed. Such an approach to both epistemic and aleatoric uncertainty characterization is predicated strictly upon probability theory. The two sources of uncertainty, natural randomness and expert knowledge, are not distinguishable when both are built into a single probability distribution.

[12] The mean hydraulic conductivity values associated with the domain in Figure 1 are provided in Table 1, along with the expert provided confidence intervals for the five formations and the resulting calculated variances. In this case, the expert possessed an awareness of the devices used to measure hydraulic conductivity at the various locations as well as an implicit understanding of the subdomain-wise homogeneity throughout the site. On the basis of this background knowledge, the intervals for three of the formations were specified to approximate 2 orders of magnitude variation in hydraulic conductivity. For simplicity, the remaining two formations were assigned zero variation.

Table 1. Hydraulic Conductivity Random Variable Properties for the Domain in Figure 1
Unit NumberMean LnK (m/s)Confidence IntervalVariance
1−25.6[−27.4, −23.8]0.83
2−18.8[−18.8, −18.8]0
3−19.2[−20.9, −17.4]0.79
4−15.0[−17.2, −12.8]1.26
5−11.9[−11.9, −11.9]0

[13] Consider some of the limitations of the pure probabilistic approach. Aside from blurring the two uncertainty sources (aleatory and epistemic) into a single probability distribution, the probabilistic form of the model estimates of concentration is significantly dependent upon the certainty with which the expert can define 95% confidence, a rather abstract notion, and the appropriateness of the lognormality assumption. In fact, the longstanding assumption of lognormality for hydraulic conductivity may not be correct in all cases [Ricciardi et al., 1998; Mathon et al., 2009] The true random variable may actually be best defined using an alternative probability function.

[14] Using rank-ordered Latin hypercube sampling [Zhang and Pinder, 2003] model estimates of concentration were determined using the flow and transport simulator. The uncertainty in these concentration estimates is sensitive to relatively small variations in estimated hydraulic conductivity intervals. Note that 2 orders of magnitude change in a confidence interval is considered small relative to the range of hydraulic conductivity values one can encounter in the field, which, according to Domenico and Schwartz [1990] can range over 11 orders of magnitude from clay to gravel.

[15] The results of the three cases of varying hydraulic conductivity uncertainty presented in Table 2 are plotted in Figure 2 as estimates of discrete concentration random variables. The cumulative distribution functions plotted are constructed from the realizations of concentration estimates that result from the application of Latin hypercube sampling. The steeper distributions (case 2, black squares) result from the smaller hydraulic conductivity confidence intervals in Table 3. The longer, wider distributions (case 3, hollow circles) are the random variables resulting from a less certain expert, who provided wider confidence intervals. Moderate uncertainty (case 1, black circles) produces cumulative distributions situated between the two extreme cases. Thus, the opining expert's certainty regarding model input parameters can produce larger changes in the model output random variables.

image

Figure 2. Random variables representing the concentrations at nodes 400 (top left), 234 (top right), 162 (bottom left), and 190 (bottom right). Intuitively, the variance changes throughout space and as the expert-provided hydraulic conductivity confidence intervals are narrow (case 2) relative to the case of interest (case 1) and relatively wide (case 3). The data for these random variables are given in Table 2. The confidence intervals for the three cases are provided in Table 3.

Download figure to PowerPoint

Table 2. Locations and Concentration Statistics for the Nodes Whose Locations Are Plotted in Figure 1a
NodeEastingNorthingMeanVariance of Case 1Variance of Case 2Variance of Case 3
  • a

    Locations and concentration are in feet.

4001107635238149911042248
234159010306458292343609
162145068299811131768713
1901420634722414497913689
Table 3. Expert-Provided Hydraulic Conductivity Confidence Intervals for the Base Case, a High-Certainty Case, and a Low-Certainty Casea
Unit NumberConfidence Interval of Case 1Confidence Interval of Case 2Confidence Interval of Case 3
  • a

    Conductivity is in m/s. Base case is case 1, high-certainty case is case 2, and low-certainty case is case 3.

1[−27.4, −23.8][−26.4, −24.8][−28.5, −22.6]
2[−18.8, −18.8][−18.8, −18.8][−18.8, −18.8]
3[−20.9, −17.4][−19.9, −18.4][−21.9, −14.4]
4[−17.2, −12.8][−16.6, −13.4][−18.5, −11.4]
5[−11.9, −11.9][−11.9, −11.9][−11.9, −11.9]

3. Random Set Uncertainty

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

[16] An approximation of a hydraulic conductivity subdomain's uncertain hydraulic conductivity is provided by the discrete CDF calculated from the available measurements in that subdomain. However, as stated above, these individual measurements are themselves uncertain. Intuitively, then, this measurement error warrants characterization before the subdomain's hydraulic conductivity random variable can be defined.

[17] Rather than restrict an expert to a single interval in an attempt to capture both the measurement error as well as the stochasticity of hydraulic conductivity throughout a particular hydraulic conductivity zone, as is traditionally accomplished by confidence intervals, it is more intuitive to permit the expert to opine upon the uncertainty in the individual measurement values that helped determine the random variable in the purely probabilistic approach, above.

[18] It has been demonstrated that an appropriate expert can simply opine upon the uncertainty of hydrogeological measurements by assigning an interval in which the true value is expected to lie [Joslyn and Kreinovich, 2005] using “what is known about the underlying quantity” [Ferson et al., 2002]. Ferson et al. [2002] aptly note that though this is the simplest approach, it is also the most difficult to defend to others. They also provide alternative methods for defining random set structures. For example, knowing the measuring device (i.e., pump test, slug test), an expert can simply define such an interval by stating that the true value lies within x orders of magnitude of the measured value [Ferson et al., 2002; Mathon et al., 2009]. Thus, in any one hydraulic conductivity subdomain with t equiprobable measurements, a collection of t equiprobable intervals is defined, representing t uncertain measurements. Where the measurements are used to construct a discrete CDF, these expert-provided intervals essentially bracket the unknown true random variable hydraulic conductivity. A collection of these intervals [Helton and Oberkampf, 2004; Joslyn and Kreinovich, 2005], forms a random set, which is isomorphic to a Dempster-Shafer body of evidence [Joslyn and Booker, 2004].

[19] In this framework, a random set and associated probability function are composed of a set of focal elements {F, m}, where F is the set of focals (intervals) and m is the basic mass assignment function that assigns nonzero probabilities to the focals. A random set can be transformed into lower and upper probability bounds, thereby bracketing the unknown true random variable. These bounds are called belief, bel(IG), and plausibility, pl(IG), respectively, for some arbitrary focal element IGG, a set of focals. Because the random set definition requires less precision from the opining expert, the resulting upper and lower bounding curves (Figure 3) do not impose false precision in the parameter uncertainty characterization and avoid any inaccuracies that result from forcing an expert to provide information that would define a single random variable. Moreover, no probability model need be selected or assumed, which is desirable in light of the above-mentioned possible inaccuracies in the lognormality assumption for hydraulic conductivity.

image

Figure 3. Cumulative random set hydraulic conductivity values for zone 1 (top left), zone 2 (top right), and zone 5 (bottom), determined by the ±2 orders of magnitude uncertainty on available measurements.

Download figure to PowerPoint

[20] Formally, a hydraulic conductivity random set (F, mF), is defined on the Cartesian product K = K1 × × Kn1, where Kj denotes the domain of the hydraulic conductivity value and n1 is the number of uncertainty hydraulic conductivity zones (in the trial case, n1 = 3). In this formal definition, mF is a function mapping elements, IF ∈ F, of F to the interval [0, 1],

  • equation image

Since, in our trial example, the conductivity zones are assumed uncorrelated, the n1-dimensional random sets are marginalized to random sets (Fj, mFj), j = 1,…,n1, each defined solely upon the individual domains Kj. In other words, one can specify (F, mF) by means of n1 stochastically independent random sets.

[21] Each of these random sets is composed of a finite number of intervals image imageFj, or focal elements, each associated with a probability mass assignment image The propagation of the random set-based hydraulic conductivity values through to concentration values necessitates the use of a tool to extend the flow and transport model such that it can operate upon these focal elements. Such an extension would permit the calculation of concentration focal elements that can be aggregated into a concentration random set at each location throughout the spatial domain.

[22] Consider the extension of the transport model y = f(x), where x = (x1,…,xn1) is the vector of uncertain conductivities in n1 zones and y = (y1,…,yn2) is the vector of uncertain concentration values at n2 nodes. Assuming, without loss of generality, that n1 = 1 and K1 = Ω = {k1,…,kL}, the domain of L possible hydraulic conductivity values, where focal elements are subsets of Ω. The random set extension principle [Dubois and Prade, 1991] defines the random set concentration (G, m) at node i as

  • equation image

and

  • equation image

where IG represents a focal set of concentrations that is an image of a hydraulic conductivity focal element IF through the transport model f. This concentration focal element is defined by

  • equation image

A significant drawback to the random set extension principle is its computational intensity, since a random set uses the power set of its domain rather than the domain itself. Therefore, an approximation of the random set hydraulic conductivities, such that the extended transport model can be carried out over Ω rather than its power set, is needed.

[23] In general, simplifying or approximating a random set means approximating it by another random set, in which the number of the focals containing relevant information is reduced [Bauer, 1996]. An approach to approximating a random set, presented by Dubois and Prade [1990], is adopted in this paper. This approximation uses the following steps: (1) formation of sets of focals of the original random set that are the focals of the approximating random set and (2) allocation of basic probability masses to the sets from the first step using a process that is optimal in the sense that the resulting hydraulic conductivity focal elements of the approximating random set are the smallest in size, effectively minimizing the amount of imprecision introduced by the approximation.

[24] Approximating a random set by the above method results in nested focal elements, which comprise a fuzzy set A [Klir and Yuan, 1995], a special type of random set and defined by a membership function (discussed further below). This fact permits the solution of the transport model over the domain Ω, rather than its power set. For the exact representation of the approximating conductivity fuzzy set on Ω, the membership function is calculated by the one-point coverage function [Goodman and Nguyen, 1985] for random sets:

  • equation image

where Ci are the focal elements of the approximating hydraulic conductivity random set and μA denotes the membership function of the hydraulic conductivity fuzzy set A. This fuzzy set representation of the uncertain hydraulic conductivity values allows for the use of the special case of the extension principle, described above, for fuzzy sets [Dubois and Prade, 1991], which states that in order to calculate the possibility value of an uncertain concentration value one must consider membership values of hydraulic conductivities used to calculate that concentration:

  • equation image

where μf(A) represents the membership function of the fuzzy concentration at a given node.

[25] Where fuzzy sets are used to approximate random sets, the application of the extension principle [Klir and Yuan, 1995] to the model equations is relatively straightforward and has precedent in hydrogeological applications [Dou et al., 1995, 1997a, 1997b; Prasad and Mathur, 2007]. Nevertheless, in our example case the vertex method [Ross, 2004], an approximation to the extension principle, is applied to reduce computational effort. The vertex method results in, for each α-cut (an interval created by the horizontal cut of a fuzzy set at a given α, or membership value), eight concentration values at each location due to the eight possible permutations of the three lower and three upper hydraulic conductivity bounds (from three uncertain hydraulic conductivity values) of the α-cut. The minimum of these eight values is taken as the lower bound of the concentration α-cut, and the maximum is assigned as the upper bound.

[26] If the function being extended (i.e., the transport model) is nonlinear and monotonic with respect to its variables (in our case, three hydraulic conductivity values), then there is a possibility that the function will take smaller or larger values for a combination of three hydraulic conductivity values that are not necessarily a permutation of the respective α-cut bounds, but rather a permutation of values sampled anywhere within these bounds. If one computes with the 8 permutations of the bounds of the α-cuts of the three fuzzy hydraulic conductivity fuzzy sets and calculates eight concentration values, one assumes that the concentration at that node cannot get any smaller than the minimum of the eight values and cannot get any larger than the maximum of the eight values. In other words, one admits that there is no need to investigate the entire hydraulic conductivity α-cuts in order to determine the α-cut bounds of the concentration at any node.

[27] If the alpha cuts of the fuzzy hydraulic conductivities have one or more extreme points in the interior, then the vertex method approach can be taken as approximations to the true global extreme values that determine the bounds of the concentration at the given alpha level. This will result in fuzzy nodal concentration values with narrower support implying higher specificity in the information content than there actually exists [Klir and Yuan, 1995]. However, it is the authors' opinion that the nonlinearity of the relationship between nodal concentrations and hydraulic conductivity is mostly monotonic and the extreme concentrations will be at the vertices (α-cut bounds), rather than anywhere between the vertices. Thus, for the contaminant transport problem considered here, it is more efficient to use the vertex method than to invoke a global optimization tool that implements the extension principle and, as such, searches the entire α-cut. Where monotonicity cannot be justified, the authors recommend applying the extension principle.

[28] Figure 4 provides examples of a fuzzy set defining an uncertain hydraulic conductivity value. The interpretation of these fuzzy sets in Figure 4 that is most pertinent to this topic is that of a possibility distribution [Zadeh, 1978]. Where model inputs are defined as fuzzy sets, model estimates of concentration are interpreted as possibility distributions. A possibility distribution defines for each value along the horizontal axis the degree to which that value is possible, given available evidence. In Figure 4 (top left), for instance, the hydraulic conductivity value 3.5 × 10−9 m/s is most possible. This is similar to probability theory, whereupon inspection of the peak of a probability density function would reveal the most probable value.

image

Figure 4. Possibilistic approximations for the random set provided by the expert for the hydraulic conductivity values associated with zone 1 (top left), zone 2 (top right), and zone 5 (bottom). The hydraulic conductivity values for zones 3 and 4 are considered certain and precise.

Download figure to PowerPoint

[29] Whereas, through the transformation to fuzzy sets, random sets offer a relatively facile strategy for computation with uncertainty, the most significant advantage to the random set approach is its potential to characterize both aleatory and epistemic uncertainty. The foundation for random sets lies in probability theory, which, as mentioned above, is ideally suited for aleatory uncertainty characterization. On the other hand, random sets are less specific, or less precise, than random variables [Joslyn and Booker, 2004], because focal elements, upon which random set are based, are a source of imprecision in the uncertainty quantification process (focal elements associated with conventional probabilities, random variables, are points and therefore more precise than general random sets). This imprecision is a form of epistemic uncertainty. An expert who provides a body of evidence to characterize the uncertainty regarding some hydraulic conductivity measurement actually admits to the existence of these two forms of uncertainty. The natural randomness (aleatory uncertainty) of hydraulic conductivity is captured by the stochastic nature of random sets (the basic mass assignments). Owing to an expert's inability to precisely define this natural randomness, a random set merely provides bounds on the exact random variable (because the basic mass function is defined over intervals of the conductivity domain rather than the domain itself), thereby imprecisely defining the random variable (epistemic uncertainty).

[30] The starting point for the application of random set-based uncertainty characterization is similar to the probabilistic approach presented above. Given the measurements in each zone in Figure 1, the expert, armed with knowledge of the measurement technique and aquifer characteristics, provides the aforementioned intervals on the each measured value by specifying that the true hydraulic conductivity value lies within ±2 orders of magnitude of the measurement. As mentioned above, where the measurements in a particular zone are used to construct an approximate CDF, these expert-provided intervals become upper and lower bounds on the zone's true random variable hydraulic conductivity (plausibility and belief, respectively). Figure 3 shows these upper and lower bounds for zones one, two and five. As in the Monte Carlo approach, the hydraulic conductivity values for zones three and four are considered certain and precise. The fuzzy set approximations of these random sets, whose construction is outlined above, are shown in Figure 4.

[31] Though the set of intervals and associated probabilities provided by the expert are a natural extension of the confidence interval in the Monte Carlo approach above, they comprise a greater amount of information. As such, the bodies of evidence provided by the expert capture both the uncertainty surrounding the mean hydraulic conductivity values and the imprecision with which the expert can truly characterize this uncertainty.

[32] The vertex method [Ross, 2004], an approximation to the extension principle, was applied to the finite element approximation equations of the groundwater flow and transport model in order to propagate the possibilistic uncertainty through to the concentrations values. As a result, uncertain concentration estimates are described by possibility distributions. The possibilistic concentration values can be transformed into upper (plausibility) and lower (belief) bounds on the unknown random variable. The resulting bounds for the nodes of interest are plotted in Figure 5 with the corresponding random variables from all three cases of uncertainty in cumulative distribution function form (dashed lines) from Figure 2. If the intervals used to construct the random sets are certain to contain the value of the measured variable, the true (and unknown) probability distribution, which the confidence intervals aim to characterize, defining a random variable lies between the plausibility and belief curves, especially where these bounds are widely separated [Ferson et al., 2002]. Thus, the burden is upon the expert, who specified the magnitude of measurement error that creates the random sets, to ensure that the measurement intervals are wide enough to bound the true hydraulic conductivity measurement yet narrow enough so as to be meaningful. Since the error of various measurement techniques are commonly acknowledged and, at times, quantified [Mathon et al., 2009] we perceive this task to be reasonable.

image

Figure 5. The cumulative belief and plausibility curves are relatively narrow and do not entirely bound the corresponding cumulative distribution functions for the three cases at nodes 400 (top left) and 234 (top right), but are rather wide and do bound the cumulative distribution functions for nodes 162 (bottom left) and 190 (bottom right).

Download figure to PowerPoint

4. Discussion of Results

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

[33] The outcome of the random set approach to uncertainty characterization is significantly distinct from that of the traditional purely probabilistic approach. Intuitively, the strictly probabilistic approach produces random variable concentration values, whereas the random set-based approach results in upper and lower probability bounds. As noted above, assuming that uncertainty in both approaches was characterized by the same expert or different experts with the same understanding of pertinent data, the upper and lower probabilities (plausibility and belief) should bracket the corresponding probability distribution (produced by the stochastic approach) at a given location. Thus, the range of concentration values that result from the random set approach is greater than that which would result from the strict Monte Carlo method, owing to the imprecision inherent in the random set approach. However, the degree of precision presented in the random variable approach is, as argued herein, difficult to justify.

[34] Consider the same nodal locations whose concentration random variables are plotted in Figure 2. The upper and lower probabilities for these same nodal locations are plotted in Figure 5, along with the concentration random variables for all three cases of uncertainty presented in Table 1. Note that, at some nodes, the upper and lower probabilities entirely bound the corresponding random variables, whereas, other nodes do not entirely bound the corresponding cumulative distribution function. Such a discrepancy is likely due to the fact that the expert employed to define the confidence intervals was not the same as the expert who provided the information used to construct the random set intervals.

[35] Uncertainty (variance) associated with concentration values changes throughout the spatial domain, as is evident by the different slopes in the cumulative distribution functions in Figure 2. Nevertheless, high variances may not simply be associated with high degrees of uncertainty, but rather with means of greater magnitude. Random set-based probability bounds, however, are independent of the magnitude of the concentration values and provide a true means of uncertainty identification. Wider bound separation signifies more uncertainty in concentration estimates, and, as a result, locations where more data are warranted. For instance, the estimate in Figure 5 (bottom right) is more uncertain than that in Figure 5 (top right) and, as such, is in need of additional data. In this particular instance, a combination of node 234's distance from the contaminant source and the expert's precision in providing error bounds on hydraulic conductivity measurements from zone 4 relative to the measurements in zone 1 contributed to the lower separation between the probability bounds for the concentration random set at node 234 (Figure 5, top right) relative to that for node 190 (Figure 5, bottom right). Likewise, the expert believed the measurements from zone 2 to be slightly more reliable than those in zone 4, and thus, the uncertainty surrounding the concentration at node 400 is lower than it is for Nodes 190 and 162.

5. Conclusion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

[36] Because thorough hydrogeological investigations cost significant amounts of money and time, an efficient means of data acquisition and interpretation is valuable. One such means is expert knowledge extraction. However, the consideration of expert knowledge introduces epistemic uncertainty in addition to the existing stochasticity in hydrogeological parameters, such as hydraulic conductivity. Thus, appropriate characterizations of uncertainty should delineate aleatory uncertainty from epistemic uncertainty, as has been done in risk assessment and reliability engineering [Helton et al., 2000a, 2000b, 2004; Hofer et al., 2002; Helton and Oberkampf, 2004; Oberkampf et al., 2004].

[37] In this paper, we have sided with Ganoulis [1996] who also argued that probability theory alone cannot accomplish this. The hazard associated with applying traditional probability theory is that the opining expert may provide inaccurate and artificially precise characterizations of the random variable that best captures the naturally stochastic nature of hydraulic conductivity. Fuzzy set theory, on the other hand, has failed to find mainstream acceptance perhaps as a result of its departure from probability theory.

[38] Random sets were introduced in this paper as an alternative and possibly more appropriate means for the characterization of both aleatory (the random variable) and epistemic (expert-characterized measurement error) uncertainty. This approach to uncertainty characterization provides a methodology for bounding an unknown random variable and properly capturing the imprecise nature of expert knowledge. In the provided example, expert knowledge was used to characterize the reducible uncertainty of individual hydraulic conductivity measurements. Random sets were collected from these individual measurement intervals and propagated through a groundwater flow and transport model using fuzzy set methodologies.

[39] Aside from avoiding the imposition of false precision, which is an unfortunate side effect of defining confidence intervals, it is important to note that uncertainty characterization via random sets eliminates the need for any probability model definition or assumption. Moreover, the approximation of the random sets by fuzzy sets and model execution with the fuzzy extension principle accomplishes what fuzzy set-based hydrogeological research has, as yet, failed to embrace – the combination of probability theory and fuzzy sets for the characterization of parameter uncertainty. If fuzzy set theory is to find a stronger foothold in engineering applications, researchers must endeavor to embrace hybrid frameworks that unite fuzzy sets with more traditional mathematical tools such as probability, as is illustrated by this paper.

[40] While the representation of model concentration estimates as upper and lower probabilities (plausibility and belief) provides a transparent comparison between the random variable and random set approaches defined above, the utility of data in such a form may not be immediately obvious. What does one do with an imprecise notion of a stochastic estimate concentration (Figure 5)? In fact, the representation of concentration estimates as possibility distributions (like the possibilistic hydraulic conductivity values in Figure 4), which contain the same information, and can be transformed into, probability bounds, is quite intuitive and readily interpretable. Inspection of a possibility distribution reveals not only the most possible concentration value, but also a range of concentrations that are also possible to lesser and varying degrees. In fact, algorithms have been developed to complement and refine possibilistic model estimates with new information [Fruhwirth-Schnatter, 1993; Pan and Klir, 1996; Yang, 1997; Ross et al., 2007, 2008].

[41] A benefit of separately characterizing aleatory and epistemic uncertainties is the possibility of identifying where and what type of additional information is most beneficial. The value of additional information is correlated with the reduction in reducible (epistemic) uncertainty realized by the consideration of the new information; this is easy to identify using belief and plausibility curves. The appropriate measure is the magnitude of epistemic uncertainty as indicated by the distance between the belief and plausibility curves. As noted above, the concentration estimate at node 190 is less precise than that at node 234 (Figure 5). Thus additional data are most valuable at node 234, where the reducible uncertainty, and likewise the distance between belief and plausibility curves, is greatest. Klir [2006] provides a set of measures to quantify the amount of information as well as uncertainty in random sets.

[42] Though the application presented above focuses upon the characterization of uncertainty in hydraulic conductivity measurements, other forms of uncertainty such as boundary conditions also can be considered. In the case of boundary conditions, which originate predominantly from expert insight, fuzzy sets can be used directly as a characterization methodology, bypassing the need for random sets. The propagation of these forms of uncertainty through a groundwater flow and transport model is executed as presented above.

Acknowledgments

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

[43] This material is based upon work supported by the Strategic Environmental Research and Development Program (SERDP). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of SERDP.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Problem Statement
  5. 3. Random Set Uncertainty
  6. 4. Discussion of Results
  7. 5. Conclusion
  8. Acknowledgments
  9. References
  10. Supporting Information
FilenameFormatSizeDescription
wrcr11702-sup-0001-t01.txtplain text document0KTab-delimited Table 1.
wrcr11702-sup-0002-t02.txtplain text document0KTab-delimited Table 2.
wrcr11702-sup-0003-t03.txtplain text document1KTab-delimited Table 3.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.