### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Problem Statement
- 3. Random Set Uncertainty
- 4. Discussion of Results
- 5. Conclusion
- Acknowledgments
- References
- Supporting Information

[1] The characterization of aleatory hydrogeological parameter uncertainty has traditionally been accomplished using probability theory. However, when consideration is given to epistemic as well as aleatory uncertainty, probability theory is not necessarily appropriate. This is especially the case where expert opinion is regarded as a suitable source of information. When experts opine upon the uncertainty of a parameter value, both aleatoric and epistemic uncertainties are introduced and must be modeled appropriately. A novel approach to expert-provided parameter uncertainty characterization can be defined that bridges an historical gap between probability theory and fuzzy set theory. Herein, a random set, a generalization of a random variable is employed to formalize expert knowledge, and fuzzy sets are used to propagate this uncertainty to model estimates of contaminant transport. The resultant random set-based concentration estimates are shown to be more general than the corresponding random variable estimates. In some cases, the random set-based results are shown as upper and lower probabilities that bound the corresponding random variable's cumulative distribution function.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Problem Statement
- 3. Random Set Uncertainty
- 4. Discussion of Results
- 5. Conclusion
- Acknowledgments
- References
- Supporting Information

[2] Uncertainty in groundwater flow and transport modeling comes in two forms: aleatory and epistemic. Such distinctions in uncertainty are most often identified in risk assessment and reliability engineering [*Helton et al.*, 2000a, 2000b, 2004; *Hofer et al.*, 2002; *Helton and Oberkampf*, 2004; *Oberkampf et al.*, 2004]; and only recently have these distinctions been identified in hydrogeological applications [*Srinivasan et al.*, 2007]. Aleatory uncertainty, also called stochastic or variable uncertainty, refers to uncertainty that cannot be reduced by more exhaustive measurements or a better model. Epistemic uncertainty, or subjective uncertainty, on the other hand, refers to uncertainty that can be reduced.

[3] Despite these apparent distinctions in uncertainty, probability theory alone has traditionally been used to characterize both forms of uncertainty in engineering applications [*Apostolakis*, 1990; *Helton et al.*, 2004]. While it is commonly accepted that probability theory is ideal for the characterization of aleatory uncertainty [*Ganoulis*, 1996], the facility with which probability theory effectively captures epistemic uncertainty has been called into question [*O'Hagan and Oakley*, 2004], especially given the introduction of a number of alternative methods of epistemic uncertainty characterization [*Choquet*, 1954; *Zadeh*, 1965, 1978; *Shafer*, 1976].

[4] One such method, fuzzy set theory, has failed to gain acceptance in engineering, much less hydrogeological, applications. A possible reason for this is the necessary paradigm shift one must make in order to apply fuzzy set theory to uncertainty characterization.

[5] Random set theory [*Zadeh*, 1965] provides an intuitive means for both epistemic and aleatory uncertainty characterization. Whereas probability theory's basic tool for uncertainty characterization is the probability density function (PDF), discrete random set theory is predicated upon the assignment of probabilities to intervals, rather than points, as with discrete PDFs. As such, a random set is a generalization of a random variable, since intervals are more imprecise than point values. The use of random set theory is more appropriate for representation of subjective knowledge because it does not rely upon means, variances and probabilistic models, which are inconsistent with the nature of human thought and discourse.

[7] Though a few applications of fuzzy set theory to expert knowledge characterization in hydrogeologic applications have been published [*Bardossy et al.*, 1989, 1990a, 1990b, 1990c; *Bagtzoglou et al.*, 1996; *Dou et al.*, 1995, 1997a, 1997b, 1999; *Fang and Chen*, 1997; *Demmico and Klir*, 2004; *Guan and Aral*, 2004; *Ozbek and Pinder*, 2006; *Ross et al.*, 2006, 2007, 2008], fuzzy sets are not a mainstream method of uncertainty characterization in hydrogeology. Consequently, an explanation of the distinctions between probability and fuzzy set theories and aleatoric and epistemic uncertainties in the characterization of hydrogeological uncertainty is needed.

[8] The purpose of this paper is to frame uncertain hydraulic conductivity information in terms of aleatory and epistemic uncertainty, to show how it is related to random set theory, and to demonstrate how it can be used in groundwater flow and transport modeling. To this end, we present the random set characterization of hydraulic conductivity using both uncertainty types and the corresponding simultaneous use of random set, probability and fuzzy set theories. We also describe how to propagate both types of uncertainty in model estimates of concentration. In doing so, we provide a reasonable method of uncertainty characterization that is compatible with both probability and fuzzy set theories.

### 3. Random Set Uncertainty

- Top of page
- Abstract
- 1. Introduction
- 2. Problem Statement
- 3. Random Set Uncertainty
- 4. Discussion of Results
- 5. Conclusion
- Acknowledgments
- References
- Supporting Information

[16] An approximation of a hydraulic conductivity subdomain's uncertain hydraulic conductivity is provided by the discrete CDF calculated from the available measurements in that subdomain. However, as stated above, these individual measurements are themselves uncertain. Intuitively, then, this measurement error warrants characterization before the subdomain's hydraulic conductivity random variable can be defined.

[17] Rather than restrict an expert to a single interval in an attempt to capture both the measurement error as well as the stochasticity of hydraulic conductivity throughout a particular hydraulic conductivity zone, as is traditionally accomplished by confidence intervals, it is more intuitive to permit the expert to opine upon the uncertainty in the individual measurement values that helped determine the random variable in the purely probabilistic approach, above.

[18] It has been demonstrated that an appropriate expert can simply opine upon the uncertainty of hydrogeological measurements by assigning an interval in which the true value is expected to lie [*Joslyn and Kreinovich*, 2005] using “what is known about the underlying quantity” [*Ferson et al.*, 2002]. *Ferson et al.* [2002] aptly note that though this is the simplest approach, it is also the most difficult to defend to others. They also provide alternative methods for defining random set structures. For example, knowing the measuring device (i.e., pump test, slug test), an expert can simply define such an interval by stating that the true value lies within *x* orders of magnitude of the measured value [*Ferson et al.*, 2002; *Mathon et al.*, 2009]. Thus, in any one hydraulic conductivity subdomain with *t* equiprobable measurements, a collection of *t* equiprobable intervals is defined, representing *t* uncertain measurements. Where the measurements are used to construct a discrete CDF, these expert-provided intervals essentially bracket the unknown true random variable hydraulic conductivity. A collection of these intervals [*Helton and Oberkampf*, 2004; *Joslyn and Kreinovich*, 2005], forms a random set, which is isomorphic to a Dempster-Shafer body of evidence [*Joslyn and Booker*, 2004].

[19] In this framework, a random set and associated probability function are composed of a set of focal elements {*F*, *m*}, where *F* is the set of focals (intervals) and *m* is the basic mass assignment function that assigns nonzero probabilities to the focals. A random set can be transformed into lower and upper probability bounds, thereby bracketing the unknown true random variable. These bounds are called belief, bel(*I*_{G}), and plausibility, pl(*I*_{G}), respectively, for some arbitrary focal element *I*_{G} ∈ *G*, a set of focals. Because the random set definition requires less precision from the opining expert, the resulting upper and lower bounding curves (Figure 3) do not impose false precision in the parameter uncertainty characterization and avoid any inaccuracies that result from forcing an expert to provide information that would define a single random variable. Moreover, no probability model need be selected or assumed, which is desirable in light of the above-mentioned possible inaccuracies in the lognormality assumption for hydraulic conductivity.

[20] Formally, a hydraulic conductivity random set (*F*, *m*_{F}), is defined on the Cartesian product *K* = *K*_{1} × *…* × *K*_{n1}, where *K*_{j} denotes the domain of the hydraulic conductivity value and *n*1 is the number of uncertainty hydraulic conductivity zones (in the trial case, *n*1 = 3). In this formal definition, *m*_{F} is a function mapping elements, *I*_{F} ∈ F, of *F* to the interval [0, 1],

Since, in our trial example, the conductivity zones are assumed uncorrelated, the n1-dimensional random sets are marginalized to random sets (*F*_{j}, *m*_{F}^{j}), *j* = 1,…,*n*1, each defined solely upon the individual domains *K*_{j}. In other words, one can specify (*F*, *m*_{F}) by means of n1 stochastically independent random sets.

[22] Consider the extension of the transport model *y* = *f*(*x*), where *x* = (*x*_{1},…,*x*_{n1}) is the vector of uncertain conductivities in *n*_{1} zones and *y* = (*y*_{1},…,*y*_{n2}) is the vector of uncertain concentration values at *n*_{2} nodes. Assuming, without loss of generality, that *n*1 = 1 and *K*_{1} = Ω = {*k*_{1},…,*k*_{L}}, the domain of *L* possible hydraulic conductivity values, where focal elements are subsets of Ω. The random set extension principle [*Dubois and Prade*, 1991] defines the random set concentration (*G, m*) at node *i* as

and

where *I*_{G} represents a focal set of concentrations that is an image of a hydraulic conductivity focal element *I*_{F} through the transport model *f*. This concentration focal element is defined by

A significant drawback to the random set extension principle is its computational intensity, since a random set uses the power set of its domain rather than the domain itself. Therefore, an approximation of the random set hydraulic conductivities, such that the extended transport model can be carried out over Ω rather than its power set, is needed.

[23] In general, simplifying or approximating a random set means approximating it by another random set, in which the number of the focals containing relevant information is reduced [*Bauer*, 1996]. An approach to approximating a random set, presented by *Dubois and Prade* [1990], is adopted in this paper. This approximation uses the following steps: (1) formation of sets of focals of the original random set that are the focals of the approximating random set and (2) allocation of basic probability masses to the sets from the first step using a process that is optimal in the sense that the resulting hydraulic conductivity focal elements of the approximating random set are the smallest in size, effectively minimizing the amount of imprecision introduced by the approximation.

[24] Approximating a random set by the above method results in nested focal elements, which comprise a fuzzy set *A* [*Klir and Yuan*, 1995], a special type of random set and defined by a membership function (discussed further below). This fact permits the solution of the transport model over the domain Ω, rather than its power set. For the exact representation of the approximating conductivity fuzzy set on Ω, the membership function is calculated by the one-point coverage function [*Goodman and Nguyen*, 1985] for random sets:

where *C*_{i} are the focal elements of the approximating hydraulic conductivity random set and *μ*_{A} denotes the membership function of the hydraulic conductivity fuzzy set *A*. This fuzzy set representation of the uncertain hydraulic conductivity values allows for the use of the special case of the extension principle, described above, for fuzzy sets [*Dubois and Prade*, 1991], which states that in order to calculate the possibility value of an uncertain concentration value one must consider membership values of hydraulic conductivities used to calculate that concentration:

where *μ*_{f(A)} represents the membership function of the fuzzy concentration at a given node.

[25] Where fuzzy sets are used to approximate random sets, the application of the extension principle [*Klir and Yuan*, 1995] to the model equations is relatively straightforward and has precedent in hydrogeological applications [*Dou et al.*, 1995, 1997a, 1997b; *Prasad and Mathur*, 2007]. Nevertheless, in our example case the vertex method [*Ross*, 2004], an approximation to the extension principle, is applied to reduce computational effort. The vertex method results in, for each *α*-cut (an interval created by the horizontal cut of a fuzzy set at a given *α*, or membership value), eight concentration values at each location due to the eight possible permutations of the three lower and three upper hydraulic conductivity bounds (from three uncertain hydraulic conductivity values) of the *α*-cut. The minimum of these eight values is taken as the lower bound of the concentration *α*-cut, and the maximum is assigned as the upper bound.

[26] If the function being extended (i.e., the transport model) is nonlinear and monotonic with respect to its variables (in our case, three hydraulic conductivity values), then there is a possibility that the function will take smaller or larger values for a combination of three hydraulic conductivity values that are not necessarily a permutation of the respective *α*-cut bounds, but rather a permutation of values sampled anywhere within these bounds. If one computes with the 8 permutations of the bounds of the *α*-cuts of the three fuzzy hydraulic conductivity fuzzy sets and calculates eight concentration values, one assumes that the concentration at that node cannot get any smaller than the minimum of the eight values and cannot get any larger than the maximum of the eight values. In other words, one admits that there is no need to investigate the entire hydraulic conductivity *α*-cuts in order to determine the *α*-cut bounds of the concentration at any node.

[27] If the alpha cuts of the fuzzy hydraulic conductivities have one or more extreme points in the interior, then the vertex method approach can be taken as approximations to the true global extreme values that determine the bounds of the concentration at the given alpha level. This will result in fuzzy nodal concentration values with narrower support implying higher specificity in the information content than there actually exists [*Klir and Yuan*, 1995]. However, it is the authors' opinion that the nonlinearity of the relationship between nodal concentrations and hydraulic conductivity is mostly monotonic and the extreme concentrations will be at the vertices (*α*-cut bounds), rather than anywhere between the vertices. Thus, for the contaminant transport problem considered here, it is more efficient to use the vertex method than to invoke a global optimization tool that implements the extension principle and, as such, searches the entire *α*-cut. Where monotonicity cannot be justified, the authors recommend applying the extension principle.

[28] Figure 4 provides examples of a fuzzy set defining an uncertain hydraulic conductivity value. The interpretation of these fuzzy sets in Figure 4 that is most pertinent to this topic is that of a possibility distribution [*Zadeh*, 1978]. Where model inputs are defined as fuzzy sets, model estimates of concentration are interpreted as possibility distributions. A possibility distribution defines for each value along the horizontal axis the degree to which that value is possible, given available evidence. In Figure 4 (top left), for instance, the hydraulic conductivity value 3.5 × 10^{−9} m/s is most possible. This is similar to probability theory, whereupon inspection of the peak of a probability density function would reveal the most probable value.

[29] Whereas, through the transformation to fuzzy sets, random sets offer a relatively facile strategy for computation with uncertainty, the most significant advantage to the random set approach is its potential to characterize both aleatory and epistemic uncertainty. The foundation for random sets lies in probability theory, which, as mentioned above, is ideally suited for aleatory uncertainty characterization. On the other hand, random sets are less specific, or less precise, than random variables [*Joslyn and Booker*, 2004], because focal elements, upon which random set are based, are a source of imprecision in the uncertainty quantification process (focal elements associated with conventional probabilities, random variables, are points and therefore more precise than general random sets). This imprecision is a form of epistemic uncertainty. An expert who provides a body of evidence to characterize the uncertainty regarding some hydraulic conductivity measurement actually admits to the existence of these two forms of uncertainty. The natural randomness (aleatory uncertainty) of hydraulic conductivity is captured by the stochastic nature of random sets (the basic mass assignments). Owing to an expert's inability to precisely define this natural randomness, a random set merely provides bounds on the exact random variable (because the basic mass function is defined over intervals of the conductivity domain rather than the domain itself), thereby imprecisely defining the random variable (epistemic uncertainty).

[30] The starting point for the application of random set-based uncertainty characterization is similar to the probabilistic approach presented above. Given the measurements in each zone in Figure 1, the expert, armed with knowledge of the measurement technique and aquifer characteristics, provides the aforementioned intervals on the each measured value by specifying that the true hydraulic conductivity value lies within ±2 orders of magnitude of the measurement. As mentioned above, where the measurements in a particular zone are used to construct an approximate CDF, these expert-provided intervals become upper and lower bounds on the zone's true random variable hydraulic conductivity (plausibility and belief, respectively). Figure 3 shows these upper and lower bounds for zones one, two and five. As in the Monte Carlo approach, the hydraulic conductivity values for zones three and four are considered certain and precise. The fuzzy set approximations of these random sets, whose construction is outlined above, are shown in Figure 4.

[31] Though the set of intervals and associated probabilities provided by the expert are a natural extension of the confidence interval in the Monte Carlo approach above, they comprise a greater amount of information. As such, the bodies of evidence provided by the expert capture both the uncertainty surrounding the mean hydraulic conductivity values and the imprecision with which the expert can truly characterize this uncertainty.

[32] The vertex method [*Ross*, 2004], an approximation to the extension principle, was applied to the finite element approximation equations of the groundwater flow and transport model in order to propagate the possibilistic uncertainty through to the concentrations values. As a result, uncertain concentration estimates are described by possibility distributions. The possibilistic concentration values can be transformed into upper (plausibility) and lower (belief) bounds on the unknown random variable. The resulting bounds for the nodes of interest are plotted in Figure 5 with the corresponding random variables from all three cases of uncertainty in cumulative distribution function form (dashed lines) from Figure 2. If the intervals used to construct the random sets are certain to contain the value of the measured variable, the true (and unknown) probability distribution, which the confidence intervals aim to characterize, defining a random variable lies between the plausibility and belief curves, especially where these bounds are widely separated [*Ferson et al.*, 2002]. Thus, the burden is upon the expert, who specified the magnitude of measurement error that creates the random sets, to ensure that the measurement intervals are wide enough to bound the true hydraulic conductivity measurement yet narrow enough so as to be meaningful. Since the error of various measurement techniques are commonly acknowledged and, at times, quantified [*Mathon et al.*, 2009] we perceive this task to be reasonable.

### 5. Conclusion

- Top of page
- Abstract
- 1. Introduction
- 2. Problem Statement
- 3. Random Set Uncertainty
- 4. Discussion of Results
- 5. Conclusion
- Acknowledgments
- References
- Supporting Information

[36] Because thorough hydrogeological investigations cost significant amounts of money and time, an efficient means of data acquisition and interpretation is valuable. One such means is expert knowledge extraction. However, the consideration of expert knowledge introduces epistemic uncertainty in addition to the existing stochasticity in hydrogeological parameters, such as hydraulic conductivity. Thus, appropriate characterizations of uncertainty should delineate aleatory uncertainty from epistemic uncertainty, as has been done in risk assessment and reliability engineering [*Helton et al.*, 2000a, 2000b, 2004; *Hofer et al.*, 2002; *Helton and Oberkampf*, 2004; *Oberkampf et al.*, 2004].

[37] In this paper, we have sided with *Ganoulis* [1996] who also argued that probability theory alone cannot accomplish this. The hazard associated with applying traditional probability theory is that the opining expert may provide inaccurate and artificially precise characterizations of the random variable that best captures the naturally stochastic nature of hydraulic conductivity. Fuzzy set theory, on the other hand, has failed to find mainstream acceptance perhaps as a result of its departure from probability theory.

[38] Random sets were introduced in this paper as an alternative and possibly more appropriate means for the characterization of both aleatory (the random variable) and epistemic (expert-characterized measurement error) uncertainty. This approach to uncertainty characterization provides a methodology for bounding an unknown random variable and properly capturing the imprecise nature of expert knowledge. In the provided example, expert knowledge was used to characterize the reducible uncertainty of individual hydraulic conductivity measurements. Random sets were collected from these individual measurement intervals and propagated through a groundwater flow and transport model using fuzzy set methodologies.

[39] Aside from avoiding the imposition of false precision, which is an unfortunate side effect of defining confidence intervals, it is important to note that uncertainty characterization via random sets eliminates the need for any probability model definition or assumption. Moreover, the approximation of the random sets by fuzzy sets and model execution with the fuzzy extension principle accomplishes what fuzzy set-based hydrogeological research has, as yet, failed to embrace – the combination of probability theory and fuzzy sets for the characterization of parameter uncertainty. If fuzzy set theory is to find a stronger foothold in engineering applications, researchers must endeavor to embrace hybrid frameworks that unite fuzzy sets with more traditional mathematical tools such as probability, as is illustrated by this paper.

[40] While the representation of model concentration estimates as upper and lower probabilities (plausibility and belief) provides a transparent comparison between the random variable and random set approaches defined above, the utility of data in such a form may not be immediately obvious. What does one do with an imprecise notion of a stochastic estimate concentration (Figure 5)? In fact, the representation of concentration estimates as possibility distributions (like the possibilistic hydraulic conductivity values in Figure 4), which contain the same information, and can be transformed into, probability bounds, is quite intuitive and readily interpretable. Inspection of a possibility distribution reveals not only the most possible concentration value, but also a range of concentrations that are also possible to lesser and varying degrees. In fact, algorithms have been developed to complement and refine possibilistic model estimates with new information [*Fruhwirth-Schnatter*, 1993; *Pan and Klir*, 1996; *Yang*, 1997; *Ross et al.*, 2007, 2008].

[41] A benefit of separately characterizing aleatory and epistemic uncertainties is the possibility of identifying where and what type of additional information is most beneficial. The value of additional information is correlated with the reduction in reducible (epistemic) uncertainty realized by the consideration of the new information; this is easy to identify using belief and plausibility curves. The appropriate measure is the magnitude of epistemic uncertainty as indicated by the distance between the belief and plausibility curves. As noted above, the concentration estimate at node 190 is less precise than that at node 234 (Figure 5). Thus additional data are most valuable at node 234, where the reducible uncertainty, and likewise the distance between belief and plausibility curves, is greatest. *Klir* [2006] provides a set of measures to quantify the amount of information as well as uncertainty in random sets.

[42] Though the application presented above focuses upon the characterization of uncertainty in hydraulic conductivity measurements, other forms of uncertainty such as boundary conditions also can be considered. In the case of boundary conditions, which originate predominantly from expert insight, fuzzy sets can be used directly as a characterization methodology, bypassing the need for random sets. The propagation of these forms of uncertainty through a groundwater flow and transport model is executed as presented above.