The methods of maximum likelihood (ML) and maximum a posteriori (MAP) are positioned on the probabilistic side of the spectrum and they focus primarily on estimation. In MAP, for example, following the MAP concept, one gets a MAP estimate, whereas drawing samples from the MAP distribution, if available, would amount to simulation. The pilot point method is positioned on the fitting side of spectrum, with a focus on simulation because it is defined by an objective function that is based on one or more fitting criteria, and because it produces some sort of conditional simulations. MAD is positioned on the probabilistic side of the spectrum, and it includes elements from both estimation and simulation, as will be explained in our subsequent discussion.
8.1. MAD and Maximum Likelihood (ML)
 MAD and ML are both probabilistic methods. The difference between MAD and ML is that ML is focused on finding an estimate of the unknown parameter, and is thus an estimation theory method, whereas MAD focuses on obtaining the distribution of the unknown parameter, and is thus a Bayesian method. It can be shown that MAD is an extension of the ML logic. For example, the ML approach of Kitanidis and Vomvoris  can be related to MAD through equation (3). ML focuses on the likelihood term in equation (3), namely, p(zb, za), and aims at estimating the vector . Usually, a modeling assumption is made with regard to this distribution and the parameters of the assumed distribution comprise the vector . The ML parameter estimates are those that maximize the model approximation of p(zb, za), or in other words, the probability of observing the data. The parameter vector models the global trends of the target variable (e.g., through its moments), and it is not intended to capture local features, which is the role of the anchors in MAD. One could possibly add anchors into the likelihood function, rewriting it in the form p(zb, za, ) and obtaining the ML estimates of both and . This would lead to a formulation of ML along the lines proposed by Carrera and Neuman [1986a, 1986b] and Riva et al. . But including anchors in the likelihood function would not amount to transforming ML into MAD because ML derives single-valued parameter estimates whereas MAD derives parameter distributions.
 One of the challenges facing ML is providing estimation variances. Under some assumptions [Schweppe, 1973], ML can provide lower bounds for the estimation variances. These variances can be translated into statistical distributions by assuming some sort of distribution: a Gaussian model is justified asymptotically. The assumption of Gaussianity is reasonable and in many cases justified [cf. Woodbury and Ulrych, 1993], but it cannot be guaranteed a priori. This is shown in Figure 3, where the distributions do not appear to be Gaussian, but it appears that a Gaussian approximation could work very well in this case. We will show later that it does not always work.
8.2. MAD and Maximum A Posteriori (MAP)
 MAP, similar to ML, is a probabilistic method that aims at obtaining parameter estimates [McLaughlin and Townley, 1996]. MAP derives parameter estimates but not their distributions. Consider the posterior distribution shown in equation (3), and let us do a couple of things: first, let us ignore the anchors and second, let us replace the prior distribution of the parameters with a prior distribution for za. This leaves us with the MAP distribution in the form:
MAP proceeds by assuming models for the distributions appearing on the right-hand side of equation (18). The MAP parameter estimates are those that maximize the model approximation of equation (18). In other words, the MAP parameter estimates are those that correspond to the mode of the parameter distribution.
 The prior p(za) in MAP acts to regularize the solution by stabilizing it around the prior, but unlike the Pilot-Point method (PPM), its weight is not manipulated to control the results: in MAP, the prior is a starting point, not a constraint! We shall see below that the transformation of the prior term into a regularization term, as done by PPM, has significant consequences.
 The likelihood function p(zb, za) is commonly taken as p(b) where b = zb − M(). The error terms in p(b) are usually assumed to be zero-mean, uncorrelated and Gaussian [McLaughlin and Townley, 1996]. Similar assumptions could be made in ML. A modeling assumption is a required component of both ML and MAP because both seek parameter values that are defined by a characteristic of the assumed distributions (e.g., the mode in ML). In other words, both ML and MAP use parametric models. MAD, on the other hand, estimates the likelihood function and not its parameters, and hence can employ nonparametric likelihood functions. The advantage of employing nonparametric models is in the flexibility it offers in terms of model selection, but this of course comes with a heavy computational price tag. Additional discussion on the differences between ML and MAP is provided in the work of Kitanidis [1997a, 1997b].
8.3. MAD and the Pilot Point Method (PPM)
 In this section, we will highlight the differences between PPM and MAD. PPM was reported in several studies [e.g., Doherty et al., 2003; Kowalsky et al., 2004; Hernandez et al., 2006; Alcolea et al., 2006]. PPM is fundamentally different from ML, MAP and MAD in that it is a model-fitting method and not a probabilistic method. We will show that PPM's goals are vastly different from the other methods, and we will show how meeting these goals affect the results. We will also show that pilot points are not anchors: not only by name but also not in concept.
 Let is start by summarizing how PPM works. Schematically, it works like this (specific details may vary between authors):
 1. Define a vector of structural parameters using the data za.
 2. Generate an unconditional realization of Y. The generated field 0 is made conditional on za.
 3. Determine the number and locations of the pilot points, and assign to them initial values y0. The initial set of pilot point values y0 is taken from 0.
 4. Set an objective function. The objective function intends to control the values assigned to the pilot points. Additional discussion of the objective function is provided below.
 6. The final product of this process is the field 0 conditioned on the set of pilot point values y that were obtained from the optimization process.
 Following this summary, we shall look at the following aspects of PPM: (1) The significance of PPM's stated goals, (2) the use of pilot points as fitting parameters, and (3) the implications of the optimization procedure. In doing so, we shall also highlight the differences between PPM and the other methods.
 By following the procedure outlined above, PPM attempts to achieve several goals [Cooley, 2000, and Cooley and Hill, 2000]. The first goal is to maintain the frequency distribution of the target variable in similar to the distribution observed by measurements of the target variable. The second goal is to generate realizations of that are equally likely, and the third goal is to closely reproduce the observations of the dependent variables zb.
 Let us consider the first goal. This goal poses several challenges because, first, the observed distributions of the target variable are either poorly defined or nonexistent to begin with. However, for the sake of discussion, let us assume that some data are available to construct a prior distribution for Y from Type A data, p(yza). If Type B data is available, it should be used as additional source of information, leading to p(yzb, za). These distributions could be different, and it is reasonable to expect that they will, because the Type B data brings additional information into consideration. And so we should ask ourselves whether it is reasonable or helpful to consider p(yza) as a constraint. MAP and MAD recognize the significance of the prior, but they do not use it as a constraint: they use it as a starting point, because they recognize that it could change if we have informative Type B data. Similarly, ML does not use p(yza) as a constraint.
 The second PPM goal [Cooley and Hill, 2000] is the generation of equally likely realizations of the target variable field . This goal is challenging on several counts. First, generating a realization is a Bayesian concept. Within classical statistics we have the concept of generating conditional realizations of estimates, which can be obtained by somehow perturbing the data. PPM is not a Bayesian concept nor an estimation method, so it is unclear what the PPM realizations represent. Second, there is a question of semantics here: in order to qualify multiple realizations as equally likely, one needs to have a statistical model to quantify that likelihood in the first place, which PPM does not have. So perhaps a more accurate adjective to use instead of “equally likely” would be “equally drawn.” Third, even if one assumes that PPM can generate equally likely realizations, the advantage of working with such realizations is questionable, because for prediction one would want to consider more likely and less likely realizations, or in short, random sampling. Random sampling is the key for sampling the complete probability space without bias. It is a fundamental tenet of statistics that samples must be drawn at random [Mugunthan and Shoemaker, 2006] in order to prevent bias. Equally likely realizations do not amount to random sampling, as shown in the next paragraph. MAD, in contrast, produces plausible realizations, with various degrees of plausibility, as measured by probability.
 Let us be more specific with regard to the third point of the last paragraph. PPM considers only realizations that cross a preset threshold value defined for the objective function. That means that realizations that do not cross the threshold will not be admitted into the pool of realizations. But the rejected realizations may be defined by nonzero probability, and hence should not be eliminated from consideration. Surprisingly, it is not only poor-performing (“poor” from the PPM objective function sense) realizations that are rejected by PPM, but also the superior realizations are rejected, because the PPM search algorithm stops once the preset threshold is crossed, and no efforts are made to further improve them. We can conclude then that PPM under-samples the probability space, and is potentially biased. The bias effect due to optimization was noted in Mugunthan and Shoemaker [2006, p. W10428] where a comment is made on “…bias introduced during optimization because of over-sampling of high goodness-of-fit regions of the parameter…” These effects may be small or large, we cannot say. For a credible application, PPM applications must show that this effect is small.
 An important implication of the biased sampling is that PPM cannot assign probabilities to realizations. Consequently, users cannot assign probabilities to events that are modeled based on the PPM realizations. In MAD, on the other hand, no optimization is used and no threshold criteria are set, thus avoiding all these issues altogether. The probability space is sampled exhaustively and without bias, and realizations can be associated with probabilities using the posterior distributions or by looking at histograms of events.
 Let us now take a look at the role of the pilot points in achieving the PPM third goal, which is the reproduction of the observations. PPM uses pilot points as fitting parameters. PPM uses many pilot points for that, and in fact it encourages the user to add as many pilot points as one would need [Doherty, 2003]. This aspect of PPM underlies its need to use the plausibility term (synonymous with the more often used regularization term) discussed in Step 4 of the algorithm. The plausibility term is used to control the problem of using a number of fitting parameters (the pilot points) that can far exceed the number of observations. Tikhonov and Arsenin  showed theoretically that it is possible for a model to fit observations exactly when the number of parameters is equal to the number of data, and that additional parameters render the problem singular unless regularization is applied. This situation applies to PPM, and was confirmed in a study of PPM by Alcolea et al. [2006, p. 1679] who commented that over-parameterization (in the form of a large number of pilot points) “…leads to instability of the optimization problem” and that “…instability implies …large values of some model parameters due to unbounded fluctuations … large jumps in the value of hydraulic properties…” etc. This instability is brought under control in PPM by the regularization term [which is referred to as the plausibility term in Alcolea et al., 2006]. The weight assigned to the plausibility term can be made arbitrarily large (or small) depending on the magnitude if the instability. That effect, although beneficial from the instability perspective, is the root cause of PPM's biased sampling because it controls the extent of censoring (from both the “bad” and “good” realizations sides, as discussed earlier). It should also be noted that “…the degree of data reproduction is a poor indicator of the accuracy of estimates” [Kitanidis, 2007].
 In additional to creating instability, pilot points can lead to artifacts in the generated fields. Because in PPM the introduction of additional pilot points is the only PPM mechanism for improving model performance and addressing neglected elements (such as three-dimensional flow, unsteady flow, recharge and leakage, geological discontinuity and such), it could lead to the appearance of artificial features in the target variable field realizations [see Cooley, 2000, p. 1162]. Alcolea et al. [2006, p.1679], confirmed the existence of this effect and indicated that it could be controlled by regularization, but it is unclear how and to what extent. Kitanidis  also confirmed the existence of this effect when he noted that “By over-weighting the data reproduction penalty, the data are reproduced more closely and more details appear in the image that is estimated from the optimization but the image is also more affected by spurious features.” Studies such as Hernandez et al.  suggest that the artifact issue can be brought under control, but it is unclear what constitutes an artifact (except after it shows up) and how this aspect of the simulation can be managed. To summarize, the plausibility term, in addition to controlling instability, is also used for reducing artifacts. But in the absence of any indication to the contrary, one can only speculate on how efficient it is in doing so.
 The fundamental difference between PPM and MAD in this regard is that MAD is a Bayesian method whereas PPM is a model-fitting method. Specifically, anchors are not fitting parameters: they are devices for reducing data into a suitably convenient form. PPM attempts to fit measurements by adding pilot points and tweaking their assigned values, whereas MAD does not fit anything: it is built around estimating statistical distribution of the differences between observations and model predictions. MAD can use parameters to model distributions, but it does not adjust point estimates.
 The likelihood function in MAD is the only subject of estimation. Estimating the likelihood function in MAD is unlike the fitting exercise of PPM because of the number of data points involved. In PPM the number of data points is limited to the number of measurements, whereas in MAD the number of data points corresponds to the number of differences between measurements and predictions, which can be set arbitrarily high (it depends on the number of Monte Carlo realizations generated for the purpose of estimating the likelihood function). For example, consider the case of N measurements. PPM attempts to fit the N measurements with a number of pilot points that can far exceed N, whereas in MAD the number of data points is on the order of N × 106 or more. Theoretically, there is no limit on the number of realizations that could be generated, and hence stability is not an issue in MAD.
 The issue of stability is demonstrated in Figure 6. Figure 6 shows the marginal distribution of the pressure at a validation point (this is a point not used for inversion, but for testing the quality of the predictions). The pressure distribution is shown here for different numbers of anchors. The distribution does not show any sign of instability as the number of anchors increases. Similar stability was observed for dozens of points spread all over the simulated domain. Figure 6 demonstrates that MAD does not have a stability problem, despite the fact that it does not use any regularization term. It also highlights the issue of anchor placement (see section 5), the point being that the convergence of the statistical distributions of the target variables provides an indication that the number of anchors used reached a satisfactory level in terms of the ability to extract information form the data. Such a measure of sufficiency is not available with PPM.
 This section includes a brief comparison of MAD with the PPM case study presented by Hernandez et al. , subsequently referred to as H06. The case study focuses on a rectangular flow domain with spatially variable Y = ln(conductivity). The spatial variability is modeled using a stationary mean and an exponential spatial covariance with variance of Y, σY2, equal to 4, and an integral scale equal to 1. The hydraulic head gradient was defined by a head difference of 10 length units. For both the head and Y, measurement errors were added to the data, defined by a unit variance. We implemented the same models in our case study. We did not have access to the baseline Y field of H06, and we generated our own baseline field using the same spatial variability model.
 As discussed earlier, there are fundamental differences between MAD and PPM and we shall not repeat them here. In this section we shall focus on specific details related to the implementation of MAD and PPM to this case study and on some results. The first difference between MAD and H06 is with regard to estimating the parameters of the spatial variability model. H06 provides estimates for the parameters based on alternative criteria of optimality, whereas MAD considers the statistical parameters as random variables and derives their distributions.
 The second difference concerns the statistical model that was employed for modeling the joint distribution of the heads and Y.H06 assumed the heads to be spatially uncorrelated. Their model amounts to assuming that the heads are deterministic variables subject to uncertainty due to spatially uncorrelated errors (which is the structure often assumed for measurement error). This assumption is not in line with the approach employed by H06 for modeling of Y, which assumed Y to be a spatially correlated space random function. H06 also assumed the heads to be uncorrelated with Y, whereas Y was assumed to be normally distributed and spatially correlated. These assumptions are not in line with multiple studies and observations that showed the heads to be spatially correlated and cross-correlated with Y, and furthermore, to be normally distributed only for σY2 smaller than 1 [Rubin, 2003]. This statistical model was employed in other studies as well [cf. Kowalsky et al., 2004]. Regardless of whether this model is justified or not, the point we want to make is that PPM assumptions are made which may be simplified or restrictive. MAD, in contrast, does not require making any assumption in this regard, and it derives a nonparametric posterior distribution.
 We chose to demonstrate the differences with two sets of results. The first set of results focuses on the hydraulic heads, and the second set focuses on the geostatistical parameters. Figure 7 shows the actual and expected values of the hydraulic head along the centerline of the flow domain. We also show the 95% confidence intervals obtained with MAD as well as those identified by H06 and given in their Figure 5. Our 95% confidence intervals are taken directly from the head distribution (see Figure 8) whereas H06 obtained theirs by computing the variance of the head through the ensemble of realizations and assuming a Gaussian distribution. Comparison of confidence intervals shows our confidence intervals to be somewhat larger than those predicted by H06. As discussed earlier, PPM censors suboptimal as well as above-threshold optimal realizations, which leads to underestimation of the confidence intervals and explains this difference. H06 upper confidence interval near the left boundary allows the head to vary above the head at the boundary, and similarly, the head is allowed to vary below the lower bound set by the right boundary. This is an outcome of the assumption of Gaussianity. In MAD, however, the upper and lower bounds of the distributions next to the boundaries are bounded correctly, because the distribution is derived, not postulated.
Figure 7. The hydraulic head along the center line of the flow domain. Reference denotes the actual head. The predicted mean is obtained using MAD based on the posterior distributions of the parameters and anchors. The 95% confidence intervals shown are those obtained by MAD and by Hernandez et al. , denoted here by H06.
Download figure to PowerPoint
 Figure 8 shows the head distributions at different locations along the transect shown in Figure 7. The distributions are shown on q-q plots, so that we could evaluate their departure from Gaussianity. We note that the head distributions are non-Gaussian throughout the flow domain. They are strongly skewed next to the boundaries because of the constraints imposed by the nearby boundaries. The departure from Gaussianity is much less pronounced around the center of the flow domain (Figure 8c), but the departure from Gaussianity is still pronounced at the tails. H06, on the other hand, assumes the heads to be Gaussian throughout the transect. One consequence of that assumption is that heads are allowed to be larger than 10 and smaller than zero next to the upper and lower boundaries, respectively. As noted earlier, MAD does not assume posterior distributions, but rather infers them, and a consequence of that is the flexibility to obtain a variety of distributions, and a better compliance with the underlying physics.
 Our next set of results deals with estimates of the geostatistical parameters, which MAD provides in the form of statistical distributions. Figure 9a shows the probability density functions obtained by MAD for three different sets of data and error levels. In all cases the distributions are well-aligned with the theoretical the actual value, which is around 1 unit length. H06 deals with this parameter in their Figure 14. They do not provide distributions. Instead, they attempt to identify an optimal value by analyzing various performance criteria for a wide range of parameter values. Theoretically, these criteria should peak at the vicinity of the actual values. Of the five evaluation criteria tested in Figure 14 of H06, none peaked at around 1, but instead peaked at around zero.
Figure 9. The posterior distributions of the scale and variance of the log conductivity. The vertical bars show the actual values of the baseline field. A and B refer to data types used.
Download figure to PowerPoint
 Figure 9(b) provides our results for the variance. All cases we analyzed show well-defined peaks somewhere between 4 and 5 (with the higher values corresponding to the case with large measurement error on the heads). Results in H06 for this parameter are provided in their Figures 13 and 16. The evaluation criteria they used did not peak at the actual values. A few criteria show preference toward high values, but without displaying a well-defined peak. The various evaluation criteria in H06 are not consistent in the trends they display. For example, in Figure 13 of H06 one the evaluation criteria identified the variance at around 0.5 whereas the others seem to prefer 6. This leaves unclear which criteria should be selected a priori as the most reliable. We speculate that this insensitivity or underperformance of the evaluation criteria in H06 could be related (1) to the elimination of the high-probability and low-probability events due to the use of optimality criteria as discussed in section 8.3 and (2) to the use of multiple fitting parameters, in the form of pilot points, which masks the actual spatial structure by introducing artifacts. This possibility was alluded to in the works of Cooley  and Cooley and Hill .