This study contributes the Adaptive Strategies for Sampling in Space and Time (ASSIST) framework for improving long-term groundwater monitoring decisions across space and time while accounting for the influences of systematic model errors (or predictive bias). The new framework combines contaminant flow-and-transport modeling, bias-aware ensemble Kalman filtering (EnKF), many-objective evolutionary optimization, and visual analytics-based decision support. The ASSIST framework allows decision makers to forecast the value of investments in new observations for many objectives simultaneously. Information tradeoffs are evaluated using an EnKF to forecast plume transport in space and time in the presence of uncertain and biased model predictions that are conditioned on uncertain measurement data. This study demonstrates the ASSIST framework using a laboratory-based physical aquifer tracer experiment. In this initial demonstration, the position and frequency of tracer sampling was optimized to (1) minimize monitoring costs, (2) maximize the information provided to the EnKF, (3) minimize failures to detect the tracer, (4) maximize the detection of tracer fluxes, (5) minimize error in quantifying tracer mass, and (6) minimize error in quantifying the centroid of the tracer plume. Our results demonstrate that the forecasting, search, and visualization components of the ASSIST framework represent a significant advance for observation network design that has a strong potential to innovate our characterization, prediction, and management of groundwater systems.
 Formal observation (or monitoring) network design has a long history dating back to the pioneering work of Drozdov and Sepelevskij  who developed a formalized framework for evaluating the spatial coverage of meteorological gauge networks. Early work in observation network design is strongly dominated by the meteorological community (for a review, see Arnold and Dey ) beginning in the 1950s and continuing to the present, using the concept of observation system simulation events (OSSEs). In OSSEs, the values of observables are forecasted in terms of their ability to enhance system predictions. The meteorological OSSE formalizes the use of predictive models and statistical data assimilation to discover the need for and benefits of new observations. It can be viewed as a physics-informed experimental design. The classical OSSE combines predictive simulation and Bayesian data assimilation to forecast the value of observations. The literature reviewed by Arnold and Dey  poses an important exemplar for the scientific value of coevolving observation and simulation systems, each of which benefit from the rigorous evaluation of predictive skill using forecasts of actual system events conditioned on proposed observation strategies.
 Although observation network design has also been a significant focus of the early water resources research literature as evidenced by the work included in the inaugural volume of this journal [see Fiering, 1965], hydrological observation network design frameworks have lagged behind the formality and innovations provided by the meteorological OSSEs. Langbein's  summary of one of the most comprehensive efforts in the water resources literature to focus on the science of observation network design provides cogent criticisms and challenges to the state of hydrological science at present. There are few examples of OSSE-type hydrological experiments where forecasts of system dynamics are used to inform subsequent laboratory- or field-based experimental design. Instead, the dominant approach is ad hoc observation and postevent analysis. Moss [1979a] highlights that our ability to understand the space and time tradeoffs implicit to hydrological observation network design requires the consideration of a third fundamental dimension for the problem, model errors. Systematic errors in our models of hydrological systems provide a barrier to using OSSE frameworks to advance our observation networks. Exacerbating this barrier, Lettenmaier  highlights that as a problem class, observation network design poses a curse of dimensionality where there are large numbers of objectives and uses for data as well as exponentially scaled growth rates for the range of alternative space and time decisions that can be considered.
 Using a groundwater application context, this study contributes the Adaptive Strategies for Sampling in Space and Time (ASSIST) framework to advance our ability to manage the technical barriers posed by observation network design as a general class of problems. As demonstrated in this paper, the ASSIST framework is a highly adaptable methodology for improving long-term groundwater monitoring (LTGM) decisions across space and time while accounting for the influences of systematic model errors (or predictive bias). This paper demonstrates how bias-aware Ensemble Kalman Filtering (EnKF) [Kollat et al., 2008a], many-objective (i.e., greater than three objectives) search using hierarchical Bayesian optimization [Kollat et al., 2008b], and interactive high-dimensional visual analytics [Kollat and Reed, 2007a] can be combined to facilitate discovery and negotiation in the LTGM design process. Our use of the terms discovery and negotiation is motivated by the potential of many-objective solution sets to identify alternatives that capture a broad suite of system behaviors relevant to both modeled and unmodeled objectives [see Brill et al., 1990]. This ultimately enables decision makers to discover system dependencies and/or tradeoffs and exploit this information in the adaptive long-term management of observation systems.
2. Prior Work
 The growth of the environmental movement and the promulgation of the U.S. Clean Water Act motivated an increased focus on characterizing and managing contaminated groundwater resources in the 1980s. In support of this change, geostatistical frameworks became a dominant technical focus within the LTGM design literature [Hughes and Lettenmaier, 1981; Carrera et al., 1984; Olea, 1984; Bogárdi et al., 1985; Rouhani, 1985]. These studies largely focused on the ability of kriging frameworks to provide spatial measures of the value of new groundwater contaminant observations in reducing the interpolation scheme's estimation variance.
Eigbe et al.  provided an excellent review of groundwater applications of the linear Kalman filter as well as the extended Kalman filter for nonlinear systems. Their review highlighted that very few studies exist for three-dimensional groundwater flow and transport applications due to these problems' high-dimensional, nonlinear state spaces (i.e., heads and concentrations) and their consequent computational barriers. Notable contributions for the application of extended Kalman filtering have focused on analytical [Graham and McLaughlin, 1989a, 1989b, 1991] and numerical techniques [Eppstein and Dougherty, 1996, 1998] for reducing the computational constraints posed by recursively updating the mean and covariance of contamination fields. Although these studies advanced the applicability of the extended Kalman filter for both parameter and multistate estimation, the methodology is still limited in its ability to model highly nonlinear systems, especially in the presence of systematic modeling errors and/or observation errors. These challenges motivated our development and use of the bias-aware EnKF [Evensen, 2003; Drécourt et al., 2006; Kollat et al., 2008a] in the ASSIST framework.
3.1. Physical Aquifer Experiment
 A unique contribution of this work is the rigorous evaluation of the ASSIST framework using the experimental aquifer illustrated in Figure 1 to perform advection-dominated observation system simulation experiments [Arnold and Dey, 1986]. This work expands the OSSE concept to include formal exploration of forecasted observations using many-objective network optimization. As an OSSE platform, the University of Vermont (UVM) tank aquifer illustrated in Figure 1 was used for a 19 day ammonium chloride tracer experiment where a hydraulic gradient of 2.9 cm was established over the length of the tank.
 In the laboratory tracer experiment, ammonium chloride was injected through port B4 (see Figures 1 and 2) at an average concentration of 1 g L−1and at a rate of 1.5 L h−1over a period of 15 days. Concentration data was then collected from the TDR probes located in layers 3–5 (63 total locations) over a period of 19 days at intervals of 17.5 min. Note that the high variability and complex loading dynamics of the tracer injection at B4 would not be available for typical groundwater transport modeling efforts and would be modeled with a simplified source term similar to the one we have used (see port B4 in Figure 2). As highlighted by Kollat et al. [2008a], the simplification of the initial condition creates a very severe bias in transport forecasts that are compounded by uncertainties in the hydraulic conductivities (despite a broad range of tracer, slug, and pump tests). The uncertain hydraulic conductivity of the fine sand lens in the center of the tank illustrated in Figure 1 exacerbated the fully three-dimensional systematic bias effects that arise from our model's simplification of the complex ammonium chloride loading dynamics.
 While the ASSIST framework has been developed specifically for field-scale applications, the sparsity of data associated with field-scale conditions would make it extremely difficult to rigorously evaluate its performance (i.e., avoiding conjecture on the quality of sampling decisions and bias-aware forecasts in space and time). Overall the experimental conditions of the UVM tank pose a severe test for the ASSIST framework's ability to forecast approximately optimal tradeoffs for sampling decisions from the tank's preexisting ports in both space and time, in a data-rich experimental environment. Additional details on the UVM tank are available from Kollat et al. [2008a].
3.2. Numerical Flow and Transport Modeling
 For this study, subsurface flow was modeled using the parallel, three-dimensional variably saturated groundwater flow model ParFlow [Ashby and Falgout, 1996; Jones and Woodward, 2001; Kollet and Maxwell, 2006; Maxwell and Kollet, 2008a]. ParFlow is under collaborative development by the Colorado School of Mines, the Center for Applied Scientific Computing at Lawrence Livermore National Labs, and the University of Bonn. It is also highly scalable [Kollet et al., 2010], which enables it to effectively use large parallel supercomputing environments where it can produce detailed simulations that set it apart from other groundwater models in terms of accuracy and performance [Maxwell and Kollet, 2008b; Kollet and Maxwell, 2008; Frei et al., 2009; Maxwell et al., 2008]. ParFlow uses a Newton–Krylov method [Saad, 2003] to solve Richards's equation [Richards, 1931] for variably saturated groundwater flow and has also been fully coupled to integrated surface and subsurface flow. In addition, ParFlow has been developed to efficiently handle subsurface heterogeneity and uncertainty by providing parallel tools for conditionally simulating permeability field realizations (e.g., Turning Bands Simulation [Tompson et al., 1989] and a parallel version of Sequential Gaussian Simulation [Deutsch and Journel, 1998]). In this study, Turning Bands Simulation was used to generate an ensemble of hydraulic permeability realizations as the primary source of uncertainty in our ensemble of flow-and-transport forecasts.
 Contaminant transport was modeled using the numerical particle transport model SLIM-FAST [Maxwell and Kastenberg, 1999; Maxwell et al., 2007; Tompson et al., 1987, 2005; R. M. Maxwell and A. Tompson, SLIM-FAST: A user's manual, version 2.0, 2006]. SLIM-FAST uses the Lagrangian Random Walk Particle Method (RWPM) to transport particles that represent the concentration and location of contaminant mass throughout a saturated media using the velocity field resulting from numerical flow forecasts attained from ParFlow. Chemical diffusivity, reactivity, and radioactive decay are also effectively simulated by SLIM-FAST using a particle-probability approach [e.g., see Maxwell et al., 2007 supplemental material]. Particle-based transport models such as SLIM-FAST have been shown to be quite effective simulation tools as they tend not to suffer as much from concentration negativity, numerical dispersion, and mass balance inconsistencies [Tompson and Gelhar, 1990; Tompson and Dougherty, 1992]. SLIM-FAST has been effectively demonstrated in tandem with ParFlow in a variety of studies [Maxwell et al., 2008, 2007; Kollet and Maxwell, 2008]. In this study, SLIM-FAST is used to simulate realizations of contaminant plumes on the basis of their corresponding flow field realizations generated using ParFlow.
3.3. Bias-Aware Ensemble Kalman Filtering
 The EnKF [Evensen, 1994, 2003] was developed and grew popular for its ability to provide statistical forecasts of nonlinear system states while accounting for simulation uncertainty and measurement uncertainty simultaneously within an ensemble Monte Carlo framework. The traditional Kalman filter proceeds iteratively in two basic steps. First, the state and error covariance of the system are projected forward in time (i.e., the forecast step). Then, when measurement data becomes available, the state space and its associated error covariance is corrected by assimilating that data and its associated uncertainty (i.e., the assimilation step) [Welch and Bishop, 2004]. A bias-aware version of the EnKF [Drécourt et al., 2006; Kollat et al., 2008a] is used in this work to correct system states by accounting for model uncertainty and using a Bayesian feedback that accounts for systematic model bias to improve transport forecasts. The ASSIST framework has been developed to support operational monitoring network design for groundwater sites where contamination is of concern. In this operational environment, systematic errors in the initial conditions, boundary conditions, and aquifer properties are always a severe problem (e.g., see Task Committee on Long-Term Groundwater Monitoring Design ).
 Using similar notation to Evensen , the bias-aware EnKF state matrix A is defined in equation (1) where n is the number of model states and N is the ensemble size:
 The state matrix is composed of N model state vectors , and N bias state vectors . During a forecast f or assimilation step a, the bias portion of A is updated using equation (2), and subsequently the model state portion of A is updated using equation (3) at each time step j. In both equations, Q is an independent, zero-mean spatially correlated noise field representing model structure uncertainty, and is a scaling factor on this field. Time correlation is added to the bias state matrix using the factor . Note how the forecasted bias states (i.e., deterministic nonzero model errors) have a direct feedback to the model state forecast through equation (3). Drécourt et al.  have shown that this error feedback can eliminate complex drifting bias (i.e., a time varying bias that can be modeled with a first- or higher-order differential equation):
 When measurement data is available, the Kalman Gain matrix K acts as a blending factor that approximately minimizes the a posteriori error covariance of the EnKF using equation (4). In equation (4), P is the forecast covariance of the state emsemble A, H is a measurement operator that maps the m measurement locations onto the n model states, and R is the covariance of the measurement ensemble uncertainty.
K contains corrections for both the model states and bias states and is thus 2n × N in size. Once K is calculated, the state ensemble is updated using equation (5), where the measurement ensemble D (which contains measurement uncertainty) is incorporated:
 The analyzed covariance, , is then obtained using equation (6), which shows that when K is maximized, the data used to condition forecasts have a maximum impact on reducing the analyzed covariance matrix. This reduces the diagonal of the covariance (i.e., the analyzed error variance) and the off-diagonal components (i.e., correlated/redundant measurements):
 For LTGM design, knowledge limitations that create systematic biases in initial conditions, boundary conditions, and spatial aquifer properties represent severe hurdles to effectively modeling these systems, especially within field-scale applications. The bias-aware EnKF represents an important contribution to LTGM design in that it can effectively recognize these biases in the presence of observed data, and incorporate this information into its forecasts. In addition, uncertainty associated with observations of system states are an integral part of the EnKF. When observation uncertainty is low relative to model uncertainty, the EnKF will place more emphasis on the observed states relative to the modeled states during assimilation. Likewise, when observation uncertainty is high relative to model uncertainty, more emphasis will be placed on the modeled system states during assimilation. The EnKF allows the user to carefully quantify and incorporate uncertainty into both the model states and observations through the use of noise fields (see Q and R in equations (2), (3), and (4) that can be controlled both spatially and temporally to reflect user knowledge of system uncertainty. More details on the formulation of the EnKF can be found in the work of Evensen  and additional information on the bias-aware formulation can be found in the work of Drécourt et al.  and Kollat et al. [2008a].
3.4. Optimization and Visualization
Kollat et al. [2008b] used a spatial LTGM network application to introduce the Epsilon Dominance Hierarchical Bayesian Optimization Algorithm (ɛ-hBOA). Broadly, the ɛ-hBOA belongs to a class of solution tools termed multiobjective evolutionary algorithms (MOEAs, for a detailed introduction see Coello Coello et al. ). MOEAs abstract the processes of natural selection to facilitate search for complex discrete, nonlinear, and nonconvex planning problems. Nicklow et al.  provide a detailed review of the broad range of water resources-related applications of MOEAs and their overall benefits. The key defining strength of these methods is their ability to use population-based search to provide high-quality approximations of the full set of Pareto optimal solutions for problems with conflicting objectives. Informally, a Pareto optimal solution represents a decision in which performance in one planning objective cannot be improved without degrading performance in one or more of the remaining objectives. The graphical representation of these tradeoff solutions is the Pareto frontier. In this paper, we are advancing many-objective optimization, which is an emerging focus in MOEA theory and in literature on applications with four or more objectives [Reed and Minsker, 2004; Fleming et al., 2005; Kollat and Reed, 2006; Coello Coello et al., 2007; di Pierro et al., 2007; Kollat et al., 2008b; Kasprzyk et al., 2009].
 Two defining challenges for many-objective LTGM design were considered in our development of the ɛ-hBOA for the ASSIST framework: (1) As the number of objectives increases, MOEAs become prone to search failures that arise from a process termed deterioration [Hanne, 1999], and (2) LTGM decisions have conditional dependencies on one another that if not accounted for can lead to MOEA search failures (termed epistasis failures [Goldberg, 2002]). Deterioration is a very common obstacle in most modern MOEAs. It represents the common problem where solutions are lost over the course of the search. This causes solutions late in the MOEA search to be dominated (i.e., perform worse in all objectives) by results that were found but lost early in the search. Deterioration is a dominant failure mode for MOEAs in many-objective applications. The second motivating challenge, termed epistasis failures, occur when traditional mating and mutation operators are inefficient or incapable of generating child solutions that preserve important dependencies in decisions. For example, it is common for human decision makers in LTGM design to consider a contaminant plume's source area, its longitudinal axis, its transverse axis, and its boundaries. The system geometry and physics that underlie the above sampling rule represent knowledge that traditional MOEA operators will not be able to discover, as has been evaluated rigorously in our prior demonstrations of the ɛ-hBOA [Kollat et al., 2008b; Shah, 2010]. Like deterioration, epistasis failures in traditional MOEAs increase as the number of LTGM decisions and objectives increases.
 Deterioration and epistasis are addressed in the ɛ-hBOA by merging innovations in ɛ-dominance archiving [Laumanns et al., 2002; Kollat and Reed, 2007b] with the emerging new class of evolutionary algorithms termed probabilistic model building genetic algorithms (PMBGAs) (see the review from Pelikan and Goldberg ). The ɛ-hBOA algorithm eliminates the crossover and mutation operators, and replaces them with Bayesian network model building. The ɛ-hBOA combines the strengths of the Epsilon Dominance Nondominated Sorted Genetic Algorithm II (ɛ-NSGAII) used widely in the water resources area [Kollat and Reed, 2006; Nicklow et al., 2010] and the original single-objective hBOA [Pelikan and Goldberg, 2003]. To accomplish this, the Bayesian network model building of hBOA was used to replace the crossover and polynomial mutation operators of the ɛ-NSGAII. This results in a multiobjective algorithm with ɛ-dominance archiving to eliminate deterioration and Bayesian network modeling to more effectively manage LTGM epistasis. The ɛ-hBOA algorithm utilizes a population-based search where LTGM solutions' fitness values are evaluated using the concept of Pareto dominance. A Bayesian network is then iteratively generated from the parent population of designs using the Bayesian information criterion, to evaluate the quality of candidate Bayesian network models' abilities to predict binary LTGM decision variable combinations that will be nondominated (i.e., their performance will not be exceeded in all objectives). Child solutions are then generated by sampling the resulting Bayesian network's probabilistic rules. The algorithm then proceeds similarly to the ɛ-NSGAII where Pareto ranking and crowded binary tournament selection are used to fill a new population of superior designs. The children of the new population are eligible for inclusion in the ɛ-dominance archive and become the parents of a subsequent generation, from which the process is repeated until the maximum number of design evaluations is reached. The ɛ-hBOA has been designed to run on massively parallel computing platforms and is able to scale with near perfect efficiency on thousands of compute cores. In the ASSIST framework, once the ɛ-hBOA identifies Pareto-approximate solutions for many-objective LTGM problems, a high-dimensional visual analytics framework termed Visually Interactive Design using Evolutionary Optimization (VIDEO) [Kollat and Reed, 2007a] enables decision makers to interactively explore and evaluate their inherent tradeoffs. The visualization and exploration capabilities of VIDEO are demonstrated throughout section 5.
3.5. Design Objectives
 The problem formulation given in equations (7)–(9) provide a six-objective demonstration of how the ASSIST framework can forecast the value of LTGM observations for competing objectives, subject to alternative levels of sampling investments. Recent studies have shown [Reed and Minsker, 2004; Kollat and Reed, 2007a] that the spatial and temporal organization of environmental systems can yield unexpected conflicts in observation network design decisions that would be difficult to predict a priori. For example, different sampling locations may be important for discovering high-contaminant fluxes versus those necessary for tracking the delimiting boundaries of a plume. As a consequence, the true dimension of observation network problems is often unknown and discoverable only with the approximate solution of many-objective formulations.
 Here we define a stochastic, multiperiod, many-objective LTGM design problem in equations (7)–(9):
 The vector-based objective function for management period k is represented by Fk, which is composed of a cost function, fCost, for discovering investment tradeoffs and five state prediction objectives, fK, fDF, fFlux, fMass, and fCentriod. These design objectives were chosen in order to capture a wide range of decision maker preferences and prior literature recommendations. The problem in management period k represents the forecasted tradeoffs across the six component objectives of Fk. The performances of the objectives are forecast on the basis of the ensemble of system states within period k, as represented by , as well as the data available within the proposed monitoring decision, , shown in equation (7). The forecast of conditions for the kth period is a challenging prediction problem that is strongly influenced by our choice of model and its associated biases and uncertainties, which justifies our use of the bias-aware EnKF. While forecast quality should serve as a primary driver in determining management period duration and sampling frequency, regulatory records of decision are the current standard defining annual management periods with quarterly sampling intervals [Task Committee on Long-Term Groundwater Monitoring Design, 2003]. Since increased design decision frequency accrues additional costs (i.e., increased computing time, decision making), a careful balance must be struck between these costs, EnKF forecast uncertainty, and management period length.
 For this study, we assume sampling decisions are to be made for a 1 year management period where each potential sampling location can be sampled up to four times throughout the year (i.e., quarterly sampling). This means that within each management period k, there are t = 1…T sampling periods, where for this study T = 4 since we are considering quarterly sampling. Consistent with the notation for the EnKF, M is defined as the total number of available sampling locations (63 for this study) at a given time and m as the number of samples actually used, based on a decision for a given time t. Hence, m ranges from 0…M, or equivalently, no sampling to all available samples taken. Now we define as the entire decision space for a management period (i.e., the 2MT possible combinations of sampling decisions for the management period) and as an instance of a sampling plan from this space. We also define us,t to represent the coordinate of a sample at location index s at time index t. The decision vector represents a potential sampling plan for an entire management period and can be broken into subvectors for each corresponding sampling period 1…T within the larger management period as shown in equation (8). Each xt subvector is composed of binary decisions x within the management period, as shown in equation (9). The decision vector is composed of M · T binary decisions (1s and 0s) indicating searchable decisions on where and when samples can be taken within the kth management period. While this test case specifically seeks to optimize well sampling strategies on the basis of preexisting locations, the ASSIST framework could be easily extended to consider binary decision variables that would initiate the installation of new sampling wells and account for fixed installation costs (see Nicklow et al.  for a more detailed review of evolutionary optimization applications with discrete costs.)
 Note that in the remainder of this paper we will refer to field-scale time analogs of the ammonium chloride tracer experiment (1 day of experimental tank transport is equivalent to 1 year at field scale). This provides the convenience of discussing the LTGM problem in more conventional time units and is not intended to imply a strict assumption of similarity between the laboratory and field scales. For each 1 year management period, the optimization framework seeks to forecast where and when samples should be taken from the 63 available sampling ports to maximize knowledge of the plume's movement. While we are demonstrating our framework using annual management periods and quarterly sampling frequencies on the basis of common practice, the approach is highly flexible and a broad range of alternative implementations are possible depending on the availability of data. For example, simple extensions to the ASSIST visual analytical framework could allow the decision maker to interactively add new sampling locations or additional sampling at existing locations on the basis of incoming data. In addition, sampling governed by regulatory standards could be easily applied as design constraints.
 The Cost objective seeks to minimize the cost of a sampling plan . We now define a cost vector Ct, which is a function of the cost of sampling from each location s at time t. In this study, since the planning horizon is 1 year, a normalized cost vector is used, meaning that all locations in space and time are equally weighted and normalized to 1. Equation (10) describes the cost minimization objective where Ct and xt are both vectors of size 1 × M.
 The cost of a sampling plan for a management period is obtained by summing up the dot products between the cost vector Ct and the decision vector xt for each t in a management period k. The Cost objective aims to minimize the cost in the present planning period.
 The Kalman gain matrix, K, represents a blending factor that weights measurements and model states on the basis of their corresponding uncertainty and the sample innovation (the difference between concentration measurements and their mean EnKF forecasted values). Equation (6) shows that maximizing the Kalman gain results in minimizing the analyzed covariance matrix in the EnKF (i.e., forecasting uncertainty and data redundancy). Although no measurements are actually assimilated during a forecast period, it is still possible to calculate the filter covariance and the corresponding Kalman gain that would be achieved if a measurement were in fact assimilated, ultimately providing a forecast of the potential corrective capability that would be provided in the sampling plan . In our bias-aware EnKF, the Kalman gain matrix K is a 2n (n model states + n bias states) by m (samples) matrix where elements close to zero represent little to no correction for either the concentration state prediction or the bias correction resulting from sampling a candidate measurement location. Elements in K that deviate from zero either positively or negatively indicate larger corrections. Maximizing K has two effects: (1) it minimizes uncertainty in forecasts (i.e., the forecasted covariance, see equation (5) in section 3.3) and (2) it maximally corrects the model state forecasts and the systematic bias corrections. Equation (11) shows the mathematical formulation of the K maximization objective.
 A Euclidean norm is calculated along each column of K, effectively providing the corrective distance for the entire model and bias state spaces that could be achieved by a sample s at some time t. Since we are seeking to maximize the overall correction capability provided by a decision vector xt, this design objective is formulated to maximize the minimum average correction distance across each of the quarterly sampling periods as represented by each column of K. In our notation, kg,s is simply an individual element of K at row g and column s. Using the max–min formulation in equation (11) ensures that we are maximizing the improvement of the worst performing forecast periods and consequently ensuring that all other periods are as good or better. This max–min formulation has the effect of improving the robustness of our evaluations of the K objective [Deb and Gupta, 2006].
 The ability of a sampling plan to detect the movement of contaminant is evaluated using the detection failure (DF) objective. DF is defined here as the probability that a given sampling plan fails to sample at locations where contaminant exists at some detectable level according to the EnKF forecast. To calculate this objective, a detection indicator vector DIi is developed for each ensemble member i of the EnKF. The detection indicator vector contains 1s and 0s indicating whether or not contaminant concentration ci exists at a detectable level at the sampling locations of ensemble member i for the time step under consideration. In this study, a detection indicator threshold of 0.075 g L−1 is used. The DF objective is shown in equation (12).
 This objective seeks to minimize the maximum expected DF across each of the quarterly sampling periods. In other words, it is minimizing the worst performing subsampling period's DF and consequently ensuring that all other periods are as good or better. The sum of the instances of contaminant detection are quantified by taking the dot product between the decision vector xt and the detection indicator vector DIi for each ensemble member (i.e., the number of sampled locations in the sampling plan xt where contaminant is forecast by an ensemble member at a level ≥ 0.075 g L−1). The total possible contaminant detection is obtained by taking the one-norm (element-wise summation) of the DIi vector (i.e., the total number of available measurement locations M where contaminant is forecast at a level ≥ 0.075 g L−1).
 The Flux objective seeks to maximize the detection of high mass flux gradients. Contaminant flux at a sampling location s is quantified by multiplying the contaminant concentration ci(us,t) for an ensemble member i by the Darcy flux qi(us,t) at that location and dividing by the media porosity . Equation (13) defines the Flux objective by incorporating the binary sampling decision x(us,t), to effectively detect mass fluxes in space and time.
 The ensemble-averaged concentration Flux detected by a sampling plan is calculated within the parentheses of equation (13). This objective seeks to maximize the minimum total contaminant Flux detected by a sampling plan xt, across each of the quarterly sampling periods for a management period. In other words, it is maximizing the worst performing subsampling period's total Flux detection and consequently ensuring all other periods are as good or better than the worst.
3.5.5. Mass and Centroid
 The zeroth and first spatial moments of the contamination plume represent the Mass of contamination and the Centroid of the plume, respectively. This formulation builds on Montas et al. , to define spatial moment-based objectives to quantify error in tracking contamination plumes through space and time. Equations (14) and (15) describe the zeroth and first spatial moments of the contaminant plume. Note that the first moment in equation (15) provides the x direction only and has analogous Centroid calculations in the y and z directions.
 Each moment is calculated on the basis of the ensemble mean concentration , at each filter grid location g in time t. The summation is performed over all n model states of the filter grid. Also, represents the porosity of the media at each grid location and Vg is the volume of a grid cell.
 The Mass and Centroid moment objectives formulated in equations (16) and (17) are defined relative to the best available forecast of the plume (its moments being and ), which results from the full sampling decision implemented in the prior k − 1 management period. In other words, it is the best plume forecast that can be made because it assimilates all of the available data from the prior period into the EnKF forecasts. In equations (16) and (17), we exploit the actual observations taken in period k − 1 to estimate how the proposed sampling plan in period k may cause deviations from the best known reference plume's moments. Only those points that are being proposed for sampling in period k and that have actual observations in k − 1 are used to develop the alternative Mass and Centroid forecasts and . The Mass objective shown in equation (16) seeks to minimize the maximum relative error (across each of the quarterly sampling periods) between the Mass moments of the reference forecast, , and the subset forecast, . This objective is expressed as a percentage error between the forecasted Mass of the plume, using all available prior information and the Mass of the plume forecasted from a subset of the prior information on the basis of a proposed sampling plan . In equation (16), the objective seeks to minimize the maximum error in forecasting the Mass of the plume across each of the quarterly sampling periods t, for the kth management period. In other words, it is minimizing the Mass error of the worst performing subsampling period and, consequently, ensuring all other periods are as good or better.
Equation (17) seeks to minimize the maximum distance error (across each of the quarterly sampling periods) between the first plume moments (in x, y, and z) of the reference forecast, , and the subset forecast, . This objective is expressed as a distance (in cm) between the Centroid of the plume forecast using all prior information in period k − 1 and the Centroid of the plume forecast using a subset of the prior information in period k on the basis of a sampling plan . This objective seeks to minimize the maximum error in forecasting the Centroid of the plume across each of the quarterly sampling periods for a management period. In other words, it is minimizing the worst performing subsampling period's Centroid error and consequently ensuring all other periods are as good or better.
 The Mass and Centroid objectives can be thought of as redundancy-based objectives where points are removed that have the least effect on the EnKF's ability to forecast the Mass and Centroid of the plume.
4. Computational Experiment
4.1. Flow and Transport Using ParFlow and SLIM
 The ammonium chloride tracer experiment was modeled using the parallel numerical groundwater flow model ParFlow and the numerical particle transport model SLIM-FAST. The modeling domain was established in ParFlow using grid cells of approximately 7 cm on each side. This resulted in a tank domain containing 39 cells in x, 54 cells in y, and 29 cells in z, for a total of 61,074 grid cells. No-flow boundaries were set at the bottom, left, or right sides of the tank. Dirichlet constant head boundary conditions were set to 203.2 cm at the front of the tank and to 200.6 cm at the rear of the tank to simulate the constant head inlet and outlet reservoirs, respectively. Since the tracer test flow was saturated, a no-flow boundary was set at the top of the tank at elevation 200.7 cm to force the numerical flow domain to remain saturated. In addition, a vertical injection well was placed in the flow domain to correspond with the location of the screen located at well B4. Water was then injected through this location at a rate of 1.5 L h−1 for the first 15 days to simulate the injection of tracer solution at this location. No injection occurred at this location during days 15–19 of the simulation. Total simulation time was set to 19 days to correspond with the full duration of the tracer experiment. Hence, steady state flow conditions existed during days 0–15 of the simulation during the tracer injection stage, and a second set of steady state flow conditions existed for days 15–19.
 Uncertainty in characterizing subsurface permeability was accounted for by simulating an ensemble of 100 flow fields, each resulting from a single realization of the subsurface permeability field. In this study, a Turning Bands (TB) simulation [Tompson et al., 1989] was used to generate realizations of the permeability field. To accomplish this, separate subdomains were established within the full model domain to represent each of the five media layers and the fine sand lens. Hydraulic conductivity data collected at each of the screened sampling ports was then used to develop mean hydraulic conductivity values (as well as their corresponding standard deviations) to use in conditionally simulating each media layer. A summary of the hydraulic conductivities and porosities used in each media layer is shown in Table 1. Additional parameters consistent across all media layers relevant to the TB simulations that were used, based on literature recommendations given by Tompson et al. , included x, y, and z correlation lengths of 30 cm, 100 simulation lines, a line process resolution of 5.0, a maximum normalized frequency of 100.0, and a normalized frequency increment of 0.2. All media were assumed to be isotropic. After attaining an ensemble of flow field realizations from ParFlow, SLIM-FAST was then used to simulate the movement of ammonium chloride tracer through the UVM tank. The computational grid used by SLIM-FAST was set up identically to that used by ParFlow. Two stress periods were then set up to correspond to the release of ammonium chloride tracer, one for days 0–15 during the injection of tracer and one for days 15–20 when no tracer was injected. Velocity fields obtained from ParFlow for each of these two stress periods and for each of the 100 realizations were then made available to SLIM-FAST for the purposes of particle transport. To simulate the injection of tracer during stress period 1, a total of 540 g of ammonium chloride was released from the two grid cells (270 g cell−1) that corresponded with the screened location at well B4 over the 15 day release period. This 540 g of ammonium chloride was represented by the release of 150,000 particles from these two grid cells, evenly over the 15 day period.
Table 1. Summary of Media Properties Used to Simulate Permeability Field Realizationsa
Mean Hydraulic Conductivity
Standard Deviation of Hydraulic Conductivity
Hydraulic conductivity values (as well as their corresponding standard deviations) are expressed in units of cm hr−1.
 Since a finite number of particles is used to represent solute concentration, numerical effects can become an issue at low solute concentrations. SLIM-FAST allows a concentration threshold to be set, below which particles are split in half to better quantify regions of low solute concentration. In this study, the concentration threshold was set to 10−6 g m−3, and the maximum number of particles that could be used (due to particle splitting) was set to 500,000. Since 150,000 particles were initially set to be injected during the 15 day stress period, each particle had a mass of 0.0036 g. However, since some particles may be split during the simulation, individual particle mass can change. No particles were injected during days 15–20 (stress period 2). Global time step increments were set to 1 h. However, SLIM-FAST calculates locally optimized time steps within the global time step for each individual particle on the basis of the velocity field, dispersion, diffusion, etc. Other relevant parameters include longitudinal and transverse dispersivities of 1.44 cm and 0.25 cm, respectively; and a molecular diffusivity of 0.0024 cm2. Upon completion of the SLIM-FAST simulation runs, 100 realizations of the ammonium chloride tracer plume were available for subsequent use in the optimization framework (as was shown to be effective by Kollat et al. [2008a]).
4.2. Parameterization of EnKF
 The EnKF is used to evaluate the performance of candidate discrete sampling strategies evolved by the ɛ-hBOA by providing forecasts of system states used to calculate the six design objectives. Note that this problem formulation seeks to identify optimal 0–1 binary sampling decisions and does not consider real-valued parameter estimation, as has been the focus of prior studies (e.g., see Vrugt et al. ). The bias-aware EnKF used in the ASSIST framework aids in correcting structural biases that arise from model parameterization errors, structural errors, and errors in boundary and initial conditions (which are rare in field-scale applications) [Kollat et al., 2008a]. An ensemble size of 100 members was used, corresponding to the 100 realizations of the flow and transport modeling generated prior to optimization. EnKF forecasts were provided across a filtering subgrid that matched the flow and transport model domain in extents but used cells that were 18 cm on each side (as opposed to 7 cm cells used for the modeling). The subgrid was composed of 11 cells in the x direction, 19 cells in the y direction, and 7 cells in the z direction for a total of 1463 cells (or n = 1463 filtered model states). Multiple filtering subgrid resolutions (12, 14, 16, 19, and 20 cm cells) and assimilation frequencies (2, 6, 12, and 24 h) were tested to determine their effects on computational efficiency, forecast quality, and memory requirements. Prior work, in which we analyzed filter accuracy and stability resulting from varying levels of information (in both space and time), revealed that an 18 cm subgrid resolution with data assimilation every 6 h provided reasonably balanced EnKF performance, both computationally and functionally [Kollat et al., 2008a]. Filter forecasts were provided monthly while data (potential well sampling plans) were assimilated quarterly.
 Measurement uncertainty was modeled by adding uncorrelated Gaussian noise to measurements with 0 mean and 5% standard deviation. The spatially correlated Gaussian noise fields Q applied to the model and bias states were generated using the TB simulation and were assumed to have a 0 mean, a 7.5% standard deviation, and an isotropic correlation length of 60 cm. The time correlation of the model bias, , was set to 99%. The EnKF formulation and parametrization for the UVM test case is drawn from Kollat et al. [2008a].
4.3. Parameterization of Many-Objective Search
 Optimization runs were conducted on the Texas Advanced Computing Center's (TACC) Ranger Sun Constellation Linux Cluster (available at http://www.tacc.utexas.edu/). The TACC Ranger system is composed of 3936 nodes, where each node contains four AMD Opteron Quad-Core processors yielding a total of 62,976 processing cores. The ɛ-hBOA was parameterized to be run on 1024 processing cores simultaneously using a Master-Slave paradigm [Tang et al., 2007; Reed et al., 2008]. Memory availability per processing core (2 GB of DDR2) was a primary driver in selection of the ɛ-hBOA population size of 4096 individuals (as was the selection of EnKF grid resolution). This population size also ensured that approximately four solution evaluations would be conducted on each processing core within one generation of evolution. The ASSIST framework has the potential for scaled use from small-scale desktop computing to emerging petascale computing systems. There is a growing interest in massively parallel applications in surface and subsurface modeling [e.g., see Kollet and Maxwell, 2006; Vrugt et al., 2008]. This study represents a massively parallel application of many-objective optimization coupled with the bias-aware EnKF that scales with very high efficiency for thousands of processors.
 Five random seed trials were conducted using 1 million solution evaluations per seed. The quantity of 1 million evaluations was carefully chosen to balance (1) parallel scaling efficiency (i.e., running on 1024 processors required a sufficiently large population size) and (2) generations of evolution (i.e., if a population size of 4096 was used, this would achieve 244 generations of evolution). In addition, the size of the decision space (7.24 × 1075 possible designs) warranted as many function evaluations as could feasibly be achieved given allocation and queue time limitations on TACC Ranger. Random-seed analysis was chosen versus single-seed analysis (e.g., 1 seed of 5 million evaluations) because this was our first attempt at optimizing sampling plans for the UVM tank. In other words, we did not know a priori if performance among random-seed trials would differ substantially, and hence, chose this approach in order to reduce the possibility of choosing a poorly performing seed.
 Additional relevent ɛ-hBOA parameters include ɛ-dominance precision values, which were set to Cost = 1.0, K = 0.5, DF = 0.1, Flux = 0.5, Mass = 0.5, and Centroid = 0.1 on the basis of our precision goals for each objective (see Kollat and Reed [2007b] for a detailed discussion of epsilon dominance and its interplay with computational scaling). Additionally, the Bayesian information criterion (BIC) [Schwarz, 1978] was used within the ɛ-hBOA as the metric by which Bayesian network model building was terminated (although this is done automatically without user intervention). Following the completion of the optimization runs, the ɛ-nondominated solutions found during all five random seed trials were combined using offline ɛ-nondomination sorting to produce the Pareto-approximate forecast solution set analyzed throughout the remainder of this study.
5. Results and Discussion
5.1. Forecasted Observation Design Tradeoffs
 A global view of the forecasted six objective tradeoffs for the current management period are shown in the Pareto-approximate set in Figure 3. This set contains 8871 solutions and was created using ɛ-nondomination sorting [Laumanns et al., 2002] to combine the results of all five random seed trial optimization runs from the ɛ-hBOA. Recall that Cost, DF, Mass, and Centroid are minimization objectives and Flux and K are maximization objectives. The Cost, DF, and K objectives are plotted on the x, y, and z axes of Figure 3, respectively. The Flux objective is shown by the color of the cones with color ranging from blue, representing low Flux detection, to red, representing high Flux detection. The Mass objective is shown in Figure 3 by the orientation of the cones ranging over 180° of rotation. Cones pointing up represent highest Mass error, cones pointing down represent lowest Mass error, and other angles represent everything in between. The Centroid objective is shown using the size of the cones, where large cones represent high Centroid error, small cones represent low Centroid error, and intermediate sizes represent everything in between. The green arrows on each axis of Figure 3 are meant to direct the viewer toward optimality in each objective. An ideal solution would thus be located toward the rear lower corner (low Cost, low DF, and high K) of the plot in Figure 3 and would be represented by a small (low Centroid error), red (high Flux detection), cone pointing down (low Mass error).
 The ɛ-hBOA was able to identify nondominated solutions throughout a wide range of Cost where the maximum value in the set still represents a 33.33% cost savings (i.e., 168 samples taken out of the 252 total possible over the four sampling periods in years 6–7). This also indicates that higher cost solutions (Cost > 168) are suboptimal in terms of the six-objective formulation used in this work. Forecasted performance in the DF objective ranged from 7.2 to 98%. The Flux objective ranged from 0.0 to 7.6 m d−1 and the Mass error objective ranged from 0.05 to 39.4%. The Centroid error objective ranged from 0.20 to 10.4 cm. In this study, the search algorithm was allowed to fully explore the potential tradeoffs that exist between the objectives, because it is difficult for a decision maker to know the marginal returns for increased sampling costs a priori.
 In Figure 3, we can see that as more money is spent on sampling, it becomes harder and harder to attain nondominated solutions that have high Kalman gains. The high Cost solutions tend to be more consistent in their Kalman gains, as is evidenced by the decreasing variability in K with increasing Cost. Lower-Cost solutions are associated with a wider range of Kalman gains. This issue highlights the benefits and the importance of the many-objective formulation. Prior Kalman filtering studies [Loaiciga, 1989; Andricevic, 1990; Zhang et al., 2005; Herrera and Pinder, 2005] have focused solely on minimizing the filter's estimation variance (similar to our K objective). These studies show that often very few samples are necessary to attain near-optimal Kalman filter variances. Focusing solely on this statistical objective would have the impact of degrading physical objectives such as DF and Flux.
5.2. Exploration to Inform Decision Making
Figures 4a–4c present the possible steps that a decision maker might take to identify interesting compromise solutions that perform well in multiple design objectives simultaneously. As has been highlighted by di Pierro et al. , the benefit of solving the full six-objective problem given in equations (7)–(17) is that the many-objective Pareto-approximate set contains all of the tradeoffs for the 6 single-objective problems, 15 two-objective problems, 20 three-objective problems, 15 four-objective problems, and 6 five-objective problems that define the subspaces of the full formulation. In other words, as a result of solving a single six-objective problem, we obtained the results for all 62 smaller subproblems automatically. The decision maker can thus proceed through a process of discovery and negotiation by analyzing subproblems of reduced complexity, and use this information to move into the more complex tradeoffs revealed by the full six-objective problem. Our visual-analytical solution exploration builds on the preference ordering work of Khu and Madsen , whereby preferred designs are chosen on the basis of their performance in lower-dimensional subproblems. Here we present a visual approach for exploring subproblem tradeoffs, and expanding these tradeoffs into larger subproblems considering successively more objectives to aid the decision maker in better understanding objective interactions and aspects of their problem that may not be fully captured in their design formulation. To illustrate objective interactions and the potential consequences of lower-dimensional formulations, Figures 4a–4d show projections of the six-objective solution set onto two-objective subproblems where again, the arrows indicate which objectives are minimized or maximized.
Figures 4a–4e present a series of analysis steps that a decision maker could take to identify interesting compromise solutions that perform well in multiple design objectives simultaneously. Figure 4a shows the Cost versus Mass tradeoff, highlighted in blue, that would have been found had only these two objectives been used in the formulation (similar to Reed et al. ). Many prior LTGM design studies have focused on one or more plume moment-based design objectives [Montas et al., 2000; Reed et al., 2000; Reed and Minsker, 2004; Wu et al., 2005; Kollat et al., 2008b]. This prior work has shown that fairly accurate characterization of plume moments can be achieved with relatively few monitoring wells. Figure 4a shows that the EnKF is capable of providing reasonable forecasts of the plume's zeroth and first moments by sampling from key locations in space and time (i.e., there are many low Cost solutions that achieve low Mass error). However, many of these seemingly high-performing solutions would fail critically in other objectives, as will be shown, underlining the importance of quantifying and exploring many objectives simultaneously. Solution 1 identified in Figure 4a represents a solution occurring at the point of diminishing return on the Cost versus Mass tradeoff that might have been chosen by a decision maker had only these two objectives been used.
 Moving to Figure 4b (where the Cost-Flux tradeoff is highlighted in green) demonstrates the inability of solution 1 to provide meaningful physical quantification of the tracer plume. The Cost-Mass tradeoff is shown as well in Figure 4b for reference purposes (highlighted in blue). Solution 2 identified in Figure 4b represents where the decision maker might now go to better capture the Flux of the system as this solution represents a point of diminishing return on the Cost-Flux tradeoff. The Cost-DF tradeoff highlighted in red is shown in Figure 4c with the Cost-Flux and Cost-Mass tradeoffs also shown. Since the Cost-DF and Cost-Flux tradeoffs appear to be closely correlated, and hence the points of diminishing return on these tradeoffs are correlated. In order to demonstrate performance differences across a range of costs, a decision maker might be inclined to select solution 3 to compare and contrast its performance in terms of the six design objectives to solution 2 (i.e., it is selected from the Cost-DF tradeoff, but at a much higher Cost, and lower DF than solution 2). The objective values of solutions 1, 2, and 3 are shown in Table 2. Comparison of the objective values associated with these three solutions reveals that increasing sampling cost (solutions 2 and 3) achieves improved performance in the DF and Flux objectives, but may degrade performance in the Mass, Centroid, and K objectives.
Table 2. Five Interesting Compromise Solutions That Might be Selected by a Decision Maker During Exploration of the Six-Objective Pareto-Approximate Forecast Seta
Cost is expressed as a percentage of the maximum Cost.
Figure 4d shows the projection of the six-objective solution set onto the Cost-K subproblem with all prior identified tradeoffs highlighted and the prior three solution selections marked. In addition, the Flux objective is now shown as color. While low Cost solutions exhibit a wide range of EnKF covariance minimizing capability, as Cost increases the variability in the K objective decreases substantially. This indicates that sampling from more locations has the potential to dramatically improve the stability of EnKF forecasts. In addition, visualizing the structure of this relationship reveals a dramatic improvement in EnKF stability around where Cost is about 40% of maximum, indicating a physical space-time sampling threshold above which the performance of the EnKF benefits dramatically. In Figure 4d we see that the Cost-Mass tradeoff solutions lie entirely in regions of low Cost and low EnKF stability (i.e., a region where K varies widely). In addition, the tradeoffs associated with the Cost-DF and Cost-Flux subproblems border the lower bound of K performance throughout the full range of Cost. Solution 4 has a very similar Cost to solution 3, but provides a dramatically higher impact on the information, K, provided to the EnKF forecasts. The objective values associated with solution 4 are also shown in Table 2. While solution 4 does tend to greatly improve performance in the Centroid objective, its performance in terms of DF and Flux fails critically. This failure is made obvious in Figure 4e where the positioning and color of solution 4 reveals its modes of failure. Figures 4d and 4e demonstrate that a higher Cost solution (solution 5) still provides significant Cost savings while maintaining an overall balanced compromise in the other objectives (see Table 2). This higher Cost solution still represents a 37.7% Cost savings relative to the maximum Cost plan where every available port is sampled quarterly throughout the management period.
 All of the observations noted throughout the discussions of the subproblems shown in Figures 4a–4d provide valuable insights regarding the structure and interactions of the six design objectives chosen and represent important knowledge discoveries for informed decision making.
5.3. Costs of Compromise
Figure 5 shows the sampling plans associated with the low Cost (solution 2) and high Cost (solution 5) plans respectively. The breakthrough curves associated with the layer 4 sampling ports are shown in Figure 5, similar to Figure 2, except that now the EnKF assimilation and forecast curve is shown using a dash-dot red curve. The prior k − 1 management period associated with years 5–6 assimilated all available data at all sampling locations. For the current management period k representing years 6–7, the EnKF is used to provide forecasts of alternative sampling strategies' objectives during this period (i.e., no observations are assimilated). Data assimilation creates corrections that move the EnKF forecast toward the observations taken in k − 1, as can be noted in Figure 5 during the prior management period.
 Although the observation breakthrough curves are shown for both the prior and forecast management periods, they are strictly plotted for reference purposes during the forecast period, and they were not used in the forecast period k to evaluate predicted tradeoffs (as would be the case in a real-world application). The quarterly sampling times of the forecast period are denoted by dashed lines in the breakthrough plots, along with an open circle indicating that no sample is taken at that quarter, and a filled circle indicating the decision to sample at a given quarter. Recall that the tracer was released from port B4 for a period of 15 years so the k − 1 and k management periods shown reflect a snapshot during midrelease (years 5–7) of the tracer. Most interesting in Figure 5 is the observation that solution 2, shown in Figure 5a, represents a minimally sampled plan that perfectly tracks high mass flux tracer locations in space and time. This result is striking given the enormous size of the decision space (2252 possible sampling plans) relative to the search time provided (5 million evaluations).
 One notable issue with the sampling plan provided by solution 2 is its failure to sample from location P4, especially late in the management period where the flow and transport model is predicting the potential for the leading edge of the plume to be in this area. Solution 2's emphasis on Cost minimization represents a very low level of risk aversion with respect to the DF and Flux objectives. This failure to sample at the leading edge of the plume represents a major issue of concern with solution 2, as future management period decisions would only have the modeled tracer at this location to draw from (i.e., no observations would be available at this location in management period k). This result is a major innovation to monitoring practices where the space-time consequences of adding or removing observations are largely unknown. Focusing now on solution 5, shown in Figure 5b, reveals that a compromise to invest in a higher Cost solution yields significant improvements in the DF and Flux objectives. In solution 5, the later quarter of the management period at location P4 is now sampled. This solution also tends to sample locations beyond the boundary of the tracer plume, which reduces the errors in forecasting the Mass and Centroid of the tracer plume as well.
Figure 6 shows snapshots of the tracer plume at year 7 resulting from (Figure 6a) a hypothetical maximum information case where all 252 available samples are assimilated into the ENKF, (Figure 6b) using the strongly biased flow and transport model only, (Figure 6c) using only the data from the prior management period k − 1 to generate an EnKF forecast of the plume in the present period k, and (Figure 6d) assimilation of the sampling plan data provided by solution 5 into the EnKF in the present management period (i.e., the actual implementation of our decision). Figure 6a represents the optimal plume snapshot that can be obtained at the end of the current management period given the use of all available data at a maximum cost. Comparing Figures 6a and 6b demonstrates the failure of the original biased model to capture key characteristics of the tracer plume. The biased simulation fails to predict the downward migration of tracer through the fine sand lens and consistently underestimates high tracer concentrations.
Figure 6c shows the EnKF forecasted plume at year 7 obtained using only the available data from the prior k − 1 management period (years 5–6). It is clear that even prior to the availability of data for the current management period, the EnKF provides a more accurate estimate of tracer concentration and movement than that provided by the model. This demonstrates the significant benefits of bias modeling within the EnKF, allowing it to more accurately forecast tracer movement, even in the presence of systematic and severe model errors. The EnKF forecast error shown in Figure 6f further supports this observation as it is much improved over the error resulting from using the model alone, shown in Figure 6b.
Figure 6d shows the plume that would result from assimilating all of the data at the sampling ports chosen by solution 5, identified in section 5.2 (see Table 2). This represents the result that would occur if the decision maker picked solution 5 from the forecasted tradeoffs in Figure 4e and implemented it. It is clear that the plume obtained using the sampling strategy in solution 5 matches very closely with the maximally sampled plume shown in Figure 6a, but at a Cost savings of about 37.7%. In addition, Figure 6g shows that the error associated with implementing solution 5 is minimal.
 This study introduced the highly adaptable Adaptive Strategies for Sampling in Space and Time (ASSIST) framework for improving long-term groundwater monitoring design (LTGM). The ASSIST framework combines Monte Carlo flow and transport simulation, bias-aware ensemble Kalman Filtering, many-objective search using hierarchical Bayesian optimization, and decision support tools based on interactive high-dimensional visual analytics. The new framework was demonstrated using a physical aquifer tracer experiment where the position and frequency of tracer sampling were optimized for six design objectives that combined a variety of methodologies historically used by the LTGM research community.
 Throughout this work, we demonstrated the ASSIST framework as a highly adaptable methodology for improving LTGM decisions across space and time while accounting for the influences of systematic model errors (or predictive bias). In addition, this study illustrates the ASSIST framework's ability to facilitate discovery and negotiation throughout the LTGM design process. Specifically, the many-objective solution set identified using the ASSIST framework elucidated relationships between the tracer plume's fluxes, moments, and boundaries while advancing decision makers' understanding of the potential consequences of their monitoring decisions. The many-objective observation system design approach provides a mechanism to discover system dependencies and tradeoffs that would have been missed using traditional one- and two-objective problem formulations.
 Our results provide an illustrative example of the process of discovery and negotiation that decision makers would be equipped to pursue using the tools available in ASSIST. Using this example, we easily identified five interesting sampling plans from a suite of 8871 Pareto-approximate design alternatives. From these five, we identified one sampling plan that achieved a cost savings of 37.7% over sampling from all available locations while also providing excellent performance in the remaining five design objectives. We also showed the information provided by this sampling plan and how it differed minimally from the information provided by sampling from all available locations. This high-performance design would have been extremely difficult to identify using prior approaches. An important contribution of this work is the ASSIST framework's ability to overcome predictive bias to improve our understanding of the space-time benefits and impacts of our environmental monitoring strategies. This work is an explicit example of how many-objective Pareto efficiency provides an integrated value of information measure that directly links observations and decisions, which has been a long recognized need and challenge in water resources research literature [see Moss, 1979b]. Consequently, the ASSIST framework has strong potential to innovate our characterization, prediction, and management of water resources systems.
Denotes data assimilation step.
Ensemble of model and bias states (2n × N).
Ensemble mean of A (2n × N).
Perturbations on A (2n × N).
Contaminant concentration [ML−3].
Cost vector for each sample (1 × M).
Ensemble of perturbed samples (m × N).
Detection indicator vector (1 × M).
Ensemble member index.
Denotes forecast step.
Plume centroid error minimization objective [L].
Sampling cost minimization objective.
Detection failure minimization objective [%].
Flux detection maximization objective [LT−1].
EnKF information maximization objective.
Plume mass error minimization objective [%].
Design objective vector.
Grid location index.
Mapping of samples to model states (m × 2n).
EnKF time step.
Management period index.
Kalman Gain matrix (2n × m).
Number of locations used in a sampling plan.
Total number of sampling locations.
Number of model states.
Number of ensemble members.
Covariance of A (2n × 2n).
Spatially correlated noise matrix (n × N).
Covariance of the perturbations on the sample matrix D (m × m).
Real number space.
Sampling location index.
Sample time within management period.
Total sampling times in management period.
A sampling location.
Binary yes/no decision for a sample.
Sampling decision vector (1 × MT).
Euclidean x direction/distance.
Euclidean y direction/distance.
Euclidean z direction/distance.
Scaling factor for ensemble noise matrix Q.
A sampling decision.
Zeroth plume moment (mass of plume) [M].
Reference plume zeroth moment [M].
First plume moment (centroid of plume).
Reference plume first moment.
Vector of bias states (1 × n).
Vector of model states (1 × n).
Time correlation factor for model bias.
The entire sampling decision space.
Perturbations on sample matrix D (m × N).
One norm calculation.
 The authors of this work were partially supported by the U.S. National Science Foundation under CAREER grant CBET-0640443. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the U.S. National Science Foundation. The first author has also been funded in part by a Science to Achieve Results (STAR) Graduate Research Fellowship (agreement FP-916,820), awarded by the U.S. Environmental Protection Agency (EPA). This study has not been formally reviewed by the EPA. The views expressed in this document are solely those of the authors and the EPA does not endorse any products or commercial services mentioned in this publication.