Corresponding author: W. W.-G. Yeh, UCLA, Civil and Environmental Engineering, 5732B Boelter Hall, Los Angeles, CA S90095-1593. (firstname.lastname@example.org)
 An optimal experimental design algorithm is developed to select locations for a network of observation wells that provide maximum information about unknown groundwater pumping in a confined, anisotropic aquifer. The design uses a maximal information criterion that chooses, among competing designs, the design that maximizes the sum of squared sensitivities while conforming to specified design constraints. The formulated optimization problem is non-convex and contains integer variables necessitating a combinatorial search. Given a realistic large-scale model, the size of the combinatorial search required can make the problem difficult, if not impossible, to solve using traditional mathematical programming techniques. Genetic algorithms (GAs) can be used to perform the global search; however, because a GA requires a large number of calls to a groundwater model, the formulated optimization problem still may be infeasible to solve. As a result, proper orthogonal decomposition (POD) is applied to the groundwater model to reduce its dimensionality. Then, the information matrix in the full model space can be searched without solving the full model. Results from a small-scale test case show identical optimal solutions among the GA, integer programming, and exhaustive search methods. This demonstrates the GA's ability to determine the optimal solution. In addition, the results show that a GA with POD model reduction is several orders of magnitude faster in finding the optimal solution than a GA using the full model. The proposed experimental design algorithm is applied to a realistic, two-dimensional, large-scale groundwater problem. The GA converged to a solution for this large-scale problem.
 Unknown forcing in an aquifer system can have drastic effects on the reliability of the results from a groundwater model. Therefore, construction of an accurate and useful groundwater model requires the accurate estimation of these forcing parameters. Unknown forcing parameters may include recharge, leakage, or evaporation loss, but in general, the most common and significant forcing comes from unknown pumping rates. Unknown pumping may result from private wells that do not report pumping or from pumping wells where it is suspected that the reported pumping rates are incorrect. An inverse model designed to estimate unknown pumping requires observed data, the most important of which are observations of head (groundwater level). However, taking observations is expensive, time consuming, or difficult to obtain, particularly if aquifer parameters vary spatially. Consequently, inverse modeling always faces an observation scarcity problem. The practical application of the experimental design problem, then, is to design an observation network conforming to a specified set of constraints (such as allowable budget or spatial or temporal constraints) that will provide the maximum amount of information about the unknown forcing of interest.
2. Experimental Design
 Experimental design covers the set of problems that involves performing experiments (in our case, taking measurements) in a way that gains the maximum amount of useful information subject to a set of constraints. In practice, we generally set a limit on the number of feasible experiments and then apply some scheme to distribute those experiments to maximize the amount of useful information gained. We then consider the relationship between the number of experiments and amount of useful information received by varying the limit on the number of experiments allowed. This description immediately raises some questions: What is useful information? How do we quantify the amount of information gained? Answering these questions has been the topic of much research spanning diverse fields of study such as engineering, statistics, biology, and medicine to name a few.
 Since the 1920s, statistical methods have been established as the primary means for answering these questions. For example, we find useful information in the Jacobian matrix , defined as
where is the total number of observations taken, which may vary spatially and temporally, N is the total number of parameters of interest, and ( ) is the sensitivity of the ith observation ( ) to changes in the jth parameter ( ). There are three methods that can be used to calculate the sensitivities of a system: (1) the parameter perturbation method, (2) the sensitivity equation method, and (3) the adjoint state method [Yeh, 1986]. For convenience, we choose the parameter perturbation method as follows:
where is a vector of the estimated parameter values, is the ith model simulated value using the parameter values in , is a small increment of the jth parameter (called the perturbation of ), and is a vector in which all elements are zero except for the jth element, which is equal to [Poeter et al., 2005]. Using the parameter perturbation method to estimate sensitivities for general models requires (N+1) model calls (a baseline model solution and N model runs to estimate the sensitivities) [Yeh, 1986]. In general, there are two requirements for making accurate estimates of through the use of equation (2). First, , and second, , must be close to the true parameter values. Note that given an infinite budget, observations can be taken at infinitesimally fine spatial and temporal resolutions. However, in the real world, this is never the case; consequently, another way to formulate the experimental design is by addressing the problem of finding the elements of this theoretical infinitely large Jacobian matrix that should be included in the analysis. The second question mentioned above then arises: How do we quantify the amount of information contained in a particular Jacobian matrix?
 To answer this question, we adopt the concept of the information matrix (I) from statistics as defined here
 where W is a user-specified weighting matrix. Under the assumptions that a least-squares error criterion is used for parameter estimation, the observation errors are uncorrelated with equal variance, and W is the identity matrix. The information matrix defined in equation (3) is equivalent to the inverse of the covariance matrix of the estimated parameters [Cleveland and Yeh, 1990; Kutner et al., 2004]. Given the definition in equation (3), the amount of information contained in any given information matrix may be quantified by a number of methods. Commonly used methods are A-optimality, which seeks to minimize the trace of the covariance matrix; D-optimality, which seeks to minimize the determinant of the covariance matrix [Steinberg and Hunter, 1984]; and E-optimality, which seeks to minimize the maximum eigenvalue of the covariance matrix [Steinberg and Hunter, 1984]. Under the assumptions outlined above (i.e., that the information matrix is equivalent to the inverse of the covariance matrix of the estimated parameters), the A- and D-optimality criteria seek to maximize the trace and the determinant of the information matrix, respectively. In general terms, A-optimality seeks to obtain the largest amount of information possible, whereas D-optimality seeks to balance the amount of information gained while minimizing the amount of covariance between observations. In the past, optimal design problems in the context of groundwater modeling have used both of these optimality criteria. A-optimality has been used for designing optimal observation networks for confined aquifer parameter estimation [Hsu and Yeh, 1989], transport parameter estimation [Cleveland and Yeh, 1990], and unconfined aquifer parameter estimation [Altmann-Dieses et al., 2002]. D-optimality has been used for designing optimal pumping tests for parameter estimation [Nishikawa and Yeh, 1989], an optimal multi-objective observation network for parameter estimation and model discrimination [Knopman and Voss, 1989], and an optimal observation network for dispersion parameters [Catania and Paladino, 2009]. Studies also have been performed to not only find the optimal design but to evaluate whether the amount of information gathered is useful for achieving some objective [McCarthy and Yeh, 1990] or if the information received is sufficient [Chang et al., 2005].
 The concept of experimental design has been applied extensively to groundwater modeling; however, many studies have faced difficulty when solving for the optimal observation network because of the combinatorial search required. In a realistic, highly discretized, large-scale groundwater model, referred to in this paper as the full model, there may be tens or hundreds of thousands of nodes. Because of this, the dimensionality of the search quickly becomes so large that it becomes impossible to solve through mathematical programming techniques such as integer programming or the simplex method with relaxation. Consequently, other methods are required to solve this optimization problem. Throughout the years, many different methods have been developed to solve large-scale optimization problems that cannot be solved or are difficult to solve through traditional mathematical techniques because of issues relating to the problems' dimensionality, non-convexity, or non-differentiability. Genetic algorithms (GAs) are one such set of techniques. GAs use methods based on the concepts of evolution and survival of the fittest to search the feasible space for the optimal solution to a general optimization problem [Mitchell, 1998]. In this study, the GA is built around a base GA code (GAlib) developed at MIT that has many built-in features suitable for use in the particular problem under consideration, such as support for real number genomes, easy adaptation, and the option to implement various “flavors” of GA [Wall, 1995].
 GAs have been used in the past with various optimality criterion to develop optimal observation networks [Reed et al., 2000; McPhee and Yeh, 2006; Babbar-Sebens and Minsker, 2010]. However, many of these studies were challenged by the fact that GAs do not address the computational burden of the original model. In the combinatorial search examined, we found two available options. The first option requires the storage and accessing of an inordinately large amount of data resulting from the groundwater model's spatial and temporal dimensions. Alternately, to avoid this burden, the second option requires calling the groundwater model numerous times as the GA evolves. The disadvantage of the second option is that when coupled with any groundwater model with realistic spatial and temporal dimensions, this approach will be prohibitively slow because a GA may need to call the groundwater model hundreds or even thousands of times before the termination criterion (e.g., convergence) is met.
 To address the issue of computation time, we apply Proper Orthogonal Decomposition (POD) to the groundwater model to reduce the model space (spatial dimension) and thus the computational burden of calling the model. POD has been shown to have the ability to reduce the dimension of a groundwater model by several orders of magnitude while maintaining more than 99% accuracy [McPhee and Yeh, 2008]. Siade et al.  demonstrated that by applying POD, a groundwater model that originally contained more than 200,000 spatial nodes could be reduced to a model containing only 10 spatial nodes, resulting in an approximately 1000 times increase in speed of solving the model. Note that the temporal dimension remains untouched. However, with such a large decrease in the spatial dimension, the time dimension becomes trivial. As a result of applying POD, calling the reduced model even thousands of times becomes inconsequential.
3. Confined Aquifer Groundwater-Flow Model
 Three-dimensional groundwater flow in a confined, anisotropic aquifer with pumping is described by the following PDE [Bear, 1979]:
 with initial and boundary conditions
 where h is the hydraulic head [L]; are the hydraulic conductivities in the x, y, and z directions [L/T]; is the specific storage ; q is the specific volumetric pumping rate ; is the specific discharge normal to the flux boundary ; is the fixed head boundary; , and are known functions; L denotes the length unit (meters, feet, etc.); and T denotes the time unit (days, hours, etc.). Note that the above equation holds for any length and time unit as long as the chosen units are consistent.
 Without loss of generality, we can change the state variable from head (h) to drawdown (s). Drawdown is defined as the difference between the initial head before pumping (H) and the head after pumping (h), i.e., s = H – h. After this linear transformation, the initial and Dirichlet boundary conditions are equal to zero. Following this change of state variable, the elements of the Jacobian matrix become , and the PDE of the governing equation can be discretized through finite difference approximations into a system of linear ODEs:
 where is a vector of dimension of drawdown values at time t, is the total number of nodes in the discretized model, is the stiffness matrix, is the mass matrix, and is a vector of sinks (pumping, recharge, evaporation, etc.) at time t. Equation (6) is referred to as the full model. In most cases of interest matrices, A and B are large, sparse, and positive definite [Siade et al., 2010] (note that, throughout the text, a bold face letter indicates a matrix or vector, whereas a non-bold face letter indicates a scalar. For example, indicates the vector of all drawdown values at time t, whereas denotes the drawdown in the ith node at time t). Although equation (6) is solved in terms of drawdown, head values (h) easily can be calculated by reversing the linear transformation s = H – h. POD is then applied to equation (6), obtaining a reduced system of equations:
 such that [Siade et al., 2010]. However, for reduced models to be useful, .
 POD has been a topic of much research [Cazemier et al., 1998; Willcox and Peraire, 2002; Kowalski and Jin, 2003]. In addition, good summaries of applying POD to a groundwater-flow model exist in the literature [Vermeulen et al., 2004; McPhee and Yeh, 2008; Siade et al., 2010; Siade et al., 2012].
 POD is based on the idea that a system of linear equations of dimension Nn may be projected into a subspace of dimension such that and , where is the approximation of the original state vector , is the projection operator given by a matrix whose columns form an orthonormal basis spanning the subspace , and is the state vector of the reduced space [Siade et al., 2010]. This approximation can be expressed as:
where denotes the kth element of ; is the kth column of P; and is the steady-state condition of the drawdown without forcing [Vermeulen et al., 2004]. Note that in general . We construct the matrix by running the full model and taking observations, sometimes called snapshots, of drawdown at all nodes in the full model. We then use these observations to create an orthonormal basis of a reduced model space that approximates the full model space [Siade et al., 2010]. The reduced model's performance is related to the quality of the snapshots (how well the snapshots represent the dynamics of the system). Siade et al.  developed a method for finding an approximation of the optimal snapshot times, which is used to find the optimal snapshot set in this application. This set is then stored in , where is the number of nodes (or ODE's) in the full model and is the total number of snapshots taken. A number of methods can be used to construct the orthonormal basis of the reduced space [Siade et al., 2010; Shlizerman et al., 2012]. We choose Singular Value Decomposition (SVD) for its computational simplicity compared to eigenvalue decomposition. To construct P, we perform SVD on X such that:
where P and V contain the left and right singular vectors of X, respectively, and Σ is a diagonal matrix containing the singular values of X. It can be shown that left singular vectors span the same subspace as the principal vectors of X; therefore, to be consistent with terminology used in previous works, we refer to P as a matrix containing the principal vectors of X. We then perform Principal Component Analysis (PCA) on the matrix P to eliminate insignificant principal vectors [Vermeulen et al., 2004; McPhee and Yeh, 2008] so that the final projection matrix explains more than 99.99% of the variance of the full model. The procedure for constructing Pcan be summarized as follows:
 1. For each pumping well (or a set of pumping wells whose ratio of pumping rates relative to each other is held fixed), perform a full model run with the well(s) set at a constant pumping rate, collect snapshots, and store them in the matrix X.
 2. Perform SVD on X such that .
 3. Eliminate insignificant principal vectors from P such that where is the number of principal components kept after PCA.
 After constructing P, we used the Galerkin projection to reduce the full model by using the following equation [Siade et al., 2010]:
 If we let , equation (8) takes on the form of equation (7) and becomes:
 with [Siade et al., 2010]. We refer to this formulation (equation (7)) as the reduced model. The solution to this set of ODEs then can be approximated through a stable time stepping technique such as Implicit Euler or Crank-Nicolson or solved analytically through matrix exponential [Siade et al., 2010].
5. Genetic Algorithm
 The GA searches the entire feasible space through the use of real number genomes, overlapping populations (making it a steady-state GA), and elitism. The GA indexes all possible observation well locations in the full model space (this index set is referred to as ) and creates a population with individuals that have some feasible combination of observation well locations . The GA evaluates the fitness (i.e., how well a particular individual optimizes the objective function) of these individuals by calling the reduced model to estimate the Jacobian matrix and the information matrix (I) (equation (1) and (3), respectively). Maximizing the trace of I, referred to as A-optimality in this paper, is chosen as the criterion for selecting the optimal individual. Thus, the reduced model evaluates the following fitness function for each individual:
 It is important to show how the reduced model within the GA estimates the elements within the Jacobian matrix to evaluate the fitness criterion of a particular design. Achieving a good approximation of the Jacobian requires some a-priori knowledge about the parameter values, which, in general, can be difficult to acquire. Because we want to gather information on unknown pumping rates, obtaining a-priori knowledge is not an issue because the estimated aquifer sensitivities relate to pumping . Because of the linear relationship between changes in pumping and changes in drawdown (s) (equation (6)), no a-priori knowledge of pumping is required to make accurate approximations of sensitivities in the Jacobian matrix . In fact, we could use any values for and in equation (2) and make accurate approximations. For simplicity, the reduced model then assumes ; this results in the drawdown also being zero. Thus,
where is the perturbation of the jth pumping rate and is the ith simulated drawdown using .
 As a result, for each individual in a population, the GA passes the reduced model (the rows of P corresponding to the elements in ). Then, the reduced model calculates the reduced solution (R) at the observation times and makes a projection onto the full space such that:
 By doing this, the GA evaluates the fitness of each individual (equation (9)) entirely in the reduced space, leading to a vast reduction in computational burden.
 The following optimization problem shows the formulation for the GA:
 where is the total number of allowable observation wells; zone j indicates all the nodes in the jth zone; is the total number of nodes in the model; and contains 0 and 1 binary variables (equal to if an observation is taken at that node and otherwise). In this study, we chose W to be the identity matrix of size , where is the total number of wells under consideration, and is calculated through the reduced model. The GA achieves convergence when there is no deviation from the best solution and the best population solutions for iterations (i.e., convergence of solution). A flowchart of the GA with POD reduced model is shown in Figure 1.
6. One-Dimensional Test Case
 We developed a synthetic experimental setup similar to the ones used in previous papers related to POD reduction in groundwater modeling [McPhee and Yeh, 2006; Siade et al., 2010] to validate the GA's potential for finding the globally optimal experimental design (the setup is shown in Figure 2. The test aquifer is 100 m long with a depth of 1 m and a width of 1 m and is divided into cubic cells that are 1 m on each side. The specific storage is and the hydraulic conductivity (K) is in nodes 1 through 50 (Zone 1) and in nodes 51 through 100 (Zone 2). The time step (Δt) is set to 0.1 days, the aquifer is modeled for 100 days, and . Observations are taken arbitrarily at 0.5, 1, 3, 5, 10, 15, 25, 40, 55, and 90 days. At node 51, one pumping well pumps water from the aquifer continually throughout the simulation period at an unknown rate. Using SAT2D [Paniconi and Putti, 1994], the fully constructed model consists of a system of 303 equations because SAT2D requires three rows of 101 nodes to create a finite element mesh to simulate the aquifer. Only the 101 nodes from the middle row are treated as “real” locations; the rest are “virtual” locations used to simulate the aquifer. After applying POD, the resulting reduced model contains only seven nodes and thus seven equations. When run on an Intel Core 2.40 GHz i5 CPU, the full model takes 3 s to complete one run, whereas the reduced model completes one run in 0.012 s, an increase in speed of 3 orders of magnitude with virtually identical results as seen in Table 1. Table 1 compares the trace of the full information matrix obtained from the full model with that obtained from the reduced model. contains information from all the nodes that may be used as observation well locations, at all observation times. Note that the full information matrix from the reduced model is obtained by first computing the reduced solution and then projecting the resulting Jacobian matrix onto the full space by using equation (10). Finally, equation (3) is used to calculate . Although the absolute error seen in Table 1 may seem large, we note that the trace of the information matrix is the sum of the squared sensitivities at potential observations to changes in pumping. Accordingly, for this full information matrix, there are 101 potential locations, 10 observation times, and one pumping well, yielding a total of 1010 potential observations. Calculating error per location leads to the conclusion that for any one observation, the average error in the amount of information is only 0.020. In addition, we can see that the maximum relative error, which is the maximum error for one well at 1 time, is less than half of 1%. Considering that the error in the information matrix includes the summation of squared errors from the full model space (101 nodes), we can conclude that the error in the information matrix as a result of model reduction is negligible.
Table 1. Comparison of the Trace of the Information Matrices for the Full and Reduced Models for the 1-D Test Case
Full Model Trace
Reduced Model Trace
Maximum Relative Error
 For the experimental design problem, the 101 nodes under consideration for potential observation well locations are divided into two zones, one containing nodes 1 through 50 and the other containing nodes 51 through 101. Two observation wells are allowed to be placed in the aquifer with the constraint that both cannot be located in the same zone. We then construct the full Jacobian matrix (equation (1)) for this aquifer, containing sensitivities for all the potential observation well nodes at all the simulated times. We then apply three methods to search for the globally optimal observation well locations. First, we perform an exhaustive search in which the fitness function (equation (9)) for all possible observation well location combinations (at the specified observation times) is evaluated. We then find the combination that results in the maximum feasible objective function value. Second, we formulate the experimental design problem as an integer programming problem solving for the optimal observation well locations. This optimization problem is shown as:
 and where is the number of observation times; is the number of pumping wells in the aquifer; is the total number of nodes in the aquifer; is the total number of allowable observation wells; is a vector containing the observation times; is the i, jth element of ; zone j indicates all the nodes in the jth zone; and . Finally, we applied the GA method to the formulated problem. The GA coupled with the reduced model takes 47 s and 6336 model calls to reach its termination criterion (i.e., convergence of solution). Experiments then are run comparing the results of coupling the full model to the GA in the place of the reduced model. It is found that the GA coupled with the full model converges to the same solution as the GA coupled with the reduced model and with a similar number of required model calls. It is found after comparing the results from the three distinct methods (GA, exhaustive search, and integer programming) that each reaches the same conclusion: Given the 1-D test aquifer and pumping well setup, the optimal locations for the observation wells are at nodes 51 and 50. Although this test case may seem trivial, it is a good test of the GA's abilities. For instance, in this test case, an intuitive solution would be to place the observation wells as close as possible to the pumping well. One then can compare this intuitive solution to the mathematically optimal design and see that they are the same. Finally, it is evident that without any prior knowledge or built-in intelligence related to the problem, the GA was able to identify this optimal solution. This lends confidence to the hypothesis that in a large-scale model, the GA is a valid search method for seeking the globally optimal solution. By observing the number of model calls necessary for the GA to achieve convergence, and the fact that the number of required model calls appears independent of which model (full or reduced) is used, we easily can see the advantage of using a reduced model rather than the full model.
7. Two-Dimensional Test Case
 After completing the 1-D test case, we constructed a 2-D test case to assess the GA on a large-scale, real-world-sized model. The 2-D horizontal model is based on a model developed to simulate the groundwater flow in a confined aquifer in the Oristano plain in west-central Sardinia, Italy [Cau et al., 2002; Siade et al., 2012]. We assume that the aquifer is surrounded on all sides by Dirichlet boundaries and the aquifer is divided into seven hydrologic zones, as seen in Figure 3, with zonal properties shown in Table 2. In SAT2D [Paniconi and Putti, 1994], the fully constructed model contains 29,197 nodes; through POD, the full model is reduced to a model containing 109 nodes while capturing more than 99% of the variance of the snapshot set. In numerical experiments on an Intel Core 2.40 GHz i5 CPU, the full model takes 67 s to complete a single model run, whereas the reduced model takes 25.4 s to complete 20 model runs, 25 s of which are required for a one-time read-in of the reduced model parameters. In other words, ignoring the one-time cost of reading-in the reduced model parameters, the reduced model only takes approximately 0.02 s to complete a model run. As we would expect, some loss of information results from the model reduction; accordingly, the solution from the reduced model does not perfectly match the solution from the full model. Table 3 shows a comparison of the trace of the full information matrices (containing information from all the nodes at all the observation times—0.5, 1, 3, and 5 days, which, as in the 1-D case, were chosen arbitrarily) obtained from the full model and from a projection of the solution from the reduced model onto the full space. It should be noted that, unlike the 1-D test case, we have no “virtual” locations; as a result, all the nodes from the model are included in this full information matrix. As with the 1-D test case, although it might seem that the absolute error is large, when considered in relation to the total number of observations (in this case, 29,197 locations at 4 times for 20 wells for a total of 2,335,760 observations), the average error in information for any one observation is 0.0011. The maximum relative error, which shows the maximum error in received information for any single well from the full model at a single time step compared to the reduced model, is also fairly small. Considering that the absolute error in the information matrix includes a summation of squared errors from the full model space (29,197 nodes), we see that this amount of lost information in the model reduction is negligible. Taking all this into account, we decide that the trade off in accuracy versus computational cost was acceptable.
Table 2. Hydraulic Zone Properties
Table 3. Comparison of the Trace of the Information Matrix for the Full and Reduced Model for the 2-D Test Case
Full Model Trace
Reduced Model Trace
Maximum Relative Error
 For the experimental design problem, we divide the aquifer into 22 arbitrary zones (see Figure 4) with some zones designated to allow only for pumping wells and others designated to allow only for observation wells. In a real-world scenario, the zones that allow for pumping might be private properties such as farms, whereas the zones that allow for observation wells could be public land in which the municipality or government agency could place observation wells. Zones 1, 2, 4, 8, 12, and 16 through 22 are designated for observation wells, whereas zones 3, 5, 6, 7, 9, 10, 11, 13, 14, and 15 are designated for pumping wells. After designating the well zones, 20 pumping wells are placed throughout the allowable zones as shown in Figure 4. As with the 1-D test case, each observation zone is allowed to contain at most one observation well. With this zonation, there are 10,490 feasible potential observation locations in the model. Accordingly, this is the dimension of the optimization problem. A number of experimental designs are then investigated. First, a network with a single observation well is optimized and then networks with incrementally more (up to 12) observation wells are optimized. We then compare the results from different experimental designs. As noted above, for all scenarios observations are taken at 0.5, 1, 3, and 5 days.
 The results of the optimization with the GA are shown in Figure 5 and Table 4. From Table 4, we can begin to draw some conclusions about this experimental design problem. From observing the number of model calls required to achieve convergence, we immediately see the advantage of coupling the GA with a POD reduced model rather than the full model. As seen in Table 4 and Figure 6, when an additional observation well is added, the amount of information (and the corresponding objective function score) increases, as expected. However, the marginal amount of information gained from adding an additional observation well decreases as the total number of observation wells increases. Figure 6 shows this trend continuing until almost no additional information stands to be gained from adding the 12th observation well compared to having only 11 observation wells. It should be noted that there may be multiple designs with the same number of observation wells that are all A-optimal.
Table 4. Optimization Results for the 2-D Test Case
One Observation Well
Two Observation Wells
Three Observation Wells
Four Observation Wells
Five Observation Wells
Six Observation Wells
Seven Observation Wells
Eight Observation Wells
Nine Observation Wells
Ten Observation Wells
Eleven Observation Wells
Twelve Observation Wells
 We observe that the GA tends to group the observation wells close together and in areas marked by high concentrations of pumping wells (Figure 5). This is not unexpected and makes physical sense because the A-optimality criterion seeks to find a design that produces the most information regardless of whether that information already has been obtained. This is supported by the results obtained, which indicate that the design based on the A-optimality selects observation wells in the areas that are most sensitive to drawdown (i.e., the areas marked by high concentrations of pumping wells) and ignores areas that are least sensitive to drawdown. As a consequence, these A-optimal networks may produce observations with large covariance between observations and insufficient information about the wells located away from the areas marked by high concentrations of pumping wells. Because of this, it has been argued that other optimality criteria should be used—for example, E-optimality. An E-optimal design would maximize the minimum eigenvalue of the information matrix [Yeh, 1992] and thus gather the most information possible from the pumping well from which it gets the least amount of information. This would probably result in more spread among the observation well locations and a different network pattern for each design. The E-optimality criterion and other optimality criteria could be easily implemented with a slight modification of the objective function in the GA. The equation below shows the objective function for E-optimality:
 where is the ith eigenvalue of I. Figure 7 shows a comparison of the A- and E-optimal observation networks for six and eight locations. We notice that in the E-optimal networks, the observation wells are more spread out and there are more significant differences in location patterns compared to the A-optimal design. The results make physical sense when we consider the definition of E-optimality. Although these results are interesting, the particulars of the optimal solution (either A or E) to which the GA converge are not of great interest. The true result of interest is that the GA coupled with the reduced model is able to achieve convergence with a realistic, large-scale model. This is something that would not be feasible to achieve if the GAs were coupled with the full model because of the computational requirements of the large number of model calls required.
 This paper presented a methodology that applies POD model reduction to experimental design. Although the full model may contain tens of thousands of nodes, through POD, the reduced model contains only tens or hundreds of nodes resulting in a drastic savings in computational time of several orders of magnitude for each model call. We developed reduced models that run 3 orders of magnitude faster than the full models. Each reduced model produced a sufficiently accurate estimation of the Jacobian matrix of the full model. We also presented a methodology for solving the combinatorial search for the optimal observation well network by using a GA to systematically search over the feasible set of observation wells. It does this by calling the reduced model to obtain information about the aquifer and then using that information to converge toward a globally optimal solution. The algorithm was constructed in such a way that the model calls required by GA are entirely performed in the reduced space. This is a major contribution of the proposed methodology in that it is able to search the information matrix in the full model space without incurring the cost of solving the full model. Thus, the algorithm presented eliminates the need to store and access prohibitively large amounts of data or make an infeasible number of time-consuming full model runs.
 We first verified the methodology on a small 1-D test case in which it was feasible to both perform an exhaustive search and solve an integer programming problem to obtain the optimal network design. A GA was then applied to verify that the algorithm would converge toward the globally optimal network and show that GA could be a valid search method in large-scale models where it is infeasible to use either an exhaustive search or integer programming. In this case, the GA did converge to the global optimum after requiring more than 6000 model calls, verifying its ability to search the feasible space. This large number of model calls demonstrated the need to replace the full model with a reduced model. After verifying the results of the GA with the 1-D test case, we applied the algorithm to a large-scale, real-world-sized model. The full model contained 29,197 nodes and was reduced through POD to a model containing only 109 nodes. The solution from the reduced model was not a perfect match to the solution from the full model. However, this is expected with a full model of this size and a dimensionality reduction of several orders of magnitude. Despite the error, the reduced model was able to provide an acceptably accurate representation of the behavior of the full model. Table 4 shows the number of model calls required to achieve convergence, demonstrating that computational cost is of great importance in the GA. It is obvious that the cost associated with constructing the reduced model is trivial compared to the number of model calls needed in the GA.
 As shown in Figure 5, the observation well locations obtained from the A-optimal design tend to group together in the area where the pumping wells are most concentrated. This design can lead to either large covariance among the observations taken by these networks or insufficient data about the aquifer as a whole. Figure 7 shows that the E-optimal design results in a different network of observation wells in which the observation wells are spread-out across the aquifer system. Regardless of whether A-optimality is the best choice for the optimality criterion in this or other aquifer experimental designs, the results show that the GA with the POD presented here is a valid search method for any problem-specific objective. This combined approach is an improvement over simply using the GA coupled with a full model or attempting to formulate a mathematical optimization problem for the full model, which may be infeasible to solve.
 In future work, we could research the effects of worst-case scenario pumping (i.e., given an “optimal” observation network, what pumping would result with the least amount of information) and apply experimental design to determine an observation network that would best handle a worst-case scenario. In addition, we could apply the GA with POD methodology to a groundwater model that is reduced in a way that is not only forcing-independent but also parameter-independent. By doing this, we could then extend the experimental design problem to one that seeks to gather information not just about estimating the unknown forcing but also the unknown aquifer parameters.
stiffness matrix for the full model;
stiffness matrix for the reduced model;
mass matrix for the full model;
mass matrix for the reduced model;
simulation time step;
perturbation of the jth pumping rate;
vector used to perturb the jth element of ;
perturbation of the jth parameter;
vector of zeros except the jth element, which is equal to ;
vector used to calculate the trace I in the integer programming problem;
known functions describing initial and boundary conditions of an aquifer;
fixed head boundary;
vector of initial head values;
vector of head values;
full information matrix calculated by ;
information matrix calculated by ;
full Jacobian matrix (sensitivities of all nodes, at all times, to all wells);
jacobian matrix containing only the observations of interest to all wells;
hydraulic conductivity in the ith direction;
ith eigenvalue of I;
total number of parameters of interest;
total number of feasible observation well locations;
number of nodes in the full model;
total number of observations taken;
maximum number of allowable observation wells;
number of principal components used in the reduced model;
number of pumping wells in the aquifer;
number of snapshots taken for each pumping well to build the reduced model;
full projection matrix;
matrix of rows of P corresponding to some ;
vector containing the index of all feasible observation well locations;
vector contacting some feasible set of observation well locations ;
ith model simulated value using the parameter values in ;
specific volumetric pumping rate;
specific discharge normal to the flux boundary ;
vector of sinks at time t for the full model;
vector of sinks at time t for the reduced model;
matrix containing the reduced solution at the observation times;
vector of the reduced solution at time t;
vector of drawdown values;
diagonal matrix containing the singular values of X;
vector of natural system dynamics at time t;
vector of drawdown values at time t;
vector of the approximation of ;
ith simulated drawdown using ;
vector of nominal parameter values;
matrix containing the left singular vectors of X;
subspace spanned by the columns of P;
matrix containing the right singular vectors of X;
some user specified weighting matrix used in calculating I;
matrix containing all collected snapshots;
vector of binary variables indicating if a node has an observation well;
all the nodes in the jth zone.
 This material is based on work supported by NSF under awards EAR-0910507 and EAR-1314422, ARO under award W911NF-10-1-0124, and an AECOM endowment. We thank three anonymous reviewers for their in-depth and constructive reviews.