Effect of correlated observation error on parameters, predictions, and uncertainty

Authors


Abstract

[1] Correlations among observation errors are typically omitted when calculating observation weights for model calibration by inverse methods. We explore the effects of omitting these correlations on estimates of parameters, predictions, and uncertainties. First, we develop a new analytical expression for the difference in parameter variance estimated with and without error correlations for a simple one-parameter two-observation inverse model. Results indicate that omitting error correlations from both the weight matrix and the variance calculation can either increase or decrease the parameter variance, depending on the values of error correlation (ρ) and the ratio of dimensionless scaled sensitivities (rdss). For small ρ, the difference in variance is always small, but for large ρ, the difference varies widely depending on the sign and magnitude of rdss. Next, we consider a groundwater reactive transport model of denitrification with four parameters and correlated geochemical observation errors that are computed by an error-propagation approach that is new for hydrogeologic studies. We compare parameter estimates, predictions, and uncertainties obtained with and without the error correlations. Omitting the correlations modestly to substantially changes parameter estimates, and causes both increases and decreases of parameter variances, consistent with the analytical expression. Differences in predictions for the models calibrated with and without error correlations can be greater than parameter differences when both are considered relative to their respective confidence intervals. These results indicate that including observation error correlations in weighting for nonlinear regression can have important effects on parameter estimates, predictions, and their respective uncertainties.

1. Introduction

[2] A diagonal weight matrix commonly is used to represent system-state observation errors in inverse models of groundwater systems, with the weights calculated as the inverse of observation error variances [e.g., Hill and Tiedeman, 2007; Singh et al., 2008; James et al., 2009; Liu and Kitanidis, 2011; Majdalani and Ackerer, 2011; Kowalski et al., 2012; Yoon and McKenna, 2012]. This representation and other common simplifications assume there are no observation error correlations. The necessary methods for including error correlations in inverse modeling have been available for decades [e.g., Neuman and Yakowitz, 1979; Cooley, 1982; Carrera and Neuman, 1986; McLaughlin and Townley, 1996; Hill and Tiedeman, 2007] and can be implemented, for example, using the inverse modeling software PEST [Doherty, 2008, 2010] and UCODE_2005 [Poeter et al., 2005]. The methods include the correlations by allowing for a full observation variance-covariance matrix to represent observation errors; this matrix is inverted to obtain the full weight matrix. However, despite availability of these methods, difficulties with quantifying the terms that characterize observation error correlations often lead to their omission and to using a diagonal observation weight matrix for convenience.

[3] Various types of error correlations are widespread in hydrologic models. Error correlations can result from phenomena such as barometric pumping of wells [Weeks, 1979] and entrapped air in the unsaturated zone [Healy and Cook, 2002] that create spatially and temporally correlated anomalies in groundwater levels. Correlations in errors arise also from use of multiple observations that derive from a single direct measurement, a common situation in hydrologic studies. For example, the water table elevation at a monitoring well is usually calculated from measurements of well elevation and depth of water, so error in the well elevation propagates to all head observations over time at that well. Streamflow observations are usually estimated as nonlinear functions of water depth and empirical rating curve constants. For both stream and groundwater depth, estimates often depend on the nonlinear equations and empirical constants used to estimate pressure from the voltage and temperature at a pressure transducer [Freeman et al., 2004]. Additional examples include multiple observations of temporal changes in hydraulic heads that depend on an instantaneous head measurement [e.g., Hill et al., 2000] and multiple observations of flow-change between stream gauging stations that depend on a single flow estimate. Although error correlations in hydrologic models have not been extensively characterized, these and other examples indicate that error correlations are potentially widespread.

[4] In this work, we explore the effect of system-state observation error correlations on parameter estimates, predictions, and uncertainty measures. We use both an analytical expression and a groundwater reactive transport model of denitrification to compare results obtained with and without error correlations. For the transport model, geochemical observation errors are correlated because selected direct measurements are used to calculate more than one calibration observation. The correlations are calculated by propagation of measurement error, a method that has precedence for geochemical data [e.g., Ballentine and Hall, 1999; Aeschbach-Hertig et al., 1999, 2000; Peeters et al., 2002] but, to our knowledge, has not been used previously to calculate error correlations in observation weight matrices for hydrogeologic investigations. Hill [1992] and Christensen et al. [1998] use a less general approach applied to streamflow gains and losses. In the context of inverse modeling, full weight matrices can be used to represent observation, model, and parameter error.

[5] Few groundwater studies have considered correlated observation errors. Christensen et al. [1998] used a full weight matrix to represent correlated error in base flows, by expanding the method proposed by Hill [1992] to derive the error covariance terms for a system of branching streams. Christensen et al. [1998] did not compare results using a full and a diagonal weight matrix, but Foglia et al. [2009] reported that an unpublished follow-up comparison found that base flow error correlations had a small effect on the parameter estimates but a larger effect on parameter uncertainty.

[6] Cooley [2004] accounted for both observation and model errors in the weight matrix, by summing the variance-covariance matrices of the two error types. Model error was formulated as the difference between stochastic representations of the true and the spatially averaged parameter distributions. Cooley and Christensen [2006] and Christensen and Doherty [2008] used synthetic models to examine the consequences of using a diagonal instead of a correct full weight matrix calculated by the method of Cooley [2004]. Their weight matrices were dominated by model error. Cooley and Christensen [2006] reported that the variance of head prediction residuals was larger in a model calibrated with a full weight matrix compared to the same model calibrated with a diagonal weight matrix. They also found that confidence intervals on predictions calculated with a diagonal weight matrix were much too small whereas those calculated with a full weight matrix were nearly correct. Their results underscore the importance of using the correct full weight matrix when observation and model error correlations exist. Christensen and Doherty [2008] found that inversion with a full weight matrix produced less accurate predictions, in contrast to expected results based on Cooley [2004]. They noted that their result was most likely related to difficulties with generating (by a Monte Carlo method) and inverting the variance-covariance matrix of total error. Lu et al. [2013] considered correlations between total errors in the context of weighting for model averaging. They found that using the variance-covariance matrix of total errors to calculate the weights resulted in better predictive performance for models of both synthetic and experimental uranium transport, compared to using the variance-covariance matrix of observation errors.

[7] A full weight matrix commonly is used for parameter errors. For example, when pilot points and regularization are used to estimate the spatial variability of a parameter field, a full variance-covariance matrix for prior information error often is used to represent the spatial correlation of these errors [e.g., Bentley, 1997; Alcolea et al., 2006; Singh et al., 2008; Hendricks-Franssen et al., 2009]. To our knowledge, there are no studies that evaluate the effect of including versus excluding these correlated errors.

[8] The implications of observation error correlations extend also to Bayesian methods in hydrologic modeling. In rainfall-runoff modeling, streamflows are the primary system-state observations used for calibration. Observation errors largely stem from using stage-discharge rating curves to determine the flows [e.g., McMillan et al., 2010] and these errors can be correlated as suggested by Foglia et al. [2009]. Recent rainfall-runoff modeling research has comprehensively examined methods for characterizing both model and observation error and its effect on prediction uncertainty [e.g., Thyer et al., 2009; Schoups and Vrugt, 2010; Renard et al., 2010, 2011]. In these papers, error models are developed for the different error sources, and the parameters of these models are estimated together with rainfall-runoff model parameters. The results show that error correlations can be pronounced and that accurate estimates of observation uncertainty are critical for predictive capabilities and for the decomposition of input and structural errors [Thyer et al., 2009; Renard et al., 2010].

[9] In this paper, we first present methods for model calibration, error propagation, and calculation of uncertainty. We next consider parameter uncertainty in a simple one-parameter, two-observation inverse model, and derive a new analytical expression for the errors in uncertainty estimates when observation error correlations are omitted. For this model, we also compare uncertainty estimates that are typically obtained in practice with those derived from theoretical calculation of the variances. A reactive transport model of denitrification is then introduced, calibrated using both full and diagonal weight matrices, and used to predict future nitrate concentrations and uncertainty. Model error is accounted for by considering multiple realizations of the geology. The derived analytical expression helps to explain the differences in reactive transport model parameter uncertainty in the calibrations with and without error correlations and provides some general guidance about the importance of using a full weight matrix in inverse models for which observation errors are correlated. Results are expected to have broad relevance, as most of the methods and analyses presented here for exploring the effects of correlated errors apply regardless of the source of these errors.

2. Methods

2.1. Nonlinear Regression and Weighting

[10] In this work, parameters are estimated by weighted least squares nonlinear regression using the Gauss-Marquardt-Levenberg method implemented in PEST [Doherty, 2008]. This method minimizes the following objective function, which is a measure of model fit to the observations [Hill and Tiedeman 2007, p. 28]:

display math(1)

where e = y − y′(b) is the residual vector of length nd, y is a vector of nd observed values, y′(b) is a vector of nd simulated equivalents, b is a vector of np parameter values, and ω is the nd × nd observation weight matrix that can be diagonal or full. Individual residuals are ei = yi −  inline image.

[11] If the observation weight matrix ω is defined as being proportional to the inverse of the observation error variance-covariance matrix, then the parameters estimated by weighted linear regression will have the smallest possible variance [Hill and Tiedeman, 2007, p. 34]. Cooley [2004, pp. 133–134] showed that this statement also is valid for nonlinear regression. When observation errors are correlated, a full weight matrix ωfull is needed:

display math(2)

where Vfull( inline image) is the full variance-covariance matrix of the observation errors inline image, containing off-diagonal elements that quantify the correlations. In this paper, the term “full” denotes a matrix containing at least one nonzero off-diagonal term. When observation errors are uncorrelated or when correlations are omitted, a diagonal weight matrix ωdiag is used:

display math(3)

where Vdiag( inline image) is the diagonal observation variance-covariance matrix, with the diagonal elements equal to:

display math(4)

where ωii is the weight for observation yi and inline image is the observation error variance. Hill and Tiedeman [2007, Appendix A.3] discuss the assumptions required for diagonal weighting to be correct.

2.2. Calculating Observation Error Variance-Covariance Matrix by First-Order Error Propagation

[12] When calibration observations are derived from multiple direct measurements, the observation error variances and covariances can be estimated by propagating the measurement error (Figure 1) [e.g., Sherman, 1989; Tellinghuisen, 2001; Feldman et al., 2008]. As discussed in section 1, derived observations commonly are used to calibrate hydrologic models. Issues of measurement error propagation are, therefore, widely applicable to hydrologic models and are likely to become increasingly important as geochemical data and estimates of groundwater ages are used more frequently to calibrate models.

Figure 1.

Relationships between input measurements and output observations for this study, and in general terms.

[13] Let observations yi and yk be expressed as linear or nonlinear functions of nm independent measurements, u1unm:

display math(5)

The first-order, second-moment error-propagation equations for calculating the error variance of observation yi and the error covariance of observations yi and yk, derived using a first-order Taylor series expansion, are:

display math(6)
display math(7)

[Meyer, 1992, pp. 40 and 45] where inline image is the error variance of measurement inline image, inline image is the error variance of observation yi, and inline image is the error covariance for observations yi and yk. Equation (6) is used to calculate the diagonal elements of both Vdiag( inline image) and Vfull( inline image). Equation (7) is used to calculate the off-diagonal elements of Vfull( inline image) and is nonzero when the same measurement inline image is used to calculate more than one observation. The equations are evaluated at the expected values of the measurements used to estimate a specific observation. Multiple measurements of the same type can be averaged to provide a single expected value, depending on the type of observation (e.g., time averaged or temporally varying). Additional terms containing measurement covariances are present in both equations if the errors of the measured quantities u1unm are correlated [Meyer, 1992, pp. 40 and 45]. Compared to these additional terms, the measurement error variance terms shown in equations (6) and (7) tend to dominate the uncertainty calculated by the error-propagation equations [Bevington and Robinson, 1992, p. 43]. Investigation of measurement error correlations is a topic for possible future research.

[14] Observation error correlations are often more intuitively understandable than the covariances in equation (7), and are calculated using elements of Vfull( inline image):

display math(8)

Correlations can range from −1.0 to 1.0, with larger absolute values indicating a greater degree of correlation.

2.3. Parameter and Prediction Uncertainty

[15] In this study, we compare two scenarios to obtain insight into the effects on parameter and prediction uncertainty of omitting observation error correlations. In the first scenario, the error correlations are known, have been correctly quantified, and are included in the observation weighting (i.e., a full matrix is used) and in the calculation of parameter uncertainty. In the second scenario, the error correlations are omitted in both the weighting (i.e., a diagonal matrix is used) and the calculation of uncertainty, as is common in practice. For both these scenarios, we estimate parameter uncertainty using the parameter variance-covariance matrix V(b) calculated as:

display math(9a)

where s2 = S/(nd − np) is the calculated error variance of the regression and X is a nd × np matrix of observation sensitivities inline image. For nonlinear models, X generally differs for different sets of parameter values. Estimated parameter variances inline image are the diagonal terms of V(b).

[16] In the first scenario of this study, with the full weight matrix calculated as the inverse of the full observation error variance-covariance matrix, equation (9a) is the theoretically correct expression for V(b). It is produced by solving for the variance-covariance matrix of the parameter vector and defining the observation weight matrix as the inverse of the error variance-covariance matrix [Hill and Tiedeman, 2007, pp. 396–398]. In the second scenario (and in common practice), we use equation (9a) under the assumption that the error correlations are not known, or are assumed to be negligible, and thus are absent from both ω and V( inline image). That is, a diagonal weight matrix is assumed to arise from a diagonal V( inline image), as in equation (3). Our approach presumes that if the error correlations have been quantified, resulting in a known full V( inline image), then a full weight matrix will be used that accounts for them, rather than a diagonal weight matrix that does not.

[17] When error correlations exist but are omitted from the weight matrix for the reasons given above, equation (9a) is not the theoretically correct expression for the variance-covariance matrix. The correct expression is obtained by using a diagonal weight matrix but a full V( inline image) [Cooley, 2004; Cooley and Christensen, 2006]. Under that condition, the derivation of V(b) for a diagonal weight matrix does not simplify to equation (9a), instead yielding a more complex expression that includes V( inline image):

display math(9b)

[Hill and Tiedeman, 2007, equation (C.20)]. This equation is generally not usable in practice when the error covariances are not known, and therefore it is not a primary focus of this study. However, it serves as a useful point of comparison with equation (9a) to evaluate differences between theoretical and practical estimates of parameter variances when the weight matrix is diagonal.

[18] Other measures of parameter uncertainty used here are confidence intervals and coefficients of variation. The individual, linear, 95% confidence interval for parameter bj is calculated as [Hill and Tiedeman, 2007, p. 138]:

display math(10)

where t(nd − np, 0.025) is the Student t-statistic for nd − np degrees of freedom and a significance level of 0.95 and inline image is calculated with equation (9a). Linear confidence intervals are used in this study as an efficient basis for comparing the different estimates of parameters and their uncertainties, and assume that observation errors are normally distributed. The coefficient of variation for bj is:

display math(11)

where inline image is calculated with equation (9a).

[19] Prediction uncertainty is calculated by a first-order second-moment method [Hill and Tiedeman, 2007, p. 159]:

display math(12)

where inline image is the variance of prediction inline image and inline image is the vector of sensitivities inline image. Individual, linear 95% confidence intervals are used to express prediction uncertainty, and are calculated in an analogous manner to equation (10). In this study, all calculations with equation (12) use equation (9a) to calculate V(b).

2.4. Effect of Observation Error Correlation on Uncertainty for a Simple Inverse Model

2.4.1. Effect on Parameter Variance

[20] To derive a general expression for the effects of observation error correlations on parameter uncertainty, equation (9a) is used with the assumption that the model is linear with respect to the parameters. The parameter variance-covariance matrix in equation (9a) is a function of model fit s2, sensitivities X, and the weighting ω. When computed for models calibrated with ωfull instead of ωdiag, both ω and s2 differ in the two calculations. For a linear model, the sensitivities are independent of parameter values and thus are the same in the two calibrations. Under these conditions, the variance-covariance matrices calculated with the full and diagonal weight matrices are:

display math(13)
display math(14)

where inline image and inline image are the regression error variances for the models calibrated with ωfull and ωdiag, respectively. Use of these equations is consistent with our approach for calculating parameter uncertainty discussed in section 2.3. The diagonals of inline image are the variances calculated with observation error correlations, inline image, and the diagonals of inline image are estimates calculated without error correlations, inline image. The ratio inline image is a metric of the change in parameter variance when a diagonal instead of a full weight matrix is used; that is, when error correlations are excluded instead of included. Values of inline image > 1.0 indicate the parameter variance is larger for the calibration with the diagonal weight matrix; values <1.0 indicate the variance is smaller for the diagonal weight matrix.

[21] We derive an analytical expression for inline image that provides insight into how interaction between the sensitivities and the observation error correlations affect parameter uncertainty estimates. First the s2 term is expanded using equation (1):

display math(15a)

For a linear model, inline image. Furthermore, using the solution to the weighted least squares regression normal equations, by which inline image [Draper and Smith, 1998, p. 222], the simulated values can be expressed as inline image. Substituting this into equation (15a) and performing matrix algebra yields:

display math(15b)

[22] Formulating equation (15b) for both a full and a diagonal weight matrix, substituting these expressions into equations (13) and (14), and forming the ratio of the diagonals yields:

display math(16)

where jj denotes the jth diagonal of a variance-covariance matrix.

[23] To obtain insight into the effect of observation error correlations on inline image, we expand equation (16) for a simple linear inverse model with one parameter b1 and two observations y1 and y2. This yields:

display math(17a)

where inline image and inline image are the error variances for observations y1 and y2, respectively, inline image is the error covariance between the two observations, X11 =  inline image, and X21 =  inline image. Details of the derivation are provided in supporting information.

[24] Equation (17a) can be expressed in terms of just two variables: ρ (equation (8)), the correlation between errors in observations y1 and y2; and rdss, a ratio composed of sensitivities and variances:

display math(17b)

For the special case in which one observation is insensitive to the parameter, equation (17b) becomes:

display math(17c)

In equation (17b), rdss is defined as:

display math(18)

where the dimensionless scaled sensitivity (dss) is a measure of the information an observation provides about a parameter [Hill and Tiedeman, 2007, p. 48]:

display math(19)

Thus, rdss is the ratio of the information that observations y1 and y2 provide about parameter b1, as measured by dss calculated using diagonal weighting. This form of dss is used because it allows the effects of sensitivity and error correlations to be distinguished in equation (17b).

[25] To explain the analytical results in equation (17b), it is useful to return to equations (13) and (14) and express the variance ratio as inline imagewhere

display math(20a)
display math(20b)
display math(21a)
display math(21b)

Derivations of equations (20a) and (21a) are provided in supporting information. Equation (20a) also can be derived from equation 50 in Cooley and Christensen (2006). In equation (21) we have defined inline image and inline image, and we denote inline image and inline image to be “scaled parameter variances.”

[26] The inline image term of inline image measures how the observation error correlations and sensitivities affect the difference in fit of models calibrated with and without the correlations. If the model is correct and the weights reflect the accuracy of the observations, s2 is expected to be 1 [Hill and Tiedeman, 2007, p. 96]. For the simple one-parameter two-observation example, the underlying model is assumed to be correct, and ωfull correctly reflects the accuracy of the observations, so s2 is expected to be 1 for the calibration with ωfull. Therefore, inline image can be interpreted as the fit being worse than expected for the model calibrated with ωdiag, and inline image can be interpreted as the fit being better than expected for this model. The inline image term of a parameter variance-covariance matrix measures how the interaction of weighting and sensitivities affects the uncertainty of individual parameters. For the one-parameter two-observation example, the inline image term of inline image reflects how this interaction affects the difference in uncertainty of b1 when calculated with and without the observation error correlations.

[27] Graphs of the ratios in equations (17b), (20a), and (21a) are shown in Figure 2. Different families of curves are produced depending on whether the sign of the product ρrdss is negative or positive. Cooley [2004, pp. 50 and 54] also found that differences between uncertainties calculated without and with observation and model error correlations were dependent on the signs and magnitudes of sensitivities and correlations. Figure 2 shows that the values of all three ratios depend on both ρ and rdss when |rdss| is between about 0.1 and 10. The ratios for a given ρ approach constant values when |rdss| < 0.1 and |rdss| > 10; that is, when the two dss differ by more than a factor of 10. The constant values equal the right-hand sides of equations (17c), (20b), and (21b), which represent the special case in which one observation is insensitive to the parameter. In this case, excluding the observation error correlations always overestimates the parameter variance, because the right side of (17c) is always >1. This effect on the variance is caused entirely by the scaled variance term (equation (21b)), because the model fit term is the same in the models with full and diagonal weighting (equation (20b)). To illustrate this result, consider that observation y2 is insensitive for the simple example. Despite this insensitivity, y2 provides information that reduces the uncertainty of parameter b1 through the correlation of its error with that of observation y1, reducing inline image in comparison to inline image. As the correlation becomes larger, inline image of equation (17c) becomes larger, and the effect of omitting the error correlations in the model with diagonal weighting is more pronounced. While an inverse model with one sensitive and one insensitive observation will rarely occur in practice, the results for this case can be generalized to help explain the effect of error correlations on parameter uncertainty for inverse models that have some insensitive observation types, as illustrated in section 3.5.1.

Figure 2.

Ratios showing the difference in parameter variance (equation (17b)), model fit (equation (20a)), and scaled parameter variance (equation (21a)) when calculated with a diagonal instead of a full observation weight matrix, for (a, b, c) ρrdss < 0 and (d, e, f) ρrdss > 0, where ρ is observation error correlation, rdss = dss21/dss11, and dss are dimensionless scaled sensitivities. These ratios assume that when the diagonal weight matrix is used, the observation error correlations are unknown and not included in the calculation of parameter variance.

[28] For a given ρ, the difference in parameter uncertainty computed with and without observation error correlations is greatest when |rdss|=1 (Figures 2a and 2d). That is, the largest differences occur when each observation contributes the same amount of information about parameter b1.

[29] For ∼0.1 < |rdss|<∼10 and ρrdss < 0, excluding the error correlations always increases the parameter variance inline image ( inline image >  inline image) as well as s2 and inline image (Figures 2a–2c). Negative ρrdss occurs when (1) ρ is negative and the dss each have the same sign or (2) ρ is positive and the dss have opposite signs. For case (1), negative ρ indicates that one observation is expected to be larger than its mean and the other is expected to be smaller than its mean, and same-signed dss indicates that the simulated values both increase, or both decrease, in response to a change in b1. That is, a change in b1 affects the simulated values in the opposite manner from how the error correlations affect the observed values; a similar lack of agreement between the sensitivities and correlations occurs for case (2). The consequence is that the diagonal model tends to achieve a fit that is worse than expected ( inline image >  inline image, Figure 2b), given its weighting that does not include the observation error correlations. This lack of agreement also contributes to the scaled variance term being larger for the model with diagonal weighting ( inline image >  inline image, Figure 2c). The effect of excluding the correlations has a greater impact on this uncertainty term than on the model fit term, as shown by the ratios in Figure 2c being larger than those in Figure 2b.

[30] For ρrdss > 0 and |rdss| close to 1, excluding the error correlations decreases the estimated parameter variance inline image ( inline image <  inline image), s2, and the scaled parameter variance inline image (Figures 2d–2f). Positive ρrdss occurs when (1) ρ is positive and the dss have the same sign or (2) ρ is negative and the dss have opposite signs. In both these cases, a change in b1 changes the simulated values in a manner consistent with how the observations jointly vary about their means. This consistency between the sensitivities and correlations means that the model with diagonal weighting fits the observations better than expected (Figure 2e). Similarly, it causes b1 in the model with diagonal weighting to have a smaller uncertainty as measured by the scaled variance term. When |rdss| is close to 1, there is a pronounced effect on inline image of the sensitivities and correlations being consistent (Figure 2d). In contrast, when |rdss| is further from 1, and one observation has a smaller magnitude of sensitivity than the other, this effect is less pronounced or absent. Figure 2d also shows that even if observation error correlations are large, there are values of |rdss| for which the parameter variance calculated with the full weight matrix is the same as that calculated using the diagonal weight matrix.

[31] To evaluate the effect of using the theoretically correct parameter variance for a diagonal weight matrix (which assumes the error correlations are known), we compare the results in equation (17b) with an alternative parameter variance ratio:

display math(22)

where inline image is the variance of parameter b1 for the diagonal weight matrix, calculated with equation (9b). The derivation is provided in supporting information. The ratio inline image measures the difference in parameter variance calculated with the full and diagonal weight matrices assuming that a known full V( inline image) is available for both calculations.

[32] Equation (22) and Figure 3 illustrate that inline image is independent of the signs of ρ or rdss, and is always greater than or equal to 1, in contrast to the ratios of inline image shown in Figure 2. The results for |rdss| = 0.01 or 100 are nearly identical to those in Figures 2a and 2d, because these |rdss| approach the case where one observation is insensitive, and under that condition the right side of equation (22) reduces to that of equation (17c). When each observation provides the same amount of information about the parameter (|rdss| = 1), the results in Figure 3 differ substantially from those in Figures 2a and 2d. In Figure 3, the variance ratio equals 1, indicating the parameter variances calculated with the full and diagonal weight matrices are the same. In Figure 2, the differences in the two variances are maximized when |rdss |= 1. Thus, for this one-parameter, two-observation model, the parameter variance typically calculated in practice for a diagonal weight matrix ( inline image) often is less accurate than the theoretically correct (yet typically unobtainable in practice) parameter variance for a diagonal weight matrix ( inline image), where accuracy is measured relative to the variance calculated for the full weight matrix.

Figure 3.

Ratios showing the difference in parameter variance, computed using equation (22), when calculated with a diagonal instead of a full observation weight matrix. These ratios assume that when the diagonal weight matrix is used, the observation error correlations are known and included in the calculation of parameter variance.

2.4.2. Effect on Prediction Variance

[33] Often in groundwater modeling, predictions and their uncertainty are of greater interest than parameter uncertainty. Prediction uncertainty is indirectly affected by the observation error correlations through their effect on parameter uncertainty, as shown in equation (12). Applying this equation to the one-parameter two-observation inverse model yields inline image, the variance of prediction inline image calculated with observation error correlations excluded, and inline image, the variance calculated with the correlations included. Here inline image is a scalar because there is only one parameter, and it is the same in both equations because the model is assumed linear. The ratio of prediction variances is:

display math(23)

[34] Thus, for a one-parameter, two-observation model, the effects of observation error correlations and sensitivities on the prediction variance ratio are the same as the effects on the parameter variance ratio.

3. Application: Reactive Transport Model

[35] A reactive transport model of denitrification is used to illustrate differences in parameter estimates and uncertainty from calibrations with and without observation error correlations included in the weighting. Predicted future nitrate concentrations and their uncertainty also are compared for different calibrations.

3.1. Model, Parameters, and Calibration Observations

[36] The numerical reactive transport model was developed by Green et al. [2010] as part of an investigation of mixing effects on estimates of reaction parameters, using field data from an agricultural setting in the San Joaquin Valley, California (Figure 4). It simulates the reactions of O2 and inline image in water that recharges a shallow alluvial aquifer and migrates toward a river. Steady-state flow is simulated using MODFLOW-2000 [Harbaugh et al., 2000]. Advection and hydrodynamic dispersion are simulated using a random-walk particle tracking code, RWHet [LaBolle et al., 2000], with backward-tracking. Solute concentrations in each well sample are estimated with a program module that calculates, for every particle in the sample, the concentration as a result of the input history of solute and the reactions occurring in the aquifer (see supporting information). This study explores the effect of observation error correlations on reaction parameter estimates and uncertainty, with the hydraulic and transport parameters kept constant. The comparisons in this study of results with and without error correlations are, therefore, not sensitive to dispersion or noise in the random walk solution.

Figure 4.

Example geologic realization and site map showing observation well locations (modified from Green et al. [2010], Figure 2).

[37] The model has four reaction parameters, including the nitrogen isotope fractionation parameter (εN), first-order denitrification rate (kN), first-order O2 decay rate (kO), and concentration of O2 above which denitrification does not occur ([O2]cut). Green et al. [2010] calibrated the model for several realizations of the heterogeneous sedimentary deposits; five such realizations are considered here (realizations 1, 45, 124, 131, and 136 in Green et al., 2010). The previous calibrations were rerun with minor modifications for the purposes of this study including removing upper limits of the [O2]cut parameter (previously <0.05 mmol L−1). Five realizations are used for consistency with the approach of Green et al. [2010] and to allow evaluation of the effects of the error correlations relative to the effects of geological uncertainty as represented by the multiple realizations of heterogeneity.

[38] The modeling techniques described above were used to obtain simulated equivalents for six types of observations. Three of the types are directly-measured observations (Table 1). The concentration of inline image, and the stable isotope ratio δ15N of inline image15N[ inline image]) were determined by laboratory analysis of field samples, and the concentration of O2 was measured in the field. The other three observation types are each derived from five or more directly-measured values, including O2, inline image, and δ15N[ inline image] (Table 2). These types include the apparent O2 decay rate (kO,app), fraction inline image remaining (fN), and apparent isotope fractionation factor (εN,app). Details of these calculations are provided in supporting information. Because of the shared dependencies of some observation types on one or more direct measurements (Figure 1 and Table 2, supporting information), errors in the observations are correlated. For all observation types except εN,app, observed values are available from 14 piezometers at various depths in five well clusters (Figure 4). Calculated values of εN,app are available from 6 piezometers.

Table 1. Direct Measurement Types, Values, and Estimated Errors
Measurement TypeMedian (Range) of Measured ValueMedian (Range) of Measurement Error Standard DeviationaAlso an Observation Type for Reactive Transport Model Calibration?
  1. a

    Standard deviations are squared to obtain measurement error variances inline image.

[O2] (mmol L−1)0.0063 (0.0031–0.19)0.0095 (0.0046–0.044)yes
[ inline image] (mmol L−1)1.0 (0.12–2.6)0.29 (0.016–2.0)yes
δ15N[ inline image] (‰)13.5 (5.4–27.)0.54 (0.54–2.2)yes
δ15N[N2] (‰)0.21 (−1.5 to 2.3)0.20 (0.20–0.81)no
T (°C)19.2 (19.2–19.2)0.95 (0.95–0.95)no
P (mm Hg)757.5 (756.9–757.8)4.1 (4.1–4.1)no
Ar (mmol L−1)0.016 (0.014–0.025)0.00032 (0.00025–0.0024)no
[SF6] (pptv)1.2 (0.34–3.5)0.69 (0.10–4.3)no
[N2] (mmol L−1)0.86 (0.62–1.6)0.032 (0.015–0.25)no
Table 2. Derived Observation Types, Values, Estimated Errors, and Relevant Equations
Observation TypeMeasurements Used to DeriveMedian (Range) of Observed ValueMedian (Range) of Observation Error Standard DeviationaEquationsb
  1. a

    Standard deviations are squared to obtain observation error variances inline image.

  2. b

    See supporting information.

kO,app (yr−1)[O2], T, P, [Ar], [SF6]0.16 (0.048–0.31)0.09 (0.02–0.2)S22, S35–S43
fN[ inline image], T, P, [Ar], [N2]0.77 (0.32–1.0)0.09 (0.03–0.5)S23–S24, S34, S36–S43
εN,app (‰)[ inline image], δ15N[ inline image], δ15N[N2], T, P, [Ar], [N2]−15 (−20 to −3)3.6 (1.2–7.7)S23–S24, S26, S31–S34, S36–S43

[39] Note that there is an oxygen decay rate model parameter (kO) and an apparent oxygen decay rate observation type (kO,app). Similarly, there is an isotope fractionation parameter (εN) and an apparent isotope fractionation observation type (εN,app). The parameters kO and εN are considered the intrinsic field values of the oxygen decay rate and isotope fractionation, respectively, and are estimated using inverse modeling with reactions applied to individual particles. The observed values kO,app and εN,app are the oxygen decay rate and isotope fractionation, respectively, calculated as described above and in supporting information and using bulk sample concentrations in place of individual particle concentrations. They are considered apparent values that are commonly estimated for field studies [e.g., Böhlke, 2002; Green et al., 2008; Tesoriero and Puckett, 2011; Liao et al., 2012] and differ from the intrinsic values because of the effects of mixing during dispersive transport and field collection of a groundwater sample.

3.2. Observation Error Correlations and Weight Matrices

[40] The equations in supporting information and estimates of measurement error variances inline image (Table 1) were used in equations (6) and (7) to propagate measurement error to observation error. For inline image, T, and P, samples were available for multiple dates at each well during the year of the study. For each of these measurements, the value of inline image in equations (6) and (7) was set equal to the annual-average of those samples to provide a single estimated observation consistent with the annual-average value estimated by the transport model at each well. The values of inline image for O2, inline image, and δ15N[ inline image] (Table 1) and the calculated values of inline image for kO,app, fN, and εN,app (Table 2) were then used to populate Vdiag( inline image). These values of inline image and the values of inline image calculated with equation (7) were used to populate Vfull( inline image). Finally, Vdiag( inline image) and Vfull( inline image) were inverted to obtain ωdiag and ωfull, respectively.

[41] To screen for possible effects of nonlinearity and non-Gaussianity, estimates of observation variances from equation (6) were compared to a Monte Carlo simulation with 10,000 realizations of measurement errors for a single set of input measurements with standard deviations set equal to the medians of estimated errors in Table 1. The Monte Carlo simulated observation error distributions (Figure 5) are approximately normal, and the first-order estimates of standard deviations are within 20% of the actual values, a small difference in comparison to the orders-of-magnitude variability among the estimated standard deviations for a given observation type (Table 2). These effects of nonlinearity and non-Gaussianity are, therefore, unlikely to strongly affect the results of this study.

Figure 5.

Probability density functions of derived observations calculated by Monte Carlo analysis using 10,000 realizations of errors in direct measurements, and equations in supporting information for deriving observations from measurements. Normalized observations for each of the derived observation types are calculated as (yMC − ym)/σm where ym and σm are, respectively, the mean and the standard deviation of all the simulated observations for a particular observation type, and yMC is a single Monte Carlo generated observation.

[42] There are seven pairs of observation types with correlated errors (Table 3). For each of these pairs, the correlation for an individual observation pair is nonzero only if both observations are associated with the same monitoring well. For example, observations of [O2] and kO,app from the same well always have correlated errors, but [O2] errors at one well are not correlated with kO,app errors at any other well. Thus, because not all pairs of observation types have correlated errors, and because for those that are correlated only a few of the individual observation pairs have correlated errors, Vfull( inline image) is sparse. Of the 5776 entries in this 76 × 76 matrix, 76 entries are the variances on the diagonals, and only 126 of the 5700 off-diagonal entries have nonzero covariances. This matrix is symmetric, and so only 63 observation pairs have correlated errors. In general, observation error variance-covariance matrices are likely to be sparse when error correlations stem from a set of measurements at a given location being used to calculate more than one calibration observation for that location alone.

Table 3. Observation Error Correlations and Average Fraction of Covariance Produced by Individual Direct Measurements
Pair of Observation TypesError CorrelationAverage Fraction of Covariancea Produced by Direct Measurements
yiykMedian (min to max)[O2][ inline image]δ15N[ inline image]TP[Ar][N2]
  1. a

    Calculated with inline image

  2. where, is an input measurement (nm ≤ 7) used to calculate model observations yi and yk, j is an individual sample (ns ≤ 14) for which input measurements were available, and fi and fk are the systems of equations used to derive model observations yi and yk from input measurement values u.

  3. b

    All pairs of fN and εN,app observations have positive error correlations except for one pair with a correlation of −0.69.

[O2]kO,app−0.87 (−0.97 to −0.21)1      
fN[ inline image]−0.59 (−0.97 to −0.27) 1     
εN,app[ inline image]−0.33 (−0.88 to −0.08) 1     
δ15N[ inline image]εN,app−0.27 (−0.41 to −0.2)  1    
fNkO,app−0.01 (−0.08 to 0.0)   0.0050.00030.995 
kO,appεN,app−0.01 (−0.01 to 0.01)   0.0030.00020.997 
fNεN,app0.7 (−0.69 to 0.93)b 0.722 0.0030.00020.1540.121

[43] For each of the seven pairs of observation types with correlated errors, Table 3 presents the average fraction of the covariance that is produced by each input measurement type. The entries for the first four pairs listed in the table show that because there is only one shared input measurement in the calculations of the observation types in the pair, 100% of the covariance between the two types is produced by that single shared measurement. Entries for observation pairs fN and kO,app, and kO,app and εN,app, show that the covariances stem almost entirely from the strong dependency of these observation types on [Ar] measurements. The covariance between fN and εN,app is not as strongly dependent on one particular input measurement.

3.3. Effect of Error Correlations on Parameter Estimates and Uncertainty

[44] The reactive transport model was calibrated twice for each of the five geologic realizations, once without observation error correlations included (using ωdiag) and once with the correlations included (using ωfull). The vector of parameters estimated using ωdiag is denoted bdiag and that estimated using ωfull is denoted bfull. Individual parameters are bj,diag and bj,full. The dss indicate that for each parameter, there are at least two observation types with moderate to large sensitivities (Figure 6), so all four parameters were estimated in the regression. Parameter estimates and their linear confidence intervals for the 10 calibrations are shown in Table 4 and Figure 7. Most of the estimated values fall within previously observed ranges, for example, −10 to −30‰ for εN [Green et al., 2010], 0.05–0.5 yr−1 for kN [Green et al., 2008; Tesoriero and Puckett, 2011], 0.02–0.3 yr−1 for kO [Tesoriero and Puckett, 2011], and 0–0.06 mmol L−1 for [O2]cut [Green et al., 2010]. When ωdiag is used and the observation error correlations are omitted, estimates of parameters εN, kN, and kO are smaller for all realizations and estimates of [O2]cut are larger for four of the five realizations, compared to when ωfull is used. Median differences range from 30% for εN to 81% for [O2]cut (Table 4).

Figure 6.

Dimensionless scaled sensitivities (dss) calculated for the model calibrated with a diagonal weight matrix. Box shows interquartile range and whiskers show minimum and maximum values. For each parameter and observation type, these summary statistics are calculated using all individual dss over all five realizations. For observation types with no box and whiskers, all dss = 0.

Table 4. Parameter Estimates From Model Calibrations With the Full and the Diagonal Observation Weight Matrices
RealizationεN (‰)kN (yr−1)kO (yr−1)[O2]cut (mmol L−1)
EstimateEstimateEstimateEstimate
FullDiagPct diffafulldiagPct difffulldiagPct diffFulldiagPct diff
  1. a

    Percent difference, defined as 100 × (bj,diag − bj,full)/|bj,full|.

1−22−31−420.0590.047−190.250.17−320.0120.02181
45−11−17−440.0440.031−300.170.14−170.0950.085−11
124−13−16−260.0410.033−180.180.13−240.0420.05019
131−37−39−70.0430.032−250.270.19−290.0030.008211
136−19−25−300.1200.118−20.210.15−280.0080.018130
Median−19−25−300.0440.033−190.210.15−280.0120.02181
Figure 7.

Estimates and individual linear 95% confidence intervals for the four parameters of the reactive transport model.

[45] For εN, kN, and [O2]cut, the differences in parameter estimates for the calibrations using ωdiag and ωfull are generally small compared to the size of the confidence intervals (Figure 7). For these parameters, the intervals for bj,diag and for bj,full fully overlap for most of the realizations. In addition, the differences in bj,diag or bj,full across the five realizations tend to be larger than the differences between bj,diag and bj,full for a given realization. The variability among realizations largely stems from differences of sediment distributions that affect paths and travel times between the water table and well screen, and therefore affect the travel time distributions of samples and the apparent reaction rates. Thus, for the error correlations considered here, the structural errors related to geological uncertainty and the uncorrelated observation errors affect the estimates of εN, kN, and [O2]cut more strongly than does the omission of error correlations.

[46] In contrast, differences between bj,diag and bj,full for parameter kO are much larger compared to their confidence intervals, and the intervals for bj,diag and bj,full overlap less, particularly for realizations 1, 131, and 136. The estimates and intervals for εN in realizations 45 and 124 and for kN in realization 45 also have these characteristics. These larger differences between bj,diag and bj,full relative to the confidence intervals occur because parameter uncertainty is smaller (Figure 8). The coefficients of variation (cv, equation (11)) for kO in all realizations, and for εN and kN in realizations 45 and 124, are substantially smaller than those of most other parameters. This suggests that the importance of including observation error correlations in the weighting increases as parameter uncertainty decreases. With smaller uncertainty, there is greater confidence in the differences between the parameter values estimated with and without these correlations. Also, for parameter kO, the differences between bj,diag and bj,full for some individual realizations are large relative to the differences in bj,diag and bj,full across the realizations.

Figure 8.

Parameter coefficients of variation (cv) for reactive transport model calibrated with the full weight matrix.

[47] For most realizations, the magnitudes of the confidence intervals differ for the parameters estimated using ωdiag and those estimated using ωfull (Figure 7). The ratios of parameter variances ( inline image, equation (16)) more clearly display the parameter uncertainty differences (Figure 9a), where inline image and inline image are calculated using sensitivities evaluated at the optimal parameter estimates from the models calibrated with ωdiag and ωfull, respectively. The ratios range from 0.4 to 3.7 across all parameters and realizations (Figure 9a). For three parameters, there are systematic differences in the parameter variances over the five realizations. For εN, all variances are larger when ωdiag is used, for kN all variances are smaller, and for kO most variances are smaller. These results show that when a diagonal weight matrix is incorrectly used, parameter uncertainty estimates can be smaller or larger than the uncertainty calculated using the correct full weight matrix, which is consistent with the derived results for the one-parameter two-observation model.

Figure 9.

Ratios of (a) parameter variance, (b) scaled parameter variance, and (c) model fit calculated in reactive transport models calibrated with the diagonal and full weight matrices. For ratios >1, the quantity calculated using the diagonal weight matrix is larger. Vertical lines connect minimum and maximum values among the five realizations.

3.4. Effect of Error Correlations on Predicted Nitrate Concentrations and Uncertainty

[48] We next evaluate differences in predictions and uncertainties (equation (12)) for the reactive transport model. Predictions of interest include [ inline image] concentrations tens of years in the future at 4 piezometers in a cluster near the river (Figure 4) based on a scenario of constant inline image flux to the water table after 2005. Concentrations predicted by the model calibrated with ωdiag are too high, compared to those predicted by the model calibrated with ωfull, for all piezometers and all realizations (Figure 10). Predicted [ inline image] is most sensitive to kO and kN, and is slightly sensitive to [O2]cut (Figure 11). These sensitivities are all negative, indicating that as a parameter value increases, predicted [ inline image] concentrations will decrease. Furthermore, in the model calibrations with ωdiag, estimated kO and kN are smaller, and estimated [O2]cut is mostly larger, compared to the calibrations with ωfull (Table 4 and Figure 7). The smaller estimates of kO and kN combined with the prediction sensitivity information explain why predicted [ inline image] is too large in the models calibrated with ωdiag.

Figure 10.

Predicted nitrate concentrations and 95% linear confidence intervals at a cluster of four piezometers (P, Q, R, and S, in order from shallowest to deepest) at the downgradient end of the transect shown in Figure 4.

Figure 11.

Prediction scaled sensitivities (pss) for [ inline image], defined as inline image [Hill and Tiedeman, 2007, p. 161, equation (8).(2)c)], where inline image is predicted [ inline image] concentration at a piezometer location and bj is a model parameter. Box shows interquartile range and whiskers show minimum and maximum values. For each parameter, the summary statistics are calculated using pss for all four prediction locations and all five realizations. For parameter εN, all pss = 0.

[49] For a given realization and piezometer, the differences in predicted [ inline image] in the models with and without error correlations (Figure 10) are typically greater than the differences in parameter estimates (Figure 7), relative to their respective confidence intervals. These prediction differences are largest for realization 45, which is likely because of the large kN differences relative to its parameter confidence intervals (Figure 7). For each piezometer, there are moderate differences between the two predictions for a realization, compared to the differences across all realizations excluding 136. This realization has an unusually large estimate of kN compared to the other realizations (Figure 7). Therefore, among the first four realizations, accounting for correlated observation error is moderately important even when considered relative to the variability in predictions caused by uncertainty in the model structure.

[50] Ratios of the prediction variances ( inline image) show that the uncertainty for the models calibrated with ωdiag range from about 50% smaller to 20% larger than those for the models calibrated with ωfull (Figure 12). Thus, omitting the observation error correlations can result in either underestimation or overestimation of prediction uncertainty. Although for a given realization the magnitude of the uncertainty among the four predictions can be highly variable (e.g., Figure 10, realization 1), the variance ratios fall within a fairly narrow range (e.g., Figure 12, ratios for realization 1 range from 0.6 to 0.8).

Figure 12.

Ratios of prediction variance, calculated in reactive transport models calibrated with the diagonal and full weight matrices. For ratios >1, prediction uncertainty calculated using the diagonal weight matrix is larger. Vertical lines connect minimum and maximum values among the five realizations for each piezometer.

3.5. Discussion of Reactive Transport Model Results

3.5.1. Insight into Parameter Variance Ratios From Derivations for Simple Model

[51] The derivations for the one-parameter two-observation model in section 2 help to explain the differences in reactive transport model parameter variances calculated with and without observation error correlations. We first evaluate the scaled parameter variances for parameters εN and [O2]cut and then analyze the model fit term.

[52] Compared to other parameters, the conditions for εN are closest to those for the one-parameter two-observation model, because only the δ15N[ inline image] and εN,app observation types are sensitive to εN (Figure 6). The dss for all δ15N[ inline image] observations with respect to εN are positive, the dss for all εN,app observations with respect to εN are negative, and the error correlations (ρ) are negative for all pairs of δ15N[ inline image] and εN,app observations (Table 3). Because the signs are the same within each of these observation groups, we loosely generalize the analytical derivation in equation (21) and apply it to this parameter and the relevant pairs of observation types.

[53] Equation (21) for the ratio of scaled parameter variances predicts correctly that the contribution of information from two observation types that are insensitive to εN, but have correlated errors with an observation type sensitive to εN, dominates the uncertainty calculated with the correlations included. As shown in Figure 9b, inline image calculated using the reactive transport model ranges from 1.7 to 2.9, indicating that excluding the observation error correlations increases this term of the parameter variance. Calculations with equation (21a) using median values of rdss and ρ show that if δ15N[ inline image] and εN,app were the only observation types involved, inline image would equal 0.8 (Table 5). That is, the scaled variance for parameter εN calculated with ωdiag would be smaller than that calculated with ωfull. This is not the case, however, because εN,app observations also are moderately to strongly correlated with [ inline image] and fN observations (Table 3), which are insensitive to parameter εN (Figure 6). Calculations with equation (21b) show that the correlations between [ inline image] and εN,app and between fN and εN,app strongly contribute to inline image being larger than inline image (Table 5), even though [ inline image] and fN observations are insensitive to εN. The results in Figure 9b are consistent with these calculations.

Table 5. Scaled Parameter Variance Ratios ( inline image) Calculated for Pairs of Observation Types
Pair of Observation TypesaρbParameter
εN (‰)[O2]cut
rdssb inline imagerdssb inline image
  1. a

    Pairs listed have at least one type sensitive to parameter εN or [O2]cut, and have median |ρ| > 0.01.

  2. b

    Median values.

  3. c

    Calculated using equation (21a).

  4. d

    Calculated using equation (21b).

δ15N and εN,app−0.27−1.50.8c−3.00.9c
[ inline image] and εN,app−0.331.1d0.11.2c
fN and εN,app−0.702.0d−0.73.2c
[ inline image] and fN−0.59−4.31.1c

[54] Equation (21) also provides insight into the changes in scaled parameter variance for model parameter [O2]cut when observation error correlations are excluded. In contrast to the situation for parameter εN, several observation types are sensitive to [O2]cut (Figure 6) and four pairs of these types have moderate to large error correlations (Table 5). For each pair of observations, the ratio of scaled parameter variances was calculated by equation (21b) using the median rdss and ρ values over all individual observation pairs. For a majority of the four pairs, excluding the observation error correlations was predicted to increase the estimated parameter uncertainty, i.e., inline image (Table 5). This result using the derivations from the simple model is consistent with the results for [O2]cut using the four-parameter six-observation-type reactive transport model, in which for most realizations inline image (Figure 9b).

[55] The ratios of parameter variances for εN and [O2]cut (Figure 9a) equal the product of the ratios of the scaled parameter variances and the term inline image. For all realizations of the reactive transport model, inline image (Figure 9c). This causes the ratios of parameter variances inline image and inline image to be smaller than the respective ratios of scaled parameter variances (Figures 9a and 9b). The discussion in section 2 for the one-parameter two-observation model, which was assumed to have no structural error, showed that inline image can be interpreted as the model with ωdiag fitting the observation data better than expected. Unlike that simple model, the reactive transport model has structural error, yet a similar approach to interpreting differences in s2 applies. For this model, we consider ωfull to accurately represent the observation error. The model fit term inline image ranges from 11 to 21, strongly suggesting model error in each individual realization. The result that inline image can then be interpreted as the model calibrated with ωdiag underestimating the effect of model error. Similarly, inline image would indicate that this model overestimates the effect of model error. This interpretation of inline image illustrates that when error correlations exist but are omitted from the weighting, incorrect conclusions about model error can be drawn.

[56] The findings from the simple model about interactions between error correlations and sensitivities help explain why inline image for all realizations. For the reactive transport model, there are 107 unique combinations of rdss and ρ, where rdss is the ratio of the dss for two different observations with respect to the same parameter, and ρ is the error correlation of the two observations. For example, three such combinations are (1) inline image and inline image, (2) inline image and inline image, and (3) inline image and inline image, where for dss the first subscript is an observation and the second is a parameter. For every realization, a large majority (73–78%) of these combinations have positive ρrdss. This result is consistent with the derivations for the one-parameter two-observation model, which showed when ρrdss > 0, inline image (Figure 2e), because the signs of sensitivities together with the effect of the error correlations (which are embedded in the observed values) causes the model calibrated with ωdiag to fit the observations better than expected. The results for the reactive transport model suggest that the same interactions between the sensitivities and correlations affect model fit for this more complex model.

3.5.2. Prediction Variance Ratios

[57] For all four predictions, the variance ratios ( inline image) for some realizations are >1 and for other realizations are <1 (Figure 12). Whereas the derivations for the one-parameter two-observation model provided insight into the parameter variance ratios for the reactive transport model, there are greater limitations to using results of the simple model to explain the prediction variance ratios for the more complex model. For the simple model, the ratio of prediction variances calculated with ωdiag and ωfull equals the ratio of the respective parameter variances (equation (23)), because for that model the parameter variance-covariance matrix reduces to a scalar. For a multiple parameter model, however, this matrix contains variances for all parameters and covariances for all parameter pairs. The variance of an individual prediction is a function of all these matrix terms and of the prediction sensitivities (equation (12)). Parameter covariances can have a substantial effect on prediction uncertainty [Tiedeman et al., 2003, 2004]. The derivations for the one-parameter model do not capture all the complexities of parameter uncertainty that affect prediction uncertainty in multiple parameter models, and thus are limited in their ability to explain the results in Figure 12. Comprehensive evaluation of how observation error correlations affect prediction uncertainty for a multiple parameter model might benefit from a derivation of prediction variance ratios for a two-parameter, two-observation model. This is a topic for potential future research.

4. Summary and Conclusions

[58] This paper shows that parameter estimates, predicted quantities, and measures of uncertainty can differ in inverse models calibrated with and without observation error correlations included. For the application presented, the differences are modest for some parameters and predictions, and more substantial for others. The insights and conclusions of this work are relevant to a broad range of studies with error correlations arising from derived observations or other factors.

[59] Analytical derivation of the difference in the estimated parameter variance for a simple one-parameter two-observation model provides insight into how interactions of weights and sensitivities affect parameter uncertainty in a model calibrated with and without observation error correlations. The derivation is based on the equations for parameter variances that are typically used in practice. It shows that when one observation is insensitive but is correlated with the other sensitive observation, excluding the correlations (i.e., using ωdiag) will always overestimate the variance. For the general case in which both observations are sensitive, the difference in parameter variance is a function of the ratio (rdss) of the dimensionless scaled sensitivities for the two observations and the error correlation (ρ) for the observation pair. When ρrdss is negative, the variance calculated with ωdiag will always be larger than that calculated with ωfull, and the difference increases as |ρ| increases. When ρrdss is positive, the variance calculated with ωdiag will be smaller for |rdss| relatively close to 1, and larger otherwise. Furthermore, for some values of rdss, there is little or no difference in the parameter uncertainty calculated with ωdiag and ωfull, even when |ρ| is very large.

[60] A reactive transport model of denitrification with four parameters and six observation types was calibrated with ωfull in which observation error correlations stem from common dependencies on directly measured quantities, and with ωdiag that ignores these correlations. Parameter estimates for two calibrations differ by tens of percent. The differences tend to be large relative to the parameter 95% confidence intervals when the intervals are small. Differences in parameter uncertainty cover a wide range; ratios of parameter variances for the models calibrated with ωdiag and ωfull range from 0.4 to 3.7. Differences in predicted nitrate concentrations for the two calibrated models, relative to their confidence intervals, tend to be greater than the corresponding differences in parameter estimates.

[61] Based on the results presented, we offer the following conclusions about the importance of including observation error correlations, and thus using a full weight matrix, in inverse models for which parameter variances are calculated with equations typically used in practice:

[62] 1. If error correlations among all observation pairs are expected to be small (e.g., Figure 2, curves for |ρ| = 0.1 and 0.3), then excluding them will not greatly affect parameter estimates or uncertainty, or the regression error variance s2.

[63] 2. If some error correlations are expected to be large (e.g., |ρ| > 0.3), excluding them might substantially change parameter uncertainty from that calculated with the correlations, but this is not guaranteed. Even with large error correlations, parameter uncertainty differences can be small, as shown in the analytical derivations of the parameter variance (e.g., Figure 2d with |ρ| = 0.90 and |rdss| = ∼0.3 or ∼4.0) as well as in the results from the reactive transport model (e.g., Figure 9a showing ratios of ∼1 for some realizations).

[64] 3. If observations or observation types with zero or small sensitivity to some parameters have correlated errors with other sensitive observations or types, excluding these correlations can substantially overestimate parameter uncertainty. This was clearly demonstrated by the analytical derivation (equation (17c)) as well as by the results for the reactive transport model (parameter εN). Thus, it is important to assess whether errors of insensitive observations are correlated with errors of sensitive observations, and to include these correlations in the weighting. The magnitudes of possible error correlations might be assessed qualitatively at first, using knowledge about the observations, how they are calculated, and the sources of their error.

[65] 4. Interaction between sensitivities and the observation error correlations can cause a model that excludes the correlations to fit the observations differently than is expected on the basis of the observation accuracy reflected in the weighting. The fit can be either better or worse, and the difference depends on the signs of the product ρrdss. The consequence of the unexpected fit is that a model that omits error correlations in the weighting might yield incorrect estimates of the magnitude of model error.

[66] 5. Differences in predictions for models calibrated with ωdiag and ωfull can be greater than differences in parameter estimates when both are considered relative to their respective confidence intervals. Thus the observation error correlations can have different degrees of impact on parameters and predictions.

[67] In summary, our results show that including observation error correlations, even when the observation error variance-covariance matrix is relatively sparse, can have a modest to substantial effect on parameter estimates, predictions, and measures of uncertainty, comparable in some cases to the magnitude of effects from structural uncertainty stemming from geological variability. Because this is the first comprehensive study comparing results from calibrations with and without these correlations, work with additional models to further evaluate the effects of observation error correlations will be helpful for testing and refining the conclusions drawn here.

Acknowledgments

[68] We thank Steen Christensen, Mary Hill, Associate Editor Ming Ye, and two anonymous reviewers for comments that substantially improved this paper. We also thank Eurybiades Busenberg for help with dissolved-gas equations. This work was funded by the National Research Program and the National Water Quality Assessment Program of the U.S. Geological Survey.

Ancillary