Best unbiased ensemble linearization and the quasi-linear Kalman ensemble generator



[1] Linearized representations of the stochastic groundwater flow and transport equations have been heavily used in hydrogeology, e.g., for geostatistical inversion or generating conditional realizations. The respective linearizations are commonly defined via Jacobians (numerical sensitivity matrices). This study will show that Jacobian-based linearizations are biased with nonminimal error variance in the ensemble sense. An alternative linearization approach will be derived from the principles of unbiasedness and minimum error variance. The resulting paradigm prefers empirical cross covariances from Monte Carlo analyses over those from linearized error propagation and points toward methods like ensemble Kalman filters (EnKFs). Unlike conditional simulation in geostatistical applications, EnKFs condition transient state variables rather than geostatistical parameter fields. Recently, modifications toward geostatistical applications have been tested and used. This study completes the transformation of EnKFs to geostatistical conditioning tools on the basis of best unbiased ensemble linearization. To distinguish it from the original EnKF, the new method is called the Kalman ensemble generator (KEG). The new context of best unbiased ensemble linearization provides an additional theoretical foundation to EnKF-like methods (such as the KEG). Like EnKFs and derivates, the KEG is optimal for Gaussian variables. Toward increased robustness and accuracy in non-Gaussian and nonlinear cases, sequential updating, acceptance/rejection sampling, successive linearization, and a Levenberg-Marquardt formalism are added. State variables are updated through simulation with updated parameters, always guaranteeing the physicalness of all state variables. The KEG combines the computational efficiency of linearized methods with the robustness of EnKFs and accuracy of expensive realization-based methods while drawing on the advantages of conditional simulation over conditional estimation (such as adequate representation of solute dispersion). As proof of concept, a large-scale numerical test case with 200 synthetic sets of flow and tracer data is conducted and analyzed.