## 1. Introduction

[2] In March 2002 the European research satellite ENVISAT was launched into a Sun-synchronous orbit, carrying the sensors Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), the Global Ozone Monitoring by Occultation of Stars (GOMOS), and the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) aboard, which are delivering an unprecedented wealth of observations of stratospheric trace gases with global coverage.

[3] It is the objective of data assimilation to provide an estimate of the state of the atmosphere from heterogeneous, irregularly distributed observational data of differencing accuracies, fused with a numerical model of the atmosphere. This is achieved mostly by the use of estimation methods adapted to large-scale problems. For the sake of mathematical rigor, an objective optimality criterion must be invoked. In most cases a Best Linear Unbiased Estimation (BLUE) is applied, which implies a least square optimum [see *Kalnay*, 2003].

[4] Sun-synchronous satellite observations are limited to measurements at a single local time, which is clearly a limitation for atmospheric chemistry. The use of spatial-temporal DA algorithms (such as 4D-Var) can effectively propagate the observation information to other times, and thus provide a complete temporal estimation of the chemical state of the atmosphere. A thorough overview of chemical data assimilation systems is provided by *Lahoz et al.* [2007] and *Geer et al.* [2006].

[5] In the realm of advanced spatiotemporal data assimilation algorithms resting on Gaussian error characteristics providing for a BLUE, there are only two families of techniques, namely the Kalman filter [e.g., *Kalman*, 1960; *Cohn*, 1997] and the 4D-var method [e.g., *Talagrand and Courtier*, 1987]. The former is a sequential method; that is, the model state is corrected at times when observations are encountered. The Kalman filter possesses the theoretical advantage that the background error covariance matrix (BECM) is evolved by a prognostic equation, and the analysis error covariances are provided by a diagnostic equation. However, with two model integrations per dimension of model-phase space, the implementation of the full Kalman filter algorithm is not feasible for atmospheric applications and complexity reduced Kalman filter algorithms must be applied [*Hanea et al.*, 2004]. To the knowledge of the authors it was for regional scale chemical data assimilation that Kalman filter implementations, which provide analysis error covariance matrices, were studied first. In the Netherlands two chemistry transport models (CTMs) were furnished with sophisticated implementations of complexity reduced Kalman filters. These include the reduced rank square root Kalman filter of the Long Term Ozone Simulation (LOTOS) model [*van Loon et al.*, 2000] and the EUROS model [*Hanea et al.*, 2004]. The reduced rank square-root approach was selected to factorize covariance matrices by a few principal components [*Verlaan and Heemink*, 1995].

[6] Unlike the Kalman filter, the 4D-var algorithm acts as a smoother, as it adjusts the initial values of the assimilating model, such that differences between observations and model state within a predefined time interval are minimized in a root-mean-square sense. The 4D-var method is sufficiently efficient to be implemented without serious simplifications. However, it lacks means to update the BECM, which must be prescribed in some way instead. While in most cases this is implemented in a static way, any dynamical evolution of the BECM must be constructed by additional information. Given typical state space vectors with dimensions of (10^{6}–10^{7}), the data volume of the squared dimension of the BECM is explicitly intractable for comprehensive three-dimensional chemistry models. As the background field is given by a short-range forecast in present assimilation systems, it is the error statistics of this forecast that is to be approximated. Further, in 4D-var there is no direct strategy to derive an analysis error estimate. Addressing the latter problem, approaches like those proposed by *Fisher and Courtier* [1995] can be applied.

[7] A first application of the 4D-var technique with a heavily reduced stratospheric reaction mechanism in connection with a trajectory model has been presented by *Fisher and Lary* [1995]. This was the first study considering the assimilation of chemically active stratospheric constituents. A state of the art tropospheric chemistry mechanism was introduced to variational assimilation by *Elbern et al.* [1997]. First extensions to the full 4D-var are given by *Elbern and Schmidt* [1999, 2001]. The 4D-var method proved flexible enough for generalization to emission rate inversion with reactive chemistry, as shown by *Elbern et al.* [2007].

[8] *Errera and Fonteyn* [2001] presented the first application of the 4D-var method in the stratosphere with a comprehensive stratospheric CTM, now the Belgian Assimilation System of Chemical Observations from ENVISAT (BASCOE; see also *Errera et al.* [2008] for further developments).

[9] Since an optimal analysis requires a realistic representation of error statistics, the treatment of the BECM constitutes a core task in designing an assimilation system. A properly specified BECM does not only balance the forecast or background error with respect to observation errors, but also guides the spreading of measurement information given a statistically well estimated influence or decorrelation length. By using known correlations, the BECM can therefore constitute a key instrument to exploit the information contents of a retrieval or observation as thoroughly as possible. In practice, the treatment of the BECM involves two independent general problems: First, an algorithmic formulation must be found to model an extremely high dimensional matrix that is generally too big to be represented explicitly. Secondly, statistically useful entries must be inferred in some suitable way.

[10] In practice, either only variances are considered (diagonal BECM [*Errera et al.*, 2008]), or the specification of covariances in chemical data assimilation mainly rests on assumptions of homogeneity (constant horizontal correlation lengths all over the globe) and isotropy (constant horizontal correlation lengths in each direction). Despite the necessary simplicity, a skillful parameterization should be capable of representing the relevant structures of the background error covariances. This includes the possibility of modeling inhomogeneous and anisotropic correlations length scales.

[11] In the realm of meteorological data assimilation, the formulation of the BECM has attracted much research (see *Bannister* [2008a, 2008b] for a comprehensive survey). An early attempt to move to anisotropic and inhomogeneous background errors is presented by *Thiebaux* [1976] by a suitably defined autoregressive scheme. *Thépaut et al.* [1996] and *Otte et al.* [2001] further demonstrated the need to relax the constraint of isotropy. A latitudinally dependent correlation function was defined by *Wu et al.* [2002] through recursive filtering. *Purser et al.* [2003] generalized the approach to variable anisotropy and inhomogeneity adaptive to geographic location. *Weaver and Courtier* [2001] applied a diffusion method, providing the same statistical properties as the recursive filtering. While for the former authors claim higher numerical efficiency, the latter show promise to better account for abrupt correlation changes, like at fronts and other air mass boundaries, due to the local control by diffusion coefficients.

[12] A straightforward approach to obtain flow dependency by anisotropy and inhomogeneity has been proposed by *Riishøjgaard* [1998], where the BECM is controlled by a function of variability of concentration levels, but also mentioned the possibility to use potential vorticity (PV) fields. While this is a direct and easy to implement method without a need to form an ensemble of model fields, the validity of the method rests on the assumption that similar field values imply similar origin, separated only by distorting flows.

[13] In the challenging domain of tropospheric chemical data assimilation, *Hölzemann et al.* [2001] introduced inhomogeneous BECMs, to account for urban-to-rural chemical regimes changes of the boundary layer. On the global scale, *Segers et al.* [2005] applied a complexity reduced global stratospheric Kalman filter system with anisotropic covariance formulation by a parameterization of correlations. Using MIMOSA (Modèle Isentrope de transport Méso échelle de Ozone Stratosphérique par Advection), a horizontally high-resolution transport model with 16 isentropic levels, it was possible to preserve fine-scale structures in the analyzed ozone field [*Fierli et al.*, 2002]. Background error correlations were flow dependent and anisotropic, specified in terms of distance and by PV field.

[14] The general objective of the first part of this study is to introduce a data assimilation system, which combines the full sophistication of a dynamically controlled BECM formulation with a complex state of the art reactive chemistry model using the 4D-var technique.

[15] The specific objectives of this paper is to validate an efficient flow-dependent formulation of the spatial background error covariances, while maintaining ability to (1) make best use of all available (satellite) data, by algorithmic capability to extend observation results spatially, while preserving the BLUE property, (2) ensure chemical and dynamical consistency by application of a state of the art chemistry mechanism, and (3) to provide numerical efficiency (grid design, parallelization), to allow for near real time operation. The system SACADA (Synoptic Analysis of Chemical Constituents by Advanced Data Assimilation), presented in this study, has been designed efficiently enough to provide for daily routine operations.

[16] In order to comply with the first item, the study is designed to implement and test the diffusion approach proposed by *Weaver and Courtier* [2001]. The resulting background error covariance operator will be shown to be well suited for the application with large models and allows for anisotropic and inhomogeneous background error correlations, a feature that was utilized to devise a flow-dependent formulation of the BECM.

[17] This work lays the foundation for a follow-up study addressing a more sophisticated a posteriori validation of the assimilation results in observation space [*Desroziers et al.*, 2005]. This paper is organized as follows. Section 2 describes the theoretical background of the assimilation approach used. In section 3 the meteorological driver model and its geodetic grid, together with the chemistry transport model are presented. Further, the data assimilation setup is given in section 4, with emphasis placed on background error covariance modeling. The satellite data involved in this study are introduced in section 5. Results and statistical evaluations of the assimilation runs are presented in section 6, followed by conclusions made in section 7.