A novel stratospheric chemical data assimilation system has been developed and applied to Environmental Satellite Michelson Interferometer for Passive Atmospheric Sounding (ENVISAT/MIPAS) data, aiming to combine the sophistication of the four-dimensional variational (4D-var) technique with flow-dependent covariance modeling and also to improve numerical performance. The system is tailored for operational stratospheric chemistry state monitoring. The atmospheric model of the assimilation system includes a state-of-the-art stratospheric chemistry transport module along with its adjoint and the German weather service's global meteorological forecast model, providing meteorological parameters. Both models share the same grid and same advection time step, to ensure dynamic consistency without spatial and temporal interpolation errors. A notable numerical efficiency gain is obtained through an icosahedral grid. As a novel feature in stratospheric variational data assimilation a special focus was placed on an optimal spatial exploitation of satellite data by dynamic formulation of the forecast error covariance matrix, providing potential vorticity controlled anisotropic and inhomogeneous influence radii. In this first part of the study the design and numerical features of the data assimilation system is presented, along with analyses of two case studies and a posteriori validation. Assimilated data include retrievals of O3, CH4, N2O, NO2, HNO3, and water vapor. The analyses are compared with independent observations provided by Stratospheric Aerosol and Gas Experiment II (SAGE II) and Halogen Occultation Experiment (HALOE) retrievals. It was found that there are marked improvements for both analyses and assimilation based forecasts when compared with control model runs without any data ingestion.
 In March 2002 the European research satellite ENVISAT was launched into a Sun-synchronous orbit, carrying the sensors Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), the Global Ozone Monitoring by Occultation of Stars (GOMOS), and the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) aboard, which are delivering an unprecedented wealth of observations of stratospheric trace gases with global coverage.
 It is the objective of data assimilation to provide an estimate of the state of the atmosphere from heterogeneous, irregularly distributed observational data of differencing accuracies, fused with a numerical model of the atmosphere. This is achieved mostly by the use of estimation methods adapted to large-scale problems. For the sake of mathematical rigor, an objective optimality criterion must be invoked. In most cases a Best Linear Unbiased Estimation (BLUE) is applied, which implies a least square optimum [see Kalnay, 2003].
 Sun-synchronous satellite observations are limited to measurements at a single local time, which is clearly a limitation for atmospheric chemistry. The use of spatial-temporal DA algorithms (such as 4D-Var) can effectively propagate the observation information to other times, and thus provide a complete temporal estimation of the chemical state of the atmosphere. A thorough overview of chemical data assimilation systems is provided by Lahoz et al.  and Geer et al. .
 In the realm of advanced spatiotemporal data assimilation algorithms resting on Gaussian error characteristics providing for a BLUE, there are only two families of techniques, namely the Kalman filter [e.g., Kalman, 1960; Cohn, 1997] and the 4D-var method [e.g., Talagrand and Courtier, 1987]. The former is a sequential method; that is, the model state is corrected at times when observations are encountered. The Kalman filter possesses the theoretical advantage that the background error covariance matrix (BECM) is evolved by a prognostic equation, and the analysis error covariances are provided by a diagnostic equation. However, with two model integrations per dimension of model-phase space, the implementation of the full Kalman filter algorithm is not feasible for atmospheric applications and complexity reduced Kalman filter algorithms must be applied [Hanea et al., 2004]. To the knowledge of the authors it was for regional scale chemical data assimilation that Kalman filter implementations, which provide analysis error covariance matrices, were studied first. In the Netherlands two chemistry transport models (CTMs) were furnished with sophisticated implementations of complexity reduced Kalman filters. These include the reduced rank square root Kalman filter of the Long Term Ozone Simulation (LOTOS) model [van Loon et al., 2000] and the EUROS model [Hanea et al., 2004]. The reduced rank square-root approach was selected to factorize covariance matrices by a few principal components [Verlaan and Heemink, 1995].
 Unlike the Kalman filter, the 4D-var algorithm acts as a smoother, as it adjusts the initial values of the assimilating model, such that differences between observations and model state within a predefined time interval are minimized in a root-mean-square sense. The 4D-var method is sufficiently efficient to be implemented without serious simplifications. However, it lacks means to update the BECM, which must be prescribed in some way instead. While in most cases this is implemented in a static way, any dynamical evolution of the BECM must be constructed by additional information. Given typical state space vectors with dimensions of (106–107), the data volume of the squared dimension of the BECM is explicitly intractable for comprehensive three-dimensional chemistry models. As the background field is given by a short-range forecast in present assimilation systems, it is the error statistics of this forecast that is to be approximated. Further, in 4D-var there is no direct strategy to derive an analysis error estimate. Addressing the latter problem, approaches like those proposed by Fisher and Courtier  can be applied.
 A first application of the 4D-var technique with a heavily reduced stratospheric reaction mechanism in connection with a trajectory model has been presented by Fisher and Lary . This was the first study considering the assimilation of chemically active stratospheric constituents. A state of the art tropospheric chemistry mechanism was introduced to variational assimilation by Elbern et al. . First extensions to the full 4D-var are given by Elbern and Schmidt [1999, 2001]. The 4D-var method proved flexible enough for generalization to emission rate inversion with reactive chemistry, as shown by Elbern et al. .
Errera and Fonteyn  presented the first application of the 4D-var method in the stratosphere with a comprehensive stratospheric CTM, now the Belgian Assimilation System of Chemical Observations from ENVISAT (BASCOE; see also Errera et al.  for further developments).
 Since an optimal analysis requires a realistic representation of error statistics, the treatment of the BECM constitutes a core task in designing an assimilation system. A properly specified BECM does not only balance the forecast or background error with respect to observation errors, but also guides the spreading of measurement information given a statistically well estimated influence or decorrelation length. By using known correlations, the BECM can therefore constitute a key instrument to exploit the information contents of a retrieval or observation as thoroughly as possible. In practice, the treatment of the BECM involves two independent general problems: First, an algorithmic formulation must be found to model an extremely high dimensional matrix that is generally too big to be represented explicitly. Secondly, statistically useful entries must be inferred in some suitable way.
 In practice, either only variances are considered (diagonal BECM [Errera et al., 2008]), or the specification of covariances in chemical data assimilation mainly rests on assumptions of homogeneity (constant horizontal correlation lengths all over the globe) and isotropy (constant horizontal correlation lengths in each direction). Despite the necessary simplicity, a skillful parameterization should be capable of representing the relevant structures of the background error covariances. This includes the possibility of modeling inhomogeneous and anisotropic correlations length scales.
 In the realm of meteorological data assimilation, the formulation of the BECM has attracted much research (see Bannister [2008a, 2008b] for a comprehensive survey). An early attempt to move to anisotropic and inhomogeneous background errors is presented by Thiebaux  by a suitably defined autoregressive scheme. Thépaut et al.  and Otte et al.  further demonstrated the need to relax the constraint of isotropy. A latitudinally dependent correlation function was defined by Wu et al.  through recursive filtering. Purser et al.  generalized the approach to variable anisotropy and inhomogeneity adaptive to geographic location. Weaver and Courtier  applied a diffusion method, providing the same statistical properties as the recursive filtering. While for the former authors claim higher numerical efficiency, the latter show promise to better account for abrupt correlation changes, like at fronts and other air mass boundaries, due to the local control by diffusion coefficients.
 A straightforward approach to obtain flow dependency by anisotropy and inhomogeneity has been proposed by Riishøjgaard , where the BECM is controlled by a function of variability of concentration levels, but also mentioned the possibility to use potential vorticity (PV) fields. While this is a direct and easy to implement method without a need to form an ensemble of model fields, the validity of the method rests on the assumption that similar field values imply similar origin, separated only by distorting flows.
 In the challenging domain of tropospheric chemical data assimilation, Hölzemann et al.  introduced inhomogeneous BECMs, to account for urban-to-rural chemical regimes changes of the boundary layer. On the global scale, Segers et al.  applied a complexity reduced global stratospheric Kalman filter system with anisotropic covariance formulation by a parameterization of correlations. Using MIMOSA (Modèle Isentrope de transport Méso échelle de Ozone Stratosphérique par Advection), a horizontally high-resolution transport model with 16 isentropic levels, it was possible to preserve fine-scale structures in the analyzed ozone field [Fierli et al., 2002]. Background error correlations were flow dependent and anisotropic, specified in terms of distance and by PV field.
 The general objective of the first part of this study is to introduce a data assimilation system, which combines the full sophistication of a dynamically controlled BECM formulation with a complex state of the art reactive chemistry model using the 4D-var technique.
 The specific objectives of this paper is to validate an efficient flow-dependent formulation of the spatial background error covariances, while maintaining ability to (1) make best use of all available (satellite) data, by algorithmic capability to extend observation results spatially, while preserving the BLUE property, (2) ensure chemical and dynamical consistency by application of a state of the art chemistry mechanism, and (3) to provide numerical efficiency (grid design, parallelization), to allow for near real time operation. The system SACADA (Synoptic Analysis of Chemical Constituents by Advanced Data Assimilation), presented in this study, has been designed efficiently enough to provide for daily routine operations.
 In order to comply with the first item, the study is designed to implement and test the diffusion approach proposed by Weaver and Courtier . The resulting background error covariance operator will be shown to be well suited for the application with large models and allows for anisotropic and inhomogeneous background error correlations, a feature that was utilized to devise a flow-dependent formulation of the BECM.
 This work lays the foundation for a follow-up study addressing a more sophisticated a posteriori validation of the assimilation results in observation space [Desroziers et al., 2005]. This paper is organized as follows. Section 2 describes the theoretical background of the assimilation approach used. In section 3 the meteorological driver model and its geodetic grid, together with the chemistry transport model are presented. Further, the data assimilation setup is given in section 4, with emphasis placed on background error covariance modeling. The satellite data involved in this study are introduced in section 5. Results and statistical evaluations of the assimilation runs are presented in section 6, followed by conclusions made in section 7.
 Let i denote the nonlinear model operator that propagates the atmospheric state x0 at time t0 to time ti. Further, it is assumed that the model and its controlling parameters, except initial values, are of sufficient quality for a forecast of the time span of interest, with statistics of forecast errors known in terms of its error covariance matrix. In addition, given a set of observations or satellite retrievals over a time interval or assimilation window [t0, tN], we seek for a BLUE of the model's n initial values of the state variable x0 = x(t0). It is assumed that background and observation errors are normally distributed and uncorrelated.
 The four-dimensional variational method minimizes the cost function:
Adopting the Ide et al.  notation as far as possible, yi is the vector of observations available within time step i, xb is the first guess or background state at initial time, typically obtained from an earlier model forecast. The initial chemical state x0 of the model at time step i = 0, at the beginning of the assimilation window, is the optimization parameter, where the most probable state is to be identified. The background error and observation error covariance matrices are denoted by B and R, respectively. The observation operator H maps the model state onto the observation state. Superscript T denotes transposition. As is obvious from (1), Jb are the partial costs resulting from the deviation of initial state x0 from the background state xb, while Jo gives the partial costs arising from the deviation of the observations yi from the model equivalents H(i(x0)). Equation (1) implements the model as a strong constraint.
 Minimization algorithms, which are suitable for high-dimensional problems, include quasi-Newton or Conjugate-Gradient methods and require gradients of the cost function with respect to the control variables x0. The calculation of Jo is the computationally most demanding task of 4D-var data assimilation, given the number of control variables in atmospheric models is on the order 106–107. The feasible strategy to accomplish the calculation of the gradient makes use of the adjoint model operator. The gradient of J with respect to the initial model values x0 is given by [Talagrand, 1997]
 Here, Mi* is the adjoint model operator, linearized backward in time from time step N until time step i, while H is the linearized observation operator H.
3. System Description
3.1. Meteorological Driver Model
 The SACADA chemistry 4D-var system has been assembled from scratch to allow for a couple of novel features. In 1999 the operational icosahedral meteorological forecast model of the German weather service replaced the earlier global model (GM) as well as the regional model (EM), and was named GME [Majewski et al., 2001]. Its version 1.22 is adopted for SACADA and applied to compute the meteorological fields at the same temporal and spatial locations where they are needed by the chemistry module. Storing the meteorological states for each time step and recoverage for forward and adjoint integration of the chemistry model avoids temporal and spatial interpolation errors of meteorological parameters. For realization of this approach GME is integrated in the chemical 4D-var model code as a subroutine.
 GME has been selected because of its icosahedral grid design, which provides a nearly homogeneous distribution of grid points over the globe, avoiding singularities at the poles and crowding of grid points due to meridional convergence at high latitudes. Given the fact that most computational burden is associated with the solution of the stiff ordinary differential equations of chemistry at each grid point, a high numerical efficiency can be expected by the quasi-isotropic grid design. Since the new chemistry parts of the SACADA model, which have been developed in the framework of this study, adopt some of the numerical concepts, the relevant grid features with related discretized operators of GME are given below.
 In three-dimensional space an icosahedron is the highest Platonic body, shaped by 20 equilateral triangles as faces. To approximate the globe it is placed into a sphere of the same diameter, onto which the 12 icosahedral vertices and connection lines are projected. As shown in Figure 1, the resulting sections of great circles are equally subdivided into a number of ni intervals each, to form an almost isotropic grid. Each grid point has six direct neighbors except the twelve points located at the vertices of the icosahedron (termed “special points” hereafter), which have only five direct neighbors. The area of representativeness for a grid cell is a hexagon and pentagon at the twelve special points. This approach results in a mesh with approximately common mesh size all over the globe, with the minimum and maximum separation between neighboring grid points Δmin and Δmax varying about 20% only. For this first version of the SACADA system it has been decided that ni = 32 intersections give a sufficient resolution, resulting in 10 242 grid points per level. The minimum and maximum distances are Δmin = 220 km and Δmax = 263 km, respectively, and the average area of grid cell, denoting the horizontal representativeness in data assimilation, is about 50,000 km2. A traditional latitude-longitude grid of similar resolution, that is, a grid with the same area represented by one grid cell at the equator requires a grid spacing of, say, 2.0° × 2.4° resulting in 13,500 grid points per level which is about 30% more than the icosahedral grid.
 An orthonormal coordinate system (x0, eλ, eϕ) is attached to each grid point of the triangular grid, where x0 is the position vector of the grid point on the unit sphere pointing in the vertical direction, and eλ and eϕ are affine unit vectors aligned from x0 to the east and north directions, respectively. Local spherical coordinates (η, χ) are used to describe the position of an arbitrary unit vector x relative to (x0, eλ, eϕ) (see Majewski et al.  for details). With this approach, polar singularities are avoided. The zonal wind component u and the meridional wind component v are given by
respectively, where RE is the radius of Earth. The GME grid is horizontally unstaggered, (Arakawa A grid). The meteorological equations are formulated and solved in the (η, χ) local coordinate system. See Majewski et al.  for details.
 GME uses two different algorithms for horizontal advection. Cloud water and cloud ice are advected by a semi-Lagrangian scheme [Staniforth and Côté, 1991] with necessary adaptations to the icosahedral grid, while other prognostic variables are advected by an Eulerian scheme [Majewski et al., 2001]. The departure points of the trajectories are found by a two step iterative procedure. After temporal discretization, (3) gives
as cos χ = 1 at the first step, starting at the grid point with χ = 0. The trajectory midpoint is computed at a second step
where I is an operator that interpolates the local wind field at the position (Δη1, Δχ1). The departure point of the trajectory is then approximated by the relative position (2Δη2, 2Δχ2). Note that the wind field at time level n is used to compute the trajectory connecting the grid point at time n + 1 with the departure point at time level n − 1. Two different interpolation operators are available: A linear operator Il, taking into account the three nearest neighbors of the departure point, and an operator Iq performing quadratic interpolation using the values at twelve neighboring grid points (see Majewski et al.  for details). Wind fields in (5) are calculated by linear interpolation Il.
 In the vertical direction GME utilizes a hybrid σ-pressure coordinate system and an advection scheme that was proposed by Simmons and Burridge . A staggered (Charney-Phillips) grid is used with the geopotential and the vertical wind specified at the boundaries of layers. As the focus is placed on the stratosphere, the number of model layers has been increased from 32 to 42 for the SACADA assimilation system compared to the operational GME (version 1.22), and the top level pressure has been reduced from 10.0 hPa to 0.1 hPa, giving about 65 km height. The resulting pressure values together with the corresponding heights are shown in Figure 2.
3.2. Chemistry Transport Module
 The SACADA chemistry transport module solves the tendency equation in the local coordinate system of the icosahedral grid for a vector c of volume mixing ratios,
with chemical production P and loss L. Making use of an operator split approach [McRae et al., 1982; Yanenko, 1971], the partial differential equation (6) is separated into subproblems which describe the rate of change in volume mixing ratio due to horizontal advection, vertical advection and chemical production or loss, denoted by the superscripts h, v and c, respectively,
Let Mh(tn), Mv(tn) and Mc(tn) be the generally nonlinear discretized integration operators, which solve (7) using meteorological parameters at time step (tn). Then, an approximate solution of (6) for time step n + 1 may be obtained by
 In data assimilation algorithms, care for mass conservation is typically not taken, given the fact that global trace gas masses are not known with sufficient accuracy. Therefore, for efficiency considerations, the semi-Lagrangian scheme from GME has been selected as the horizontal transport algorithm for short-range forecasts. Departure points are computed according to (5) and saved together with the meteorological data during the GME run. Consequently, the operator Mh = Iq is remarkably simple. The interpolation operator Iq performs a quadratic interpolation using the twelve nearest grid points surrounding the departure point. To suppress negative volume mixing ratios, the SACADA-CTM employs the operator in its positive definite form, which means here that interpolated volume mixing ratios smaller than zero are set to zero.
 In the vertical direction the transport equation is solved by a linear and efficient implicit upwind algorithm Mv = [I + 2ΔtA]−1, where the tridiagonal matrix A depends on the vertical wind given in the hybrid vertical coordinate system and the pressure differences between adjacent layers. The scheme is unconditionally stable and linear, and consequently needs no chemical states saved for adjoint calculation.
 Forty-eight atmospheric trace constituents are treated in the SACADA chemistry transport module, all of which are advected. The reaction scheme comprises 167 gas-phase reactions (see Tables 1 and 2) and 10 heterogeneous reactions on surfaces of Polar Stratospheric Cloud (PSC) particles and in sulphate aerosol droplets (see Table 3). The mechanism follows partly Hendricks et al. , with extensions for the upper stratosphere and lower mesosphere. Reaction velocities follow recommendations from Sander et al. . A second-order Rosenbrock method is applied to solve the gas-phase chemistry (operator Mc). Details on this solver and its properties are given by Verwer et al. .
Table 1. Photolysis Reactions Included in the SACADA Gas-Phase Mechanism
 Stratospheric particles in the SACADA chemistry transport module include sulfate aerosol droplets, solid nitric acid trihydrate particles (NAT, type Ia PSC), supercooled ternary HNO3/H2O/H2SO4 solutions (STS, type Ib PSC), and water ice (type II PSC).
where is the mean molecular velocity of the reacting species in the gas phase, S is the surface area density of particles in units of m2/m3 and γ is the reactive uptake coefficient. The hydrolysis of ClONO2 and the reactions of ClONO2 and HOCl with HCl in sulfate aerosol/STS particles is treated following Shi et al. , while all other γ are taken from the recommendations of Sander et al. . A comprehensive discussion of the treatment of heterogenous reactions is given by Hanson et al. , Carslaw et al. , Hendricks et al.  and references therein. As a consequence of (9), the surface area density S and some representative radius r of stratospheric particles must be known in order to calculate the rates of heterogeneous reactions.
 Given the fact that the stratosphere has hardly been affected by volcanic eruptions over the last decade [Thomason et al., 1997; Deshler et al., 2003], it was decided, that for the present assimilation system version, a single reference profile of aerosol surface area and median radius was sufficient for uptake coefficient calculations. This reference profile was calculated using data from 20 balloon-borne measurements made between 1997 and 2002 at Laramie, Wyoming [Deshler et al., 2003]. A monodisperse size distribution and fixed number density is assumed for all particle classes. Nsulf = 5 cm−3 was taken as a representative number density for stratospheric aerosols [Deshler et al., 2003]. Following Hendricks et al. , PSC particle number densities are prescribed by NNAT = 1 cm−3 and Nice = 0.01 cm−3.
 Aerosol composition as well as the uptake of HNO3 into sulfate aerosol particles at low temperatures (and its removal from the gas phase) is parameterized according to Carslaw et al. . Further, the solubility and the corresponding gas-phase removal of HCl and HBr in sulfate aerosol is accounted for as devised by Luo et al. . Radius and surface area are then calculated from known composition using Nsulf.
 Surface area densities and radii for NAT and ice particles are calculated using thermodynamic equilibrium constraints for H2O and HNO3 over ice and NAT surfaces, as specified by Marti and Mauersberger  and Hanson and Mauersberger . If the number of H2O or HNO3 molecules is larger or smaller than predetermined by the saturation vapor pressure, the appropriate number of molecules is condensed or evaporated. Surface area and radius are inferred using Nice and NNAT. If ambient conditions favor ice formation, it is assumed that all other types of particles are incorporated into the ice, and the amount of HNO3 within ice is determined by its equilibrium pressure over ice surfaces [Hanson and Mauersberger, 1988]. As observations indicate that a supercooling of 2–4 K usually occurs before ice particles emerge [World Meteorological Organization, 1999, and references therein], ice formation is initiated if temperature falls 3 K below the ice frost point, or if water vapor reaches a 50% supersaturation with respect to ice surfaces. Further, it is well known [see, e.g., World Meteorological Organization, 1999] that homogenous nucleation of HNO3 and H2O vapor to NAT particles is not favored energetically. Therefore, NAT formation is initiated only at rather high supersaturation of HNO3 with respect to NAT. In the presence of preexisting NAT or ice particles, no phase transition barrier is assumed. The composition of STS particles is calculated after gas-phase removal of HNO3 due to NAT formation.
4. Variational Assimilation System Components
4.1. Adjoint Chemistry Transport Module
 The development of the adjoint model operator M*, which is needed for the computation of the gradient J of the cost function with respect to the initial volume mixing ratios c0(1) is constructed module by module, by forming the tangent linear of each individual line of code and its transpose. Adjoint operators Mc* and Mh* are constructed by application of KPP [Sandu et al., 1997] and TAMC [Giering, 1999], respectively. In contrast to Mv*, operators Mc* and Mh* involve the recomputation of required variables starting from the volume mixing ratios values before the respective forward operator was applied [Giering, 1999]. For each iteration, these values are saved to disk at each forward time step prior to the horizontal advection step Mh and prior to the call of the chemical solver Mc. In total, two chemical model states and one meteorological state are stored per advection time step, the latter once and before the chemical data assimilation procedure commences. Consequently, during the adjoint/backward integration, the chemical states are retrieved from disc in reverse order. The overall computational costs of the adjoint integration is about a factor 2.4 larger than the forward integration, including the time of writing and reading the model states during the forward and backward integration. The nonlinear forward model integration is updated after each adjoint integration. Typical number of iterations range between 12 and 16, with the former being sufficient for case studies spanning a sequence of consecutive days, where benefits of good background fields from earlier assimilation can be drawn.
 As another practical advantage of the icosahedral grid structure, it should be mentioned that the treatment of the adjoint of the semi-Lagrangian transport is particularly simple, since there are no pole singularities. Problems reported by Tanguay and Polavarapu  at the poles and actions for their reduction are inapplicable.
 During the course of adjoint model integration, the gradient due to observations yn+1, which are available within the time interval [tn, tn+2], is added to the adjoint variable c*(tn+1), which then is propagated backward in time by means of the adjoint model,
Finally, at n − 1 = 0, we get the sought after gradient Jo = c*(t0).
 As a simplification, the adjoint formulation does not include the dependence of reaction rates on concentration levels. This concerns the rates of photolysis reactions, which depend on the overhead ozone column, as well as heterogeneous reaction rates, which are calculated from particle size and composition. Extensive tests proved that the use of this slightly inexact adjoint does not hamper the minimization procedure.
 For the minimization of the cost function, the quasi-Newton limited memory L-BFGS (Limited-memory Broyden Fletcher Goldfarb Shanno) algorithm, which was devised by Nocedal  and Liu and Nocedal , was used. The entire model setup is parallelized. On a PC cluster with 16 processors (AMD Opteron with Infiniband connectivity) roughly representing the state of general purpose technology at the time of writing, the 4D-var assimilation of observational data collected during 24 h can be accomplished in less than 4 h, demonstrating the required computational efficiency of the setup.
4.2. Formulation of the Cost Function
 The (de)correlation length or radius of influence L is the control parameter for the spatial information spread induced by the BECM. Extending the correlation lengths, the condition number of B increases rapidly [see, e.g., Elbern and Schmidt, 1999], and preconditioning of the minimization problem is mandatory. With the aid of the diffusion approach [Weaver and Courtier, 2001], the BECM is easily used for preconditioning with the transformation to variable v0,
where δx0 ≔ x0 − xb, and B = B1/2BT/2. Here, BT/2 means the transposed square root of B. The cost function remains invariant under this transformation, i.,e., J(x0) = J(v0), while the gradient now reads
After the application of the minimization routine, the new output v0 is saved for the next iteration step, while the initial values read xo = B1/2v0 + xb. Selecting x0 = xb as a first guess atmospheric state, results in v0 = 0 for the first iteration. Consequently, the transformation (11), and hence B−1/2, is never explicitly needed.
4.3. Correlation Modeling Using a Diffusion Approach
 In practice, BECM modeling by the diffusion paradigm [Weaver and Courtier, 2001] satisfies three fundamental needs: (1) it substitutes an intractably large matrix by an operator of linear complexity only, while ensuring positive definiteness and symmetry, (2) it allows for flexible design of anisotropic and inhomogeneous correlation lengths, and (3) it provides a square root decomposition for preconditioning. Given the prominence of this technique in this study, the method is outlined below.
Weaver and Courtier  make use of the fundamental solution ψ(z, t) at location z and for time t of the diffusion equation, for clarity reading in one-dimensional form,
which is given by the convolution of ψ(z, 0) with a Gaussian
After normalization a Gaussian function defines a valid correlation function with
acting as the square of a correlation length scale.
 The same considerations apply for the solution of the two-dimensional diffusion equation on the sphere. Weaver and Courtier  further demonstrated that a quasi-Gaussian covariance operator can be approximated by a diffusion operator L, which solves (13) numerically. The BECM can be factorized by the standard deviation matrix Σ and the correlation matrix C, reading
4.3.1. Discretized Formulation
 In a formalized notation, the solution ψ of the discretized diffusion equation after M time steps can be written as
with D denoting a discrete representation of the Laplace operator.
 In discretized form the Laplace operator is self-adjoint with respect to a W metric and therefore satisfies 〈WDψ1, ψ2〉 = 〈ψ1, WDψ2〉 = 〈DTWψ1, ψ2〉, where W ≔ diag(Δs1, ..., Δsn) is a diagonal matrix containing the volume of the corresponding grid cells, and 〈·, ·〉 denotes the canonical scalar product. The square-root formulation of the discrete diffusion operator L needed for preconditioning is obtained from the fact that
However, diffusion integration by L does not satisfy the autocorrelation condition ρi,i = 1. To ensure this, a diagonal normalization matrix Λ is introduced, defining
In the idealized case a homogeneous and isotropic radius of influence, diagΛ = preserves the normalization of a Gaussian distribution. Relaxing homogeneity and isotropy simplifications, as is of interest here, other techniques must be invoked (see section 4.3.2).
 To apply the diffusion equations as correlation models a three-dimensional diffusion operator L and its square root decomposition have to be constructed, by applying the vertical and horizontal diffusion operator Lh and Lv alternatingly.
4.3.2. Flow-Dependent Inhomogeneous and Anisotropic Correlations
 Radii of influence are introduced to background error covariance modeling not only to achieve smooth fields, but also to spread information of observations within a reasonable and statistically sustained distance. This is based on the assumption that fairly homogeneous air masses with similar concentration levels of chemical compounds show high correlation values. As air masses are subject to atmospheric dynamics, for better exploitation of measurements it is desirable to devise a flow-dependent background error covariance parameterization. Due to the linkage between model dynamics and background error covariances, it is reasonable to assume, that background errors show a stronger correlation between air parcels belonging to the same air mass [Riishøjgaard, 1998]. A way to distinguish between different air masses is due to Ertel's potential vorticity (PV), which is defined by
where θ is the potential temperature, k the vertical unit vector, and ζ the vector of absolute vorticity. For stratospheric conditions ∇θ is nearly aligned with the vertical direction. In the lower stratosphere, larger deviation may occur, but Elbern et al.  found, even in this region, the approximation in (20) to be accurate within 15%. As the potential vorticity is conserved for adiabatic and frictionless motion, it can be concluded that the chemical composition of individual air parcels is correlated along lines of constant potential vorticity, as long as diabatic processes are weak enough. Sankey and Shepherd  investigated the correlation between different long-lived species, and found it dependent on height and latitude. Since in our context the correlation between PV and species is only invoked locally and the assimilation interval is not extended beyond 24 h, our assumptions appear to be justifiable. It is clearly desirable to attribute each species individual correlation lengths. However, due to lack of related knowledge, our present approach rests on the assumption that the chemical scenario and conditions are sufficiently uniform in a contiguous air mass of equal potential vorticity level.
 A disadvantage concerning the use of PV for the purpose of discriminating different air masses is the fact, that the term ∂θ/∂p varies exponentially with height. Therefore it was decided to use a modified potential vorticity, which is defined according to Lait  as
with ζ = R/cp, where R is the gas constant for air and cp is the specific heat at constant pressure. The dimensionless scaling factor (θ/θ0) (with θ0 arbitrarily chosen as 420 K) removes most of the altitude dependence of P while preserving its conservation properties (see Lait  for details). An example for zonal mean PV modified following (21) for 18 November, 2002, is presented in Figure 3, where vertical dependencies are considerably attenuated.
 To account for anisotropic and inhomogeneous background error correlations a symmetric coordinate stretching tensor S is introduced in the horizontal two-dimensional diffusion equation.
Tensor S is composed of a diagonal tensor containing stretching factors 1 and 2 and a rotation tensor which rotates the local coordinate system such that the stretching can be applied along the two coordinate axes of the rotated system:
where α is the rotation angle.
 The discrete three-dimensional diffusion operator takes the same form as the isotropic scheme, with the exception that the horizontal Laplacian operator Dh is replaced by a discrete representation of div (S gradh).
4.3.3. Normalization Matrix
 An algorithm, which calculates an approximation of the normalization factors, has been proposed by Weaver and Courtier : An ensemble of Q random vectors vq having zero mean and unit variance is generated, and the transformed ensemble q ≔ L1/2W−1/2vq is calculated. Since E[v] = 0 and E[vvT] = I, where E stands for expectation value, the diagonal elements i may be estimated from
The standard deviation of the i derived on the basis of (23) can be shown to be 1/ if v is a Gaussian random vector [see Weaver and Courtier, 2001, and references therein]. Hence, an efficient way to approximate the normalization matrix is given by setting Λ = diag (1/.
4.3.4. BECM Implementation
 Two background error covariance parameterizations, an isotropic and a generalized scheme based on (22) have been developed for the SACADA system. Normalization factors for the anisotropic scheme are computed according to the random method (23). The distribution of relative differences between the random normalization factors and the exact solution is shown in Figure 4 for different numbers of ensemble members Q. Standard deviations of the three tested realizations of random normalization factors are in good accordance with the theoretical value 1/. For Q = 5000, which was the ensemble size selected for subsequent tests, the resulting standard deviation is 1% and hence, the probability is less than 1% that a single normalization factor differs more than 3% from the exact value. The computational effort that has to be spent on the generation of an ensemble of this size depends on the diffusion length scale Lh and the stretching factors 1, 2, but is generally comparable to the time consumed by one iteration of the assimilation algorithm.
 The rotation angle α is calculated at each grid point as the angle between the gradient of the potential vorticity and the direction to the north of the local coordinate system. Hence, according to (22), TT∇hψ is the gradient of an arbitrary scalar field ψ transformed into a (, ) coordinate system, where the axis is aligned to the direction of the PV gradient. Consequently, the stretching factors 1 and 2 specify the stretching or shrinking of coordinates (, ) in the direction perpendicular and parallel to the PV gradient, respectively. In the current version of the SACADA system, there is a linear decrease of 1 and 2 between ∣(∇P)max∣ and ∣(∇P)max∣/5, where ∣(∇P)max∣ is the absolute value of the maximum gradient of modified PV over the model domain. At locations where the PV gradient is smaller than 20% of the maximum PV value, no coordinate stretching is applied.
 As an example, the horizontal correlations generated by the two schemes are shown in Figure 5. Isotropic correlations with a horizontal length scale of 600 km are displayed at the top, and the outcome of the anisotropic scheme using the settings Lh = 600 km, 1 = 4 and 2 = 0.25 is shown at the bottom. The meteorological situation from 18 September 2002 was taken to calculate the anisotropic correlations; the corresponding PV field is shown by isopleths in Figure 5. The larger correlation between grid cells along the edge of the polar vortex is clearly visible.
5. Data Description
 The SACADA assimilation system has been tested and evaluated by means of two case studies. Data from three different limb viewing instruments, namely MIPAS, HALOE, and SAGE II, have been used for this purpose. MIPAS has been the data source for assimilation case studies presented in this work. Data from SAGE II and HALOE have been withheld for use as independent (not assimilated) control data sets.
 MIPAS is a Fourier-transform spectrometer measuring high-resolution emission spectra in the midinfrared from 4.1 to 14.7 μm wavelength in a limb viewing mode. A detailed description of the instrument design is given by Fischer and Oelhaf . In 2002, MIPAS was placed into a Sun-synchronous polar orbit with an inclination of 98.55°. The orbit is almost circular at about 800 km altitude resulting in a total of 14.3 orbits that are performed each day. The descending node of the orbit (crossing of the equatorial plane from north to south) is located at 1000 local time. The instrument's field of view is about 30 km in the horizontal and 3 km along the vertical direction at the tangent point. A single limb scan covers an altitude range from approximately 6 to 68 km by 17 steps. The trace gas profiles which are operationally delivered by ESA as described by Ridolfi et al.  and Raspollini et al.  as version 4.61/4.62 are assimilated in this system validation study. Profiles of O3, NO2, CH4, HNO3, H2O and N2O are contained in these data products and assimilated.
5.2. SAGE II and HALOE
 The SAGE II instrument was launched in October 1984 on board the Earth Radiation Budget Satellite (ERBS) spacecraft. The instrument operated in the UV, visible and near-infrared (385–1020 nm) wavelength region employing the solar occultation technique leading to a maximum of 15 profiles per day after technical problems in July 2000. Due to the orbit geometry, tangent point locations vary slowly from 70°N to 70°S within approximately one month. Retrieved profiles of aerosol extinction, O3, NO2 and H2O are provided.
 HALOE was in operation from October 1991 to November 2005 on board the Upper Atmosphere Research Satellite (UARS). The sensor performed solar occultation measurements at sunrise and sunset in the infrared wavelength region and profiles of O3, HCl, CH4, H2O, NO, NO2, HF, temperature and aerosol extinction have been derived routinely. During nominal operation, about 30 occultation events per day have been recorded. As in the case of SAGE II, the latitude of tangent point location changes slowly from day to day, covering a range of 80°N to 80°S within approximately one month. The latest data product release (version 19) from the NASA Langley Research Center has been used for validation.
6. Case Studies
 The performance achieved with the 4D-var system both in terms of analysis skill and numerical efficiency is demonstrated by operation during two selected episodes. The two time intervals in 2002 and 2003 cover 1 September to 15 October 2002, hereafter referred to as CS1, with two data gaps (9 to 11 September and 29 September to 11 October), and 21 October to 30 November 2003, referred to as CS2. During CS1, at the end of September 2002, a major stratospheric warming occurred with the souther polar vortex, which caused the antarctic polar vortex to split into two fragments around 25 September [e.g., Allen et al., 2003; Bencherif et al., 2007].
6.1. Experimental Setup and Cost Function Reduction
 The 4D-Var analysis window is 24 h long, and the meteorological simulation is initialized at the start of each window with fields from European Centre for Medium Range Weather Forecasting (ECMWF) analyses. The analyzed chemical fields from the end of one assimilation window are used as the background for the next assimilation. At the beginning of each case study chemical initial values provided by the two-dimensional SOCRATES model [Brasseur et al., 1995] are adopted. After interpolation of these data onto the icosahedral grid, a 48 h model spin-up run was performed for chemical relaxation, followed by a second spin-up phase of 6 days with data assimilation. During the spin-up phase the PSC scheme has been turned off until day five. Both episodes have been analyzed twice: In the first experiment traditional isotropic covariances with Lh = 600 km radius of influence were applied, and in the second experiment PV-structure controlled anisotropic covariances were used. In the latter case local coordinate stretching factors have been set to 1 = 4 and 2 = 0.25, corresponding to halving the correlation length in the direction aligned with the local PV gradient (300 km), and doubling along PV isopleths (1200 km). Values for background error standard deviation are taken to be εb = 40% and Lv = 1.5 km has been chosen as the settings for the case study periods. Control model runs without data assimilation starting from the final analysis of the spin-up assimilation have been performed for both case studies.
 The observation error covariance matrix R has been simplified to diagonal form with the variances taken from the respective data products. As observations with unrealistically small errors receive too much weight within the data assimilation procedure, it has been decided to assume some minimum relative error for the MIPAS data products. Different values for the minimum relative error margin have been tested and the results presented here, have been obtained with MIPAS retrieval error margins increased to a minimum of 10% relative error.
 In 4D-var, the background errors of the chemical species, that is the standard deviations , must be selected a priori, but are amenable to posteriori by χ2 validation following Talagrand . The statistical evaluation is considered also in part II of this study by a novel and more focused approach. In order to test this condition, the cost function values are normalized by the number of available observations p to render the results comparable among different days. Following Talagrand , at the minimum of the cost function (1), that is for the sought after most probable chemical state or analysis xa, the value of Jpa ≔ J(xa)/p = 1/2 should be expected, if properly defined covariance matrices R and B had been used.
Figure 6 presents the evolution of the costs of both the background and assimilation based runs during the sequence of days of episode CS1, also displaying the effect of temporal data gaps. While the costs of the first days during the assimilation spin-up period exhibit large deviations from the target value of Jpa = 1/2, a fast approximation can be observed toward the desired value.
 During the spin-up assimilation of 6 days, the normalized background costs, where the background term in the cost function is Jb = 0,
rapidly decrease within the first few days. Note that Jpf gives the mean square difference between the background based forward run and observations, weighted by the observation error covariances. This quantity can be regarded as a quadratic measure of 24 h forecast skill, depending largely on the quality of the previous analysis. During the rest of the period the cost function is reduced by a factor of approximately 2–3, with little day-to-day variation.
 The impact of the data assimilation procedure compared to free control forecasts can be estimated by data gaps of 3 and 14 days. To bridge these gaps, the model has been operated in forecast mode starting from the last available analysis. Consequently, Jpf at 12 September 2002 and 12 October 2002 reflect the forecast skill of a 4 day and a 2 week prediction, respectively. The factor by which Jpf is larger for the 4 day forecast in comparison to the Jpf value at other days is less than 2, and even after bridging the data gap of 14 days this factor is not larger than 3. This demonstrates the beneficial impact of data assimilation.
6.2. Flow-Dependent BECM Parameterization
 The benefits of the anisotropic background error covariance formulation is estimated by the elongated dynamical structures along polar vortex edges, where “collars” of elevated ozone mixing ratios occur. The impact of the flow-dependent, anisotropic and inhomogeneous background error covariance formulation versus the isotropic formulation can be observed from Figure 7. The ozone analyses for 18 September 2002 at 44 hPa obtained with the isotropic BECM parameterization can be compared with the analysis inferred by anisotropic formulation, together with the potential vorticity and the 24 h forecast. Prevailing zonally elongated structures are clearly exhibited by the model forecast. While these structures are emulated by the anisotropic BECM formulation, the isotropic version distorts filamentary structures at the vortex edge. Given the polar orbit of ENVISAT and many other typical environmental satellites, the zonal extension of influence radii has a desirable complementing effect by propagating information in the zonal direction, between the satellite tracks. A second example taken from case study CS2 at 55 hPa on 23 October 2003, is given in Figure 8.
 For independent validation, two SAGE II profile retrievals close to the southern polar vortex edge were available for this day, the position of which are indicated by a square and triangle symbol in Figure 8. Figure 9 exhibits comparisons between assimilation results from homogeneous and isotropic versus inhomogeneous and anisotropic correlation modeling. In both cases, it can be seen that there are substantially smaller differences to SAGE profiles with the inhomogeneous and anisotropic correlation model in comparison with the homogeneous and isotropic correlation approach.
 It should be noted that the parameters for the diffusion scheme are still tentatively estimated: The basic correlation length scales Lh and Lv as well as the relative background error εb are, while easily variable, held constant in the model domain and the coordinate stretching factors 1 and 2 in (22) are calculated using a simple dependence on the gradient of potential vorticity.
 Nevertheless, it can be clearly observed that the elongated gradient structures of the ozone field along the edge of the polar vortex are maintained if the anisotropic BECM formulation is applied, and the collar of high ozone volume mixing ratios around the vortex does show a typical streamer structure. Hence, the benefit of the anisotropic BECM formulation is visible, at least in the region around the polar vortex.
6.3. Statistical Evaluation
 The validation comprises assimilation results after model and data assimilation spin-up runs. Emphasis will be placed on ozone, complemented by CH4, N2O, NO2, H2O, and HNO3. The following validation is separated into consistency tests with respect to assimilated MIPAS retrievals and, in section 6.3.2, validation with independent data from SAGE II and HALOE.
6.3.1. Self-Consistency Validation of Analyses by Assimilated MIPAS Retrievals
Figure 10 presents four examples of averaged volume mixing ratio profiles for CS1, including O3, CH4, H2O, and NO2, the latter for nighttime conditions. Averaged profiles of analysis results, the control run and MIPAS retrieval results are depicted by averages of the 8 November to 15 October 2002 time span over latitude belts as indicated. Shaded error margins are obtained from the retrieval files, but are augmented to a minimum of 10% as lower threshold defined for R. The selected averaged altitude intervals of 3–5 km are compatible with the vertical resolution of the MIPAS instrument. Overall, the analysis results are closely aligned to the centerline of the retrievals, with some exceptions. In the case of ozone, the control run indicates a model proclivity to underestimate the retrieved ozone concentrations for middle stratospheric and lower mesospheric height levels, which is corrected by the assimilation procedure. This is also seen in BASCOE analyses [Errera et al., 2008]. In the case of methane, analysis and control run differences are small, but indicating a small secondary maximum at 70 hPa. Similarly, for water vapor, significant discrepancies between control simulations and retrievals and analyses in the lower stratosphere, indicate false water vapor updrafts through the tropopause, presumably due to numerical diffusion. This effect is markedly reduced by data assimilation, albeit not eliminated. As an example of a species with a pronounced diurnal cycle, NO2 profiles are presented, exhibiting significant discrepancies between control run and retrieval in the lower mesosphere. Again, the assimilation results indicate a close agreement between retrievals and analyses.
Figure 11 displays the relative difference profiles of assimilation results minus assimilated MIPAS retrievals, normalized by retrieved concentration levels: (A − O)/O. The associated standard deviation profiles are depicted by symmetric graphs as well. As additional information, differences between observations and control run as well as first guess/background run profiles are included.
 Ozone analyses and background values are generally in good agreement with retrievals, as can be seen in Figure 11a. Major relative deviations from retrievals can be observed at lower stratospheric and lower mesospheric height levels. Nevertheless, in view of the prevalent low concentration levels at those height levels, differences are marginal in absolute terms. However, the tendency for modeled lower mesospheric ozone concentrations being lower than retrieved by MIPAS is visible throughout all latitude belts, especially at tropical latitudes, indicating a systematic discrepancy between MIPAS and the model. An example of the beneficial effect of data assimilation can be seen from the striking improvement of southern polar vortex dependent ozone analyses at about 50 hPa. Here, and at other latitude belts, the control run sequence exhibits a marked peak, designating a difference between the control run on the one hand, and MIPAS retrievals, analyses and short-term forecasts on the other hand.
 The assimilation of MIPAS methane retrievals renders substantial improvements after data assimilation in both polar regions. Figure 11b exhibits significant deviations between analyses or assimilation based forecasts and the control run throughout all height levels in the southern polar region, and in the northern polar region above the stratopause. All other latitudinal belts indicate far better agreement, with insignificant deviations from MIPAS for analyses and background runs.
 Water vapor data assimilation results are given in Figure 11c. While the MIPAS retrieval assimilation displays a good to excellent agreement with both assimilation results and also the control run at most height levels, visible deviations in the lower mesosphere, and pronounced deviations in the tropopause and lowest stratosphere region occur.
 Mean profiles relative to MIPAS retrievals of NO2 have been separated into nighttime and daytime profiles, as NO2 shows a strong diurnal cycle with higher values occurring at night. Note that the sunset at the top of the model domain (65 km) occurs at zenith angles of about 100°. To exclude twilight observations, a nighttime profile was defined with zenith angles greater than 110°, while daytime observations were classified with zenith angles smaller than 90°. The assimilation of photochemically reactive constituents is generally more challenging than the assimilation of more inert gases, as the tangent-linear approximation, which is implied by the use of the adjoint model, may become poor, especially if an observation took place during daylight and the analysis at the beginning of the assimilation window is valid for local nighttime conditions, or vice versa. Nevertheless, the analyses of both, nighttime and daytime NO2 show a good agreement with observations, as is displayed for nocturnal NO2 in Figure 11. The latitude belt between 30°S–60°N is an exception at highest model levels. Overall, these results agree with those found by Errera et al. .
 Adopting the assumption of Gaussian probability density functions (PDF) of errors and the resulting quadratic form of the cost function (1) implies that also the PDF of analysis errors is Gaussian. These underlying statistical assumptions constitute necessary conditions for the correctness of the inferred analyses. The practical justification can be considered by inspecting the differences between observations and background (O − B) as well as observations and analysis (O − A) values.
Figure 12 exhibits an estimate of the PDF of (O − B) and (O − A) differences for ozone, methane, and nitric acid in CS2, by three height intervals, spanning 146–2 hPa. Upon inserting the graph of an exact Gaussian with maximum located at the mean (O − B) value and the corresponding standard deviation, a direct comparison reveals the degree of agreement between the assumption and practical results. Note that absolute biases can only be identified by independent measurements of significantly higher quality than assimilated data. Relative biases can be directly inferred by displacements of histogram peaks from the zero line. Histograms narrower than the exact Gaussian while higher at the maximum indicate a noticeable number of outliers. If there is a marked difference between the positions of the histogram maximum and the exact Gaussian maximum, a single sided prevalence of outliers is indicated.
 In general we find that most distributions are peaked around zero and are approximately symmetric, whereas a notable bias occurs for HNO3 in the 38–8 hPa range. The fact that the distributions are generally more peaked than a Gaussian with the same variance, is a well known phenomenon. It is caused primarily by observational data with gross errors [see, e.g., Kalnay, 2003]. While these outliers are included for this statistical exposition here with full quadratic weight, the data assimilation result may be only marginally influenced. The reason for this is that outlier retrievals are typically attributed with much larger errors. Therefore, the impact of the outliers on the assimilation result is markedly reduced.
 It is concluded, that the underlying Gaussian assumptions are not violated to an extent which causes problems with the analyses, such that the application of 4D-var is not any more justified. Improvements can be expected from a more advanced identification scheme of retrieval faults by quality control.
6.3.2. Independent Evaluation by HALOE and SAGE II
 In order to validate the MIPAS based assimilation results by independent data, retrievals of HALOE and SAGE II were used, but withheld from assimilation. It must be noted that both HALOE and SAGE II are occultation instruments, which provide data profiles crossing the terminator line. Therefore HALOE and SAGE II retrievals are geolocated far away from MIPAS footprints, except in polar regions.
Figure 13 shows comparisons of analyses, background based runs and control runs with HALOE retrievals for O3, CH4, H2O, and NO2, by the same format as for Figure 11. For ozone above 1 hPa, the model proclivity for underestimation is more pronounced in Figure 13a than shown for MIPAS retrievals in Figure 11a. In contrast, for the lower stratosphere, control run and assimilation results are mostly larger than HALOE retrievals, maintaining in better agreement with MIPAS retrievals. Improvements by data assimilation can be claimed, as there is a closer fit of analysis profiles and background profiles with HALOE, than the control run. Again, the forecasts based on the background initial values benefit from assimilation results of previous days. A study by Cortesi et al.  reveals that displayed systematic discrepancies in the lower stratosphere appear to result mainly from corresponding systematic differences between HALOE and MIPAS retrievals, at least within the 100–1 hPa height range, where MIPAS retrievals are mainly higher than HALOE retrievals.
 Comparing assimilation results for CH4 with HALOE retrievals (Figure 13b), a general agreement can be found throughout the stratosphere. Notable differences occur in the in the lower mesosphere, with analysis values being smaller. This can also be stated for the lower stratospheric southern polar region. Given the excellent consistency between MIPAS and analyses, the discrepancies with HALOE result from the assimilation of MIPAS data.
 HALOE water vapor observations are systematically smaller than assimilation results in the middle and even more in the upper stratosphere (Figure 13c). This is in contrast to the general agreement found between MIPAS retrievals and both analyses and control runs, and reflects the differences between MIPAS and HALOE retrievals in those height levels. Similar or even enhanced discrepancies between HALOE and MIPAS retrievals occur below 100 hPa. For this low elevation however, estimated retrieval errors of both sensors are high, such that there is little impact on the analyses by those retrievals.
 For a comparison of MIPAS based assimilation results with SAGE II data, only retrieved ozone and water vapor profiles are available. Figure 14a displays the results for ozone, where the same scheme of deviations from MIPAS based analyses is exhibited as for HALOE: Aloft of 10 hPa the (A − O)/O differences are increasingly negative with height, while becoming positive in the lower stratosphere. For the southern polar region [90°S–60°S] ozone below 10 hPa shows a large discrepancy between MIPAS and SAGE II. It should be noted again that the estimated retrieval errors in the lower stratosphere are large. Nevertheless, the differences between analyses and SAGE II retrievals are mostly smaller than for the control run and the first guess run. This indicates the sustained information gain by the assimilation procedure, despite systematic differences between MIPAS on the one hand and the occultation sensors SAGE II and HALOE on the other hand.
 SAGE II water vapor (A − O)/O differences are similar to HALOE, where values aloft of 20 hPa are higher than MIPAS retrievals and assimilation results. The use of SAGE II retrievals are explicitly not recommended at lowest stratospheric height levels and therefore cut off in Figure 14a).
 The novel SACADA 4D-var system for operational assimilation of stratospheric observations has been developed ab initio, aiming to provide a middle atmospheric chemistry analysis tool, which is efficient in terms of processing available retrieval information and use of computational resources. The common grid structure and the common time steps of the meteorological driver module GME with the chemistry data assimilation section avoids spatial and temporal interpolation of the meteorological fields with associated information loss and error generation. In particular, a consistent representation of vertical wind fields is available for the solution of the advection-reaction equation.
 Moreover, the numerical efficiency benefits considerably from the icosahedral grid, where the computational costs are reduced by about 30%, compared to CTMs employing a traditional latitude-longitude grid. In addition, the semi-Lagrange horizontal transport scheme applicable for short simulation intervals, leads to an excellent efficency of the new system, where the number of transported constituents is large. While for adjoints of semi-Lagrangian schemes on traditional grids considerable efforts are required to maintain accuracy at the poles, the icosahedral grid is not affected. The efficient system design enables the application of the computationally costly 4D-var technique some eight times faster than real time, and opens options for further grid refinements and chemical mechanism extensions.
 Particular efforts were devoted to implement spatial correlations with the BECM. This problem was solved by introducing the diffusion approach following Weaver and Courtier . A modified potential vorticity was adopted to identify air mass structures, allowing to estimate anisotropic and inhomogeneous correlation lengths accordingly. The implementation presented in this work assumes larger background error correlations along isopleths of potential vorticity in regions where large gradients of potential vorticity prevail. At the polar vortex edge, analyses of chemical constituents appear to be more consistent with the dynamics than those made with a homogeneous and isotropic formulation, which ignores dynamic patterns.
 The diffusion approach proved to have several advantages, as it (1) saves the storage of a full BECM, which is replaced by an operator with the same statistical properties, (2) does not need the assumption of separability of horizontal and vertical correlations, as related diffusion operators are alternatingly applied during the three-dimensional integration, (3) allows for an easy preconditioning of the minimization problem by straightforward calculation of the square root of the BECM, (4) is numerically efficient, as the complexity of the calculation is linear with the dimension of the model grid, (5) is amenable for flow-dependent design of spatial covariances, allowing for pronounced variations of the correlation lengths, and (6) can be easily adapted to nonstandard grid design as the icosahedral grid, without any difficulties.
 A suite of two case studies, comprising 1 September to 15 October 2002, and 21 October to 30 November 2003 served for validation of the assimilation system. EnviSat-MIPAS data products have been assimilated and the assimilation results have been validated with independent (not assimilated) data from SAGE II and HALOE. A sequence of tests proved the quality of the assimilation results. The a posteriori validation of normalized costs demonstrated a fast convergence toward the optimal value Jpa ≈ 1/2 after 4 days of spin-up. Display of (O − A) and (O − B) density distributions reveal nearly bias free analyses with significantly reduced variance of PDF. The PV controlled dynamical BECM formulation exhibits advantages in areas with pronounced PV gradients. Comparisons of MIPAS based assimilation results with not assimilated SAGE II and HALOE retrievals revealed significantly improved analyses, albeit in the limits of agreements between the infrared sounder and the occultation sensors. It could be shown that the application of the assimilation system also improved short-range forecasts, as demonstrated by the background based model runs.
 In a companion paper (J. Schwinger and H. Elbern, Chemical state estimation for the middle atmosphere by 4-dimensional variational data assimilation: 2. A posteriori validation of error statistics in observation space, submitted to Journal of Geophysical Research, 2010), an in-depth a posteriori validation is provided, where the mutual consistency of the background and observation error covariance matrices will be established and an analysis error assessment in observation space will be presented.
 The SACADA system is running an operational service providing daily analyses in near real time.
 The authors are highly indebted to the German Weather Service and D. Majewski for giving access to GME code and providing advice. W. Joppich, S. Pott, and H.-G. Reschke, SCAI, Fraunhofer Society, gave a lot of support in adapting the vertical grid structure of GME to the needs of stratospheric modeling. J. Hendricks, DLR, provided advice on the use of the chemical mechanism including heterogenous chemistry to the SACADA system. We are very grateful to D. Poppe, ICG-2, Research Centre Jülich, and E.-P. Röth, University Essen and ICG-2, FZ Jülich, for a critical final review of the extended version of the chemistry mechanism, and to Anne Smith, NCAR, for provision of photolysis rates. MIPAS data have been processed and provided by ESA. We are grateful to G. Brasseur, NCAR, and A. Sandu, Virginia Tech, for giving access to SOCRATES and KPP software, respectively. SAGE II data were obtained from NASA Langley Research Centre, and HALOE data were obtained from Hampton University, Virginia, and NASA Langley Research Center. We are further indebted to the SACADA team, most notably M. Riese and L. Hoffmann, ICG 1, FZ Jülich, T. von Clarmann, IMK, KIT, and H. Bovensmann, IFE, University of Bremen, for manifold discussions on satellite retrieval error characteristics. The meteorological data for driving GME were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF). Computational resources were provided by University of Cologne's computer centre RRZK and the Jülich Supercomputing Centre. This work was funded by the German Federal Ministry of Education and Research in the frame of the funding program AFO 2000 with the grant FZK 07ATF48. The authors want to thank three anonymous reviewers, who helped to improve the manuscript.