[10] We rewrite the CTM dynamical differential equation in discrete form from time t_{k−1} to t_{k},
where x^{t} denotes the true state vector of dimension n, M_{k−1} corresponds to the (nonlinear) dynamical operator from k − 1 to k, and ε^{f} is the model error vector assumed to have a normal distribution with zero mean and covariance matrix Q. In this study, the state is chosen to be the vector of concentrations for the concerned species. For simplicity we drop t_{k} to subindex k, e.g., x_{k}^{t} for x^{t}(t_{k}) and Q_{k−1} for Q(t_{k−1}). At each time t_{k}, one observes,
where ε_{k}^{o} is the observation error vector assumed to have a normal distribution with zero mean and covariance matrix R, and H_{k} is the (possibly nonlinear) operator that maps the state to the observation space at time t_{k}. The vector y_{k} is of size p, and usually p ≪ n. The error vectors ε_{k−1}^{f} and ε_{k}^{o} are supposed to be independent.
[11] Let x^{b} be the a priori state estimation (background) with error ε^{b} = x^{b} − x^{t} of zero mean and covariance matrix B, and let x^{a} be the posterior state estimation (analysis) with error ε^{a} = x^{a} − x^{t} of covariance matrix A. The data assimilation problem is to determine the optimal analysis x^{a} and its statistics A given background x^{b}, observation y, and the statistical information in error covariance items R and B.
2.1. Optimal Interpolation
[12] Optimal interpolation [Daley, 1991] searches for an optimal linear combination between background and innovation by minimizing the stateestimation variance. The innovation d is the difference between the observation vector and the state vector, i.e., d = y − H(x^{b}). Under linearity assumption of the observation operator close to the background, i.e., H(x) − H(x^{b}) = H(x − x^{b}) where H is the linearized operator, the estimation formulae are given according to best linear unbiased estimator theory as follows:
[13] In practice, setting the background error covariance remains problematic. In this study B is either diagonal or in Balgovind form. In the latter case, the error covariance between two points is given by
where L is a characteristic length, d is the distance between the two points, and v is the a priori variance [Balgovind et al., 1983].
2.2. Ensemble Kalman Filter
[14] Ensemble Kalman filter [Evensen, 1994, 2003] differs from optimal interpolation in that the error covariance matrix is timedependent. The assimilation process follows the cycling of two steps of forecast and analysis. At forecast step, the model is applied to the rmember ensemble {x_{k−1}^{a,(i)}, i = 1, …, r}, and produces the forecast {x_{k}^{f,(i)}, i = 1, …, r}. The forecast error covariance matrix P^{f} can be approximated by the ensemble statistics. Whenever observations are available, the cycling enters into analysis step, and each ensemble member x_{k}^{f,(i)} is updated to x_{k}^{a,(i)} according to the OI formula (5)–(6), with background error covariance matrix B replaced by forecast error covariance matrix P^{f}. Although not necessary in the algorithm, the analysis error covariance matrix P^{a} can then be approximated with the analyzedensemble statistics.
[15] We summarize the algorithm as follows.
[16] 1. First we perform initialization. Given the probability density function (PDF) of the initial concentrations, an ensemble of initial conditions is generated. In our experiments, except for the cycling context in section 4.4 where initial ensemble members are ensemble forecasts from the previous cycle, we skip this step: all members in the ensemble start with the same initial condition. The first integration steps are therefore a spinup period during which the ensemble spread is essentially increasing as a result of the perturbations on uncertain parameters.
[18] 3. An analysis formula is applied,
where _{k}^{a} is the mean of analysis ensemble {x_{k}^{a,(i)}, i = 1, ., r}, y_{k}^{(i)} is the observation vector, and the Kalman gain is approximated by
[19] The ensemble initialization and the determination of the model error ε_{k−1}^{f,(i)} are entangled problems. In our implementation, we take identical initial samples, and the model error is approximated by perturbing model input data and model parameters,
where d is the vector of parameters to be perturbed, and for ith sample, w^{(i)} is the diagonal matrix whose diagonal elements are perturbation coefficients (see section 2.5). Let e_{k}^{f,(i)} be x_{k}^{f,(i)} − _{k}^{f}, one (approximate) direction of the forecast error, and let E^{f}_{k} be the matrix [e^{f,(1)}_{k}e^{f,(2)}_{k} …e^{f,(r)}_{k}]. By formula (9), we have
In this way, the error covariance matrix is approximated by ensemble statistics.
[20] In the original EnKF algorithm, the observation vector y_{k}^{(i)} is perturbed for consistent analysis statistics [Burgers et al., 1998]. In this paper, we present only the assimilation results without observation perturbations, since the variances of observation errors are in general much smaller than those of model errors. However, in our implementation, the observation perturbation is an option, and preliminary tests showed that, at least for the reference setting in section 3, there are improvements in forecast performance with this option on.
2.3. ReducedRank Square Root Kalman Filter
[21] Reducedrank square root Kalman filter [Heemink et al., 2001] uses a lowrank representation LL^{T} of error covariances matrix P. L = [l^{1}, …, l^{q}] is the mode matrix whose columns (modes) are the dominant directions of the forecast error. The evolution of a mode can be approximated by the differences of the forecasts based on the mean (analyzed) state and its perturbation by this mode, that is,
where x_{k−1}^{a} is given by
[22] The forecast x_{k−1}^{f} is calculated from previous analyzed state by
[23] Assuming that at time t_{k−1} the error covariance P_{k−1}^{a} has the square root form L_{k−1}^{a}L_{k−1}^{a,T}, the propagation of P_{k−1}^{a} is tractable. The forecast error covariance matrix at time t_{k} is calculated by
where M_{k−1} is the tangent linear model, that is the Jacobian matrix of M_{k−1}, Q_{k−1} is the square root of model error covariance matrix. Considering square root form L_{k}^{f}L_{k}^{f,T} for P_{k}^{f}, we have the forecast formula for mode matrix L^{f},
where Π_{k}^{f} projects _{k}^{f} onto the q leading eigenvectors of ^{f}_{k}_{k}^{f,T} using the singular value decomposition. Recall that analysis error covariance matrix P^{a} can be calculated by (I − KH)P^{f} (I − KH)^{T} + KRK^{T} for arbitrary gain K, then rewriting it in square root form we obtain the analysis formula for mode matrix L^{a},
where Π_{k}^{a} projects _{k}^{a} onto the q leading eigenvectors of _{k}^{a}_{k}^{a,T}.
[24] We do not use the tangent linear model, but employ (15) to simulate M_{k−1}L_{k−1}^{a} at forecast step. The columns of Q_{k−1} are obtained in the same manner as the model error formula (13) in EnKF. The above treatments make the RRSQRT implementation similar to our variant of EnKF. The difference is that RRSQRT employs square root formulae. In addition, the error covariance is approximated in dominant eigenvectors in RRSQRT whereas EnKF bears no such process.
2.4. FourDimensional Variational Algorithm
[25] Fourdimensional variational algorithm [Le Dimet and Talagrand, 1986] finds the optimal initial condition x* by minimizing a cost function
under the constraint x_{k} = M_{0k}(x) = M_{k−1}(M_{k−2} (…M_{1}(M_{0}(x))…)). The assimilation period is from t_{0} to t_{N}. The gradient for J_{o} is calculated by the backward integration of the adjoint model (F. Bouttier and P. Courtier, Data assimilation concepts and methods, 1999, http://www.ecmwf.int/newsevents/training/rcourse_notes/DATA_ASSIMILATION/ASSIM_CONCEPTS/Assim_concepts.html), (1) _{N} = 0 (2) for k = N, …, 1, calculate _{k−1} = M_{k−1}^{T} (_{k} − H_{k}^{T}Δ_{k}), where Δ_{k} = R_{k}^{−1} (y_{k} − H_{k}(x_{k})), and (3) _{0}: = _{0} − H_{0}^{T}(Δ_{0}) gives the gradient of J_{o} with respect to x.
[26] Assimilations are performed by model integrations starting from the optimal initial condition x*. Further integrations from time step N based on the analyzed model state provide the predictions. The inverse of B is calculated online or, for highdimensional model configurations, B^{−1} can also be approximated using SVD truncations and saved on disk for later computations. The adjoint operator M^{T}_{k}_{−1} is obtained using the automatic differentiation software O∂yssée [Faure and Papegay, 1998]. The forward model simulations are saved for the backward integrations of the adjoint model. No checkpointing technique is employed in our implementation.
2.5. Uncertainties and Model Error
[27] The corrections of the analysis scheme lie in the subspace spanned by covariance matrix of forecast or background errors, i.e., the space induced by the columns of the square root of the matrix B [Kalnay, 2003]. Unrealistic error structure produces spurious corrections, and probably results in unbalanced physical model state. Therefore the design of the error structure is of great importance.
[28] There are mainly three approaches for error modeling: (1) modeling uncertain sources and then perturbing them in the model [Segers, 2002; Constantinescu et al., 2007a]; (2) using the statistics of model states, for example, NMC method [Chai et al., 2007] and ensemble methods; (3) using parameterizations, for example, Balgovind correlations for background error covariance [Hoelzemann et al., 2001; Elbern et al., 2007]. Each of the three should be flowdependent, that is, adapting to the “error of the day.” The spatial and temporal heterogeneities of the chemistrytransport problem make the last two approaches difficult issues.
[29] The numerical models are usually assumed unbiased. In our case, we assume that the model uncertainties only result from the misspecification of model parameters. In our EnKF and RRSQRT implementation, the ensemble is generated by the model integrations with perturbed parameters. The uncertainties and the distributions are introduced for model parameters that are mainly bidimensional or tridimensional fields under space coordinates. These parameters are modeled as random vectors. In practice, for a field (a random vector) whose median value is p, a perturbation is applied to the whole field so that every component _{k}^{i} has the prescribed distribution. For instance, for a lognormal distribution, one writes
where γ is sampled according to a standard normal distribution. The quantity γ is independent of the time index k and of the space index i, so that the perturbations increase the ensemble spread. The same sample of γ is used to perturb all values of the field . The quantity ^{γ} is the perturbation coefficient for the corresponding parameter in matrix w^{(i)} in formula (13). For normal distributions, perturbations bigger than certain given quantity (by default two times of the standard deviation) are discarded so that no unrealistic parameters are produced, for instance, the negative emissions.
[30] A delicate point is having temporal and spatial correlations between the different values of the field. With the perturbation applied in (22), the correlation between two field values ln _{k}^{i} and ln _{l}^{j} is equal to 1. But the uncertainty sources at these two points are not the same; hence the perturbation should be different. A fine modeling of the uncertainty should lead to have γ depending on time and position (producing γ_{k}^{i}). Such a fine description of uncertainties is mostly beyond available knowledge.
[31] Examples for continental air quality simulations extracted from Hanna et al. [2001] are shown in Table 1. For many fields (associated with α = 2), a confidence interval that includes 95% of the probability density integral is [, 2m] if m is the mean of the field. Uncertainty levels must be adjusted to the simulation scale (domain size and temporal discretization). In particular, uncertainties decrease as data is averaged over a larger domain or over a longer period of time.
Table 1. Uncertainties Associated With Several Input Fields of a ChemistryTransport Model at Continental Scale^{a}Field  Distribution  Uncertainty 


Top ozone boundary conditions  lognormal  α = 1.5 
Top NO_{x} boundary conditions  lognormal  α = 3 
Lateral ozone boundary conditions  lognormal  α = 1.5 
Lateral NO_{x} boundary conditions  lognormal  α = 3 
Major NO_{x} point emissions  lognormal  α = 1.5 
Wind velocity  lognormal  α = 1.5 
Wind direction  normal  ±40° 
Temperature  normal  ±3 K 
Vertical diffusion (night)  lognormal  α = 3 
Precipitations  lognormal  α = 2 
Cloud liquid water content  lognormal  α = 2 
Biogenic emissions  lognormal  α = 2 
Photolysis constants  lognormal  α = 2 