#### 2.1. The Arpège model

The French operational model Arpège (Courtier *et al.*, 1991), developed in collaboration between Météo-France and the European Centre for Medium-Range Weather Forecasts (ECMWF), is a global spectral forecast model that uses a stretched horizontal resolution emphasizing the European area, with a finer resolution over France (Courtier and Geleyn, 1988). The atmosphere is described on 60 vertical levels from the surface to 0.1 hPa. Model variables are vorticity *ζ*, divergence *η*, temperature *T*, logarithm of surface pressure *P*_{S} and specific humidity *q*. The assimilation scheme is based on a multi-incremental 4D-Var (Courtier *et al.*, 1994; Veersé and Thépaut, 1998), with two successive minimizations at spectral resolutions T107 and T224 respectively. There are four daily analyses at the main synoptic hours (00, 06, 12 and 18 UTC).

#### 2.2. Formulation of the background-error covariances

The multivariate description of the background-error covariance matrix in the Arpège model is based on the formulation by Derber and Bouttier (1999). In this formulation, control variable transforms (CVT) are used to represent univariate and multivariate components of the **B** matrix. Precisely, model variables are partitioned into balanced and unbalanced components (denoted hereafter by the subscripts *b* and *u* respectively) by using regressions, and the problem is reformulated in terms of new variables (the control variables): vorticity, unbalanced divergence *η*_{u}, unbalanced temperature and logarithm of surface pressure (*T,P*_{S})_{u}, and specific humidity. Background errors for these control variables are quasi-independent due to the use of regressions and they are related to the model variables through balance relationships (which represent the mass/wind coupling for example).

Univariate covariance matrices are thus defined for each of the control variables, while multivariate coupling is obtained by applying a balance operator. The background error covariance matrix **B**_{u} of the control variables is then assumed to be block-diagonal with no correlation between the parameters:

where *C*(·) represents the background-error autocovariance matrix for each variable.

The method of CVT in Arpège is represented in Figure 1. With this change of variable, the problem is now easier to handle since the calculation of the multivariate covariance matrix is reduced to the calculation of the univariate blocks only.

Each autocovariance matrix in **B**_{u} is designed with a hybrid grid point/spectral approach. It combines the grid point space for the representation of local variances, and the spectral space for the representation of spatial correlations. This hybrid formulation is given by **B**_{u} = **ΣΓΣ**^{T}, where **Σ** is the diagonal matrix of grid point standard deviations and **Γ** is the correlation matrix.

Standard deviations in the matrix **Σ** are specified for each control variable. For vorticity, variances are space and time varying, and provided by a real-time ensemble variational assimilation, as will be detailed in section 2.3. For the unbalanced parts of divergence, temperature and surface pressure, variances are static in time and horizontally homogeneous. They are provided by an off-line ensemble of perturbed variational assimilations (Fisher, 2003; Belo Pereira and Berre, 2006). For specific humidity, variances are space and time varying, and obtained through an empirical flow-dependent formula, as will be detailed in section 2.5. Moreover, a multiplicative inflation factor equal to 1.8 is applied to standard deviations in **Σ**, on the basis of *a posteriori* diagnostics (Desroziers and Ivanov, 2001), in order to represent model error contributions.

The correlation matrix **Γ** is modelled with a spectral diagonal hypothesis, **Γ** = **S**^{−1}**D**(**S**^{−1})^{T}, where **D** is a block-diagonal matrix in spectral space (with a block for each spectral coefficient and each variable), and **S** represents the spectral-to-grid transformation. The **D** matrix is defined such that spectral coefficients are not correlated, but a full vertical autocorrelation matrix is specified for each spectral coefficient. The resulting correlations are non-separable, homogeneous and isotropic in grid point space. The shape of horizontal correlations is determined by the correlation spectra. A climatological estimate of the correlations is obtained from an ensemble of perturbed variational assimilations (Fisher, 2003; Belo Pereira and Berre, 2006), and is specified in **B**_{u}.

Finally, the **B** matrix in terms of model variables is not explicitly defined, but obtained as

where **K**_{b} is the balance operator (detailed in section 2.4), which transforms control variables to model variables. The additional elements included in **B** over those in **B**_{u} are the balanced parts of the covariances.

This method for modelling background-error covariances includes a first degree of flow dependence. This arises in particular from the use (within **K**_{b}) of linearized versions of nonlinear and omega balance equations (Fisher, 2003) that depend on the background state.

#### 2.3. Calculation of flow-dependent variances

To introduce further flow dependence in the modelled background-error statistics, the ensemble variational assimilation that has been implemented at Météo-France and used operationally since July 2008 is considered. The first operational version of this ensemble (as described in detail in Berre *et al.*, 2007) is based on a lower and unstretched horizontal resolution version (T359C1.0L60) of the Arpège operational version (T538C2.4L60), where C is the stretching factor, and uses 3D-Fgat (first guess at appropriate time) to approximate 4D-Var. In this configuration, it consists of six independent 3D-Fgat assimilation experiments performed in real time in a perfect model framework, with explicit perturbations of observations and implicit perturbations of backgrounds.

Denoting the perturbed backgrounds by and the number of members in the ensemble by *N*, the ensemble estimate of the background-error covariance is then classically calculated as

Numerous authors have reported and worked on the problems raised by this ensemble formulation, such as the sampling noise that affects finite-size ensemble estimates. Regarding correlations, the key issue of spurious non-zero values at long distances has been widely studied (Houtekamer and Mitchell, 1998; Buehner and Charron, 2007; Pannekoucke *et al.*, 2007; Bishop and Hodyss, 2007; Kepert, 2009). Regarding the sampling noise observed on the estimated variances, the literature is often elusive on this subject. Raynaud *et al.* (2008, 2009) and Berre and Desroziers (2010) have examined this issue so far, from both experimental and theoretical points of view. The first study gives empirical insights on the spatial properties of the sampling noise and of its appropriate filtering, while the second study, based on these empirical results, proposes an objective and automatic filtering procedure to remove this sampling noise.

In the current study, flow-dependent ensemble-based variances are calculated for all control variables. Estimated variances are then filtered before being used in the assimilation process (i.e. in the matrix **Σ**). This is done with the filtering approach proposed by Raynaud *et al.* (2009). Basically, the filtering of the raw estimates is performed with a spectral low-pass filter, defined by

- (1)

with an objective choice of the truncation *N*_{trunc} for each control variable and each vertical level. This objective truncation is determined according to spectral noise-to-signal ratios of the estimated raw variance fields. Figure 2 presents vertical profiles of objective truncation for vorticity, unbalanced divergence, unbalanced temperature and specific humidity. These profiles display a quite similar behaviour for vorticity, divergence and temperature, with a decrease of the truncation with altitude, apart from a small increase near the tropopause height. This means that the filtering tends to be less scale selective at low levels than at higher levels. It can also be noticed that the filtering truncations for these variables have comparable values above 700 hPa, while they differ markedly at lower levels. The behaviour is a bit more complex for specific humidity, with successive decreases and increases of the truncation.

#### 2.4. Balance relationships

As mentioned in section 2.1, background-error autocovariances in the Arpège model are first calculated for the control variables, and transformed back to the space of model variables with balance relationships. The balance operator **K**_{b} is defined so that the covariances for divergence, temperature and surface pressure are related to their unbalanced components and to vorticity through the following equations:

The *M*, *N* and *P* operators define the balance relationships. They are calculated from a combination of multiple linear regressions plus nonlinear and omega balance equations (Derber and Bouttier, 1999; Fisher, 2003).

In the reference system (Table I), flow-dependent vorticity variances in *C*(*ζ*) are calculated from the ensemble, while the variances for the unbalanced parts, *C*(*η*_{u}) and *C*[(*T,P*_{S})_{u}], are obtained from climatology. However, the above equations show that vorticity plays an important role since the covariances of balanced parts of divergence, temperature and surface pressure are linear functions of the covariances of vorticity. The introduction of flow dependence in vorticity variances thus enables the variances of associated balanced parts to be flow dependent too. Moreover, the flow dependence of balanced variances is strengthened by the fact that some of the balance operators are also flow dependent (see end of section 2.2).

On the other hand, a full flow-dependent description of background-error variances can be obtained with a flow-dependent specification of the variances for both the balanced and the unbalanced parts. The addition of flow-dependent unbalanced variances is part of the experiments presented in the next sections.

#### 2.5. Calculation of specific humidity variances

In the reference assimilation scheme, variances of specific humidity are not derived from a climatology as is the case for the unbalanced variables. In the case of humidity, variances are calculated with a simple empirical formula as a function of the background temperature and relative humidity (Rabier *et al.*, 1998). Moreover, to avoid a systematic drift of stratospheric humidity, increments are forced to be negligibly small above the tropopause. This is achieved by setting a very low value of 10^{−8} for humidity background-error standard deviations everywhere the pressure is lower than 70 hPa. The humidity in the stratosphere in this system is then mainly driven by the model, and weakly controlled by observations. The replacement of these empirical variances by ensemble variances is also examined in this study.

#### 2.6. Design of experiments

The main investigation of this study is to extend the use of ensemble-based background-error variances to all variables in the minimization (instead of vorticity only), and to test their impact on the Arpège assimilation/prediction system. For that purpose, variances are calculated at each analysis time from the variational six-member ensemble for all control variables and all vertical levels, and they are specified in the operational 4D-Var as a replacement for the previous operational variances. This means that variances in *C*(*ζ*)*,C*(*η*_{u})*,C*[(*T,P*_{S})_{u}] and *C*(*q*) are all provided by the ensemble calculation.

It may be mentioned that this study deals with flow-dependent variances only. Flow-dependent correlations could also be calculated from the ensemble, but this is beyond the scope of this paper and correlation spectra in this study are kept climatological.

Experiments have been performed over a 1-month period in February–March 2008 with a T538 spectral resolution and a C2.4 stretching factor. This implies a maximum spectral resolution of T1290 over Europe (which corresponds to about 15 km in physical space). A wide range of observation data types are used: surface observations, aircraft data, satellite-derived winds, sea surface observations (e.g. drifting buoys, ship reports), *in situ* sounding data, wind profiler radar data, geostationary satellite winds (atmospheric motion vectors), Global Positioning System (GPS) ground-based data and radiances from polar-orbiting satellites (e.g. AMSU-A/B, AIRS, SSMI, IASI). Various checks are performed to select a ‘clean’ set of observations. This selection involves quality checks, removal of duplicated observations and thinning of their resolution, in particular. The quality control is based on observation–background departures, with a rejection of the observation if the difference is larger than a given threshold. In this study, ensemble-based background errors ‘of the day’ are specified in the minimization step only, while background-error variances for observation quality control are obtained from quasi-static covariances using a randomization technique (Fisher and Courtier, 1995; Andersson and Fisher, 1999).

A summary of the experiments performed is given in Table I.

Table I. Description of the two systems considered. Each system uses a specific set of background-error standard deviations. The reference system corresponds to the previous operational configuration (until April 2010), while the experimental system, which uses a full set of flow-dependent standard deviations, corresponds to the current operational configuration.System | Flow-dependent ensemble SD | Climatological or empirical SD |
---|

Reference | *ζ* | *η*_{u},(*T,P*_{S})_{u}*,q* |

Experimental | *ζ,η*_{u},(*T,P*_{S})_{u}*,q* | |