## 1. Introduction

The success of a data assimilation system relies heavily on the characterization of the background errors statistics, i.e. the statistics of the short-range forecasts that data assimilation seeks to correct. In the variational (VAR) data assimilation systems now used at all of today's major operational forecasting centres, background errors are typically based on highly-parametrized models of the error covariance, with the parameters obtained from climatological error statistics. One weakness of this approach is the difficulty of representing ‘Errors of the Day’ – the variations in error due to the locations of recent instabilities and observations. Thus the impact of observational information around, for example, frontal structures, is often highly suboptimal. By incorporating more sophisticated balance relationships into the covariance model – e.g. the nonlinear or omega balance equations (Fisher, 2003) – it is possible to improve the modelling of error structures that can be diagnosed directly from the model state. In four-dimensional variational (4D-Var) systems the inclusion of a linear forecast model also allows a degree of additional implicit flow dependence to develop within the assimilation window, but this information is not carried forward to following cycles. Arguably, however, the most promising source of Errors of the Day information is a suitably designed ensemble prediction system (EPS) that accounts properly for the spatial and temporal characteristics of the observation network, and propagates error structures using a full nonlinear forecast model.

At the Met Office, ways to incorporate ensemble error structures were sought early in the development of its 3D-Var system. Using the extended control variable method of Lorenc (2003), experiments were run to test the impact of blending in a single error mode generated by a two-member error breeding system (Barker, 1999). With just a single mode, however, the impact of this ‘hybrid’ covariance on forecast performance was found to be negligible. The work was therefore put aside until a more sophisticated ensemble system became available.

In the meantime, the development and growing maturity of ensemble data assimilation techniques, such as the ensemble Kalman filter (EnKF; Evensen, 1994), increasingly proved their value in providing realistic estimates of short-range forecast error, with a natural inclusion of Errors of the Day. In a hybrid 3D-Var system coupled to an EnKF, based on a quasi-geostrophic model in a perfect model framework, Hamill and Snyder (2000) found best results when the standard quasi-static background error covariance was replaced almost entirely with the ensemble covariance. For small ensemble sizes, however, optimal performance was obtained with a reweighting towards the climatological covariance. Etherton and Bishop (2004) found similar results with a barotropic vorticity model, but found that when model error was introduced it was better to give more weight to the static covariance, presumably because of its better representation of model error. Buehner (2005) brought the hybrid scheme forward into a quasi-operational 3D-Var setting, but found the impacts to be rather small, suggesting that in the real world the effects of model and sampling error largely outweigh the benefits of capturing flow-dependent covariance structures.

Wang *et al.* (2008a, 2008b) studied the impact of hybrid covariances in a limited-area configuration of the Weather Research and Forecasting (WRF) 3D-Var system (Barker *et al.*, 2004) coupled to an ensemble transform Kalman filter (ETKF; Bishop *et al.*, 2001). In a perfect model setting, a blend of static and ensemble covariances was again found to give optimal results, particularly in data-sparse regions. Using real observations, hybrid covariances were again found to give the best results, but with a smaller impact, and an optimal weighting more towards the static covariance. Recently, Zhang and Zhang (2012) used a similar configuration of WRF to test a 4D-Var/EnKF hybrid, finding a significant improvement over the standard 4D-Var scheme.

All of the hybrid schemes mentioned above implement a covariance that is a simple linear combination of the climatological and modified ensemble covariances. Despite the modifications, designed to compensate for the small ensemble size, the ensemble covariances are still recognizably based on those used in an ensemble Kalman filter. An alternative approach is to use contemporary ensemble information to generate parameters for the standard error covariance model, rather than using only climatological training data. This approach is able to apply more rigorous filtering techniques to the estimation of the selected parameters. For instance, based on earlier work by Raynaud *et al.* (2009), Bonavita *et al.* (2012) used an independent ensemble of 4D-Vars to provide flow-dependent background error *variances* (not covariances) for use within the covariance model of the European Centre for Medium Range Weather Forecasts (ECMWF) deterministic 4D-Var system, giving substantial overall improvements in forecast quality. The estimated parameters – in this case variances – are more accurately determined from the ensemble than in our method. The approach is limited, however, by the flexibility of the particular covariance model on which it is based. For instance the Fisher (2003) model does not have parameters which might be determined from the ensemble to specify any three-dimensional anisotropy, or variations in the inter-variable correlations, both of which occur naturally in the ensemble-based covariances used in our approach. Anisotropic correlations appear to be important in Figure 9 below, and flow-dependent inter-variable correlations were shown to be important by Montmerle and Berre (2010). Our approach starts from the ensemble covariance and modifies it by ‘localization’ to remove aspects likely to be spurious. Possible improvements to the localization method are discussed in section 5, but in the first instance it is designed to remove spurious features rather than to optimally determine particular aspects of the covariance. The alternative approach starts from a model of the climatological error covariance and modifies it to determine a few parameters from the ensemble; a range of alternative covariance models are possible (e.g. that of Purser *et al.*, 2003, does allow anisotropy). While the chosen characteristics can be optimally estimated from the ensemble plus prior climatology, other aspects are determined by the chosen covariance model rather than the ensemble.*

At the Met Office, work on a global hybrid ensemble/variational data assimilation was resumed in late 2008, after the operational implementation of 4D-Var in 2004, and in particular the development and implementation of ‘MOGREPS’ (Bowler *et al.*, 2008), a global and regional ensemble prediction system based on the ETKF. The development and implementation of this hybrid ensemble/4D-Var system are the subject of this paper, which is organized as follows. Section 2 describes the formulation of the hybrid 4D-Var system, including details of the ensemble system providing the Errors of the Day, and its coupling with 4D-Var. In section 3, we describe the initial development path of the system, from a basic configuration including horizontal localization alone, to the configuration that formed the basis for pre-operational trials. Results of these trials are described in section 4, and we conclude with a discussion in section 5.