## 1. Introduction

[2] Data assimilation [e.g., *Daley*, 1993; *Bouttier and Courtier*, 1999; *Cohn*, 1997; *Todling*, 1999] is a well-known approach that combines information from different sources to improve an estimate of a system's state. The information sources are usually a numerical model, various observations, and error statistics. Under given assumptions, several widely used assimilation methods compute the so-called best linear unbiased estimator (BLUE), which minimizes the total variance of the state. Among the popular methods, one may cite the Kalman filters (e.g., the ensemble Kalman filter) [*Evensen*, 1994] or the four-dimensional variational assimilation (4D-Var) method [*Le Dimet and Talagrand*, 1986]. Data assimilation has been successful in many fields. It is especially successful in numerical weather forecasting where improved initial conditions can dramatically improve a forecast cycle [e.g., *Buehner et al.*, 2010].

[3] Another approach for improving forecasts using observations is called sequential aggregation. It is employed to produce an improved forecast out of an ensemble of forecasts. Each forecast of the ensemble is given a weight that depends on past observations and past forecasts. An aggregated forecast is then formed by the weighted linear combination of the forecasts of the ensemble. The aggregation is sequential since it is repeated before each forecast step, with updated weights. The weights can be computed with machine learning algorithms so that appealing theoretical results may hold in practical applications [*Mallet et al.*, 2009]. Other forms of aggregation have been applied in geophysical forecasts, for example, with the dynamic linear regression in air quality [*Pagowski et al.*, 2006] or with least-squares methods in climatology [*Krishnamurti et al.*, 2000] and in air quality [*Mallet and Sportisse*, 2006].

[4] In this paper, the focus is on machine learning algorithms because of their key theoretical properties. If the forecast performance is measured by a mean quadratic discrepancy with the observations, the learning algorithms guarantee that, in the long run, the aggregated forecast performs at least as well as the best linear combination of models that is constant in time. In other words, over a long-enough time period, the mean quadratic error of any constant combination (that is, any linear of combination of models with weights that do not depend on time) tends to be greater than or equal to the mean quadratic error of the aggregated forecast. In particular, the aggregated forecast will perform better than any individual model (whose forecast is the linear combination with unitary weight on the model and null weights otherwise) and the ensemble mean (associated with uniform weights). Both data assimilation and sequential aggregation have advantages and drawbacks, in terms of theoretical framework, practical application, performance, and computational efficiency. An introduction to these techniques can be found in Appendix A.

[5] In this paper, a method called “ensemble forecast of analyses” (EFA) is designed to combine both approaches. A key motivation is to address two important limitations of sequential aggregation. The first limitation is that sequential aggregation may not take into account the observational errors. The linear combination of the ensemble forecasts is determined to minimize its discrepancy with the observations. Since the observations are not perfect, this approach is not entirely satisfactory. The second limitation is that the weights are computed only at the locations and for the variables that are observed. Computing weights for other locations and other variables is beyond the scope of the methods. It is possible to compute a single set of weights for all locations. In this case, it can be argued, and sometimes observed in applications, that the weights are reasonably spatially robust, but there is no theoretical framework to support this assumption. Further details on the motivation for the development of EFA are given in section 2.1.

[6] In EFA, the leading idea is to carry out sequential aggregation to forecast an analysis, instead of observations. The analysis is the result of a data assimilation step. In some sense, it is the best estimate of the true state that can be produced out of the available information. Analyses can be produced whenever observations become available. Therefore the sequence is similar to that of sequential aggregation. First, EFA produces a forecast of the forthcoming analysis. Second, when the date of the forecast is reached, the observations become available and the analysis can be computed. Third, this analysis is compared with the EFA output. After that step, the cycle goes on with EFA producing a forecast for the next analysis. In short, the EFA method tries to predict, using sequential aggregation, the analysis that *will* be computed with future observations.

[7] The observation errors are taken into account in the analysis, and EFA naturally computes a multivariate and multidimensional field in the same space as the model state. The method is described in section 2. This fairly general method is illustrated through application to ozone forecasting in section 3.