## 1 Introduction

[2] Many applications in science and engineering require that the predictions of uncertain models be updated by information from a stream of noisy data [see, e.g., *Doucet et al.*, 2001; *van Leeuwen*, 2009; *Bocquet et al.*, 2010]. The model and data jointly define a conditional probability density function (pdf) *p*(*x*^{0n}|*z*^{1n}), where the discrete variable can be thought of as discrete time, *x*^{n} is a real *m*-dimensional vector to be estimated, called the “state”, *x*^{0n} is a shorthand for the set of vectors , and where the data sets *z*^{n} are a *k*-dimensional vectors (*k*≤*m*). All information about the state at time *n* is contained in this conditional pdf and a variety of methods are available for its study, e.g., the Kalman filter [*Kalman*, 1960], the extended and ensemble Kalman filter [*Evensen*, 2006], particle filters [*Doucet et al.*, 2001], or variational methods [*Talagrand and Courtier*, 1987; *Bennet et al.*, 1993]. Given a model and data, each of these algorithms will produce a result. We are interested in the conditions under which this result is reasonable, i.e., consistent with the real-life situation one is modeling.

[3] We say that data assimilation is feasible in principle if it is possible to calculate the mean of the conditional probability density that it defines with a small-to-moderate uncertainty; we discuss what we mean by “moderate” below after we develop the appropriate tools. If data assimilation is feasible in this sense, it is possible to find an estimate of the state of a system whose distance from an outcome of the physical experiment described by the dynamics is small to moderate, with a high probability, i.e., reliable conclusions can be reached based on the results of the assimilation. We consider a data assimilation algorithm, e.g., a particle filter or a variational method, to be successful if it can produce an accurate estimate of the state of the system. A data assimilation algorithm can only be successful if data assimilation is feasible in principle. Our definition of success is in line with what is required in the physical sciences, where one wants to make reliable predictions given a model and data. We do not consider data assimilation to be successful if the posterior variance is reduced (e.g., when compared to the variance of the data) but remains large.

[4] Generally, we restrict the analysis to linear state space models driven by Gaussian noise and supplemented by a synchronous stream of data perturbed by Gaussian noise, i.e., the noisy data are available at every time step of the model and only then. We further assume that all model parameters (including the covariance matrices of the noise) are known, i.e., we consider state estimation rather than combined state and parameter estimation. We study this class of problems because it can be examined in some generality, and we can explain qualitatively its important aspects; however, we also discuss its limitations.

[5] In section 2 we derive conditions under which data assimilation is feasible in principle, without regard to a specific algorithm. We define the effective dimension of a Gaussian data assimilation problem as the Frobenius norm of the steady state posterior covariance and show that data assimilation is feasible in the sense described above only if this effective dimension is moderate. We argue that realistic problems have a moderate effective dimension.

[6] In the remainder of the paper, we discuss the conditions under which particular data assimilation algorithms can succeed in solving problems (where success is defined as above) that are solvable in principle. In section 3 we briefly review particle filters. In section 4, we use the results of [*Snyder*, 2011] to show that the optimal particle filter (which in the linear synchronous case coincides with the implicit particle filter [*Atkins et al.*, 2013; *Chorin et al.*, 2010; *Morzfeld et al.*, 2012]) performs well if the problem is solvable in principle, provided that a certain balance condition is satisfied. We conclude that optimal particle filters can solve many data assimilation problems even if the number of variables to be estimated is large. Building on the results in [*Snyder et al.*, 2008; *Bengtsson et al.*, 2008; *Bickel et al.*, 2008], we show that another filter fails under conditions that are frequently met. Thus, how a particle filter is implemented is very important, since a poor choice of algorithm may lead to poor performance. In section 5 we consider particle smoothing and variational data assimilation and show that these methods as well can only be successful under conditions comparable to those we found in particle filtering. We discuss limitations of our analysis in section 6 and present conclusions in section 7.

[7] The effective dimension defined in the present paper is different from the effective dimensions introduced in *Snyder et al.* [2008], *Bengtsson et al.* [2008], *Bickel et al.* [2008], and *Snyder* [2011]. The effective dimensions in *Snyder et al.* [2008], *Bengtsson et al.* [2008], *Bickel et al.* [2008], and *Snyder* [2011] are defined for particular particle filters, whereas the effective dimension defined in the present paper is a characteristic of the model and data stream, i.e., independent of the data assimilation algorithm used. We show in particular that the effective dimension (as defined in the present paper) remains moderate for realistic models, even when the state dimension is large (asymptotically infinite) and that numerical data assimilation can be successful in these cases; in particular, a moderate effective dimension in our sense can imply moderate effective dimensions in the sense of *Snyder et al.* [2008], *Bengtsson et al.* [2008], *Bickel et al.* [2008], and *Snyder* [2011] for a suitable algorithm.