Performance of Ensemble Kalman filters in large dimensions

Contemporary data assimilation often involves more than a million prediction variables. Ensemble Kalman filters (EnKF) have been developed by geoscientists. They are successful indispensable tools in science and engineering, because they allow for computationally cheap low ensemble state approximation for extremely large dimensional turbulent dynamical systems. The practical finite ensemble filter like EnKF necessarily involve modifications such as covariance inflation and localization, and it is a genuine mystery why they perform so well with small ensemble sizes in large dimensions. This paper provides the first rigorous stochastic analysis of the accuracy and covariance fidelity of EnKF in the practical regime where the ensemble size is much smaller than the large ambient dimension for EnKFs with random coefficients. A challenging issue overcome here is that EnKF in huge dimensions introduces unavoidable bias and model errors which need to be controlled and estimated.


Introduction
With the growing importance of accurate predictions and the expanding availability of data in geoscience and engineering, data assimilation for high-dimensional dynamical systems has never been more crucial. The ensemble Kalman filters (EnKF) [1,4,15,38] are ensemble-based algorithms well designed for this purpose. They quantify the uncertainty of an underlying system using the sample information of an ensemble fX .k/ n g K kD1 . In most applications, the ensemble size K 10 2 is much smaller than the ambient space dimension d 10 6 , thereby significantly reducing the computational cost. The simplicity of EnKF and its closeto-accurate performance has fueled wide applications in various fields of geophysical science [20,32].
However, why EnKF works well with a small ensemble has remained a complete mystery. The existing theoretical EnKF performance literature focuses mostly on the impractical scenario where the ensemble size goes to infinity [25,34]. When the ensemble size is finite, the only known recent results are well-posedness, nonlinear stability, and geometric ergodicity [22,23,39,40]. For continuous EnKF analogues, similar results and filter error bounds are obtained when the forecast model admits uniform contraction [12][13][14]. These theories are inadequate to explain EnKF performance in practice. On the other hand, the practitioners often attribute the EnKF success to the existence of a low effective-filtering dimension. This means most of the filtering uncertainty is contained in p 10 directions, so the ensemble size is still comparatively large. But the definition of the effective-filtering dimension has remained elusive, as its associated subspace often evolves with the dynamics. How the filter ensemble is attracted to this subspace is even more puzzling.
Another layer of complexity that shrouds the study of EnKF is its practical variants. The sampling formulation of EnKF leaves it with many structural flaws. Meteorologists and engineers have invented various methods to remedy these problems, including square root formulations, covariance inflation, and localization [2,3,27,30]. The derivations of these methods rely purely on physical or statistical intuition, where the filter ensemble is often assumed to be Gaussian distributed. How these methods contribute to the bias and model error introduced by EnKF formulation in practical scenarios lacks rigorous explanation.
As a consequence of these two theoretical gaps, there are no existing theoretical guidelines on how to choose the ensemble size, or which and how augmentation should be implemented. This paper intends to fill these gaps by developing a rigorous error analysis framework for a properly augmented EnKF. We study these issues here in the challenging context of Kalman filter with random coefficients [5] in large dimensions [33]. Although this setting is less difficult than the fully nonlinear case [39,40], it is much richer than the deterministic case with applications to stochastic turbulence models [32,33].
The remainder of this introduction outlines the objective, strategy, and main results that are proved in detail in the remainder of the paper. We note that [33] provides a simple rigorous treatment of the same issues without the effect of finite ensemble.

Intrinsic Performance Criteria
Consider the following signal observation system with random coefficients (1.1) X nC1 D A n X n C B n C nC1 ; nC1 N .0; † n /; N .0; I q /: We assume the signal variable X n is of dimension d , and the observation variable Y n is of dimension q Ä d . The realizations of the dynamical coefficients .A n ; B n ; † n /, the observation matrix H n , and Y n are assumed to be available, and the objective is to estimate X n . The random-coefficients setting allows us to model internal dynamical intermittencies, where the effective low-dimensional subspace may evolve nontrivially.
The optimal filter for system (1.1) is the Kalman filter [5,28,29], assuming .X 0 ; Y 0 / is Gaussian distributed. It estimates X n with a Gaussian distribution N .m n ; R n /, where the mean and covariance follow a well-known recursion: (1.2) Forecast: m nC1 D A n m n C B n ; y R nC1 D A n R n A T n C † n Assimilation: m nC1 D m nC1 C G n . y R nC1 /.Y nC1 H n m nC1 /; R nC1 D K n . y R nC1 /; G n .C / D CH n .I q C H n CH T n / 1 ; K n .C / D C G n .C /H n C: Unfortunately, the Kalman filter (1.2) is not applicable for high-dimensional problems, as it has a computational complexity of O.d 2 q/. Nevertheless, its optimality indicates that R n directly describes how well system (1.1) can be filtered. Moreover, the classical Kalman filter analysis [5] indicates that R n converges to a stationary solution z R n of (1.2), which is independent of filter initializations. In other words, conditions on the stationary solution z R n can be viewed as intrinsic filtering criteria for system (1.1). Moreover, as z R n can be estimated or computed in many scenarios, these criteria can be used as filtering guidelines. There is abundant literature on the various notions of stability of Kalman filters and Riccati equations [6][7][8][9]19].
In particular for our interest, an augmented Kalman filter stationary solution z R n can be used to define the effective dimension. As mentioned earlier, it is conjectured that EnKF performs well with small ensemble size, because there are only p < K dimensions of significant uncertainty during the filtering. Throughout this paper, a threshold > 0 is assumed to separate the significant dimensions from the others. So one way to formulate the low-effective-dimension requirement would be z R n has at most p eigenvalues above : The details of the Kalman filter augmentation, along with the additional lowdimension requirements of the unstable directions of A n and † n , will be given in the formal Assumption 3.1.

EnKF with Small Ensemble Sizes
In essence, EnKF is a Monte Carlo simulation of the optimal filter (1.2). It utilizes an ensemble fX .k/ n g to describe the filtering uncertainty and uses the ensemble mean S X n as the estimator for X n . In the forecast step, an forecast ensemble f X .k/ nC1 g is generated from the posterior ensemble fX .k/ n g simulating the forecast model in (1.1). In the assimilation step, the Kalman update rule of (1.2) applies, while the mean and covariance are replaced by ensemble versions. This effectively brings down the computation cost. But the small sample formulation brings up four structural issues. Fortunately, there are existing practical variants of EnKF that can alleviate these problems. By incorporating these augmentations, our theoretical results in return provide rigorous justifications for their improvement in practice.
The first problem is the covariance rank deficiency. The sample covariance of an ensemble of size K can only be of rank at most K 1, so uncertainties on the d K C 1 complementary directions are ignored. The observation operator may accumulate all these underestimated uncertainties and create a huge bias in the assimilation step. In practice, this problem is usually remedied by adding a constant inflation to the forecast covariance. In our effective dimension formulation, the directions of significant uncertainty in principle should be captured by the ensemble, while the other directions are underrepresented. Then with an ensemble covariance matrix C , it is intuitive to add I d to it in order to capture the uncertainty in the ambient space, making the effective covariance estimator C WD C C I d .
The second problem comes from the instability of the dynamics. In many realistic models, intermittent instability comes from the unstable directions of the forecast operator A n or genuine large surges in † n . This may not be captured by the ensemble at the previous step, and it is necessary to include it in the forecast step. Evidently, in order for the effective-filtering dimension to be p, the dimension of such instability must be below p as well.
The third problem is the covariance decay from the sampling effect. The standard EnKF forecast step intends to recover the covariance forecast of (1.2), so E C nC1 D y R nC1 , assuming C n D R n . However, the Kalman covariance update operator K n is a concave operator; see Lemma A.1. The Jensen's inequality indicates that the average posterior covariance EK n . C nC1 / is below the target covariance K n . y R nC1 /; see Lemma 5.3 for details. The practical remedy is a multiplicative inflation, so the average forecast covariance is E C nC1 D r y R nC1 with r > 1. The effect of such inflation has been rigorously studied in [16] but only for d D 1.
Lastly, small samples may create spurious correlation in high dimensions. As an easy example, let X .k/ be K i.i.d. N .0; I d / random vectors. Then by the Bai-Yin law [41], the spectral norm of the sample covariance is roughly 1 C p d=K instead of the true value 1. In our small sample K d setting, the sampling error is simply horrible. In order to remove such correlation, practical remedies often involve localization procedures. Here we simplify this operation by projecting the sample covariance matrix to its p leading eigenvectors. Mathematically speaking, this ensures the forecast sample covariance can concentrate on the right value. Ideally, if the effective dimension is lower than p, such spectral projection will not change much of the covariance matrix. Section 2 will give the explicit formulation of the prescribed EnKF, where Algorithm 1 summarizes the procedures.

Covariance Fidelity and Mahalanobis Error
Practical filter performance analysis requires concrete estimates of the filter error e n D S X n X n . This appears to be very difficult for EnKF, because the covariance relations in (1.2) no longer hold deterministically. To make matters worse, the random sampling error propagates through a nonlinear update rule as in (1.2). So it is nearly impossible to trace the exact distribution of e n , while it is clearly not Gaussian.
The existing analysis of EnKF focuses mostly on its difference with the optimal filter (1.2) [12][13][14]16,25,26,34]. These results often either require the ensemble to be Gaussian distributed at each step or only consider the case where the ensemble size K reaches infinity. Unfortunately, such assumptions are unlikely to hold in practice, since there is computational advantage for EnKF only when K < d .
A more pragmatic strategy would be looking for intrinsic error statistical relations. In particular, it is important to check whether the reduced covariance estimators dominate the real error covariance, as underestimating error covariance often causes severe filter divergence. The Mahalanobis norm is a natural choice for this purpose. Given a d d positive definite (PD) matrix C , it generates a Mahalanobis norm on R d : This norm is central in many Bayesian inverse problems. For example, given the prior distribution of X as N .b; C / and a linear observation Y D HX C with Gaussian noise N .0; †/, the optimal estimate is the minimizer of kx bk 2 C C kY H xk 2 † : In our context, it is natural to look at the nondimensionalized Mahalanobis error 1 d ke n k 2 C n . According to the EnKF formulation described above, the true state is estimated by N . S X n ; C n /. If the hypothesis holds, 1 d ke n k 2 C n should roughly be of constant value. By showing that the Mahalanobis error is bounded, we also show the error covariance estimate C n more or less captures the real error covariance, which is known as the covariance fidelity.
The Mahalanobis error also has surprisingly good dynamical properties in large dimensions [33]. In short, ke n k 2 C n is a dissipative (also called exponentially stable) sequence. This is actually carried by an intrinsic inequality induced by the Kalman update mechanism. In the optimal filter, it is formulated as This was exploited by previous works in the literature [5,35] to show robustness of Kalman filters and extended Kalman filters in a fixed finite dimension. See, for example, the first displayed inequality in the proof of [5, theorem 2.6]. One of the major results of this paper can be informally formulated as follows: THEOREM 1.1. When applying the EnKF described in Section 1.2 to system (1.1) with an effective-filtering dimension p and proper inflation parameters r > 1, > 0, there is a constant D such that if K > Dp, the nondimenionalized Mahalanobis filter error E 1 d ke n k C n converges to a constant independent of d and the filter initialization.
The detailed conditions and statements are given by Section 3 and Theorem 3.4. This result rigorously explains the effectiveness of EnKF in practice. It is also important to note that the condition requires the ensemble size to grow linearly with the effective dimension, instead of exponentially as in the case of particle filters [10,11]. This makes the EnKF a promising tool for a system with medium size d , even if there is no low-effective-dimensionality assumption.
There are two useful corollaries from Theorem 1.1. First, its proof indicates that the EnKF is exponentially stable for an initial ensemble shift; see Corollary 3.5 below. In other words, if two ensembles start with the same ensemble spread but different means, the difference in their mean will decay exponentially fast. Second, when the system noises are of scale epsilon, the EnKF filter error is also of scale epsilon. This is called the filter accuracy of EnKF, and evidently is very useful when the system (1.1) is observed frequently. See Corollary 3.6 below.

Framework of the Proof: Random Matrix Concentration and Lyapunov Functions
In order to adapt the Mahalanobis error dissipation (1.4) for EnKF, it is essential to recover A T n y R 1 nC1 A n R 1 n with y R nC1 replaced by the random ensemble forecast matrix C nC1 . Such a problem has rarely been discussed by the random matrix theory (RMT) community, as it involves matrix inversion and noncentral ensembles. Fortunately, standard RMT arguments like Gaussian concentration and -net covering can be applied, and roughly speaking we can show C 1 nC1 y R 1 nC1 . Both covariance inflation techniques and the effective dimensionality are the keys to making this possible. The additive inflations keep the matrix inversion nonsingular, and the multiplicative inflation creates enough room for a spectral concentration to occur at the effective dimensions.
An RMT result, Theorem 6.1, guarantees that the Mahalanobis error dissipates like (1.4) with high probability. But the rare sampling concurrence may decrease the sample covariance and create underestimation. Such intermittency genuinely exists in various stochastic problems, and the general strategy is to find a Lyapunov function. In this paper, our Lyapunov function exploits the concavity of the Kalman covariance update operator K and the boundeness of filter covariance from an observability condition.

Preliminaries
The remainder of this paper is organized as follows. Section 2 formulates an EnKF with specified augmentations, which is summarized by Algorithm 1. Section 3 formulates the low-effective-dimension Assumption 3.1 and uniform observability Assumption 3.2. Theorem 3.4 shows that these assumptions guarantee the Mahalanobis error of EnKF decays to a constant geometrically fast. The important Corollary 3.5 and Corollary 3.6 about exponential stability and accuracy follow immediately. Before diving into the proofs, a simple application of our framework to a stochastic turbulence model is given in Section 4. The main proof components are illustrated in Section 5. The noncentral RMT result is located in Section 6.
Before we start the discussion, here are a few standard notations we will use in the following. kC k denotes the l 2 operator norm of a matrix C , and jxj is the l 2 norm of a vector x. We use x˝x to denote the rank 1 matrix xx T generated by a column vector x. We use C 2 PD or PSD or simply C is PD or PSD to indicate that a symmetric matrix C is positive definite or positive semidefinite, respectively. OEC j;k denotes the .j; k/ th coordinate of a matrix C , and OEC I 2 is the submatrix with both indices in a set I , and A B indicates that B A 2 PSD. dae is the smallest integer above a real number a.
Given an ensemble of vectors x .1/ ; : : : ; x .K/ , we use x x D 1 K P K kD1 x .k/ to denote its ensemble average, and x .k/ D x .k/ x x to denote the deviation of each ensemble. Sometimes it is easier to describe an ensemble in terms of its mean x x and spread matrix S D OEx .1/ ; : : : ; x .K/ .
We assume the distribution of filter initializations is known. Generally speaking, there are no specific requirements for their values. But some results implicitly rely on the invertibility of the covariance matrices.

Forecast Step with Instability Representation
The forecast step propagates the underlying uncertainty of time n to time n C 1. Since the effective posterior distribution is N . S X n ; C n /, the target forecast distribution should be N .A n S X n C B n ; A n C n A T n C A n A T n C † n /: In order to remedy the covariance decay due to sampling, we also multiply the target covariance with a ratio r > 1.
In the ensemble formulation, the posterior ensemble is propagated to the forecast ensemble, X is Gaussian distributed with mean zero; it is intended to capture the instantaneous instability of the system. Its covariance † C n will be specified soon. Denote the inflated forecast ensemble spread p rA n S n C p rOE .1/ nC1 ; : : : ; .K/ nC1 as y S nC1 , where S n is the ensemble spread matrix at step n: The corresponding forecast covariance is given by Here E n denotes expectation conditioned on system and ensemble realization up to time n. Since C nC1 is an ensemble covariance, it underrepresents the ambient space uncertainty. This can be remedied by adding a constant covariance I d .
Here > 0 is a parameter that increases our framework flexibility. Its value depends on the model setting; see an example in Section 4. Our framework is valid independent of , and in the first reading, can be seen as 1. In order to preserve covariance fidelity, intuitively we want For this purpose, let b P n be the projection to the eigensubspace of A n A T n C † n =rI d with positive eigenvalues. If we let † C n be the positive part of the prescribed matrix, it is straightforward to check that (2.2) holds. † C n captures the instantaneous instability of system (1.1). It essentially includes the unstable directions of A n with singular value above a spectral gap p =r and the directions of † n with eigenvalue above =r. Its low dimensionality will be another crucial component of the low-effective-dimension Assumption 3.1.

Assimilation Step with Spectral Projection
Once the forecast covariance is computed, the Kalman gain matrix can be obtained through We update the mean from Based on the classical Kalman formulation, the target posterior covariance should be K n . C nC1 /, with the Kalman covariance update operator Unfortunately, K n . C nC1 / cannot be directly obtained as a posterior ensemble covariance matrix, since it may have rank d instead of K 1. Moreover, the forecast sample covariance matrix may have spurious correlation due to lower-thandimension sampling size. For this reason, we project K n .
C nC1 / to its eigenspace associated with the largest p eigenvalues. The posterior ensemble should have an effective covariance C nC1 matching it. If we denote the projection as P nC1 , let QDQ T be the eigenvalue decomposition of P nC1 .K n . C nC1 / I /P nC1 , and ‰ƒˆT be the SVD decomposition of y S nC1 . We can update the posterior spread S nC1 through an ensemble adjustment Kalman filter (EAKF) type of update, Notice that the directions with eigenvalues of K n . C nC1 / below the threshold are filtered out on the right-hand side. Moreover, it is straightforward to verify that In case one does not have a low effective dimension, so p D d , the spectral projection is trivial P nC1 D I d .
Before we move on and analyze EnKF performance, let us briefly discuss the computational complexity involved in the analysis step, assuming d > K 2 > p 2 and standard numerical procedures [17]. It is important to notice that C nC1 D 1 K y S nC1 y S T nC1 is of rank K. Moreover, the following Woodbury matrix identity holds: where I K C y S T n H T n H n y S n is K K, and Q n D .I q C H n H T n / 1 is usually easy to compute because H n is often a simple projection. This makes vector products with the matrix Consequently, finding the p largest eigenvalues and associated eigenvectors is of complexity O.pKd / through, for example, the power method. Since the SVD decomposition is identical to standard EAKF, it takes complexity of O.K 2 d / [38]. In conclusion, this version of EAKF does not require an additional order of complexity. We comment that simpler operations may exist with other EAKF formulations. For example, [37] suggested letting S nC1 D .I 1 2 G nC1 H n / y S nC1 .

Summary of the Algorithm
The mathematical formulation of EnKF can be summarized by Algorithm 1:

4:
Projection to the largest p eigenvectors of K n . C nC1 /:

9:
S nC1 A n S n by an EAKF type of update, so that . 11: end for 3 Main Results: EnKF Performance

Effective Low Dimensionality
As explained in Section 1.1, Kalman filters will be employed to formulate the effective dimensionality. Since our EnKF implements covariance inflation techniques, the associated signal-observation system is also augmented with a stronger inflation [33]: In principle, r 2 can be replaced with any ratio above r, and our analysis below still holds. We use r 2 just for notational simplicity. The optimal filter for (3.1) is a Kalman filter. The associated covariance solves follows recursion: Given the value of R 0 j , the value of R 0 k at a later time k j can be computed by the recursion above. So there is a mapping R j;k such that R 0 k D R j;k .R 0 j /: Our low-dimensionality reference can be formulated through one solution z R k D R j;k . z R j /: The system instability matrix † C n has rank not exceeding p , A n A T n C † n = has at most p eigenvaleus above =r. z R n has at most p eigenvalues above .
Although our framework works for an arbitrary solution of (3.2), in most cases any solution will converge to a unique PD stationary sequence, assuming stationarity, ergodicity, weak observability, and controllability of the system coefficients; see [5]. It is also straightforward to verify that, if the original system (1.1) satisfies the weak observability and controllability condition of [5], then so does the augmented system (3.1). A proof of a similar claim is in the appendix of [33]. Then it makes more sense to impose Assumption 3.1 on this stationary solution, and by doing so, the assumption is independent of the initial conditions. This is why in Assumption 3.1 we put "stationary" in parentheses; the reader should choose the proper variant in application.
In order to focus on the more interesting EnKF sampling effect, this paper considers a relatively simple system setting, so (1.1) is observable through a fixed time interval m: Assumption 3.2 will simplify the control of a solution of (3.2) a lot; for example, Lemma 5.5 shows that R k;kCm .C / will always be bounded and uses [5].
Remark 3.3. It is worth noticing that the direct requirement on system (1.1), Assumption 3.2, is quite weak, given that .A n ; B n ; H n ; † n / can be any random sequence. However, Assumption 3.1 is another implicit condition imposed on system (1.1), since the z R n is generated through (3.2). The dependence of z R n on .A n ; B n ; H n ; † n / is in general very involved, even in the deterministic, timehomogenous case. Section 4 will further discuss how to verify Assumption 3.1 and demonstrate with an example. More examples for Kalman filters with random coefficients can be found in [33] and its references. On the other hand, Assumption 3.1 is required in theory, simply because it can be verified offline and before executing the algorithm. In practice, it suffices to check on the fly whether the posterior covariance K n . C nC1 / indeed has fewer than p eigenvalues above . If yes, the projection error is 0; the proof in Section 5.2 holds as well with nC1 D 1, so the same results would hold.

Covariance Fidelity and Filter Performance
As discussed in Section 1.3, Mahalanobis error is a natural statistic for covariance fidelity quantification. By proving a stronger but more technical result, Theorem 5.10, we can show that the Mahalanobis error decays to a constant geometrically fast. With stronger conditions, a similar proof should lead to a similar bound for Eke n k s C n with s > 1. But we only demonstrate the case with s D 1 for exposition simplicity. THEOREM 3.4. Suppose system (1.1) satisfies the uniform observability Assumption 3.2 and has an intrinsic filtering dimension p as described in Assumption 3.1. For any c > 0, there exist a function F W PSD ! R, a constant D, and a sequence M n such that when K > Dp, The function F and the sequence M n satisfy the following bounds with a constant D F : Moreover, the ensemble covariance C n is bounded by z R n most of the time:

Exponential Stability
Another useful property implied by the previous analysis is that EnKF is exponentially stable. Let fX C g be a shift of the original initial ensemble. If both EnKF generate the same realization for .k/ n and assimilate the same observation sequence Y n , then it is straightforward to check that their ensemble spread matrices remain the same, and the difference in their mean estimates is given by So if kU n;0 k converges to 0 exponentially fast, then so does the mean difference.
In [5], this is called exponential stability.
COROLLARY 3.5. Under the conditions of Theorem 3.4, suppose EF .C 0 / < 1 and K > Dp. Then the EnKF is exponentially stable, The proof is located after the proof of Theorem 5.10.

Filter Accuracy
When system (1.1) is observed frequently, the system noises are of close-to-0 scale. Intuitively, the filter error should be close to 0 as well. Such a property has recently been investigated by [24] for other observers and called filter accuracy. While filter accuracy may come easily for the Kalman filter (1.2), it is highly nontrivial for EnKF. Our framework reveals that filter accuracy of EnKF can be obtained by filter accuracy of the reference Kalman filter. COROLLARY 3.6. Under the same conditions of Theorem 3.4, there exists a constant C such that when K > Dp, for any > 0 there is an EnKF filter with ensemble size K for the following system: such that with the same function and sequence in Theorem 3.4: PROOF. This is a straightforward corollary from Theorem 3.4, since if z R n is a reference Kalman filter covariance for system (1.1), then 2 z R n is a reference Kalman filter covariance for system (3.4), which has spectral norm bounded by 2 D R . So if one applies Algorithm 1 with rescaled parameters .K; p; r; 2 ; /, one can check that both Assumptions 3.1 and 3.2 hold. The 2 appears before C 0 , because in fact F .C 0 / D f .kC 0 z R 1 0 k/ for some function f , which will be clear in Theorem 5.10.

A Simple Application and Example
When applying our results to a concrete problem in the form of (1.1), we need to compute the stationary filter covariance z R n , as it plays a vital role in the loweffective-dimension Assumption 3.1, and it controls the ensemble covariance C n . In practice, this requires nontrivial numerical computation. On the other hand, there is a huge literature on Kalman filters, so there are various ways to simplify the computations. The authors [33] have discussed a few such strategies, involving unfiltered covariance, the benchmark principle, and the comparison principle. We will not reiterate these strategies in this paper, but instead apply some of them to a concrete simple turbulence problem [32,33].

Linearized Stochastic Turbulence in Fourier Domain
Consider the following stochastic partial differential equation [18,32] To simplify the discussion, the underlying space is assumed to be a one-dimensional torus T D OE0; 2 ; generalization to higher dimensions is quite straightforward. The terms in (4.1) have the following physical interpretations: (1) is an odd polynomial of @ x . This term usually arises from the Coriolis effect from earth's rotation, or advection by another turbulent flow. Sup- (2) is a positive and even polynomial of @ x . This term models the general diffusion and damping of turbulences. Suppose .@ x /e ikx D k e ikx . (3) F .x; t / is a deterministic forcing and W .x; t / is a stochastic forcing. One way to discretize (4.1) in both time and space is to consider the Galerkin truncation of u at time nh with a fixed interval h > 0. Let X n be a .2J C 1/dimensional vector, with its k th component being the k th Fourier component of u. ; nh/; in other words, 2OEX n k cos.kx/ C 2OEX n k sin.kx/: Suppose both the deterministic and the stochastic forcing admit a Fourier decomposition: Here W˙k.t / are standard Wiener processes. A time discretization of (4.1) in the Fourier domain yields the system coefficients for the system vector X n . The details are in [33], and the results are presented as follows. A n D A is diagonal with 2 2 sub-blocks, and † n D † is diagonal. Their entries with B n are stands for the stochastic energy of the k th Fourier mode, and also the sum of the stochastic energy of OEX n k and OEX n k .
In practice, the damping often grows and the energy decays like polynomials of the wavenumber jkj: To show that our framework is directly computable, we will also consider the following specific set of physical parameters with a Kolmogorov energy spectrum used in [31]:

Reference Spectral Projection
In order to verify Assumption 3.1, we need to estimate the stationary Kalman covariance z R n . The unfiltered equilibrium covariance V 0 n of X 0 n is a crude upper bound for z R n , as the Kalman filter has the minimum error covariance, while V 0 n is the error covariance made by estimating X 0 n using its equilibrium mean. Although this is a crude estimate, it works for any choice of observation, and it is easy to The authors [33] also applied this idea for reduced filtering error analysis.
In particular, for (4.2), A n A T n C † n is a diagonal matrix with entries =r, this mode should be included by the instability covariance † C n . For the other Fourier modes, one can verify that V 0 is a constant diagonal matrix with entries Since z R n V 0 , by Lidskii's theorem 6.10 of [21], it suffices to show V 0 has at most p eigenvalues above . In summary, the following must hold: The quantity is easily computable; for example, with the physical parameters (4.4), p can be set as 15 or 30 if the negative wavenumbers are also considered, and D 0:04. Such a small p makes K 100 comparatively large, and it is independent of the Galerkin truncation range J .

Regularly Spaced Observations
Observations can significantly decrease the Kalman filter covariance z R n , so they help to keep the intrinsic filtering dimension low. Here we consider only a simple but useful scenario where the observations of u.x; t / are made at a regularly spaced network, x k D 2 k 2J C1 , k D 0; : : : ; 2J , and have an Gaussian observation error N .0; o / at each location. [32,33] have shown that this is equivalent to a direct observation, with H n D I 2J C1 p .2J C 1/= o . Then the reference covariance matrix z R n is a constant diagonal matrix with entries OE z R k;k D r k . When k is not a mode of instability, it solves a Riccati equation In summary, the following must hold The quantity is easily computable, for example with the physical parameters (4.4) while J D 50 and o D 10, p can be set as 6 and D 0:04. Apparently, the existence of observations makes the effective dimension much smaller than the one estimated by the unfiltered covariance.

Intermittent Dynamical Regime
One challenge that practical filters often face is that the dynamical coefficient A n is not always stable with spectral norm less than 1. This is usually caused by the large-scale chaotic dynamical regime transitions. One simple way of modeling this phenomenon in (4.2) is letting A n be a Markov jump process [32], while maintaining the sub-block structure OEA n fk; kg 2 D OE n k OEA fk; kg 2 : Here n is a Markov chain taking values in R KC1 . Then the system random instability can be modeled as the random fluctuation of OE n k , so that occasionally kOEA n fk; kg 2 k > 1 for some k.
Our framework is applicable to such scenarios in general, as we allow random system coefficients. The main difficulty would be the computation of the z R n and the verification of Assumption 3.1, which in general require numerical methods. Our framework can also be generalized, so instead of a constant ambient space uncertainty level , one can uses a stochastic sequence n . Section 5 of [33] discusses this generalization. In this paper, we do not intend to generalize our framework to that level, as the analysis involved is already rather complicated.
On the other hand, in many situations, the dynamical instability only occurs on a small subset I of Fourier modes. This is because when the wavenumbers are high, the dissipation force is much stronger than the random environmental forcing. So for k 2 I c , OEA n fk; kg could remain of constant value as in (4.2). See chapter 4 of [32] for such an example, where the instability occurs only at a few modes with wavenumber less than 5. Then it suffices to include the subspace spanned by modes in I in the instability subspace. This can be done with a slightly larger p. Since all verification discussed above, (4.5) and (4.6), concern only modes outside the instability subspace, they and also our framework still remain valid.

Rigorous Analysis for EnKF
This section provides the main ingredients for the proof of Theorem 3.4. This is accomplished by a RMT result for the forecast covariance matrix, a Mahalanobis dissipation mechanism, a Kalman covariance comparison principle, and a Lyapunov function that connects the previous three. The detailed proof of the RMT result is delayed to Section 6, as it is rather technical, long, and indirectly related to our main problem.
There will be two filtrations in our discussion. The first one contains all the information of system coefficients up to time n, and the initial covariance for the filters: F c n D fA k ; B k ; † k ; H k ; k ; k Ä ng _ fR 0 ; C 0 ; z R 0 g: Noticeably, the Kalman filters have their covariance matrix inside this filtration: We will use F c D W n 0 F c n to denote all the information regarding the system coefficients through the entire time line.
The second filtration contains all the information of system (1.1) up to time n, F n D F c n _ f l ; l ; .k/ l ; .k/ l ; l Ä n; k Ä Kg: We use E n Z and E F to denote the conditional expectation of a random variable Z with respect to F n and another fixed -field F, respectively. P n and P F denote the associated conditional probability.

Concentration of Samples
The first mechanism we will rely on is that with a low effective dimension, the ensemble forecast covariance C nC1 concentrates around its average. To describe this phenomenon, define the following sequences: So essentially the random matrix is sandwiched by its expected value multiplied by these ratios: 1 nC1 rA n C n A T n C r † C n C I C nC1 nC1 rA n C n A T n C r † C n C r I : Theorem 6.1 below indicates that n and n are mostly of value close to 1 as long as the sample size K surpasses a constant multiple of the effective dimension p. Specifically, we have the following: COROLLARY 5.1. Under the same conditions of Theorem 3.4, we denote the following rare events: There are constants c r ; D r > 0, such that The tail of nC1 can be bounded by an exponential one, P n . nC1 > 8 C t / Ä exp. c r Kt /: Inside the rare event, for any fixed M , there is a constant D M such that the following bound holds with l Ä M : kC n k C 1/ exp.D r p c r K/: Recall that k k denotes the l 2 operator norm of matrices.
PROOF. We will apply Theorem 6.1 with We will consider ı D 1 5 .
p r 1/ and two different 's: D .r 1/ and D . Theorem 6.1 is applicable here, because after the projection P n , fa 1 ; : : : ; a K g spans a subspace of dimension at most p. The union of the rare events considered in this theorem is included by the one of Theorem 6.1, since Notice that the condition number of C C sI d is krs 1 A n C n A n k C 1. Also notice that log.x C 1/ Ä log x C 1 for all x > 0, so with kA n k Ä M A , there is a constant D ;A;r 1 such that log s 1 rA n C n A T n C I Ä log s 1 rM 2 A kC n k C 1 Ä D ;A;r .log kC n k C 1/ for both s D .r 1/ and s D . Therefore, according to Theorem 6.1, there are constants D ı ; c ı ; D M such that P n U nC1 [ U nC1 Ä exp.log 4D ;A;r C D ı p c ı K/.1 C log kC n k/; P n . n > 8 C t / < exp. c ı Kt /, and E n 1 U nC1 l n Ä k † C n = k l D M exp.log 2D ;A;r CD ı p c ı K/.1CkC n k/; l Ä M: Since we can always pick D r such that log 4D ;A;r C D ı p Ä D r p for all p 1, we have our claims.

Covariance Fidelity via Mahalanobis Norm
The spectral projection is necessary for the posterior ensemble to be of rank p. The price we need to pay is that this procedure may decrease the ensemble covariance. Let n be the .p C 1/ th eigenvalue of K n . C n /, and n D maxf1; n = g measure the impact of the spectral projection step P n over the posterior ensemble. This can be expressed by the following inequality: n C n P n .K n . C n / I /P n C n I P n .K n . C n / I /P n C P n C n .I P n / K n . C n /: This inequality can be verified by checking the eigenvectors of K n . C n /, which are the same as the one of C n . The verification is straightforward if we divide the eigenvectors into those in the range of P n and those in the null space. The Mahalanobis error dissipation can be formulated as below.
LEMMA 5.2. With our EnKF Algorithm 1, the filter error e n D S X n X n satisfies In our formulation above, nC1 describes the fluctuation from random sampling, and nC1 describes the possible deflation made by projection, as seen in (5.3). In the classic Kalman filter setting [35], these sequences are simply ones.
PROOF. The forecast estimator is S X nC1 , and its error is y e nC1 D S X nC1 X nC1 . The difference of the following two equalities, yields y e nC1 D A n e n nC1 . Moreover, D .I G nC1 H n /A n e n .I G nC1 H n / n C G nC1 nC1 : Because nC1 and nC1 are distributed as N .0; † n / and N .0; I / conditioned on F n , For the first part, (5.4), we claim that Because of (5.3), nC1 C nC1 K n . C nC1 / .I G nC1 H n / C nC1 .I G nC1 H n / T : Moreover, .I G nC1 H n / D .I C C nC1 H T n H n / 1 is clearly invertible. The inversion of the inequality above reads (5.7) .I G nC1 H n / T C nC1 1 .I G nC1 H n / nC1 C nC1 1 : Next, recalling (2.2) we have (5.8) rA n C n A T n C r † C n C I rA n C n A T n : By the definition of nC1 , OE C nC1 1 nC1 OErA n C n A T n C r † C n C I 1 , so which by Lemma A.2 leads to (5.6). To deal with (5.5), we use the identity that a T Aa D tr.Aaa T / and the conditional distributions of the system noises, Note that by (5.7) and (5.8), and C nC1 r † C n C I , By Lemma A.2, and r † C n C I r † n , tr .I G nC1 H n / C nC1 1 .I G nC1 H n / T † n Ä 1 r nC1 nC1 d: Also notice that Then by nC1 C nC1 K n . C nC1 /, tr.G nC1 G T nC1 OEC nC1 1 / Ä d nC1 : As a sum, we have shown our claim.

Covariance Control via Comparison
The second mechanism we will exploit is the comparison principle of the Riccati equation. The general idea is to compare the ensemble covariance C n with a solution of the Riccati equation (3.2). This can be done in two ways. The first one is through expectation. LEMMA 5.3. For all n 0, E F c C n R 0;n .C 0 / a.s.
PROOF. Let R 0 n D R k;n .C k /. We will prove our claim by induction. Suppose that E F c C n R 0 n a.s.; then since : By Lemma A.1, K n is concave and monotone. By Jensen's inequality nC1 a:s: This result explains why covariance inflation is necessary: the random sampling underestimates the covariance on average. On the other hand, the a priori estimates it provides on C n is rather limited. For example, E F c kC n k cannot be obtained solely from Lemma 5.3. In order to do a more delicate analysis, we consider the quotient ratio between C n and z R n as in Assumption 3.1. Define the sequence (5.9) n D inff 1; C n z R n g: By Lidskii's theorem 6.10 of [21], C n n z R n indicates that the ordered eigenvalues of C n are dominated by the ordered eigenvalues of n z R n . Then by Assumption 3.1, the .p C 1/ th eigenvalue of C n is at most n . Therefore sequence n dominates sequence n . For this reason, we will replace n with its upper bound n in the following discussion.
As a matter of fact, we can compare our EnKF with any solution of (3.2) and find the following recursion formula: LEMMA 5.4. Fix a time k and a covariance sequence R 0 n D R k;n .R 0 k /. Let 0 n D inff 1; C n R 0 n g: Then 0 nC1 Ä minf1; 0 n nC1 =rg: Moreover, K n . C nC1 / 0 nC1 y R 0 nC1 . PROOF. We will prove our claim by induction. Suppose that C n 0 where we used the definition of nC1 . Then by the concavity and monotonicity of K n , Lemma A.1, and (2.5), : The uniform observability condition guarantees an upper bound for our covariance.
LEMMA 5.5. Under the uniform observability Assumption 3.2, given any matrix C k , there is a D R such that PROOF. This is proved by proposition 19 of [33] (proposition 6.1 in the arXiv version). By taking no information for forecast at time k (that is, formally taking y R 1 k D 0), we have

A Lyapunov Function
In view of Lemma 5.2, the Mahalanobis error is not dissipating by itself due to the fluctuation of the sampling effect and truncation errors. Sections 5.1 and 5.3 imply these effects are controllable. In this section, we show that they are combined under a Lyapunov function, namely the following: (5.10) . / D exp.D log 3 /; . / D . / m .1 C a r /: D is a large constant and a r is close to 0; their values will be fixed during our discussion. Before we prove Theorem 3.4, we need two components. We will assume Assumptions 3.1 and 3. PROOF. Since n dominates n , we can iterate Lemma 5.2 n times and find that (5.11) k k =r: We will deal with the linear coefficient E 0 Q n kD1 k k =r first. Consider the following rare events: (5.12) U k D f k p r or k p rg; k D 1; : : : ; n; and U D S n kD1 U k . Outside of U, k Ä p r and k Ä p r, so by Lemma 5.4, This indicates that Note that since k 1, Lemma 5.4 indicates k Ä 0 Q k j D1 j a.s. So the rareevent part in the right-hand side above can be bounded as follows: Because n Ä m, by Corollary 5.1 there are constants c r and D r so To continue, by Jensen's inequality, the concavity of log, and Lemma 5.3, E 0 log.kC k 1 k C 1/ Ä E 0 log.tr.C k 1 / C 2/ Ä log.tr E 0 C k 1 C 1/ Ä log.tr.R 0;k 1 .C 0 // C 1/ Ä log p C log.kR 0;k 1 .C 0 /k C 1=p/: Note that R 0;k .C 0 / is the Kalman filter covariance of system (3.1) with X 0 N .0; C 0 /. Therefore it is dominated by the unfilter covariance By the bounds of kA k;j k and k † j k for k Ä m, and since C 0 Therefore, (5.13) Ä n 0 exp. 2 3 D r p 2 3 c r K/D m .log 0 C 1/. In summary, if The constant terms in (5.11) can be bounded in a similar fashion. Note that outside the rare event U, . j j =r C j / Q n kDj C1 k k =r Ä 2 m 0 , and inside the rare event we can bound it exactly as in (5.13), but with fewer terms, so Using each term in (5.11) with the corresponding upper bound above yields our claim.
The second component shows that n is a very stable sequence; it indicates that C n is dominated by z R n most of the time. But first we have a purely computational verification: So if we let c K D 12D , then So clearly we can find our v 0 .
The second component is the following.  By Corollary 5.1, we can find a sequence of independent, exponential with parameter c r .K 1/, random variables Z k such that k Ä 8 C Z k a.s. Then because the function above is increasing with k , By Lemma 5.7, there is a … such that if 0 > …, then for any fixed D , for K larger than a constant K , which completes (5.15). As for (5.16), simply note that in the previous step we have built an upper bound for . m / mC1 m , while .1 C 2a r / . 0 / Ä . 0 / k 0 C 1 for a sufficiently large … and 0 ….
Next we do a more delicate bound with 0 Ä … for (5.15). This time, we simply apply Lemma 5.4 to R 0 k D z R k , so (5.17) nC1 Ä maxf1; n nC1 =rg: Recall the rare events U D S m kD1 U k with U k given by (5.12). First, let us assume Note that for sufficiently small a r , 1 C a r 0 = p r Ä maxfr m=2 ; 0 = p rg for all It remains to bound the rare event part. Since k Ä m, we want to show Applying the Cauchy-Schwarz inequality, we find Since k 1, m Ä 0 Q m kD1 k , and .1 C a r m / Ä 2 m , Recall that we can bound k by a sequence of exponential random variables x k , so the quantity above is bounded by log.8 C x k / : Following a similar computational result like that for Lemma 5.7, there is a constant D 0 that bounds the right-hand side for all 0 Ä …. Therefore, by upper bounds of P 0 .U/ as in (5.14), there is a D 00 such that When c r K D r p is sufficiently large, this can be further bounded by a r . LEMMA 5.9. For any 1 Ä k Ä m, the following holds if K D b p is sufficiently large: PROOF. The proof is the same as that for Lemma 5.6, starting from (5.17) to (5.18), which holds for general 0 and m replaced with k. The only difference is that (5.18) is no longer bounded by a constant, as we no longer have 0 Ä …. So instead, we bound (5.18) by the following, using Hölder's inequality .a C b/ 3 Ä 4a 3 C 4b 3 : The quantity above can be further bounded by 8 . 0 /D 00 for a constant D 00 using computations as in Lemma 5.7. Hence

Conclusions from Previous Results
Once we combine all the previous arguments, we reach a stronger version of Theorem 3.4: THEOREM 5.10. Suppose system (1.1) is uniformly observable as in Assumption 3.2 and has intrinsic dimension p as in Assumption 3.1. Then for any fixed a r > 0 sufficiently small and any fixed D sufficiently large, there exists a D such that if K Dp is sufficiently large, and . / D exp.D log 3 /; . / D . / m .1 C a r /; then for any k 2 OE0; m/ The sequence M n is given by the following: With n ! 1, M n converges to a constant Before we show our proof, it is not difficult to go from Theorem 5.10 to Theorem 3.4, as F .C 0 / can be chosen so that where 0 is defined as in (5.9) and M n in both theorems are the same. So the bound for Eke n k C n comes immediately from 1. Then because . / , z R n n C n , and k z R n k Ä D R C , and we have the bound of Eje n j. The dominance of C n over z R n (3.3) can be derived from the bound of E . nmCk /, as D can be chosen arbitrarily large and a r arbitrarily close to 0.
PROOF OF THEOREM 5.10. We will pick parameters so that Lemmas 5.6-5.9 all hold. First, we consider the case with k D 0. By Cauchy-Schwarz, For sufficiently small a r and b r , because log 0 Ä 0 1, Using the Markov property, we can iterate this inequality n times using Gronwall's inequality: Note that by Gronwall's inequality, the second claim of Lemma 5.8 indicates that The bound for E . nm / comes from this, as a sum of s D m and s D m C 1.
Combining both estimates of Lemma 5.8 using Cauchy-Schwarz, we find that In summary, we have our claims for k D 0. For nonzero k < m, by the Markov property, the previous results indicate that (5.19) Then by Cauchy-Schwarz, Lemma 5.9, and the fact that m 1, Likewise, by Lemma 5.9 we have E 0 . k / Ä 1 C 4 . 0 /, and Plugging these bounds into (5.19) and (5.20), we have our claim.
PROOF OF COROLLARY 3.5. Let U n;n 0 D Q n 1 kDn 0 .I b K kC1 H k /A k . By iterating (5.6) n times, we find that kC n k 1 U T n;0 U n;0 U T n;0 OEC n 1 U n;0 Taking the spectral norm on both sides yields kU n;0 k Ä D R kOEC 0 1 k n n Y kD1 k k =r: Recall that which is less than r m=6 p . nm /. Therefore by iterative conditioning, By applying Lemma 5.9, E p . j / can be bounded. Then notice that 1 yields our claim.

Concentration of Noncentral Random Matrix
Random matrix theory (RMT) is one of the fastest-growing branches of probability theory in recent years. Yet most of the RMT results concern the spectrum of a matrix X where each column is i.i.d. with mean zero. For our application, each column of the forecast ensemble spread matrix has its own noncentral distribution, and we more concerned with the ratio between each eigenvalue and its expectation. For this reason, we have to develop the following result for EnKF. Fortunately, classic RMT arguments like -net and Gaussian concentration are still valid for our purposes.
The EnKF augmentations are crucial for such a result to hold. The additive inflation insures that the matrix inversion is nonsingular. The multiplicative inflation creates an important nonzero gap so the concentration can fit in with high probability. The spectral projection insures that the rank of the matrices are at most p.
.a k C k /˝.a k C k /: Clearly EZ D D. Fixing any > 0, denote the condition number of C C I d as C, and D inffr 0 W OEZ C I d 1 rOED C I d 1 g; D inffr 0 W Z rOED C I d g; Suppose that rank.OEa 1 ; : : : ; a K / Ä p and rank. †/ Ä p. For any fixed ı > 0, there are constants D ı and c ı such that when K > D ı =c ı p we have the following: With high probability, both and are close to 1: for the event U WD f > 1 C 5ı or > 1 C 5ıg, In the complementary set, is controllable through its moment. Suppose that Ä k †k; then for any fixed number n, there is a constant C n E1 U n Ä C n n k †k n .log C C 1/ exp.D ı p c ı K/: Finally, is controllable through its tail: when K D ı p, so for any t > 0 P . > 8 C t / Ä exp. c ı Kt /: PROOF.
Step 1. Spectral projection. In our discussion below, without lost of generality, we assume ı is a small positive number, so the following simplified estimation holds: Next, let P be the linear sum of the column space of † and the linear space spanned by fa 1 ; : : : ; a K g. By our condition, P has dimension at most 2p. Denote the projection of v to P as v 0 , and the residual as v ? . Note that ha k ; v ? i D 0 and h k ; v ? i D 0 a.s., One elementary fact is that if a; b; c are nonnegative real numbers, As a consequence, F v Ä maxf1; F v 0 g. Since we are not concerned with the part of that is below 1, we can focus on v 2 P. Moreover, because F v is invariant under renormalization, we can focus on jvj D 1.
Step 2. Two -nets. In order to parametrize v 2 P, let ‰ƒ‰ T be the orthogonal decomposition of †, where ‰ is a d p matrix, and let ƒ be a p p matrix with strictly positive diagonal entries. Let ‚ D ƒ 1=2 ‰ T . The set f‚ T u; u 2 S p 1 g D fv 2 P; jvj D 1g, where S p 1 D fu 2 R p ; juj D 1g is the .p 1/-dimensional sphere. With a transformation through ‚, we denote the spread matrices consisting of the dynamical forecast and the system noise as follows: S D OE‚a 1 ; : : : ; ‚a K T ; S 0 D .S T S C .K 1/ ‚‚ T / 1=2 ; and also the random matrices T D OE‚ 1 ; : : : ; ‚ K T ; T 0 D OE‚ 1 ; : : : vector with all entries being 1. Note that kQ K k Ä 1. One important fact will be exploited later is that T is a K p random matrix with i.i.d. N .0; 1/ entries. Also, note with these notations, where OEx k denotes the k th coordinate of a vector x.
Like many other random matrix problems, f u;u and g u;u are easy to estimate for a fixed u, but difficult to estimate over all u 2 S p 1 . The general solution to such a problem is finding proper -nets, where is a very small positive number to be fixed later. Here we need two.
Pick a group of points fw j 2 R p j j 2 J; jw j j D 1g as an -net for the .p 1/dimensional sphere S p 1 . In other words, S p 1 S j B .w j /, where B r .x/ is a ball of radius r around point x. By lemma 5.2 in [41], the cardinality of this net is bounded by jJ j Ä .1 C 2 / p .
As we are dealing with another matrix S 0 , we need a second net fu i 2 R p j i 2 I g generated by the norm jS 0 j, so S p 1 S i B S .u i /. Here u 2 B S .u i / if jS 0 .u u i /j Ä jS 0 u i j: By Lemma 6.2, this set has cardinality jI j Ä 4.1 C 4 1 / p p.log C 0 C 1/; with C 0 being the condition number of S 0 . Note that F . 1 ; : : : To continue, we will find simpler bounds for F ‚ T u and G ‚ T u when u 2 B S .u i /, u 2 B .w j /. This discussion will be done separately for two different scenarios.
Step 3. jS 0 u i j Ä 1 2 ı p K 1. In this case, the numerator of f u;w is bounded by Ã and the denominator of g u;u is bounded from below by K 1. As for the denominator of f u;u and numerator of g u;u , jT 0 w j j 2 makes a good approximation, specifically because jT 0 .w j w/j Ä kT 0 k and By Cauchy-Schwarz , Likewise, the numerator of g u;w is bounded by Recall that T w j N .0; I k /, so jT 0 w j j D jQ K T w j j p K 1 by concentrations of Gaussian variables; then the quantity above can be bounded. In particular, let us consider the events of large deviations: Then in the canonical event .D [ A i;j / c , In this canonical set, we have two good upper bounds, On the other hand, because kT 0 k Ä kT k, so by spectral norm estimate of the Gaussian random matrix T , corollary 5.35 in [41], moreover, because EjQ K T w j j 2 D K 1, by Hansen-Wright's inequality for Gaussian variables [36], for some constants D ı ; c ı > 0, A different pair of c ı ; D ı will make P .D/ C P .A i;j / Ä D ı exp. c ı K/: While in the noncanonical set D [ A i;j , we can use a trivial upper bound for F ‚ T u , The part for G ‚ T u can be achieved through a Cauchy inequality in the end.
Step 4. jS 0 u i j 1 2 ı p K 1. In this case, the numerator of f u;w is bounded: and the denominator of g u;w is bounded from below by Next we try to bound the denominator of f u;w and the numerator of g u;w . Denote Then due to the facts that jS 0 .u u i /j Ä jS 0 u i j and that jw w j j Ä , the denominator of f u;w is bounded from below by Likewise, the numerator of g u;w is bounded by In order to continue, we rewrite F i;j ; G i;j by rotating proper terms. There is a K K rotation matrixˆi , with its first row being .OES u i 1 =jS u i j; OES u i 2 =jS u i j; : : : ; OES u i K =jS u i j/: Then if we consider the K 1 random vector D . 1 ; : : : we have OET 0 w j 2 k : Moreover, because T w j N .0; I K /, N .0;ˆi Q 2 KˆT i /. In particular, j k j 2 (6.6) and We consider these sets of large deviations: and D i D fkT 0 k .4 / 1 ı p .K 1/ C jS 0 u i j 2 g. Then in the canonical set and for small enough ı, Combining this with (6.2), we get Likewise, in the canonical set and by (6.3) and (6.5), we get Next we bound the probability of large deviations. By a Gaussian tail estimate, there are constants D ı ; c ı > 0 such that and because kT 0 k Ä kT k and jS 0 n u i j 1 2 p ı.K 1/, by concentration of the spectral norm of random matrices, corollary 5.35 of [41], there is a pair of constants .D ı ; c ı / such that Lastly, we can compute the mean of where e 1 D OE1; 0; : : : ; 0 T . Since tr.
k has the same distribution as jU j 2 , where N .0; I K / and Then kU T U k D Q K ‰ T i .I E 1;1 /‰ i Q K Ä 1 D kI K k: Also, the Hilbert-Schmidt norm is bounded by where we used that tr.AB/ D tr.BA/ and tr..I E 1;1 /A/ Ä tr.A/ for all PSD A. So by the Hansen-Wright inequality [36], there are constants D ı and c ı such that P .B i;j / Ä D ı exp. c ı ..K 1/ C jS 0 u i j 2 //: By the union bound, we can choose a different pair of D ı and c ı , Moreover, when in the rare event D i [ A i;j [ B i;j , we can again use the trivial upper bound Step 5. Summing up. Finally, we can put all our estimates together. First, denote the union of all large-deviation sets as Based on the previous discussion, outside U, F ‚ T u ; G ‚ T u Ä 1 C 5ı for all u, so ; Ä 1 C 5ı, while the probability of this event U is bounded by the union bound as Finally, notice that the condition number C 0 of matrix S 0 is dominated by that of C by Lemma A.5 since ‚ has rank p: For the control of in the rare event, recall the trivial upper bounds. When jS 0 u i j Ä 1 2 p ı.K 1/, For jS 0 u i j 1 2 p ı.K 1/, By maximizing the right-hand side of (6.8) over all possible values of jS 0 u i j, we can find a new pair of c ı ; D ı such that (6.7) Ä n k †k n D ı exp. c ı K/; (6.8) Ä n k †k n D ı exp. c ı K/: As a consequence, Ä .log C C 1/ n k †k n exp.log p C log D ı C D ı p c ı K/ for a different pair of c ı ; D ı > 0. Moreover, we can remove the log p C log D ı term by having a larger D ı in front of p.
The tail of can be achieved by the following trivial bound. For the m part, notice that by (6.1) So by corollary 3.53 of [41], LEMMA 6.2. Let S be any p p nonsingular matrix, and C.S / be its conditional number. Then for any fixed 2 .0; 1/, we say U S p 1 is an S -relative -net of S p 1 if for any x 2 S p 1 , there is a u 2 U such that jS.x u/j Ä jS uj: Then there exists an S-relative -net of S p 1 , jU j Ä e.1 C 2e 1 p 1 / p p.log C.S / C 1/; which can be further bounded by 4.1 C 4 1 / p .p log C.S / C 1/ for simplicity. Moreover, the linear dependence of jU j over log C.S / is sharp.
PROOF. Denote by m the minimum singular value of S , and let c > 1 be a number to be determined. For n D 1; : : : ; N c WD dlog.C.S //= log ce, let D n WD fu W juj D 1; mc n 1 Ä jS uj Ä mc n g: Then S p 1 D S N c nD1 D n , and we will construct an S -relative -net for each D n . We say U D n is an S-relative -separated set of D n if for two different x; y 2 U , jS.x y/j maxfjS xj; jSyjg. Then, the cardinality of any S -relative -net of D n is upper bounded. To see this, note that if x; y 2 U , B .1=2/ jS xj .S x/ and B .1=2/ jSyj .Sy/ have no intersection; otherwise jS x Syj < 2 .jS xj C jSyj/, which contradicts the definitions of -separation. On the other hand, because jS xj Ä mc n , B 1 2 jS xj .S x/ Â B .1C 1 2 /mc n .0/, so jU jVol.B 1 2 mc n 1 .S x// Ä Vol.B .1C 1 2 /mc n .0// ) jU j Ä .1 C 2 1 / p c p : Since S -relative -separated sets of D n all have finite cardinality, there is a U 0 n among them with the maximal cardinality. We claim this U 0 n is a S-relative c -net for D n . To see this, suppose there is a´, with jS.´ x/j c jS xj for all x 2 U 0 n ; then jS.´ x/j jS´j because both´and x are in D n , and that cjS xj jS´j by the definition of D n . So U 0 n [ f´g is another S -relative -separated set of D n , which contradicts that U 0 n has the maximal cardinality. To summarize, we can use S N c nD1 U 0 n as a c set for S p 1 , with the total cardinality bounded by .1 C 2 1 / p c p N c . To show our claim, we will replace with c 1 and c D e 1=p .
It is also easy to see the linear dependence of jU j on C.S /. Let F .x/ D log jS xj; then F .S p 1 / D OElog m; log m C log C.S /, where m is the minimum eigenvalue of S . For each u 2 U , if jS.x u/j Ä jS uj, then F .x/ 2 OElog.1 / C log F .u/; log.1 C / C log F .u/, which is an interval of fixed length log 1C

Conclusion and Discussion
Ensemble Kalman filters (EnKF) are indispensable data assimilation methods for high-dimensional systems. Their surprisingly successful performance with ensemble size K much lower than ambient space dimension d has remained a mystery for mathematicians. The practitioners often attribute this success to the existence of a low-effective-dimension p, for which the formal definition is sorely lacking. This paper closes this gap by considering a Kalman filter with random coefficients and uses its covariance z R n as an intrinsic filtering performance criterion. The effective dimension then can naturally be defined as the number of eigenvalues of z R n , or a system instability matrix, that is above a significance threshold . An EnKF with proper covariance inflation and localization techniques is constructed by Algorithm 1, which exploits the low-effective-dimension structure. Then assuming that the system is uniformly observable and that the ensemble size exceeds a constant multiple of p, Theorem 3.4 asserts that the Mahalanobis error of EnKF decays geometrically to a constant. Useful properties such as covariance fidelity, exponential stability, and filter accuracy of EnKF come as immediate corollaries. This framework can be directly applied to a simple stochastic turbulence, where the effective-filtering dimension along with the EnKF tuning parameters are explicitly computable. The proof exploits the intrinsic Mahalanobis error dissipation, shows that this mechanism operates with high probability using a new noncentral random matrix concentration result, and regulates the behavior of outliers by designing a Lyapunov function.
As the first step of studying EnKF performance, this paper looks at a relatively simple setup. There are multiple directions in which this framework can be improved. Here we list five of them to inspire future research: The random-linear-coefficient setting (1.1) includes a wide range of applications [5,33]. But for many geophysical applications of EnKF, the forecast models are fully nonlinear [39,40]. Two major challenges arise if one wants to generalize our setup to nonlinear systems. First, it will be difficult for the forecast ensemble to capture the forecast dynamics, so the error dynamics will no longer be linear. Second, the reference Kalman filter plays a pivotal role in this framework; it is not clear what would replace it in a nonlinear setting.
One important feature that this paper has not discussed is the universality of noise. The authors believe the same proof remains valid if the noise distribution is assumed only to be sub-Gaussian [41]. This is because the only feature we have used of Gaussian distributions is their concentration, which also holds for sub-Gaussian distributions or even exponential distributions. The non-Gaussian noise may be an essential ingredient if one wants to investigate the nonlinear settings. This paper has not discussed this universality intentionally, as it is less known for the filtering community and may cause confusion.
The Mahalanobis error appears to be a natural statistic for covariance fidelity and stability analysis. It is equivalent to the average l 2 error Eje n j, but this equivalence is not good in a high-dimensional setting. For example, in Theorem 3.4, Eje n j has a square root dependence on d . In practice, one would expect the dependence should be p p. To reach this result, one would need to investigate the uncertainty levels in different dimensions. The uniform observability Assumption 3.1 is proposed for a simpler Lyapunov function construction. It is stronger than the general assumptions developed in the classic Kalman filter literature such as [5]. It will be interesting if our framework can be further generalized in this aspect.
In recent years, many other EnKF augmentations are invented based on various intuitions. Whether our framework can justify their formulations requires further investigation.

Appendix: Matrix Inequalities
The following lemma has been mentioned in [16] for dimension 1.
LEMMA A.1. The prior-posterior Kalman covariance update mapping K n in (1.2) can also be defined as K n .C / D .I GH n /C.I GH n / T C GG T where G WD CH n .I C H n CH T n / 1 is the corresponding Kalman gain. K n is a concave monotone operator from PD to itself.
PROOF. The first matrix identity is straightforward to verify and can be found in many references on Kalman filters [32]. In order to simplify the notation, we ignore the time indices below and let J.X / D .I C HXH T / 1 . Then picking any symmetric matrix A, the perturbation in direction A is given by Therefore, as long as X; X C A 0, then the convexity holds: K.X / C K.X C A/ 2K.X C 1 2 A/: When we require A to be PSD, D A K 0 implies the monotonicity of K. PROOF. If the null subspace of B is D and P is the projection onto the complementary subspace D ? , then it suffices to show that .PAP/.PBP/.PAP/ PBP. Therefore, without loss of generality, we can assume B is invertible, so it suffices to show .B 1=2 AB 1=2 /.B 1=2 AB 1=2 / T I: But this is equivalent to checking that the singular values of B 1=2 AB 1=2 are greater than 1, which are the same as the eigenvalues of A.
If A and C are invertible, then the second claim follows as the direct inverse of the first claim. Otherwise, it suffices to show the claim on the subspace where A and C are invertible. LEMMA A.4. Let A and B be two PSD matrices, and let A be invertible; then kABk D kA 1=2 BA 1=2 k D inff W B A 1 g: PROOF. kABk D kA 1=2 BA 1=2 k comes as conjugacy preserves eigenvalues, and kA 1=2 BA 1=2 k D inff W B A 1 g is obvious.
LEMMA A.5. Let A be a d d PSD matrix, and ‚ be a p d matrix with rank p.
Then the condition number of ‚A‚ T is dominated by that of A.
PROOF. Suppose v M and v m are the p-dimensional eigenvectors of ‚A‚ T with maximum and minimum eigenvalues M and m . Compare .‚ T v M / T A.‚ T v M / and .‚ T v m / T A.‚ T v m /. We know that the maximum eigenvalue of A is above M and that the minimum eigenvalue of A is below m .