Effects of mis‐specified time‐correlated model error in the (ensemble) Kalman Smoother

Abstract Data assimilation is often performed under the perfect model assumption. Although there is an increasing amount of research accounting for model errors in data assimilation, the impact of an incorrect specification of the model errors on the data assimilation results has not been thoroughly assessed. We investigate the effect that an inaccurate time correlation in the model error description can have on data assimilation results, deriving analytical results using a Kalman Smoother for a one‐dimensional system. The analytical results are evaluated numerically to generate useful illustrations. For a higher‐dimensional system, we use an ensemble Kalman Smoother. Strong dependence on observation density is found. For a single observation at the end of the window, the posterior variance is a concave function of the guessed decorrelation time‐scale used in the data assimilation process. This is due to an increasing prior variance with that time‐scale, combined with a decreasing tendency from larger observation influence. With an increasing number of observations, the posterior variance decreases with increasing guessed decorrelation time‐scale because the prior variance effect becomes less important. On the other hand, the posterior mean‐square error has a convex shape as a function of the guessed time‐scale with a minimum where the guessed time‐scale is equal to the real decorrelation time‐scale. With more observations, the impact of the difference between two decorrelation time‐scales on the posterior mean‐square error reduces. Furthermore, we show that the correct model error decorrelation time‐scale can be estimated over several time windows using state augmentation in the ensemble Kalman Smoother. Since model errors are significant and significantly time correlated in real geophysical systems such as the atmosphere, this contribution opens up a next step in improving prediction of these systems.

problem is given by: where x represents the state of the system, and y denotes the observations. The prior probability density function (pdf), p(x), contains the background information of the state variables, and the denominator p(y) is the marginal pdf of the observations and independent from state variables, and indeed plays no active role in state estimation. The conditional pdf p(y|x) contains the information from the observations, and is the probability density of the observations given the current state of the system. Lastly, the conditional pdf p(x|y) is the posterior which represents the probability of the state variable given the observations, and obtaining it is the ultimate goal of data assimilation. In the atmospheric and oceanic sciences, various approximate data assimilation methods have been developed in the past few decades, typically originating in either variational approaches (Courtier and Talagrand, 1987) or (ensemble) Kalman Filter-based approaches (Evensen, 1994). Recently there has been a surge in hybrid methods trying to combine the advantages of the variational and KF-based methods, for instance using variational methods to solve the ensemble problem (Zupanski, 2005).
In the past few decades, data assimilation methods like four-dimensional variational method (4D-Var) have often been performed under the assumption that the numerical models are perfect, known as the strong-constraint setting (discussion in e.g., Amezcua and Van Leeuwen, 2018). Typically, it is assumed that the model errors can be neglected when compared with other error sources in the systems, such as the errors in the initial condition and observations (Trémolet, 2006). Since many dynamical systems of interest are chaotic, which means they are highly sensitive to the initial condition (Lorenz, 1963), a lot of research has focused on the errors in the initial condition in order to improve the accuracy of the weather forecast.
There are cases, however, when errors not coming from initial conditions become important in the accuracy of the forecasts and hence the data assimilation process. In fact, there is ample evidence that this is the case for most, if not all, geoscience disciplines (e.g., Bony et al., 2015;Kuma et al., 2018;Muelmenstadt and Feingold, 2018;Fox-Kemper et al., 2019;Fennel et al., 2019;Fisher and Koven, 2020). These model errors are often hard to estimate, which has hampered their inclusion in the data assimilation process. However, there are many reasons why a proper estimate of model errors needs to be included, apart from the fact that they are there in our prediction models. Jazwinski (1970) points out that, in order to obtain an optimal estimate of the system, we need a better understanding of the error covariance matrices from all error sources. Furthermore, including random model errors in smoothers for chaotic systems such as the atmosphere and the ocean makes these system less dependent on initial conditions, allowing for more efficient optimisation and longer smoother windows. Indeed, with better understanding of initial and observational errors, and a strong reduction in the former, there has been an increasing number of works taking model errors into account in data assimilation process (e.g., Carrassi and Vannitsem, 2010;Howes et al., 2017), resulting in so-called weak-constraint data assimilation.
Model error is essentially the mismatch between the true evolution of the system and the forecast produced by the numerical model over one model time step. There are various sources for model errors in numerical models, such as numerical discretization of the underlying differential equations describing the system, incorrect parametrizations, missing physical processes, etc. Some works implement a random additive variable at any given time step as model error (e.g., Evensen and Van Leeuwen, 1996), or insert a random multiplicative factor in the tendencies of the model governing equations (Palmer et al., 2009). For simplicity, model errors are often considered Gaussian random variables with zero mean and no correlation over time. Alternatively, the model error can be considered to be fixed over the simulation period, resulting in a model bias. However, in operational systems, real model errors will be complex in both spatial and temporal behaviour, as can be inferred directly from the sources of these errors.
In this paper we study the case in which the spatial structure of the model error is known, but its temporal structure is uncertain. In reality, both space and time structure are unknown, but we focus on the latter. We consider that the nature run evolves with a true model error; that is, a random model forcing with a certain decorrelation time-scale r . We label this time-scale memory. The imperfect forecast model uses a guessed memory g , which is different from the real one.
This work has two main purposes. The first is to investigate the effect of the incorrect time-correlated model error on data assimilation results under different observational frequencies, and different number of observations in an assimilation window. More specifically, we aim to quantify the change in performance of the Kalman Smoother when the time statistics of the model error are mis-specified, and the sensitivity of this change to different assimilation parameters. These results are extended to the ensemble case. The second objective is to use the data assimilation process to diagnose the memory of the model error. This is of great importance since it allows us to discriminate F I G U R E 1 Plots of the trajectories of the true state of the system (black), three posterior ensemble members (pink, randomly chosen from 200 members), and the posterior ensemble mean (red). (a, c) show results for a true white-noise model error and an assumed bias model error for two observation densities. Note that the posterior estimates are poor in both cases. (b, d) depict a bias true model error and an assumed white-noise model error. The result with one observation is poor, while if many observations are present, the assimilation result is consistent within the ensemble spread between a bias and a completely time-independent model error, and identify cases in between.
Before continuing, a simple illustration can illuminate the issue. In Figure 1 we show results of a smoothing process for a simple one-dimensional system over a time window of 20 nature time steps. We use an ensemble Kalman Smoother with two different observation densities in time (the details are discussed in a later section). The memories in the nature model and the forecasts models do not coincide. We can see that with r = 0.0, when the actual model error is a white-in-time random variable, the evolution of the true state of the system behaves rather randomly with the present model settings. If we do not know the memory and assume the model error is a bias in the data assimilation process ( g → ∞), the estimation made by the data assimilation method is not even close to the truth, even with very dense observations in the simulation period, as shown in Figures 1a, c. On the other hand, if the model error in the true model evolution behaves like a bias, and we assume that the model error is white in time in the data assimilation process, the results are quite different with different observation frequencies. As shown in Figures 1b, d, with very frequent observations, we can see a fairly good performance of the data assimilation process, but with a single observation the estimation is still not accurate.
The general structure of this paper is as follows. In section 2, we investigate the performance of the Kalman Smoother on a linear model with time-correlated model error analytically. When we are unable to find closed expressions, we numerically evaluate the (open) analytical expressions when necessary. We determine the behaviour of the posterior variance and mean-square error for different values of true and guessed memory. Next, a higher-dimensional system is explored in Section 3 via numerical experiments using the Ensemble Kalman Smoother. In Section 4 we use state augmentation to try to infer the memory time-scale from the assimilation process, with satisfactory results. Section 5 contains a summary and a discussion of the results.
In this paper we follow the notation introduced by Amezcua and Van Leeuwen (2018). Identifying different attributes in a variable can be difficult in some expressions. In general, superindices are used as time indices. If there is a comma in the superindex, it is because we have also added a label corresponding to the role in the data assimilation process. For instance, the variable t,b x would be the background mean of x at time t. There are some cases in which the function is clear, for instance a superscript applied to a vector cannot mean exponentiation. In the case where superscripts denote exponents, this is clearly specified in the text. An example of more complicated uses is: K x,t g which refers to the Kalman gain evaluated in x space at time t computed with the covariance matrix which uses the guessed memory g . More complicated uses of the suband superscripts are clearly identified in the text, and we also recommend that the reader check Amezcua and Van Leeuwen (2018) for clarity.

TIME-CORRELATED MODEL ERROR IN THE KALMAN SMOOTHER
Let us consider a simple linear model with the governing equation over one model time step: where M t→(t+1) ∈  N x ×N x represents the linear model operator and its size depends on the number of variables in the system N x , x t ∈  N x is the state variable at given time step t, and t+1 ∈  N x is an additive model error which contains correlation in time and space. The initial condition of the random variable, x 0 ∈  N x , is drawn from a multivariate Gaussian distribution (MGD), x ∈  N x is the mean of the random variable and B ∈  N x ×N x is its covariance matrix.
The time-correlated model error at time t also comes from a MGD, t ∼  (0, Q), with zero mean and the covariance matrix Q ∈  N x ×N x . We also consider spatial correlations for the model errors, hence Q is not diagonal. We follow Amezcua and Van Leeuwen (2018) and assume that the model errors are correlated in time as: where (|i − j|, ) represents the memory of the model errors, |i − j| is the absolute difference between time steps i and j, and represents the characteristic memory time-scale of the model error. The function decreases monotonically to 0 as |i − j| increases, and the maximum value of the is 1.0 as the absolute difference between time steps i and j tends to 0. For simplicity, we choose an exponentially decaying memory for the model error: When the correlation time-scale tends to 0.0, which indicates no temporal correlation in model errors, the function becomes a Kronecker delta function and the linear model becomes a first-order Markov model: In the other limit when tends to infinity, the memory of the model errors becomes 1.0 and the model error is fixed in time.

Formulation of the Kalman Smoother
We start with the formulation of the Kalman Smoother as described in Amezcua and Van Leeuwen (2018). It uses an extended control variable, z 0∶ ∈  ( +1)×N x over + 1 model time steps. This construction simplifies the representation of the covariance matrix and the exposition of the method. This extended variable can be written as the initial state of the system x 0 ∈  N x , plus a collection of the model errors over time, 1∶ ∈  ×N x : The extended variables can be transformed back to state space via: where M 0∶t ∈  (t+1)N x ×N x is the extended model operator and can be formulated as a block-matrix: The extended form also follows a MGD z 0∶ ∼ In this case, the prior covariance matrix D 0∶ ∈  ( +1)N x ×( +1)N x has a simple form, which can be written as a block-matrix: The covariance matrix of the extended control variable has two separate and independent parts: the part that comes from the initial condition, B ∈  N x ×N x , and the part that originates purely from the correlated model errors, The covariance matrix Q 1∶ is a block-matrix and can be written as a Kronecker product of a Toeplitz matrix and the spatial covariance matrix of the model error, Q: where the Toeplitz matrix 1∶ ∈  × contains all the memory coefficients. This Toeplitz matrix, 1∶ , has different forms in different scenarios: • When → 0, the Toeplitz matrix becomes an identity matrix, I ∈  × , and the Kronecker product Q 1∶ becomes a block-diagonal matrix.
• When → ∞, the Toeplitz matrix 1∶ is a matrix of ones and Q 1∶ becomes a block-matrix, in which every block element is the spatial covariance matrix Q.
To demonstrate the structure of the Kalman Smoother solution, we consider only one single observation at time step . Details of the formulation with multiple observations can be found in Amezcua and Van Leeuwen (2018). Then, the Kalman gain acting upon the whole simulation period in extended-variable space, K 0∶ z , can be computed as: (12) With the Kalman Gain, we can update the extended control variable using the Kalman equation, assuming that the state initial x 0 , the observation error and the model error are statistically independent of each other. Hence, the analysis mean is: where d is the innovation between observations and the model output at observational time t, which can be calculated as: The vector y represents the observations obtained from the true evolution of the system by the observation operator, H ∈  N y ×N x , including the observational error: where t ∈  N y is the observational error which follows a zero-mean MGD t ∼  (0, R) and its size depends on the number of variables observed from the system N y , x t, r represents the real state of the system at time step t, and R ∈  N y ×N y represents the covariance matrix of the observation errors. Note that the observational time can be anywhere inside the assimilation window 0 ≤ t ≤ . Finally, the covariance matrix is updated via: Considering more than one observation per assimilation window does not yield simple expressions. Instead, it can be done in two ways. First, we can consider modified expressions as in the appendix of Amezcua and Van Leeuwen (2018). Second, the observations can be assimilated serially one after the other. Since the observation error covariance matrix is assumed diagonal, this is equivalent to updating observations all-at-once. Amezcua and Van Leeuwen (2018) established a framework to handle time-correlated model errors in the Kalman Smoother and its ensemble implementation. Nonetheless, they did not evaluate the performance of the methods they discussed, and they did not study the consequences (in this performance) of using wrong memory of the model error in the forecast. This is one of the two new contributions of this work, and it is detailed in this section.

Evaluating the performance of the Kalman Smoother with time-correlated model error
A data assimilation system should be able to produce accurate estimations of the posterior density function of the state variables. In practice, assuming a unimodal posterior, it should at least be able to produce a mean trajectory which remains "close" to the (unknown) truth, and provide an uncertainty measure corresponding to the true uncertainty of the mean with respect to the truth.
One common approach is to compare the root-mean-square error (RMSE) which is the true error of the posterior mean, with the posterior standard deviation, or spread, which is the error estimated by the data assimilation method (Fortin et al., 2014). When the data assimilation results give us the "best" estimation of the system, the ratio of the RMSE and the spread should approximately be equal to 1.0. To simplify the situation, instead of comparing the RMSE with the spread, we use the ratio of the mean-square error (MSE) and the variance of the state variable. TA B L E 1 Expressions for the prior variance, prior covariance and prior correlation in different scenarios Before proceeding to actual experiments, we find the analytical expressions for both the MSE of the background and analysis. We also analyse in detail the variance expressions shown in Amezcua and Van Leeuwen (2018). To simplify calculations, we assume that the state is one-dimensional and the model operator is a damping coefficient, . The model is pure noise if tends to 0.0, and a random walk model when = 1.0. We choose a damping coefficient between 0.0 and 1.0 to ensure that the linear model is stationary. This leads to a model equation: For the next subsections we work in the state variable space, that is, our control variable is x 0∶ , for two reasons: the meaning of the expressions is more tractable, and the implementation in the ensemble case is more straightforward. The general expressions are obtained as double sums which are not easy to visualise. In some cases these double sums can be evaluated, leading to expressions provided in the Tables in the Appendix. In other cases we evaluate the expressions numerically and provide graphical illustrations.

Posterior variance in the Kalman Smoother
The prior variance at any time and covariance between two different time steps in our scalar system have the following expressions: where b 2 is the variance of the initial x 0 , the superscript b denotes the prior, and q 2 is the variance of the model error. In (18), in the expressions involving the scalars , b and q, the exponent actually means the constant raised to a power, as opposed to being a superscript.
According to (18), the prior covariance and variance have two sources: the initial condition which is the first term on the right-hand side (RHS), and the auto-correlated model errors as the double sum term on the RHS. Of course, (18) is only suitable for t > 0, t 1 > 0, t 2 > 0. As a special case, since the initial condition x 0,b is independent from the model errors at any given time, its variance and covariance are given by: Once more, the expression t means the constant raised to the power t. To obtain a feeling for (18), Table .1 contains results on limiting cases for and where the results of the sums can be evaluated analytically. Figure 2 shows that the prior variance is a monotonically increasing function of both and g , and, not surprisingly, of time. The prior variance is almost constant when < 0.5. For larger values of , the prior variance increases much faster with g . The prior variance as function of g shows the opposite behaviour: when g is between 0.0 and 10.0, the prior variance increases significantly with increasing g , but for larger g values the increase of the variance slows down. Since the posterior variance is the estimated error resulting from the data assimilation scheme, and in linear data assimilation the posterior variance is independent of the actual value of the observations, the posterior variance has no knowledge of the real decorrelation time-scale of the model errors, r . The posterior variance at a given time step, assuming that we have a single observation at time step , can be simplified as: where K x,t g is the Kalman Gain formulated in the x-space acting on the current time step t and in this scalar case can be computed as: where r 2 is the variance of the observation error, and clearly the exponent means the second power. We can see that the Kalman gain depends on the covariance between the state at the present time and at the observational time, the state variance at the observational time and the observation error. These expressions correspond to the state-space formulation in Amezcua and Van Leeuwen (2018). We also compute some limiting cases on the posterior variance with a single observation for and shown in Table .2. When more than one observation is present within an assimilation window, it is difficult to find simple analytical expressions and we refer to numerical evaluation. We start our numerical experiments with a fixed damping coefficient = 0.8, but with different memories . The results are shown in Figure 3.
The first thing that strikes the eye is the low posterior variance at observation times, which is as expected. Another clear trend is the decrease of posterior error with increasing g . This is directly related to the spread of observation information in the system: a larger g gives more memory in the system, and hence observations have a larger influence over time. In some plots the posterior variance is decreasing towards the initial time, while in others it is increasing. However, this is mainly due to the different colour scales in the plots; the posterior variance at initial time is mainly set by the prior variance, although observations do have an influence for larger decorrelation time-scales. Finally, one can notice a decrease of the posterior variance for g close to zero. This behaviour has its roots in the behaviour of the prior, which has minimal variance for small g .
To see this latter point better, we resort back to the analytical treatment of the case of a single observation at the end of the simulation period, at t = . We focus on the posterior variance at the initial time and at the observational time as those times show most interesting behaviour. As we have seen above, the initial variance and the covariance between the initial state and the state at any time is independent of the decorrelation time-scale, so: Using this, we find for the Kalman gain from (21) and Figure 2: and so the Kalman gain for initial time is a decreasing function of the decorrelation time-scale. Using this we find for the posterior variance at initial time: which is an increasing function of the decorrelation time-scale. At the observational time we can do a similar derivation: leading to: We thus find that both at initial and at observation times, the posterior variance increases with g . In fact, this derivation shows that this is true for all values of g , at initial and final times, not only for small g values as Figure 3 might suggest.

Mean-square error (MSE) of the posterior in the Kalman Smoother
Different from the posterior variance, for the MSE between the analysis mean and the true state of the system differences between the real decorrelation time-scale and the one assumed in the data assimilation are important. We calculate the MSE of the prior as the difference between the prior mean t,b and the truth. The truth is a realization of the true prior pdf at the initial time. The MSE at any time t is defined as: When the statistics of the model error used in the data assimilation is different from that of the truth, the prior pdf used in the data assimilation will deviate from the pdf that the truth is drawn from. Writing the pdf from which the truth is drawn as x t,r ∈  ( t , B t ), where t is its mean at time step t and B t represents its variance, the MSE at time t becomes: in which the last term represents the bias in the prior. Using this in a Kalman Smoother, we can compute the posterior MSE as: In the ideal case when g = r , the MSE of the posterior can be simplified as: As expected, the posterior MSE in the ideal case is the same as the posterior variance shown in (20) because the statistics of the prior and the truth are the same in this ideal case. When more than one observation is present in the time window, we can write the Kalman Smoother MSE as: where 0∶ ,a is the time-series of the posterior mean from the Kalman Smoother, B 0∶ represents the covariance matrix derived from the true pdf, K x,0∶ g is the Kalman Gain matrix calculated with g , and 0∶ ,b is the prior mean time-series. The observation operator H 1 : L maps L observations, from the state space into the observation space. Written in this form it is relatively easy to understand what the influence of a mis-specified model error is. However, this is slightly deceiving in that the result is written in terms of the true covariance and mean, which are unknown in the real world, and the Kalman Gain using the incorrect model error description. In the ideal scenario the MSE can be simplified to: where K x,0∶ is the optimal gain for the Kalman Smoother. Equation (32) shows the exact solution for the posterior covariance matrix shown in (16) in the variable space. The behaviour of the posterior MSE when the memory in the prior differs with that of the true system, that is, g ≠ r , is found from the numerical evaluation of the analytical expressions shown in (29), and the results are shown in Figure 4, which shows that, in general, the magnitude of the posterior MSE decreases as the observation frequency increases. This matches the results shown in Figure 3 for the posterior variance. As we expected, the posterior MSE reaches its minimum at the observational time. From Figure 4a,b,c, we can see that, with a single observation at the end of the simulation window, the MSE is minimized when g = r for the time steps that are away from the observational time and initial state. However, when the number of observations in the window increases, the difference between r and g becomes less important: the solid lines do not dominate large changes in MSE.
The Appendix contains derivations and analytical results for the Ensemble Kalman Smoother, where we specifically study the influence of sampling errors.

Evaluation of the Kalman Smoother for a one-dimensional system
To evaluate the performance of the Kalman Smoother we compute the ratio of the MSE over the variance of the posterior averaged over the simulation window, with different observational frequencies. Figure 5 shows the numerical evaluation of analytical expressions which contain ratios As we can see, the Kalman Smoother works well when g = r for all the cases, with the ratio of MSE over the variance equal to 1.0. With relatively high observational frequency (five observations or more in the simulation window), the MSE is larger than the estimated posterior variance when g > r , and vice versa. From the numerical results shown Figure 4, the mismatch between the two time-scales r and g barely has any impact on the MSE. The ratio is dominated by the posterior variance.
To understand the behaviour of the ratio in Figure 5 for small observation numbers, we refer to Figure 6, which shows the time average posterior variance as function of g for the case of one observation in the time window, as the black line. The concave shape is due to a combination of two effects. Firstly, the prior variance grows with g as a larger g gives rise to a larger decorrelation time-scale, so errors persist in the time window. This effect leads to a growth in posterior variance with g . Secondly, a larger g reduces the posterior variance because the larger decorrelation time-scale allows the observation information to spread more over the time window. These two competing effects lead to a maximum in posterior variance. Figure 6 also shows the MSE for three different values of r . The MSE curves are all convex, with a minimum when g = r , as expected since the minimum value of the MSE happens when the guess decorrelation time-scale is equal to the real time-scale. The ratio the MSE to the posterior variance is equal to one when their curves cross, and we see immediately that the curves cross twice when the position of the minimum of the MSE is different from that of the maximum in the posterior variance. This is exactly what Figure 5 shows for one observation, and the structure of that solution is fully determined by the position of the peak in the posterior variance. For two observations, we see qualitatively similar behaviour, with the peak in the posterior variance shifting closer to zero. For 5 and 20 observations, the peak in the posterior variance shifts all the way to zero because the influence of the observations becomes more important than the prior, so the posterior variance becomes a decreasing function of g , as can also be observed in Figure 3. This means that the MSE and posterior variance curves only cross once, where g = r , as Figure 5 indeed shows.
Ideally we would be able to show this behaviour exploring the analytical expressions of (18), but the expressions become rather complicated as we would have to F I G U R E 5 Ratio of MSE over the posterior variance for the one-dimensional system, calculated using numerical evaluation of the exact analytical expressions, using (a) one, (b) two, (c) five, and (d) 20 observations analytically evaluate ratios of integrals over double sums, which we were unable to perform. Finally, it should be noted that Figure 5 first calculates the MSE over posterior variance ratio and then averages over time, while Figure 6 and the argument above first average over time and then calculate the ratio. The results are qualitatively the same because Figures 3 and 4 show a similar behaviour over time.

TIME-CORRELATED MODEL ERROR IN A HIGHER-DIMENSIONAL SYSTEM
In this section we explore how the analytical results from the one-dimensional system carry over to systems with relatively higher dimensions. To this end we implement an Ensemble Kalman Smoother (EnKS; spici-teEvensen2000ensemble) using perturbed model forecasts (Van Leeuwen, 2020) with 200 ensemble members on a ten-dimensional system in which the deterministic model consists of a diagonal matrix with the damping coefficient on the diagonal, and spatially and temporally correlated model errors. This means that, although the elements of the state are evolving independently over time, they become more and more correlated due to the correlated model error. The large ensemble size with respect to the size of the state variable ensures that sample effects are small. Four cases with four different observation frequencies are discussed, similar to the experiments we do for the one-dimensional system. F I G U R E 6 The time-averaged posterior variance (black) and posterior mean-square error with different r (red,blue,brown), as a function of g , using a fixed damping coefficient = 0.8 We generate the true trajectory of the system, x 0∶ ,r with r , and all the prior ensemble members are generated using g . The assimilation is run over 50 time windows, in which the results from one window provide the prior for the initial conditions for the next window (i.e., cycling). There are 20 time steps ( = 20) in each time window.
We experiment with different combinations of r and g with the same range from 0.0 to 20.0, and the four observation settings explored above. To evaluate the performance on the EnKS, we calculate the ratio of the MSE and the ensemble variance. The MSE at a given time step t is computed as: where x t,a is the mean of the posterior ensemble and x t, r is the true state of the system. After obtaining the MSE and the variance for each time step, we calculate their ratio and the average of this ratio over the whole simulation period, and the results are shown in Figure 7. We find that the ratio of MSE over the posterior variance matches with the results for one-dimensional system shown in Figure 5. The EnKS performs well with a correctly guessed decorrelation time-scale ( g = r ). With different observation frequencies, the ratio shows a similar behaviour as the ratio of MSE over the variance of the posterior in the one-dimensional system, but the differences are about 10% larger in the higher-dimensional case. For the higher-dimensional system, it seems that the posterior ensemble spread is still the main factor to the ratio, which has a non-monotonic behaviour with g and becomes monotonically decreasing as g increases.

ESTIMATION OF THE MEMORY IN THE MODEL ERROR
In the previous sections we showed that using an incorrect memory time-scale in the model error can have a significant impact on the data assimilation results. Unfortunately, in many practical situations we do not know this memory time-scale. However, it is possible to treat that correlation time-scale as an unknown quantity and perform parameter estimation along the state estimation.
Parameter estimation via state augmentation has been used before, for example, with an extended Kalman Filter (Carrassi and Vannitsem, 2011), and even to determine parameters in Lagrangian Data Assimilation (Kuznetsov et al., 2003). Evensen (2009) used the EnKF to update the state while performing the parameter estimation.
Even for the simple linear regressive model that we used in the previous section, since the correlation time-scale is deeply encoded inside the governing equation of the system, parameter estimation becomes a nonlinear problem. As an example of such a correlation time-scale estimation problem, we will use state augmentation in an EnKS, in which the time-scale is simply added to the state vector.
Instead of the memory time-scale, g , we use the log scale of the memory time-scale to avoid negative memory estimates. The initial log-timescale values are drawn from a normal distribution: ln g i ∈  (ln g , 1.0). Hence we assume that the prior distribution of the memory time-scale is log-normal distributed. The results are shown in Figure 8. Figure 8a, b show experiments with only one observation at the end of the window, in which either our first estimate of the time-scale is larger or smaller than the real time-scale. With an increasing number of windows, F I G U R E 7 Ratio of MSE over the variance of the posterior for a ten-dimensional system with (a) one, (b) two, (c) five, and (d) 20 observation frequencies in each simulation window. These plots come from numerical experiments with a 200-member EnKS. Note that the results are qualitatively and also quantitatively very similar to those in Figure 5 we obtain better estimates, but the variance of the estimate does not change. Also, the convergence is slow. We experimented with different values for first guess and true time-scales, and in some cases the solution did not converge to the correct value. This is not surprising given the highly nonlinear character of the parameter estimation problem, especially with only one observation per window.
When we observe every time step, the convergence is much faster, and the variance in the estimate decreases, as shown in Figure 8c, d. In this case we always found fast convergence with different first guess and true time-scale combinations, demonstrating that more observations bring us closer to the truth, and hence make the parameter estimation problem more linear.

CONCLUSION
In this paper we have investigated the influence of a mis-specified model error decorrelation time-scale in linear models, using an (Ensemble) Kalman Smoother, and investigated estimation of that time-scale in an EnKS. Using a Kalman Smoother, analytical results were derived for the posterior variance and Mean-Squared Error F I G U R E 8 PDFs of the prior (blue) and posterior (reddish colours) estimated g , using an increasing number of assimilation windows. The different panels show results for different observation densities and prior mean (a, c) larger or (b, d) smaller than r . The vertical black line denotes the true value r (MSE) for a zero-dimensional model. We find that the posterior variance, which only depends on the guessed correlation time-scale g , has different behaviour with different observation frequencies. With a single observation, the posterior variance has a maximum at a certain g value, and that maximum and the g value decrease over time. When we increase the number of observations, the posterior variance becomes a monotonic decreasing function of g . Since the posterior variance represents the error of the posterior estimated by the data assimilation process, with more information from the observations the estimated error becomes significantly smaller. The MSE, which is the actual error of the posterior mean, decreases as well when more observations are included. But, unlike the posterior variance, the MSE of the posterior mean does not only depend on g , but also on the real memory time-scale r . The results for the posterior MSE with a single observation show that it increases with both r and the mismatch between g and r . It means that, if we do not have a fair estimate of the correlation time-scale, the actual posterior error will be larger.
For a higher-dimensional model we used an EnKS. The results agree with the results from the analytical and numerical evaluations of the Kalman Smoother. For many observations we found that the MSE is larger than the estimated error for g > r , and vice versa. For a low number of observations, a new regime appears where for very small r the MSE is smaller than the estimated error, and vice versa for very small g . This behaviour is mainly dictated by the behaviour of the estimated error.
Since the influence of an incorrect decorrelation time-scale in the model error can be significant, we investigated the estimation of this time-scale within an EnKF. We found that, when the observation density is high, state augmentation is sufficient to obtain converging results. However, with only one observation in a time window, the problem becomes too nonlinear and the estimation process is slow, or does not even converge. These results are consistent with parameter estimation via state augmentation in the literature. The new element is that online estimation is possible beyond a relatively simple bias estimate of the model error.

APPENDICES
The prior variance, prior covariance and prior correlation The posterior variance with a single observation at the end of the window The Mean-Square Error in the finite ensemble and scalar case Let us start with the simplest case for the finite ensemble member size with only one observation at t = , with the ensemble members having the same distribution as the truth and hence the same model-error memory. The ensemble size is N e , and the ensemble members {x t,b 1 , x t,b 2 , ..., x t,b N e } and the truth are drawn from  ( t,b , B t 2 ). The sample mean, x t,b N e , has the distribution ∼  ( t,b , B t 2 ∕N e ), and the MSE of the prior sample mean is given by: As we can see, the prior MSE under the perfect assumption is just the variance of the truth if N e → ∞. In this case, the posterior MSE of the ensemble mean can be computed as: Even with the ideal assumptions, the posterior MSE for the finite ensemble case is not as simple as the prior MSE. The first two terms on the RHS represents the real MSE of the posterior, and the rest is the sampling error. Note that the Kalman Gain is optimal since g = r . Now, let us take the different memory scales in the model error into account. These lead to different variances of the prior ensemble mean and the truth: x t,b N e ∼  ( t,b , t 2 ∕N e ), and x t,b ∼  ( t,b , B t 2 ). Thus, we have a slightly different expression for the prior MSE: Comparing with the expression shown in (.34), if we increase the ensemble size to infinity, the prior MSE is the same; it is just the variance of the truth. But the sampling error part is different. As for the posterior MSE in this scenario: In this case the Kalman Gain K x,t g is not optimal. Lastly, let us consider the most different case, when we have no knowledge of both the model error memory and its mean. In this case, we also have a bias in the mean of the prior ensemble members besides incorrect variance: x t,b N e ∼  (̃t ,b , t 2 ∕N e ). Then, the prior MSE of the ensemble mean has a similar expression as in (28): In this scenario, extra errors come from the bias in the mean. The posterior MSE becomes: The posterior MSE in this scenario contains the errors that are introduced by the sampling error, incorrect auto-correlation time-scale, and the bias term.