A dynamical Gaussian, lognormal, and reverse lognormal Kalman filter

We derive a generalization of the Kalman filter that allows for non‐Gaussian background and observation errors. The Gaussian assumption is replaced by considering that the errors come from a mixed distribution of Gaussian, lognormal, and reverse lognormal random variables. We detail the derivation for reverse lognormal errors and extend the results to mixed distributions, where the number of Gaussian, lognormal, and reverse lognormal state variables can change dynamically every analysis time. We test the dynamical mixed Kalman filter robustly on two different systems based on the Lorenz 1963 model, and demonstrate that non‐Gaussian techniques generally improve the analysis skill if the observations are sparse and uncertain, compared with the Gaussian Kalman filter.


INTRODUCTION
The Kalman filter (Jazwinski, 1970;Kalman, 1960) has been very successful in data assimilation applications due to its simplicity.Its development offered an analytical approach to estimate the optimal state of a dynamical system by linearizing the model equations and measurement operators, and paved the way for ensemble data assimilation (Evensen, 1994).The major drawback of this technique is the fact that it assumes Gaussian-distributed background and observational errors.This Gaussian assumption lies at the basis of many data assimilation methods, while it generally does not hold (Bannister et al., 2020;Foster et al., 2006;Hu et al., 2023;Perron & Sura, 2013;Poterjoy, 2022).
As numerical models and observing systems improve, this shortcoming is becoming more obvious, increasing interest in developing non-Gaussian methods in recent years.Although significant advances have been made in this regard, most available non-Gaussian data assimilation techniques still have challenges to overcome before they can be operationally viable.Approaches based on the particle filter (van Leeuwen, 2009) can in principle deal with any prior distribution and likelihood, but generally need large ensemble sizes to avoid ensemble degeneracy (Snyder et al., 2015).Another avenue to take is to replace the Gaussian prior and likelihood distributions explicitly with a different probability density function (Bishop, 2016;Fletcher & Zupanski, 2006a), where one needs a way of deciding which is the optimal distribution to use.Similarly, Gaussian anamorphosis (see, e.g., Amezcua & Van Leeuwen, 2014;Bertino et al., 2003;Bocquet et al., 2010;Simon & Bertino, 2009;Simon & Bertino, 2012) can be used to transform ensembles into an approximately Gaussian form.This procedure can handle non-Gaussian errors, but discards skewness information, which can lead to biases in the analysis (Fletcher & Zupanski, 2007).Finally, hybrid approaches have been proposed that combine a particle filter with standard data assimilation methods (see, e.g., Nerger, 2022).
Due to these challenges, operational numerical weather prediction systems are still based on Gaussian prior distributions (Bannister, 2017).Two main approaches are used: variational methods and ensemble techniques.In the former, the negative log-likelihood of the posterior distribution is minimized to find the optimal analysis state.The latter are based on the ensemble Kalman filter, which allows for a flow dependence of the background-error covariance matrix, but including enough ensemble members is computationally expensive.To improve the forecast skill, most operational numerical weather prediction centres use hybrid data assimilation methods, to try to combine the best of both worlds.
In the case of lognormal errors, variational data assimilation methods are already well established (Fletcher & Zupanski, 2006a, 2006b;Fletcher, 2010Fletcher, , 2022) ) and machine learning has been suggested as a promising way of selecting the underlying error distribution (Goodliff et al., 2020;Goodliff et al., 2022).Recently, Fletcher et al. (2023) showed that it is also possible to adapt the Kalman filter to allow for lognormal errors.This paves the way for hybrid techniques that can assimilate variables that are bounded from below better.
The lognormal distribution is just one of many possibilities for the error distribution of state variables.It is a good approximation for variables that are positively skewed and have a set minimum value, such as water-vapour mixing ratio (Kliewer et al., 2016).However, atmospheric statistics can change over time and space (Foster et al., 2006;Perron & Sura, 2013), and in particular the skewness can be negative.One way to deal with this is to introduce the reverse lognormal distribution, which looks like the mirror image of a lognormal distribution, with a given upper bound and negative skewness.The reverse lognormal distribution has been successfully incorporated into variational data assimilation (Goodliff et al., 2023), allowing for a mixed distribution with Gaussian, lognormal, and reverse lognormal random variables.
In a sense, adding the reverse lognormal distribution to previous mixed data assimilation schemes completes the model, making it possible to account for variables with any skewness, positive or negative.Thus, in an effort to design operationally viable hybrid data assimilation methods, the mixed Kalman filter of Fletcher et al. (2023) should be appended with the reverse lognormal distribution, which is the motivation of this article.
We start by deriving the reverse lognormal Kalman filter in Section 2.1, where all errors are reverse lognormally distributed.Of course, such a filter is only marginally useful, as in any real application different distributions are present at the same time.In Section 2.2, we outline the fully mixed version of the Kalman filter, which incorporates Gaussian, lognormally, and reverse lognormally distributed background and observational errors.Moreover, we remark that the fully mixed Kalman filter is formulated in a dynamical way, where the underlying distribution of each state variable or observation can change every analysis time.The details of the implementation of the new Kalman filter are presented in Section 3, and used in Section 4 in conjunction with two models based on the Lorenz-63 system (Lorenz, 1963) to investigate its capabilities.Finally, we provide conclusions in Section 5.

FORMULATION
The Kalman filter equations can be derived in multiple ways (Fletcher, 2022).Here, we present a formulation based on a least-squares approach, starting from a cost function describing the negative log likelihood of the analysis errors.To show the details of the derivations, before presenting the full dynamical model with mixed Gaussian, lognormal, and reverse lognormal distribution, we derive the reverse lognormal Kalman filter, where all errors follow a reverse lognormal distribution.The extension to the fully mixed error distribution is trivial, and will be summarized in Section 2.2.The derivation in this section is similar to the one presented by Fletcher et al. (2023) in the context of lognormal errors, and the reader is referred to that work if more details are necessary.The main difference is the use of a different probability density function for the prior and likelihood distributions: the reverse lognormal distribution.Its multivariate probability density function is given by } . (1) The reverse lognormal distribution can be thought of as a mirror image of a lognormal distribution, and has a negative skewness.Reverse lognormal variables are semibounded, with a fixed upper bound , in contrast with lognormal variables that have a lower bound.

Reverse lognormal Kalman filter
The starting point for deriving the Kalman filter in a least-squares approach is the cost function that needs to be minimized to find the analysis state.If the background and observational errors follow a reverse lognormal distribution (Fletcher, 2022;Goodliff et al., 2023), then the cost function is given by Here, Note that, similar to the lognormal Kalman filter (Fletcher et al., 2023), we need to use the cost function for which the minimum is the median of the posterior distribution.Although this is not always the best statistic to use (Fletcher et al., 2019), it is valid in the regime where the analysis errors are small.This is a necessary assumption to derive the Kalman filter equations: to find an analytical form of the minimizer of Equation 2, the gradient of the cost function needs to be linearized, as will be described below.Within this approximation (i.e., the analysis errors are small), the use of a median-based cost function is justified.
In contrast to the Gaussian Kalman filter, we do not assume the forecast model to be linear.Instead, we can define the analysis errors e a and forecast errors e f at time with x a the analysis state (the state that minimizes the cost function in Equation 2), x f the forecast state, and x t the true state.The forecast state at time t = t k can be found by letting the forecast model act upon the perturbed analysis state at t = t k , that is, ] . (5) The operator ⊙ denotes the Hadamard (elementwise) product, and M is the nonlinear model bringing the state variables from time t = t k−1 to t = t k .The seemingly complicated form of the perturbed analysis state (the argument of the model M in Equation 5) emerges from the fact that the errors are multiplicative with an upper bound, and can be found by solving Equation 3 for x t .More details about this transformation can be found in Section 3.2.With these definitions, the forecast-error covariance matrix P f,rl can be calculated as where Q is the model-error covariance matrix, disregarding correlations between analysis and model errors.
In contrast to the Gaussian Kalman filter, where the forecast-error covariance matrix is calculated by having the linearized forecast model act directly on the analysis-error covariance matrix, the definition of P f,rl relies on the nonlinear forecast model M, implicit in the calculation of the forecast errors defined in Equation 4.
For non-Gaussian errors, there is no general way of defining a forecast-error covariance matrix equivalent to the Gaussian definition, because the model operator does not commute with the logarithms (Fletcher et al., 2023).This has the added benefit of removing errors introduced by linearizing the model.This also implies that the true state x t is still present in the equations for the forecast error, although it is generally unknown.Therefore, we have to approximate the true state by running the analysis state of the previous assimilation step through the forecast model: Finally, one should be aware that the forecast-error covariance matrix defined in Equation 6is only an approximation to the covariance of the prior, as the true state and exact model errors are not known.
At each assimilation time, the cost function J rl needs to be minimized, or, equivalently, the root of the gradient of the cost function has to be found.Differentiating the cost function of Equation 2 with respect to x, one finds (7) Here, we have introduced the Jacobian of the observation operator arising from the derivatives of the logarithms in the cost function.
To find the best estimate of the analysis state, one is interested in the solution to ∇ Using this Taylor expansion in the gradient of the cost function (Equation 7), one can rewrite ∇ x J rl (x a ) = 0 as To find this form, the small parameter r was recognized in the first term of Equation 7, and the whole expression was multiplied by W T f,rl from the left.Finally, note that r o = ln( o − y) − ln( o − h(x b )) should be of a similar order to r, assuming the observational innovations are small.In order to keep the entire right hand side of Equation 10 linear in r and r o , the prefactor should be expanded as well.In other words, we want to omit all terms of order r 2 , r 2 o , rr o , or higher in the full expression of Equation 10.As the terms between brackets are already of the order r or r o , the prefactor can be expanded up to zeroth order in r: because ), Equation 10 can be written as Performing some additional algebra (including the use of the Sherman-Morrison-Woodby formula; see Fletcher (2022) and Fletcher et al. (2023) for a detailed derivation), the analysis state can be written as (13) such that the gain matrix of the reverse lognormal Kalman filter is given by This result is very similar to the gain matrix in a Gaussian (Kalman, 1960) or lognormal (Fletcher et al., 2023) framework, with the only difference being the scaling of the Jacobian of the observation operator.Lastly, in the Kalman filter scheme, the analysis-error covariance matrix should be updated, in order to estimate the forecast errors in the next assimilation step.The analysis errors are defined in Equation 3, and can be used to form the analysis-error covariance matrix by replacing ln( f − x a ) in favour of the Kalman gain matrix through Equation 13.Using a similar Taylor expansion to Equation 9, ln( o − h(x b )) can be written as a function of the true state x t and the background error defined through ln which is the inverse of the scaled Hessian matrix of the cost function, that is,

Dynamical mixed Kalman filter
Using the results presented in the last section and combining them with the existing literature on the Gaussian and lognormal Kalman filters (Fletcher et al., 2023;Jazwinski, 1970;Kalman, 1960), it is straightforward to extend the equations to mixed distributions, where now the error of a specific state variable is a random variate from a Gaussian, lognormal, or reverse lognormal distribution.The starting point for the dynamical mixed Kalman filter is again the cost function, now defined as The capitalized letters represent the mixed variables x (g)   ln x (l)   ln( f − x (r) ) ln y (l)   ln( o − y (r) ) where the superscripts (g), (l), and (r) depict all Gaussian, lognormal, and reverse lognormal variables respectively, which were sorted without loss of generality.The total number of state variables is still N, containing n g Gaussian, n l lognormal, and n r reverse lognormal components.
We allow the underlying distributions to change over time, such that n g = n g (t), n l = n l (t), and n r = n r (t), while always keeping N = n g + n l + n r fixed.Similarly, for the observed variables we denote , where the amount of observations within a certain distribution can change over time.
The derivation of the mixed Kalman filter equations is very similar to the reverse lognormal filter described above.By solving for the root of the gradient of the cost function ∇ x J(x a ) = 0 and performing an appropriate Taylor expansion to linearize the equation, one finds the mixed Kalman gain matrix as from which the analysis state can be calculated: Here, the scaled Jacobian of the observation operator is defined as ] and where 1 n g denotes a vector of size n g with ones for each element.The analysis-error covariance matrix is updated as The formulation of the mixed Kalman filter is again analogous to the Gaussian, lognormal, or reverse lognormal filters.The main difference is that it is expressed in terms of the mixed variables, and thus also the errors are in a mixed representation.This needs to be remembered whenever the forecast model is used.Note that the mixed Kalman filter described here is a generalization of Kalman filters with a certain assumption for prior distributions.For example, taking N = n g and N o = n o g (such that n l = n r = n o l = n o r = 0), a Gaussian Kalman filter is recovered.However, it should be kept in mind that the fully nonlinear model is used, and thus this situation coincides with a nonlinear Gaussian Kalman filter.The details of the implementation of the mixed Kalman filter are reported in the next section.

IMPLEMENTATION
The actual use of the Kalman filter in a data assimilation setting can be partitioned into three main steps: forecasting, transforming, and updating.Here, we provide details of these different steps and how to implement the dynamical mixed Kalman filter.

Forecast step
At a given time t = t k in a data assimilation cycle, one wants to find the optimal analysis state x k a , based on observations y k and the knowledge of the previous states at t < t k .To use the latter information, it is necessary to propagate the previous analysis state x k−1 a through the forecast model.This defines the background state x k b = M(x k−1 a ).As a reminder, in contrast to the original formulation of the Kalman filter, we use the nonlinear model M, not the linearized version.
In order to calculate the forecast-error covariance matrix, one also needs the forecast errors.We define the forecast state as where we introduced the short-hand notation x (l) ⊙ e (l)    f − ( f − x (r) ) ⊙ ( f − e (r) ) This operation result from transforming the sum of two mixed variables X + E back to the original variable representation.Details of this transformation and a description of the mixed variables can be found in Section 3.2.The forecast-error covariance matrix P k f is defined through the mixed forecast errors where the capital letters X f , X b , and E f represent the mixed forms of x f , x b , and e f , and the overline denotes the expected value.Note that the forecast-error covariance matrix as defined in Equation 26is only an approximation, as it actually describes the errors of the forecast state with respect to the true state.As the true state is unknown in general, the approximation x k t ≈ x k b is made when calculating P k f .In each assimilation window, both the previous analysis state x k−1 a and the perturbed state x k−1 a ⋆ e k−1 a have to be forecast, using the nonlinear model M. If k = 1, namely at the beginning of the data assimilation cycle, one does not have access to the previous analysis state and error.One usually starts with an initial guess for the previous analysis state x 0 a .Multiple choices can be made, however, for the initial perturbations e 0 a .One possibility is to start with random noise, where the errors are drawn from an unbiased distribution.Alternatively, one can base the initial perturbations on a known climatology, such as in the National Meteorological Center (NMC) method (Parrish & Derber, 1992), where forecasts from different initial times are compared with each other to estimate the errors.

Transform step
From the forecast step, we have access to the background state x k b and the forecast state x k f .The mixed Kalman filter, however, is defined in terms of the mixed random variables.Some sort of decision function is necessary to determine which components need to be transformed.How to define this decision function is still an open question for most problems, although advances have been made with machine-learning techniques (Goodliff et al., 2020(Goodliff et al., , 2022(Goodliff et al., , 2023)).
Assuming the decision function is known, the transformation X ≡ T(x) itself is straightforward: the Gaussian components do not change X (g) = x (g) , while the lognormal variables are calculated as X (l) = ln x (l) , and the reverse lognormal variables as X (r) = ln( f − x (r) ).Although the state variables in Equation 18 are sorted with respect to their distribution for notational simplicity, this is not a necessary condition of the mixed variables and is not required to use the mixed Kalman filter.
A similar transformation also needs to be done in observation space, to define the mixed random variables Y ≡ T o (y) given in Equation 19.Note that the observation operator h acts on the original (lower-case) variables directly, and thus the transformation needs to be done after applying h, that is, H(x) = T o [h(x)].In general, the decision function in observation space can be different from the decision function in forecast space, that is, T ≠ T o .
As we will specify in the next section, the output of the update step from the Kalman equation is always in the mixed representation, namely one obtains X a = T(x a ) and E a = T(e a ).In the next forecast step, one instead needs the state variables themselves.They can easily be computed through the inverse variable transform which can be used to find x a = T −1 (X a ) and e a = T −1 (E a ).
The operation x ⋆ e, introduced in Equation 25, is characterized through this inverse variable transform.Specifically, it is defined as x ⋆ e = T −1 [T(x) + T(e)], and thus represents a sum of mixed variables.

Update step
Finally, the mixed distributed random variables can be used to calculate the Kalman gain matrix defined in Equation 20, and the analysis state x k a can be determined through Equation 21.Moreover, the gain matrix can be used to update the analysis-error covariance matrix P k a , according to Equation 23.
In the original (Gaussian) formulation of the Kalman filter, the analysis-error covariance matrix is propagated directly to the next time step through the linearized forecast model, to define the forecast-error covariance matrix P k+1 f .However, because the model operator M and the variable transform T do not commute, this is not possible here, and instead we define the forecast-error covariance matrix from the forecast errors e f directly, as described in Section 3.1.Thus, instead of calculating P a , we calculate the analysis errors directly as with E o the observation error.Note that the analysis-error covariance matrix (see Equation 23) can be calculated as P a = E a E T a , assuming the forecast and observation errors are uncorrelated.After performing the inverse variable transform, described in Section 3.2, the analysis state and errors can be used to compute the forecast state for the next time step, using Equation 24, and the cycle repeats.

RESULTS
To test the implementation of the dynamical mixed Kalman filter, we perform some experiments with a toy model.We choose the Lorenz-63 model (Lorenz, 1963), which has been shown to contain lognormal and reverse lognormal signals (Goodliff et al., 2020;2022;2023).The governing differential equations are given by with x = (x, y, z) the state variables and (, , ) = (10, 28, 8∕3) the parameters of the model, which have been chosen to reflect chaotic behaviour.According to Goodliff et al. (2020), the x and y components of the Lorenz-63 system satisfy the Gaussian assumption, while z changes distributions over time, switching between Gaussian, lognormal, and reverse lognormal errors, making this an ideal system to test the dynamical mixed Kalman filter.

Decision function
In order to assess the dynamical features of our mixed Kalman filter, a decision function is necessary to select the correct distribution to use for the z component at every assimilation step.We follow a technique similar to the one developed by Goodliff et al. (2022), based on machine-learning methods.First, a long control run for t ∈ [0, 1000] is generated from the initial conditions x 0 = (−3, −3, 20) with time step dt = 0.01.The first 100 time steps are removed to account for spinup.At a given time t k , the skewness of the z variable is computed by performing a skewness test on the sample {z k−w , … , z k , … z k+w }, with w the sample window radius determining the number of values used for the skewness test.In the results shown in this section, we use w = 14.
The skewness test is a two-sided hypothesis test, with the null hypothesis being that the skewness of the sample is zero (the skewness of a Gaussian distribution).It outputs a z-score expressing the deviation of the skewness of the sample from the skewness of a Gaussian distribution (no skew).We use a significance level of  ≃ 0.3, rejecting the null hypothesis if the z-score is outside the domain [−1, 1].If the z-score is larger than 1, we assume the errors of the z variable of the Lorenz model to be lognormally distributed; when it is smaller than −1, we assume a reverse lognormal distribution; if the null hypothesis is accepted, Gaussian errors are assumed.The significance level is chosen in order to produce enough cases where lognormal and reverse lognormal errors are predicted, and is reproduced from Goodliff et al. (2022).For example, for  = 0.3, the z component behaves lognormally about 40% of the time and reverse lognormally about 15% of the time.This threshold could be optimized for better performance of the machine-learning model, but this goes beyond the scope of the current work.
Once the data are divided into these three bins (Gaussian, lognormal, and reverse lognormal, based on the skewness test), we train a k-nearest-neighbour algorithm to detect the underlying distribution without the need for the skewness test and the knowledge of z at multiple time steps.This is a classification algorithm that makes a decision boundary from a majority vote of the k nearest neighbours from the training data.In particular, we use the k-nearest-neighbour algorithm from scikit-learn (Pedregosa et al., 2011), with k = 15 and inverse-distance weights.The training input data are the randomized and standardized time series of the x and y components of the Lorenz model, and the training output is the underlying distribution of the z variable, decided by the skewness test.We keep 30% of the original control run for testing, to find an accuracy of 98.7% in predicting the correct distribution.
Once the machine-learning algorithm is trained, it can be used to predict the underlying error distribution of the z variable, based on the input of x and y at a given time, and can thus be used as the decision function.The advantage of the k-nearest-neighbour algorithm is that it is incredibly fast, both in training and in predicting, such that the extra computational time for deciding the z-distribution is minimal.

Twin experiments
Now that we have all the ingredients, it is possible to evaluate the accuracy of the dynamical mixed Kalman filter.
To do so, we perform twin experiments with the Lorenz-63 model repeatedly to estimate the robustness of the proposed filter.For each assimilation cycle, the following steps are executed.
(i) Generate the true state x t by integrating the Lorenz-63 model from t = 0 to t = t max from initial condition x 0 .(ii) Generate artificial observations by sampling from a distribution with mode x t and standard deviation  x .(iii) Apply the Kalman filter to the observations from some initial state x 0,DA .and , the lognormal (b) and reverse lognormal (c) filters have higher RMSE, mainly because (reverse) lognormal signals in the background do not develop yet on these small timescales and a Gaussian approximation describes the evolution well.Moreover, for small observation errors the Gaussian Kalman filter performs very well, and not much improvement is possible.In isolated cases [see, for example, Figure 1b for  o = 160 dt and  2 = 3.5], there is an instance where the filter failed to converge, resulting in a high RMSE, as also reported in Fletcher et al. (2023).
Using the machine-learning model to switch between distributions (d)-(f) solves these problems by only using (reverse) lognormal updates when necessary.In all but one case, the dynamical Kalman filters (d)-(f) outperform the Gaussian filter, whenever the variation is significant.The dynamical filter switching between the three distributions (f) generally outperforms the filters where only two distributions are used, (d) and (e).It should be noted that the results are dependent on the choice of model-error covariance matrix, which we have taken to be the same for all filters.For the mixed filters, the model error dominates for small , resulting in large perturbations in the background state, which may affect the performance at small .
To show explicitly the improvement of using lognormal or reverse lognormal updates, Figure 2 presents an example of a specific twin experiment.There, we compare the fully Gaussian Kalman filter (a) with the fully dynamical Kalman filter (f).As before, the x and y components are always assumed to have Gaussian priors and likelihoods, while the z component switches between the three distributions considered.The observations are generated artificially from the true state and change distributions according to the machine-learning model applied on that true state.This can be seen in the bottom panel of Figure 2, where the filled circles show the distribution from which the observations are sampled.The circles are surrounded by an open circle, which shows the predicted distribution at the assimilation time, based on the observation itself.As can be seen, in three cases (t = 2, 2.4, and 7.2), the machine-learning model incorrectly predicts a Gaussian error, when the observation was actually generated from a lognormal or reverse lognormal distribution.
Still, the dynamical Kalman filter (red dashed lines) outperforms the Gaussian filter (blue dotted lines) in this specific scenario.In particular, after t = 3.2, both backgrounds start to diverge from the true solution.At t = 3.6, a Filled circles show the artificial observations, sampled from a distribution selected by the machine-learning model.Open circles show the predicted distribution at the analysis time, which does not always match the correct underlying error distribution.lognormal observation is available, which is predicted correctly by the machine-learning model and brings the background back towards the true state.However, the Gaussian filter does not use the correct underlying statistics, such that it is unable to bring the background back towards the truth.The same phenomenon can be seen at later times, where the dynamical filter is able to assimilate lognormal errors correctly, while the Gaussian filter diverges from the true state.Only when enough Gaussian observations are available does the Gaussian filter converge back to the true solution.
To elucidate the improvements of the non-Gaussian filters further, we also perform experiments where only the x and y components of the Lorenz-63 model are observed, that is, h(x) = (x, y).To also test the robustness with respect to specific conditions of the machine-learning model, we use a significance level of  = 0.2 for the skewness test (rejecting the null hypothesis if the z score is outside the domain [−1.28, 1.28]) and take a sample window size of w = 12.The rest of the experimental details are identical to the experiments outlined above.
The results are shown in Figure 3 for different values of  o and .As can be seen, the dynamical Kalman filter (solid brown bars) always outperforms the Gaussian Kalman filter (hatched red bars), with particularly significant improvement at larger observation variances.

Coupled Lorenz system
Because the Lorenz-63 system is very simple, and only has a single variable that seems to follow non-Gaussian statistics, it is not possible to use it to assess the full strength of the mixed Kalman filter, where all three distributions are present at a single assimilation time.To overcome this, we build a higher-dimensional system by creating a chain of Lorenz-63 systems, with the possibility of coupling between the different attractors, similar to Akhmet and Fen (2015).These coupled differential equations then have the form with  ∈ {1, … , n p } an index selecting the attractor and n p the number of attractors.Moreover, we assume the total system to be cyclic, such that x n p +1 = x 1 , y n p +1 = y 1 , and z n p +1 = z 1 .The state vector x has N = 3n p elements, three variables for each attractor, and Equation 30 expresses a system of N coupled differential equations.We again choose (, , ) = (10, 28, 8∕3), and perform experiments for n p = 10 and weak coupling c ≡ c x = c y = c z = 0.1.The experiments performed with the coupled Lorenz system are very similar to the ones for the three-variable Lorenz system described above.Every separate Lorenz attractor in the chain starts from different initial conditions, chosen to be x ,0 = −5 + G(0, 3), y ,0 = −6 + G(0, 3), and z ,0 = 22 + G(0, 3).The data assimilation run is initiated with x 0,DA = x 0 + G(0, 1).
A decision function is again defined through a machine-learning model based on the k-nearestneighbour classifier.Because the different attractors have the same general behaviour, the machine-learning model can be trained on only one location  = 1 and  used for all locations.Thus, the machine-learning model is trained on x 1 and y 1 as input and the skewness of z 1 as output, with the same parameters as before, i.e.We again compute the average root-mean-square error RMSE for N c = 50 trials and N a = 250 assimilation windows for different observation errors  (assuming all state variables have the same standard deviation) and observation periods  o .We always assume x  and y  to be Gaussian, and test three different scenarios for the distributions of z  : a. Gaussian {z 1 , … , z 10 } ∀t; b. lognormal {z 1 , … , z 5 } and reverse lognormal {z 6 , … , z 10 } ∀t; c.Gaussian -lognormal -reverse lognormal decided by machine-learning model.
The all-Gaussian method is used as a baseline to compare the other methods with.In scenario (b), we seemingly arbitrarily chose the first five attractors to behave lognormally and the last five to behave reverse lognormally.We have tested different variations of selecting five lognormal and five reverse lognormal z  components, with similar results, so the conclusions are robust to this choice.In scenario (c), only information of x  and y  is needed to predict the distribution of z  , and thus the attractors are assumed to be independent when making this decision.
The results for the coupled Lorenz-63 model are summarized in Figure 4, where in Figure 4a the root-mean-square error of the Gaussian filter is shown; this is used as a comparison for the mixed filters in Figure 4b,c.Again, a two-sided F-test is used to investigate the significance of the variations between the Gaussian and mixed filters, with a significance level of  = 10 −4 .The mixed Kalman filters only perform worse than the Gaussian filter for small  o and .The main reason for the decreased performance of the dynamical filter can be attributed to the machine-learning model, as it has not been optimized for the coupled system and may therefore predict incorrect distributions.Moreover, the model error remains fixed for the entire data assimilation run and dominates the errors at small values of .For larger  o and , the mixed Kalman filters always perform better than the Gaussian one.

CONCLUSION
We have derived a dynamical mixed Kalman filter that accounts for background and observation errors that originated from a mixed distribution of Gaussian, lognormal, and reverse lognormal random variables.This generalizes the earlier work of (Fletcher et al., 2023), where the lognormal Kalman filter was introduced, now allowing for variables with negative skewness.An important aspect of the mixed Kalman filter is that it makes use of the nonlinear model, therefore eliminating any errors introduced by linearizing the model.
To test the performance of the dynamical mixed Kalman filter, we performed twin experiments with two different toy systems based on the Lorenz-63 model.If the time between observations is and their are large, the dynamical mixed Kalman filter significantly improves the analysis skill over an all-Gaussian approach in the parameter space that was examined.
Our work makes an important step towards creating operationally viable data assimilation systems with skewed state variables that have upper and/or lower bounds.Specifically, the mixed Kalman filter equations can be used to develop non-Gaussian ensemble methods, for example by extending the maximum-likelihood ensemble filter (Zupanski, 2005) to accommodate non-Gaussian errors.In conjunction with similar non-Gaussian variational methods that use a mixed distribution of Gaussian, lognormal, and reverse lognormal random variables (Goodliff et al., 2023), this moreover creates the possibility to design hybrid non-Gaussian data assimilation techniques in the future.
)with K rl , the Kalman gain matrix, of size N × N o .To do this, one needs a way of writing ln( o − h(x a )) in terms of ln( f − x a ) and the background state, by making use of the geometric tangent linear model introduced byFletcher and Jones (2014).Specifically, introducing the small parameter r = ln( f − x a ) − ln( f − x b ), one can perform a Taylor expansion for small r on the function f In particular, Equation 9 can be used after replacing r → − ln( f − e b ) and x a → x t to find the correct Taylor expansion.With this, the analysis error can be written as ln( f − e a ) = (I − K rl Hrl ) ln( f − e b ) + K rl ln( o − e o ), (15) with ln( o − e o ) = ln( o − y) − ln( o − h(x t )).Assuming the background and observational errors are uncorrelated, the analysis-error covariance matrix becomes Root-mean-square error (RMSE) of different Kalman filters for different values of the observation error  and the observation period  o for the Lorenz-63 model.In (a), the RMSE for the Gaussian Kalman filter is shown.The other panels show the RMSE relative to the Gaussian RMSE for different distributions of the z component: (b) lognormal ∀t; (c) reverse lognormal ∀t; (d) Gaussian -lognormal decided by machine-learning model; (e) Gaussian -reverse lognormal decided by machine-learning model; (f) Gaussian -lognormal -reverse lognormal decided by machine-learning model.The circles and crosses show the parameters that perform significantly better or worse than the Gaussian filter respectively, based on a two-sided F-test with  = 10 −4 .
Example of a twin experiment with the dynamical Kalman filter with the Lorenz-63 model for  2 = 3 and  o = 40 dt.Grey lines show the true solution, blue dotted lines the background from the Gaussian Kalman filter, and red dashed lines the background from the fully mixed Kalman filter, with the machine-learning model selecting the underlying distribution.
Root-mean-square errors of different Kalman filters for different values of the observation variance  2 and the observation period  o for the Lorenz-63 model, observing the x and y components only.The different bars depict different choices for the prior distribution of the z component: (a) Gaussian ∀t; (b) lognormal ∀t; (c) reverse lognormal ∀t; (d) Gaussian -lognormal decided by machine-learning model; (e) Gaussian -reverse lognormal decided by machine-learning model; (f) Gaussian -lognormal -reverse lognormal decided by machine-learning model.
Root-mean-square error of different Kalman filters for different values of the observation error  and the observation period  o for the coupled Lorenz-63 model with n p = 10 and c = 0.1.In (a), the root-mean-square error for the Gaussian Kalman filter is shown.The other panels show the root-mean-square error relative to the Gaussian root-mean-square error for different distributions of the z  components: (b) lognormal {z 1 , … , z 5 } and reverse lognormal {z 6 , … , z 10 } ∀t; (c) Gaussian -lognormal -reverse lognormal decided by machine-learning model.The circles and crosses show the parameters that perform significantly better or worse than the Gaussian filter, respectively, based on a two-sided F-test with  = 10 −4 .