State, global, and local parameter estimation using local ensemble Kalman filters: Applications to online machine learning of chaotic dynamics

In a recent methodological article, we showed how to learn chaotic dynamics along with the state trajectory from sequentially acquired observations, using local ensemble Kalman filters. Here, we investigate more systematically the possibility of using a local ensemble Kalman filter with either covariance localisation or local domains, in order to retrieve the state and a mix of key global and local parameters. Global parameters are meant to represent the surrogate dynamical core, for instance through a neural network, which is reminiscent of data‐driven machine learning of dynamics, while the local parameters typically stand for the forcings of the model. Aiming at joint state and parameter estimation, a family of algorithms for covariance and local domain localisation is proposed. In particular, we show how to update global parameters rigorously using a local‐domain ensemble Kalman filter (EnKF) such as the local ensemble transform Kalman filter (LETKF), an inherently local method. The approach is tested with success on the 40‐variable Lorenz model using several of the local EnKF flavors. A two‐dimensional illustration based on a multilayer Lorenz model is finally provided. It uses radiance‐like nonlocal observations. It features both local domains and covariance localisation, in order to learn the chaotic dynamics and the local forcings. This article addresses more generally the key question of online estimation of both global and local model parameters.


Introduction 1.Parameter estimation and data-driven techniques for the geosciences
The recent upheaval generated by machine learning (ML) and in particular deep learning has opened the way to a wealth of data-driven techniques, where not only the state of an observed dynamical system is estimated but also its key dynamical constitutive parameters, if not the full model.There are by the beginning of 2022, dozens of ML papers in the literature dealing with the problem of estimating the dynamics of a system from observations, even when only focusing on low-order models used in the field of geoscience.The problem can be addressed by typical ML techniques, such as the projection on a regressor frame or basis, random forests, analogs, diffusion maps, reservoir computing, long short-term memory and other neural network approaches (e.g., Brunton et al., 2016;Lguensat et al., 2017;Harlim, 2018;Pathak et al., 2018;Dueben and Bauer, 2018;Fablet et al., 2018;Scher and Messori, 2019;Weyn et al., 2019;Arcomano et al., 2020;Nadiga, 2021).It can also be solved using a conjunction of ML and data assimilation (DA) techniques to exploit noisy and incomplete observations such as those met in realistic geoscience systems (Bocquet et al., 2019;Brajard et al., 2020;Bocquet et al., 2020;Arcucci et al., 2021;Gottwald and Reich, 2021).In the case of high-dimensional systems, the relative lack of information can be compensated by additionally using past trajectories or information on the system such as an approximate model derived from physical laws (Wikner et al., 2020;Brajard et al., 2021;Farchi et al., 2021b).
However, this should not divert us from the even much more abundant contributions focused on the problem of parameter estimation in meteorology, climate science, oceanography, atmospheric chemistry, glaciology, hydrology, solid earth physics, space weather, seismology, etc., using more traditional DA and inverse problem techniques.Compared to the ML view, this part of the geoscience and applied mathematics literature relies much more on a trusted numerical physical model of the system under scrutiny in order to make inferences.
Nonetheless, the ML scientific tsunami has blurred the frontiers between ML and DA approaches, for the better.Model error estimation, a classical topic of DA, where the main model is to be corrected through statistical procedures or via parameter estimation techniques, can now be addressed by the addition of an ML based correction with many parameters that need to be learned.Hence, the coming of ML has pushed the limits of what was traditionally asked of DA, and in particular of DA focused on parameter estimation.This is the reason why this paper is targeted at filling in some of the current theoretical and algorithmic gaps of DA methods meant for joint state and parameter estimation.The typical applications we have in mind are at that ML/DA frontier where a part or the whole model needs to be learned.Moreover, following Bocquet et al. (2021), we aim at addressing the difficult objective of learning state and parameters on the fly, i.e. online as observations are acquired, using sequential DA techniques such as the ensemble Kalman filter (EnKF) (Evensen, 2009) as an alternative to the variational methods which are more common for parameter and ML problems (Farchi et al., 2021a).

Local and global model parameters
In this paper, we assume that the model parameters are not directly observed, which is a common but implicit assumption in the geosciences.Their inference necessarily indirectly stems from the observation of the state variables.
In an ensemble-based parameter estimation problem, a popular and universal approach consists in augmenting the state vectors to incorporate the parameters (Jazwinski, 1970).We have adopted it in Bocquet et al. (2021) and we will keep doing so here.It was shown that this method also seamlessly blends well with ensemble-variational DA approaches (Bocquet and Sakov, 2013;Bocquet et al., 2021).One can distinguish between two types of parameters, whose nature have a significant impact on the EnKF-based approaches.
First, one can consider global parameters, that do not depend on space.They are typically parameters of the intrinsic physics of the geophysical fluid, of its constituents, or of its dynamics.However, they are very different from the intrinsically local state variable, leading to substantial theoretical complications, especially when local EnKFs (LEnKFs) based on domain localisation such as the local ensemble transform Kalman filer (LETKF) are used.This point has been addressed in Bocquet et al. (2021) to a large extent, although additional numerical tests and theoretical improvement will be proposed here.
Second, local parameters are in a sense simpler to estimate since they are of the same nature as the state variables.However, their number can increase dramatically depending on the number of domains, and yield significantly larger augmented control vectors.The topic was not addressed in Bocquet et al. (2021) but in earlier contributions to the literature as will be discussed in the following section.
In this paper, we will consider both global and local parameters, possibly a mix of them, and develop new EnKFbased algorithms, accounting for the need of localisation in high-dimensional systems.Local parameters could typically represent forcings (radiative forcing, species emissions, etc.), local physical parameters (friction or deposition coefficients) or a Coriolis term while global parameters would represent the parametrised dynamics and micro-physics.
Moreover, in this paper, the parameters are assumed not to depend on time.This could be induced by an autonomous system, or it could be due to a known and explicit, parametrised dependence on time of, e.g., the forcings, which would be themselves tuned by static parameters.

Parameter estimation techniques in the data assimilation geoscience literature
Although the literature on parameter estimation based on the EnKF applied to geosciences in high dimensions is vast, the set of available techniques is rather limited.To our knowledge, the state augmentation principle is always used.Ruiz et al. (2013) have written a pedagogical review on parameter estimation with the EnKF, which explains the mechanisms at play.Significant issues with the algorithms arise when local EnKFs are considered.In principle, LEnKFs with covariance localisation (CL) handle global parameters well.However, the extension of the localisation operator to global parameters is not natural, while the addition of local parameters could have an excessive numerical cost.By contrast, LEnKFs with domain localisation (DL) handle local parameters very well but fail at rigorously estimating global parameters.
The latter issue, of considerable importance, has been approximately addressed.In Aksoy et al. (2006); Fertig et al. (2009); Hu et al. (2010), the global parameters are made local in the DL update step and their local approximations are later averaged in space to form new global parameters (an ad hoc procedure) in order to propagate the ensemble using these updated global parameters.
The former issue where global parameters are estimated with CL LEnKFs, and which requires a definition of the localisation matrix in parameter space as well as the cross-correlations, has been studied by Koyama and Watanabe (2010); Ruckstuhl and Janjić (2018).The authors actually proposed a uniform localisation whenever global parameters are concerned.The localisation matrix associated with the global parameters-state cross-correlation matrix could have its entries set to 1 (absence of localisation for the global parameters) or to a specific tapering scalar coefficient, which would additionally ensure the positive definiteness of the localisation matrix (Ruckstuhl and Janjić, 2018).A generalisation will be proposed in appendix A.
In Bocquet et al. (2021), following these first papers, several solutions have been proposed and tested for the CL LEnKF family.Moreover, theoretical solutions were proposed for the DL LEnKF family beyond the approximate solution of Aksoy et al. (2006), but with no numerical tests.These new types of EnKF were termed EnKFs-ML since they were meant to estimate not only the state but also the entire dynamics (through their parameters).table 1 summarises the adequacy and inadequacy between EnKF families and local/global parameters.

Outline
In section 2, we will recall, improve and propose parameter estimation techniques, in order to fill the gaps of the geophysical DA parameter estimation literature.In section 3, the new algorithms will then be evaluated on the Lorenz-96 model with inhomogeneous local forcings.In section 4, these algorithms (and combinations thereof) will be tested on a 2D (horizontal and vertical) complex case where radiances are assimilated column-wise, which is reminiscent of a realistic meteorological DA setup.In these experiments, part or the complete model will be learned alongside with the state variables, which represent challenging parameter estimation problems.

Algorithms
Following Bocquet et al. (2021), the algorithms derived and tested in this article are based on the augmented EnKF.The main idea is to extend the state vector x ∈ R Nx to z ∈ R Nz containing the state variables and all model parameters.The strength of this approach is that correlations between state variables and parameters will implicitly develop during the forecast steps.Hence the parameters get corrected during the analysis steps even though they are not observed.During the forecast steps, the state variables are updated using the parametrised model, while the parameters follow persistence, i.e. are not updated.
Without localisation, the implementation of the ML-counterpart of an EnKF algorithm, the EnKF-ML, is very similar to that of this original algorithm, provided that the observation operator H has been adjusted for z instead of x.Hence, in order to avoid divergence, the ensemble size must be strictly larger than the number of neutral and unstable modes of the total (state and parameter) dynamics, equal to the number of neutral and unstable modes of the state dynamics plus the number of influential and independent parameters, as explained by Bocquet et al. (2021).Indeed, the parameter dynamics is entirely neutral since parameters are not updated during the forecast steps (Bocquet et al., 2021).
However adding localisation to an EnKF-ML algorithm is not obvious because by definition global parameters cannot be localised.Exploiting the fact that parameters are not observed, we have shown that the EnKF-ML analysis can be written as a two-step process (Bocquet et al., 2021): (i) update the state using the observations and (ii) compute the parameter update from the state update using a linear regression based on the ensemble.
More generally, the posterior probability density function (pdf) p(z|y), which synthesises the analysis problem, can be written where it has been assumed that the parameter vector p is independent of the observation vector y conditional on x.
Hence, the analysis can be first carried out on x by considering the marginal problem on x, integrating out eq. ( 1) over p, and later solving the problem on p once it is solved on x.A consequence of this decomposition is that localisation can be enabled as usual for the state update and disabled for the parameter update.Nevertheless, such an update scheme may be sub-optimal if some of the parameters are local.
In the following sections, we extend the local EnKF-ML algorithms introduced in Bocquet et al. (2021) and apply localisation to the parameter update when possible.The resulting algorithms are called EnKF-HML (for hybrid ML) to emphasise the fact that the ML part is partly localised and partly non-localised.Both CL and DL are presented using the example of the ensemble square root Kalman filter (EnSRF) in the first case and of the ensemble transform Kalman filter (ETKF) in the second case (Evensen, 2009).

Partitioning of the augmented state
We assume that the augmented state z ∈ R Nz is organised as follows: where x ∈ R Nx is the state of the dynamical system, p ∈ R Np is the vector of global model parameters, and q ∈ R Nq is the vector of local model parameters.The ensemble of the filter is a collection of N e augmented states {z i , i = 1, . . ., N e }.It is organised column-wise into the augmented ensemble matrix E ∈ R Nz×Ne .The augmented state mean z ∈ R Nz and the augmented state perturbation matrix Z ∈ R Nz×Ne are defined by where 1 ∈ R Ne is the vector full of ones.
Following eq. ( 2), E, z and Z can be split according to the state (x), global parameter (p), and local parameter (q) subspaces into For these quantities, an "f" superscript is used to refer to the forecast (or prior) value and an "a" superscript is used to refer to the analysis (or posterior) value.Using the same rationale, any matrix A ∈ R Nz×Nz can also be split into In particular, this is the case of the prior error covariance matrix B and of the localisation matrix ρ for the EnSRF algorithm.
Furthermore, since the model parameters are not observed, the observation equation can be written where H is the augmented observation operator and H x is the usual observation operator (which applies to state only).
The tangent linear operators of the maps x → H x (x) and z → H (z) are written H x and H and they are related by 2.1.2Matrix square root Both the EnSRF and the ETKF are deterministic implementations of the EnKF which rely on a matrix square root.
Several definitions of the matrix square root are possible, some of them being non equivalent.In this paper, we use the following definition, chosen, e.g., by Bocquet and Farchi (2019); Farchi and Bocquet (2019).
Let A be a diagonalizable real matrix with non-negative eigenvalues, written A = GDG −1 , where G is an invertible matrix and D a diagonal matrix with non-negative entries (the eigenvalues of A).The square root of A, written A 1/2 , is defined as where D 1/2 is the diagonal matrix containing the square root of the entries of D, i.e. the square root of the eigenvalues of A.

2.2
The ensemble square root Kalman filter

Generic (local) EnSRF analysis
The generic 1 EnSRF analysis is given by the following set of equations: Equation ( 9a) is known as the mean update and eq.(9b) as the perturbation update.In these equations, B ∈ R Nz×Nz is the prior error covariance matrix, equal in this case to the forecast sample covariance matrix: Although I + BH R −1 H may not be symmetric, it is diagonalisable with non-negative eigenvalues (see, for instance, Farchi and Bocquet, 2019), which makes the matrix square root in eq.(9b) well defined.
Following Bocquet and Farchi (2019), it can be shown using the matrix shift lemma (see, for instance, Asch et al., 2016, section 6.4.4) that the perturbation update eq. ( 9b) is equivalent to where the linear algebra (matrix square root and inverse) is expressed in the observation space, which is usually much smaller than the augmented state space (N y N z ), and where the secant method is used to compute H Z f which stands for (12) We will use this notation throughout the entire manuscript, except in the formal algorithms where its formula is explicit.
Furthermore, the EnSRF analysis eq. ( 9) can be written using the following incremental formulation in observation space: State, global and local parameter estimation, Q. Malartic, A. Farchi, and M. Bocquet preprint -March 28, 2022 This update can be further simplified if R −1/2 is easy to compute, for instance if R is diagonal as is often assumed in the geosciences.In this case, let us introduce the ancillary matrix T y ∈ R Ny×Ny defined as Introducing T y in eq. ( 13) yields where the linear algebra operators now apply to symmetric matrices only.
Finally, CL can be included in the analysis by replacing eq. ( 10) with where ρ ∈ R Nz×Nz is the localisation matrix, a correlation matrix which depends on the geometry 2 of all variables, and • is the Schur/Hadamard product.The resulting analysis is called the local EnSRF (LEnSRF).
In the following sections, we show how the EnSRF analysis (both global and local) can be efficiently implemented when the augmented state contains model parameters.In section 2.2.2, we only consider global model parameters, repeating Bocquet et al. (2021), and in section 2.2.3, we consider the general case with both global and local model parameters.

The (local) EnSRF-ML analysis
Let us start with global parameters only (i.e.N q = 0).Following Bocquet et al. (2021), it is possible to separate state and parameter update in the analysis to make it more efficient.
To do this, we split B according to the state and parameter subspaces as in eq. ( 5).The ancillary matrix T y is equal to Let us now introduce the additional ancillary variables u x ∈ R Nx and U x ∈ R Nx×Ne defined as With these definitions, the mean update eq.(15a) becomes and the perturbation update eq.(15b) becomes Assuming that B xx is invertible, the parameter update formulae eqs.(19b) and (20b), can be written which is the original parameter update derived by Bocquet et al. (2021).Note that B −1 xx does not need to be computed since it applies to N e N x vectors and only requires the solution of a linear system of equations with N e N x unknowns.
At this point, it is important to realise that the state-wise update eqs.(19a) and (20a) is the usual EnSRF analysis while the parameter update eq. ( 21) is a regression of the state update into the parameter subspace.Such regression is very general and can be used regardless of the state update method.In our specific case however, the introduction of the ancillary variables u x and U x permits us to bypass the matrix multiplication by B −1 xx .In a way, one can think of u x and preprint -March 28, 2022 U x as the uncorrelated increments.Moreover, note that the entire analysis does not depend on B pp , neither explicitly nor implicitly.
When using CL, as for B, we split ρ in such a way that The localisation matrix for the state subspace ρ xx is the usual localisation matrix.It almost certainly makes B xx positive definite, and in particular invertible.The localisation matrix for the state-parameter cross subspace ρ px has to be row-wise uniform because the parameters are global.Hence it is of the form where 1 x ∈ R Nx is the vector full of ones and where ζ p ∈ R Np is a vector of algorithmic parameters, one for each model parameter, which is more general than what was suggested in Bocquet et al. (2021).Nonetheless, for the sake of simplicity, we choose in the following this vector to be uniform as in Bocquet et al. (2021), such that ρ px = ζ p Π px , where ζ p is a scalar algorithmic parameter (please see appendix A for the multivariate generalisation).Looking at eqs. ( 19b) and ( 20b), we see that ζ p tapers the parameter update in a linear way: using ζ p = 1 does not alter the parameter update while using ζ p = 0 entirely disables the parameter update.For this reason, ζ p is called the tapering parameter.For simplicity and to emphasise the role of the tapering, we assume that ρ px = Π px and we introduce ζ p directly into the parameter update, which is now written Finally, since B pp is not used during the analysis, ρ pp does not need to be specified.This means that potential spurious correlations between global parameters are not mitigated, but this is not problematic because such correlations have no effect on the analysis ensemble.Nevertheless, since the localisation matrix ρ is by assumption a correlation matrix, it must be symmetric and positive definite.This means that the tapering parameter ζ p cannot take arbitrary values (Ruckstuhl and Janjić, 2018).See Bocquet et al. (2021) for a detailed interpretation of ζ p .
For completeness, let us mention that the update eq. ( 21) is also valid without localisation when B xx is not invertible (i.e. when N e ≤ N x + 1) provided that we replace the inverse by the Moore-Penrose pseudo-inverse.Indeed in this case, as proven in Bocquet et al. (2021), the parameter update can be written where the Moore-Penrose pseudo-inverse is indicated by a + superscript.Realising that we conclude that the update eq. ( 25) is equivalent to eq. ( 21) upon replacing the inverse by the Moore-Penrose pseudo-inverse.

The (local) EnSRF-HML analysis
We now extend the EnSRF-ML analysis to the case where both global and local parameters are estimated.For this problem, we keep the state update and the global parameter update of the EnSRF-ML analysis, namely eqs.(19a), ( 20a) and ( 24), and an update for the local parameters needs to be provided.
Following the arguments of section 2.2.2, we choose to write the local parameter update as State, global and local parameter estimation, Q. Malartic, A. Farchi, and M. Bocquet preprint -March 28, 2022 Without localisation, there is no distinction between local and global parameters.When using CL, in addition to eq. ( 22), we have Again, since B qp and B qq are not used during the analysis, ρ qp and ρ qq do not need to be specified.The localisation matrix for the state-local parameter cross subspace ρ qx has to reflect the geometry of the local parameters and state variables, contrary to ρ px which, as explained in section 2.2.2, is bound to be row-wise uniform.This point is important since specifying ρ qx is the only way to include localisation in the local parameter update.
Finally, it is possible to normalise ρ qx into ρ qx using its largest value: ρ qx = ζ q ρ qx .It turns out that ζ q , defined as the largest value of ρ qx , has the same role in the local parameter update eq. ( 27) than the tapering parameter ζ p in the global parameter update eqs.( 19b) and (20b).Therefore, as in the previous section, we assume that ρ qx = ρ qx and we introduce ζ q directly into the local parameter update, which is now written Hereafter, ζ q is called the local tapering parameter, not to be confused with the (global) tapering parameter ζ p3 .
To conclude, the LEnSRF-HML analysis is summarised in algorithm 1.By construction, it is equivalent to the generic LEnSRF analysis described in section 2.2.1.In the limit where localisation is disabled (ρ = Π, the matrix full of ones), the EnSRF-HML analysis is retrieved, and is equivalent to the generic EnSRF analysis described in section 2.2.1.Note that this algorithm explicitly uses the tangent linear operator H x of H x , which may be mandatory to assimilate non-local observations.However, if the observations are local, it is possible to derive an alternative algorithm which does not explicitly use H x , but is nonetheless equivalent to the original algorithm for local, linear observation operators.This algorithm is given in appendix C.

The ensemble transform Kalman filter
We now focus on the EnKFs with DL, for which the LETKF is exemplar.

The generic (local) ETKF analysis
The generic ETKF analysis is given by the following set of equations: Equation ( 30a) is known as the mean update and eq.(30b) as the perturbation update.In these equations, Y ∈ R Ny×Ne , T e ∈ R Ne×Ne , and w a ∈ R Ne are ancillary variables defined by Note that this definition of Y is consistent with the definition in line 3 of algorithm 1.
The main advantage of the ETKF analysis is that the linear algebra is expressed in the ensemble space (R Ne ), which is usually much smaller than both the number of observations and the augmented state space dimension (N e N y , N z ).Unfortunately, CL expressed in the augmented state space cannot be used in the analysis.Nevertheless, DL can be included in the ETKF by making the analysis local following Hunt et al. (2007); Nerger and Gregg (2007).
For each augmented state variable n ∈ {1, . . ., N z }, the inverse of the observation error covariance is tapered: where ρ n ∈ R Ny×Ny is the localisation matrix in observation space for the n-th variable, a correlation matrix which depends on the geometry of the observations relative to the n-th variable.This yields local variants of T e and w a which are used to compute the n-th row of the mean and perturbation updates ∆z and ∆Z.This describes the LETKF analysis.By construction, the localisation matrix ρ n is rigorously defined only when both the observations and the n-th variable are local.
A key asset of the ETKF is that eq. ( 30), which describes the generic ETKF analysis, can also be used to implement the ETKF-ML analysis in a very efficient way.By contrast, the generic LETKF analysis described above cannot be used to implement the LETKF-ML analysis because the localisation matrix ρ n cannot be rigorously defined for global model parameters.
Therefore, in the following sections, we derive an equivalent update for the ETKF-ML analysis.The goal is to provide an update scheme equivalent to eq. ( 30) in the global case while being generalisable to DL with global model parameters.In section 2.3.2,we only consider global model parameters, repeating but improving upon Bocquet et al. (2021), and in section 2.3.3,we consider the general case with both global and local model parameters.

The (local) ETKF-ML analysis
Let us start with global parameters only (i.e.N q = 0).Following Bocquet et al. (2021), it is possible to separate state and parameter update in the analysis.The state update is performed using the same ensemble transform as in the generic LETKF: and the parameter update is performed using the pseudo-inverse formulae eq. ( 25), which we recall here: When enforcing DL, the state update eq. ( 33) is made local (following the method described in section 2.3.1),while the parameter update eq. ( 34) is only indirectly localised.Indeed, as explained in section 2.2.2, the parameter update is a regression of the state update eq. ( 33), which is localised, into the parameter subspace.This update, combined with the state update eq. ( 33), defines the LETKF-ML analysis as originally proposed in Bocquet et al. (2021).
However, eq. ( 26) shows that without localisation, where B xx has been localised with ρ xx .This last localisation footprint is missing in the original LETKF-ML analysis in Bocquet et al. (2021); we have numerically checked that, although working as expected, it makes the LETKF-ML distinctively not as accurate as the LEnSRF-ML algorithm.To fix this issue, we propose a more consistent approach for the parameter update.
Instead of using the pseudo-inverse formulae eq. ( 34), we propose to use the parameter update of the EnSRF-ML analysis, namely eqs.(19b) and (20b).With the additional assumption H x Z f x = HZ f ≈ H Z f , this update can be re-written as follows: where the uncorrelated increments u y and U y are the counterparts of u x and U x in observation space, given by This parameter update, combined with the state update eq. ( 33), defines the ETKF-ML analysis used in this paper.A proof of these formulae can be found in appendix B.
Enforcing DL in the parameter update of this new ETKF-ML analysis is straightforward.First, the construction of the uncorrelated increments u y and U y with eq. ( 36) is made local (following the method described in section 2.3.1), and then the parameter update is computed (globally) with eq. ( 35).The resulting LETKF-ML analysis has exactly the same amount of localisation footprints as the LEnSRF-ML analysis.It theoretically improves upon the approximate technique proposed in Aksoy et al. (2006) as it makes the update rigorous.
Finally, as proposed in Bocquet et al. (2021) and taking again inspiration from the EnSRF-ML analysis, it is possible to taper the parameter update and hence to replace eq. ( 35) by where ζ p is the global tapering parameter.In the EnSRF-ML analysis, the values of ζ p are bounded by the fact that they are used in the definition of a positive definite matrix.By contrast here, there is no such constraint and ζ p can take arbitrary values.

The (local) ETKF-HML analysis
We now extend the ETKF-ML analysis to the case where we have both global and local parameters to estimate.For this problem, we keep the state update and the global parameter update of the ETKF-ML analysis, namely eqs.( 33) and ( 37), and we need to provide an update for the local parameters.
We choose to perform the local parameter update using the same ensemble transform as in the generic ETKF, with the addition of the local tapering parameter: This update, combined with the state update eq. ( 33) and the global parameter update eq. ( 37), defines the ETKF-HML analysis.The local tapering parameter ζ q enables a full similarity between the ETKF-HML and EnSRF-HML analyses.
Enforcing DL in this ETKF-HML is straightforward: the local parameter update is made local following the method described in section 2.3.1.Note however, that we do not make the assumption that the local parameters and the state variables follow the same geometry.Therefore, a rigorous definition of the LETKF-HML analysis could require two sets of localisation matrices: {ρ x n , n = 1, . . ., N x } for the state variables and {ρ q m , m = 1, . . ., N q } for the local parameters.Hence the local state updates and local parameter updates are computed in two different localisation loops.If the geometry of the local parameters coincides with that of the state variables (i.e. if the local parameters and the state variables are co-located), then the two localisation loops can potentially be merged.
To conclude, the LETKF-HML analysis is summarised in algorithm 2. In this algorithm, we make the assumption that the observation operator is fully local.Specifically, we hypothesise the existence of a map h : p → h (p) from [1 . . .N y ] to [1 . . .N x ], which, to each index p of any observation [y] p associates the index n = h(p) of the grid cell where the observation belongs and of which it is representative.Hence, for any p ∈ [1 . . .N y ], the p-th observation can be written [y] p = H x,p [x] h(p) .If needed, the algorithm can be generalised to more complex observation operators, provided that they are local, typically interpolation operators.
In the limit where localisation is disabled (for all n ∈ {1, . . ., N x }, and m ∈ {1, . . ., N x }, ρ x n = ρ q m = Π, the matrix full of ones), one recovers the ETKF-HML analysis, which is equivalent to the generic ETKF analysis described in section 2.3.1.Furthermore, the generic ETKF analysis being equivalent to the generic EnSRF analysis, we conclude that the ETKF-HML analysis is equivalent to the EnSRF-HML analysis.However, even though the LEnSRF-HML analysis is equivalent to the generic LEnSRF analysis, the LETKF-HML analysis is not equivalent to the generic LETKF analysis which is not defined (because of the global parameters).Finally, the parameter localisation is somewhat similar between the LETKF-HML and the LEnSRF-HML analyses, which is why we expect the difference in performance between the LETKF-HML and the LEnSRF-HML algorithms to be of the same order as the difference in performance between the LETKF and the LEnSRF algorithms (Sakov and Bertino, 2011).
The algorithms presented in section 2.2 and section 2.3 are summarised in table 4 of appendix E.
Algorithm 2 LETKF-HML analysis for a fully local observation operator Parameters: localisation matrices {ρ x n , n = 1, . . ., N x } and {ρ q m , m = 1, . . ., N q }, tapering parameters ζ p and ζ q Input: Forecast ensemble for p ∈ h −1 (n) do 10: 12: end for 13: state, mean update [local] 14: state, perturbation update [local] 15: end for 16: for m = 1 to N q do 17: local parameters, mean update [local] 22: local parameters, perturbation update [local] 23: end for 3 Illustration of the EnKF-ML algorithms with a 1D model In this section, the EnKF-HML family of algorithms is first illustrated numerically using the Lorenz 1996 (L96) model (Lorenz and Emanuel, 1998).The standard L96 model with 40 variables is widely used in DA to test new methods, but we choose here to use an inhomogeneous variant to illustrate the need for local parameters.

The inhomogeneous Lorenz 1996 model
The L96 model is defined by a set of ODEs over a periodic domain with N x variables, indexed by n = 1, . . ., N x : where F is the forcing coefficient and x 1 = x Nx+1 , x 0 = x Nx , and x −1 = x Nx−1 to ensure periodicity.The inhomogeneous L96 (L96i) model is a variant of the L96 model in which the constant forcing F is replaced by a local forcing F n which depends on the state variable index n.
The standard L96 model uses N x = 40 variables and F = 8.For our experiments we use the L96i model with N x = 40 variables as well and the local forcing is defined as The model is integrated using a fourth-order Runge-Kutta scheme with a time step of δt = 0.05.We checked that it has 13 positive Lyapunov exponents and a neutral one, yielding an unstable-neutral subspace of dimension 14.

The surrogate model
As explained in the beginning of section 2, the EnKF-HML algorithms do not use the true model for the forecast but a surrogate model instead, whose parameters are estimated during the analysis.Following Bocquet et al. (2020Bocquet et al. ( , 2021)), we choose to use the surrogate model designed in Bocquet et al. (2019).In this model, the tendencies are parametrised by a set of regressors called the monomials, and are then integrated in time to build the resolvent between two time steps.This model can in principle represent any homogeneous ODE, provided that the number of monomials (which is determined by L, the size of the local stencil) is sufficient.Note that this surrogate model has been implemented using neural networks in Bocquet et al. (2019).
In our experiments, we use a stencil of L = 2, we replace the global forcing coefficient by local forcing coefficients, and we use a fourth-order Runge-Kutta scheme with a time step of δt = 0.05 to integrate the tendencies.The surrogate model is defined on N x = 40 state variables, in a one-to-one correspondence with those of the L96i, and has a total of By construction, it is possible to reproduce the L96i model with a specific and unique set of parameters which we write a t and f t : sur (a t , f t ) is the L96i model.The values of a t lie in the set {−1, 0, 1} while the values of f t are given by eq. ( 40).The sensitivity of the surrogate model sur (a, f ) to a and f is illustrated in fig. 1 using the forecast skill, which is defined as the average integration error after a given lead time starting from the correct initial condition.

The inference problem
The experiments consist of twin simulations.The truth is generated using the L96i model, or equivalently using sur (a t , f t ).The system is fully observed (the 2D system used later on is not), H x (x) = x, with a period of ∆t = 0.05, and the observations are independently perturbed with a normal distribution of error covariance matrix R = I.
Three categories of experiments are performed, with an increasing number of parameters to estimate alongside the state.
1.In the first category, the goal is to estimate the 17 monomial coefficients a.This inference problem is very similar to the one considered in Bocquet et al. (2021).
2. In the second category, the goal is to estimate the 40 forcing coefficients f .Figure 1: Forecast skill of the surrogate model sur (a, f ) compared to sur (a t , f t ), the true L96i model.Left panel: f = f t and a = a t + a with a ∼ N (0, σ 2 I) for increasing σ.Right panel: a = a t and f = f t + f with f ∼ N (0, σ 2 I) for increasing σ.Each experiment is repeated 5000 times, with different parameter perturbations and different initial conditions.The curves are stopped with a dot when at least one of the 5000 repetitions diverged.The RMSE is normalized by the variability of the true model.
3. In the third category, the goal is to estimate all 57 coefficients.
In all experiments, the main performance metric is the time-averaged root mean squared error (RMSE) of the state analysis.Since the set of true parameters is unique, it is also possible to compute an RMSE score for the parameter analysis.However, in such cycled experiments, we expected and we have numerically checked that small RMSE scores for the state estimation can only be obtained with accurate models, i.e. with small RMSE scores for the parameter estimation.For this reason, we do not systematically report the parameter RMSE.Furthermore, the exact numbers of spin-up and assimilation cycles depend on the experiment and are specified later.

Tested algorithms
Our objective is to implement and test the LETKF-HML and LEnSRF-HML algorithms, for which we need to specify the set of global and local parameters p and q to be estimated alongside the state.The 17 monomial coefficients a affect the model tendencies in a global way.Therefore, if they need to be estimated, they must be included in the set of global parameters p.By contrast, the 40 forcing coefficients f affect the model tendencies locally.This means that, if they need to be estimated, they can be included either in the set of global parameters p (i.e., ignoring their local nature) or in the set of local parameters q.In order to distinguish the different algorithmic variants, we will replace the -HML suffix by a -ML suffix when there are only global parameters to estimate (N q = 0) and by a -LML suffix when there are only local parameters to estimate (N p = 0).This terminology is consistent with the definition of the EnKF-ML algorithms.
For comparison, we also implement and test the algorithm of Aksoy et al. (2006), hereafter called LETKF-Aksoy.This is a variant of the LETKF suited for parameter estimation, in which the global parameter update is performed through an empirical averaging of local updates.The original algorithm by Aksoy et al. (2006) included a mechanism to maintain the parameter spread above a certain threshold.For simplicity, we have not used this mechanism in our experiments as we did not find it necessary.
The setup for all the LEnKF-HML variants tested in section 3.4 is summarised in table 2.

Ensemble initialisation
As shown in Bocquet et al. (2021), the ensemble initialisation may have an impact on the time-averaged metric (even with a very long run).In this paper, this is less critical because we only use localised ensemble DA algorithms.Nevertheless, we stick to the initialisation method described in Bocquet et al. (2021).Namely, the i-th ensemble member is initialised as where z t is the true initial state, z is the initial bias, and z i is the i-th asymptotically unbiased perturbation.The covariance matrix Σ is diagonal, equal to 1 for the state variables and to 0.2 for both the local and global parameters.
As shown in fig. 1, having a 0.2 bias in the parameters is sufficient to make the surrogate model inaccurate.

Algorithm parametrisation
For the LEnSRF-HML analysis, algorithm 1, we need to specify two localisation matrices: the classical localisation matrix between state variables, ρ xx , and the cross localisation matrix between state variables and local parameters ρ qx .In all our experiments, the geometry of the local parameters (if any) is the same as the geometry of the state variables.Therefore and for the sake of simplicity, we enforce ρ qx = ρ xx and ρ xx is chosen as where GC is the Gaspari-Cohn piecewise rational function (Gaspari and Cohn, 1999), d (m, n) is the (circular) distance between the m-th and n-th variables, and r is the localisation radius, the only algorithmic parameter relative to localisation.
For the LETKF-HML analysis, algorithm 2, we also need to specify two sets of localisation matrices: the classical localisation matrices between observations and state variables, ρ x n , and the localisation matrices between observations and local parameters, ρ q n .For the same reasons as above, we enforce ρ q n = ρ x n and the ρ x n matrices are chosen as where d (i, n) and d (j, n) are the distances between the i-th observation and the n-th variable and between the j-th observation and the n-th variable, respectively, and r is the localisation radius.Besides, having the same geometry for the state variables and the local parameters means that the two for-loops in algorithm 2 can be merged.
Finally, in order to mitigate the sampling errors, we use a multiplicative inflation on the prior with a uniform and constant in time coefficient λ.Preliminary experiments have shown that using different inflation coefficients for model state and model parameters does not significantly improve the scores, which is why we chose to use the same uniform inflation coefficient for all components of the augmented state.Note however that this result may not generalise to other experiments, as suggested by other studies in the literature (Kang et al., 2011).
To summarise, our algorithms depend on at most four scalar parameters: the localisation radius r (which parametrises the Gaspari-Cohn function), the inflation coefficient λ, and the two tapering coefficients ζ p and ζ q introduced in section 2. Unless otherwise mentioned, each algorithmic parameter is optimally tuned to yield the lowest state RMSE for each experiment.

Results
In this section, we present the results of our numerical experiments, organised according to the classification described in section 3.3.1.

Estimation of the 17 monomial coefficients
In this first test series, the goal is to estimate the 17 monomial coefficients a only.As explained in section 3.3.2,these coefficients affect the model tendencies in a global way, and hence must be included in the set of global parameters p, which means that N p = 17.For these experiments, there is no local parameter: N q = 0.The setup for each LEnKF-HML variant tested in this section is recalled in table 2 (first three rows).
There are only two minor differences between this inference problem and the one considered in  The results are shown in fig. 2. The state analysis RMSE is averaged over 3000 cycles after a spin-up period of 3000 cycles, and over 8 repetitions of the experiments.This is empirically sufficient to ensure the convergence of the statistical indicators.
As expected from the similarity between the inference problems, the scores obtained with the LEnSRF-ML are overall similar to those reported by Bocquet et al. (2021).Indeed, the minimal ensemble size N e for a successful run (analysis RMSE around 0.2) is 20.This could be interpreted as 17 members for the N p = 17 global parameters (each global parameter is a neutral mode of the dynamics) plus a few additional members for the N x = 40 state variables, for which the number of unstable and neutral modes is 14, but which benefit from localisation.
There is almost no difference between the scores of the LEnSRF-ML and those of the LETKF-ML.This is not a surprise because the global parameter update of the LETKF-ML has been redesigned in section 2.3.2 to mimic that of the LEnSRF-ML, in such a way that the LEnSRF-ML and the LETKF-ML are as close to another as the LEnSRF and the LETKF.
More surprisingly, the LETKF-Aksoy method yields very similar results.Of course, the LETKF-Aksoy method makes sense: within each local domain, we obtain an estimate of the global parameters, therefore defining the global estimate as the average of the local estimates is natural.At the same time, we cannot exclude the possibility that the global parameter estimates vary a lot over the local domains, in which case making an average may not necessarily be a good option.This is why we expected the LETKF-ML to be more robust than the LETKF-Aksoy, because the LETKF-ML provides one estimate of the global parameters consistent with all local domains, which seems more rigorous.The similarity in scores suggest that there might be a deeper connection between the two methods.A further study is required to understand the mathematical justification of the global parameter update of the LETKF-Aksoy (for example using the alternating direction method of multipliers method, see Boyd et al., 2011, and references therein) and its potential limitations.
Finally, even though N e = 20 members are sufficient for a successful run, there is still at this point a small gap between the scores of the LEnKF-ML algorithms (i.e. with parameter estimation) and those of the LETKF (i.e. with known model).According to Bocquet et al. (2021), this gap comes from the use of a uniform (rather than adaptive) inflation and indeed progressively vanishes as the ensemble size N e grows.

Estimation of the 40 forcing coefficients
In this second test series, the goal is to estimate the 40 forcing coefficients f only.As explained in section 3.3.2,these coefficients affect the model tendencies in a local way, and hence they can be included either in the set of global parameters p or in the set of local parameters q.Our objective is to compare the two approaches and demonstrate that  parameter localisation is effective.The setup for each LEnKF-HML variant tested in this section is recalled in table 2 (fourth and fifth rows).
The results are shown in fig. 3. The state analysis RMSE is averaged over 3000 cycles after a spin-up period of 5000 cycles, and over 8 repetitions of the experiments.
With the LETKF-ML, the local nature of the forcing coefficients f is ignored and hence the algorithm uses N p = 40 global parameters and no local parameter: N q = 0.As in the previous test series, we expect that the minimal ensemble size N e for a successful run should be around N p = 40 members for the global parameters (each global parameter is a neutral mode of the dynamics) plus a few additional members for the N x = 40 state variables, for which the number of unstable and neutral modes is 14, but which benefit from the localisation.This is indeed what is observed in fig. 3.However, the divergence of the LETKF-ML for small ensembles (N x ≤ 40) is much less pronounced here than in the first test series.This can be explained by the fact that the surrogate model sur (a, f ) is more sensitive to a perturbation of the monomial coefficients a than to a perturbation of the forcing coefficients f , as illustrated by fig. 1.In particular, the initial bias in model parameters (as described in section 3.3.3) is much weaker in relative terms in this test series than in the first one.
With the LETKF-LML, the local nature of the forcing coefficients f is fully exploited.Hence the algorithm uses N q = 40 local parameters and no global parameter: N p = 0. fig. 3 shows that the localisation of the parameters is efficient.The minimal ensemble size N e for a successful run has been reduced from about 44 (without the LETKF-ML) to about 20.Furthermore, the scores obtained by the LETKF-LML are qualitatively close to those obtained by the LETKF (with known model), although there is a small gap, which corresponds to the estimation of one additional parameter per grid point.

Estimation of all 57 model coefficients
In this third test series, the goal is to estimate the 17 monomial coefficients a as well as the 40 forcing coefficients f .The monomial coefficients a must be included in the set of global parameters p, while the forcing can be included in the set of local parameters q.Hence, in these experiments there are N p = 17 global parameters and N q = 40 local parameters.The setup for each LEnKF-HML variant tested in this section is recalled in table 2 (last two rows).
The result of a first experiment with the LEnSRF-HML is shown in fig. 4. For this experiment, the ensemble size N e is set to 36 and the specific values for the algorithmic parameters (r, λ, ζ p , and ζ q ) are chosen by trial and error.
First of all, this experiment can be qualified as successful: after a spin-up period of several thousands of cycles, the state analysis RMSE stabilises below 0.2.Second, the improvement of the analysis is overall rather slow.Parameter estimation in ensemble DA is slow in general, but it is here most likely due to a misspecification of the algorithmic parameters.For example, increasing the inflation factor λ could help at the beginning of the experiment, when the surrogate model is inaccurate, but would impair the analysis at the end of the experiment, when the surrogate model is more precise.Using an adaptive inflation would resolve this dilemma, but this is beyond the scope of this paper4 .Third, the algorithm improves the global parameter analysis before the local parameter analysis, the state analysis RMSE seems much more correlated to the global parameter analysis RMSE than to the local parameter analysis RMSE, and the final spread of the local parameter analysis RMSE is much larger than that of the global parameter analysis RMSE.
All three elements are related to the fact that the surrogate model sur (a, f ) is more sensitive to a perturbation of the monomial coefficients a (which are the global parameters p in this experiment) than to a perturbation of the forcing coefficients f (which are the local parameters q in this experiment).Finally, the small increase in the local parameter analysis RMSE at the beginning of the experiment is once again most likely due to a misspecification of the algorithmic parameters.
After this first successful experiment, we wish to better characterise the function of each algorithmic parameter.While the role of the localisation radius r and of the multiplicative inflation factor λ are well documented in the DA literature, this is not the case for the tapering coefficients ζ p and ζ q .For this reason, we show how the accuracy of the analysis depends on ζ p and ζ q in fig. 5.The state analysis RMSE is averaged over 10 000 cycles after a spin-up period of 10 000 cycles, and over 8 repetitions of the experiments.The ensemble size N e is kept to 36 and in each case (ζ p and ζ q ), the values of the three other algorithmic parameters (r, λ, and the other ζ) are optimally tuned to yield the lowest time-averaged state analysis RMSE.
Let us first discuss the global tapering coefficient ζ p .Without tapering (ζ p = 1), the algorithm fails at estimating the global parameters, and therefore the state.The global parameter update per cycle is too strong compared to the amount of information brought to the system by only one batch of observations.In a way, the algorithm is constantly overfitting the single batch of observations at each cycle.This issue is most likely due to the ensemble being too small to accurately represent the cross-correlations between state variables and global parameters, because, empirically, the need for tapering vanishes as the ensemble size grows (Bocquet et al., 2021).Hence, ζ p can here also be seen as a relaxation parameter.The analysis progressively improves as ζ p decreases, making the global parameter update slower but more robust.Finally, the state analysis RMSE reaches an optimal value and then grows again when the tapering is too strong.Indeed, for very small values of ζ p , the global parameter update is very slow, slow enough that the number of cycles used in the experiment, even though already large, is not enough to ensure the convergence of the statistics.Furthermore, as can be seen in fig.5, lower values of the global tapering ζ p , typically below 0.1, yields numerical divergence of the filter since the relaxation towards a better surrogate model is too slow.
The influence of the local tapering coefficient ζ q is qualitatively similar to that of ζ p with one exception.Even if using ζ q < 1 yields better scores, tapering is not mandatory because the experiment is already successful without tapering (ζ q = 1).This is most probably due to the fact that cross-correlations between state variables and local parameters are easier to estimate thanks to parameter localisation.Finally, we show the accuracy of the analysis as a function of the ensemble size N e in fig.6.The state analysis RMSE is averaged over 10 000 cycles after a spin-up period of 10 000 cycles, and over 8 repetition of the experiments.
First, there is almost no difference between the scores of the LEnSRF-HML and those of the LETKF-HML in the accurate estimation part of the curves (higher N e ).The similarity between both algorithms can be explained using the same argument as for the similarity between the LEnSRF-ML and the LETKF-ML in section 3.4.1.Second, two regimes can be qualitatively distinguished for the LEnKF-HML variants.When the ensemble size N e is smaller than 20, the algorithms diverge, in a way which is very similar to the divergence of the LEnKF-ML variants in section 3.4.1.
When the ensemble size N e is larger than 20, the accuracy of the analysis progressively improves, in a way which is very similar to the LETKF-LML in section 3.4.2.These regimes can be explained as follows.In general, the LEnKF-HML estimates the most sensitive parameters first.In our experiments, the most sensitive parameters are the global parameters p, which correspond to the 17 monomial coefficients a.As for the LEnKF-ML, the minimal ensemble size N e to estimate the state and the global parameters is around 20: 17 members for the N p = 17 global parameters (each global parameter is a neutral mode of the dynamics) plus a few additional members for the N x = 40 state variables (14 unstable and neutral modes, but the assimilation is localised).However, in this third test series, using N e = 20 is not sufficient because we must also estimate the N q = 40 local parameters.This explains the second regime which is qualitatively similar to the LETKF-LML.In this regime, we must add 12 additional members to decrease the analysis RMSE to 0.2.This is less than the additional N q = 40 local parameters to estimate, which shows that parameter localisation is efficient.Eventually, for larger ensembles, the scores obtained by the LEnKF-HML variants become close to those obtained by the LETKF (with known model), with a small but meaningful gap corresponding to the additional estimation of the N p = 17 global parameters and of the N q = 40 local parameters.

Two dimensional illustration with global and local parameters, covariance and domain localisations
In this section, we provide an illustration of a selection of EnKF-HML algorithms with the multilayer L96 (mL96) model (Farchi and Bocquet, 2019), which is a two-dimensional (horizontal and vertical) extension of the standard L96 model with radiance-like (hence non-local) observations.This may seem a complicated example but it actually reflects to a large extent the requirements of a realistic, high-dimensional application of our methods.

The multilayer Lorenz 1996 model
The mL96 model consists in a vertical stack of N v = 32 coupled (atmospheric) layers, each layer being a onedimensional L96 model with N h = 40 variables.The total state dimension is hence , and the model's equations are given by the following set of ODEs: where x v,h is the h-th horizontal variable of the v-th vertical layer.The first terms in this equation correspond to the original L96 dynamics, where the horizontal index h applies periodically in {1, . . ., N h }.The forcing term F is inhomogeneous; it is set constant over each layer and decreases from F 1,h = 8 for the bottom layer to F Nv,h = 4 for the top layer.Finally, the last two terms correspond to the vertical coupling between adjacent layers, with The model is integrated using a fourth-order Runge-Kutta scheme with a time step of δt = 0.05.The dimension of the unstable and neutral subspace of the dynamics is about 50 (Farchi and Bocquet, 2019).

The surrogate model
For this two-dimensional illustration, we use the surrogate model presented in section 3.2, which we adapt in the following way.
1.The 17 monomial coefficients a are shared between all N v layers.2. In theory, the number of forcing coefficients of the model is N v × N h (one for each state variable).To reduce this number and avoid an excessive initial underdetermination, we parametrise the forcing as , where F v and F h capture the vertical and horizontal variations of the forcing, respectively.The total number of forcing coefficients is hence N v + N h .However, to ensure the uniqueness of the decomposition, we rescale F v and F h in such a way that F v (0) is always 1.This reduces the effective number of forcing coefficients to N v + N h − 1. 3. In the following, the vertical coupling terms Γ v+1,h and Γ v,h are hard-coded in the model.In more advanced experiments, we have successfully learnt those parameters with success just as the rest of eq. ( 44), but we do not report it for the sake of conciseness.
As in section 3.2, for convenience we introduce sur (a, f v , f h ) as the surrogate model in which the 17 monomial coefficients are a, the N v − 1 = 31 vertical forcing coefficients are f v and the N h = 40 horizontal forcing coefficients are f h .The total number of parameters of this surrogate model is 17 + 31 + 40 = 88.With this parametrisation, the true mL96 model is identifiable: by construction it can be reproduced with a given set of parameters.

The inference problem
The experiments consist of twin simulations.The truth is generated using the mL96 model.At each time step ∆t = 0.05, a total of N y = 8 × 40 = 320 observations are generated, whose characteristics will be described in the next section.
In addition to estimating the state variables (40 × N v = 1280 scalars), the goal is to estimate the 17 monomial coefficients a, the 40 horizontal forcings f h and the 31 vertical forcings in f v .In all experiments, the main performance metric is the time-averaged RMSE of the state analysis, as, in this context with very few parameters, a small state RMSE only can be obtained with successful parameter estimation.

Observation setup
For this multilayer model, the observations consist of satellite soundings with 8 channels.Each channel is characterised by a vertical distribution of observation weights, also called averaging kernel, which is applied to all N h = 40 columns of state variables5 .
The averaging kernels are constructed using the Gaspari-Cohn function, with centers evenly spaced along the vertical direction, and each with a half-width of 10 levels.They are independently normalised in such a way that the natural variability of each observation matches that of the L96i variables.The 8 non-normalised and normalised averaging kernels are displayed in fig. 7. Finally, the observations are perturbed with a normal distribution of error covariance matrix R = I.
Note that this observation setup is sparse, as there are four time as many state variables as observations.

The L2EnSRF-HML algorithm
With non-local observations such as the ones described above, using DL only yields suboptimal results.Therefore, following the approach of Farchi and Bocquet (2019), we include DL in the LEnSRF-HML, algorithm 1, but only in the horizontal direction.The resulting L 2 EnSRF-HML algorithm uses DL in the horizontal direction (in which observation are local) and CL in the vertical direction (in which observations are non-local).The analysis is summarised in algorithm 3, in which several simplifications have been made.
• In principle, four categories of parameters exist: global, horizontally local, vertically local, and both horizontally and vertically local.Algorithm 3 only uses two categories: q gathers the set of horizontally local parameters and p the set of horizontally non-local parameters.In both categories, the parameters can be vertically local or not.
• Vertical localisation is performed using CL with the matrices ρ xx , ρ px and ρ qx .
• Horizontal localisation is performed using DL with N h local analyses.Each local analysis updates the h-th column of state variables and model parameters, whose indices are written c (h), using a single localisation matrix ρ h , common to state variables and model parameters.
For our estimation problem, we split the surrogate model coefficients into p and q as follows.The 40 horizontal forcing coefficients f h are included in q, and the 17 monomial coefficients a are concatenated with the 31 vertical forcing coefficients f v to form the horizontally non-local parameters p: horizontally local parameters, mean update 14: state, perturbation update 15: horizontally local parameters, perturbation update 16: horizontally non-local parameters, perturbation update 21: return E a = zf + ∆z 1 + √ N e − 1 Z f + ∆Z analysis ensemble

Ensemble initialisation
The ensemble initialisation is performed following the method described in section 3.3.3.The state is initialised with a standard deviation of 0.5, smaller than in the one-dimensional test series because the time-averaged analysis error is expected to be smaller in the present experiment.The standard deviations for the horizontal and vertical forcing coefficients have been set to 0.17 and 0.012, respectively, in such a way that the initial RMSE for the 1280 reconstructed forcing coefficients (the outer product of the horizontal and vertical coefficients) ranges between 0.15 and 0.2.Finally, the monomial coefficients initial standard deviation is set to σ a = 0.1, once again smaller than in the one-dimensional test series as it makes the convergence faster.

Algorithm parametrisation
As seen in algorithm 3, the L 2 EnSRF-HML analysis requires four kinds of localisation matrices.
First, the horizontal localisation matrices ρ h are given by where d h (i, h) and d h (j, h) are the horizontal (circular) distances between the i-th observation and the h-th column and between the j-th observation and the h-th column, respectively, and r h is the horizontal localisation radius.
Second, the vertical localisation matrix between state variables, ρ xx , is given by where d v (m, n) is the vertical distance between the m-th and n-th state variables and r v is the vertical localisation radius.Finally, the vertical cross-localisation matrix between state variables and horizontally non-local parameters, ρ px follows the same structure as the horizontally local parameters: where the first block, corresponding to the cross-localisation between the monomial coefficients a and the state variables, is set to Π.Because the monomial coefficients are global, this matrix should be row-wise uniform, but we additionally assume it to be fully uniform for the sake of simplicity.The second block ρ vx , corresponding to the cross-localisation between the N v − 1 = 31 vertical forcing coefficients and the state variables, is given by In this equation, d v (m, n) is the vertical distance between the m-th vertical forcing coefficient and the n-th state variable, and r v is the same vertical localisation radius as in eq. ( 48) to reduce the number of algorithmic parameters.Note that, by construction, the m-th vertical forcing coefficient has the same vertical location as the state variables within the (m + 1)-th layer.
To summarise, the L 2 EnSRF-HML analysis depends on two localisation radii r h and r v .The analysis also depends on the two tapering coefficients ζ p and ζ q .In order to further reduce the number of algorithmic parameters, and given the results of fig. 5, we set ζ p = ζ q ζ.Moreover, as in the one-dimensional test series, we use a multiplicative inflation on the prior with a uniform and constant in time coefficient λ.For each experiment, the algorithmic parameters (r h , r v , ζ, λ) are tuned so as to yield optimal scores.

Results
The goal of the present test series is to show that it is possible to estimate the parameters of the surrogate model alongside the state variables, but also that parameter localisation is efficient.To that purpose, we perform four types of experiments.We first test the EnSRF-ML and the L 2 EnSRF-HML (with and without localisation).For comparison, we also test the EnSRF and the L 2 EnSRF, with known model.The results are described in the following sections, and summarised in table 3.

Estimation of the state variables only
With known model, and without localisation, 50 ensemble members are necessary to accurately estimate the 1280 state variables only, which corresponds more or less to the dimension of the unstable and neutral subspace (Bocquet and Carrassi, 2017).The time-averaged analysis RMSE is around 0.08.With localisation, only 8 ensemble members (Farchi et al., 2021b) are required for a successful estimation, and the best scores require an ensemble of 10 members.

Estimation of both state variables and model parameters without localisation
Without localisation, an ensemble of 140 members is sufficient to accurately estimate the 88 model parameters alongside the 1280 state variables.The time-averaged state analysis is around 0.11.Although this experiment is not the main result of the present two-dimensional test series, we think that it provides a reasonable approximation of the best scores that can be obtained with the L 2 EnSRF-HML.
More precisely, we have found that the state analysis RMSE decreases almost linearly as the ensemble size N e increases (not shown here).The score stops improving when N e reaches a critical value, around N e = 140 here, which is close to the dimension of the unstable and neutral subspace of the augmented state dynamics (Bocquet et al., 2021).When the ensemble size N e is close, but smaller to 140, the algorithm is able to accurately estimate the monomial coefficients a but struggles to estimate the forcing coefficients f v and f h .This is once again related to the fact that the surrogate model is more sensitive to perturbations of a.

Estimation of both state variables and model parameters with localisation
With localisation, an ensemble of only 50 members is sufficient to accurately estimate the 88 model parameters alongside the state.From the previous experiments, we know that 10 members are sufficient to estimate the state variables.Additionally, 17 members are required to estimate the 17 monomial coefficients a, which are neutral modes of the augmented state dynamics and which do not benefit from localisation.This means that only about 50 − 10 − 17 = 23 additional members are necessary to estimate the 40 + 31 = 71 horizontal and vertical forcing coefficients f v and f h .This shows that parameter localisation is indeed effective.
For this inference problem, fig.8 shows the results of an experiment with 50 members.The conclusions are overall very similar to those in section 3.4.3.First, the experiment can be qualified as successful: after a spin-up period of several hundreds of cycles, the state analysis RMSE stabilises around 0.12.Second, the improvement of the analysis is rather slow, once again because the algorithmic parameters have been chosen to minimise the asymptotic analysis error.Third, the different components of the augmented state are learnt on different time scales: the algorithm first corrects the state and the monomial coefficients a, which are the most sensitive parameters.Finally, note that the time-averaged state analysis RMSE is a bit higher here (0.12) than in section 4.4.2without localisation (0.11), but we have checked that better scores can be obtained with localisation when using larger ensembles.

Conclusions
In the wake of Bocquet et al. (2021), we have shown how the classical LETKF and LEnSRF can be generalised to estimate model parameters, both global and local, alongside the state variables.The assimilation of local parameters is natural with DL (i.e. the LETKF), especially when model parameters and state variables are co-located.By contrast, CL (i.e. the LEnSRF) is more suited than DL to the estimation of global parameters, and to the assimilation of non-local observations, at the cost of having to perform linear algebra in the whole state space.Introducing the ancillary variables defined in eq.(18a) and eq.( 18b) for the LEnSRF-HML, and defined in eq.(36a) and eq.(36b) for the LETKF-HML, we have optimised these algorithms in such a way that it is not necessary to compute Introducing the L96i, an inhomogeneous variant of the L96 model, we have numerically tested the LETKF-HML and the LEnSRF-HML.The results are overall consistent with those of Bocquet et al. (2021), they show that the algorithms are able to learn the dynamics of a fully parametrised surrogate model alongside the state variables, and that parameter localisation is beneficial, in the sense that it is possible to estimate a large amount of parameters with reasonable ensemble sizes.
Finally, we have generalised the L 2 EnSRF algorithm proposed in Farchi et al. (2021b) to estimate model parameters alongside the state variables.The resulting algorithm, called L 2 EnSRF-HML, combines DL in the horizontal direction and CL in the vertical direction, and is able to take into account the local nature of state variables and model parameters in both the horizontal and vertical directions.We have illustrated with success the algorithm using a challenging multilayer L96 experiment with radiance-like, non-local observations, in which the surrogate model has global monomial coefficients as well as vertically local and horizontally local forcing coefficients.
Several researchs could be initiated following this paper.First, we showed that using non-adaptive algorithmic parameters can be critical because the optimal algorithmic parameters change over time, as the surrogate model progressively improves; hence, adaptive algorithms should be investigated.Second, we have not demonstrated a clear practical advantage of the LETKF-ML over the empirical LETKF-Aksoy.We suspect that applying the methods to well designed sparsely observed systems could unveil accuracy differences.Third, the online characteristics of the proposed algorithms could be exploited in situations with slow parameter time evolution, while offline algorithms would be unfit to this task.Finally, the next goal would be to learn the dynamics of more complex and non identifiable surrogate representations, with a more realistic dynamical model (both in term of physical interpretation and of state dimension).

Figure 2 :
Figure2: Time-averaged state analysis RMSE as a function of the ensemble size N e for the first test series (estimation of the 17 monomial coefficients a) with the LEnSRF-ML (in blue), the LETKF-ML (in yellow), and the LETKF-Aksoy (in green).For reference, the red line shows the scores obtained with the LETKF when the model is known.

Figure 3 :
Figure3: Time-averaged state analysis RMSE as a function of the ensemble size N e for the second test series (estimation of the 40 forcing coefficients f ) with the LETKF-ML (in blue) and the LETKF-LML (in yellow).For reference, the red line shows the scores obtained with the LETKF when the forcing coefficients are known.

Figure 4 :
Figure4: Time series of instantaneous analysis RMSE for the third test series (estimation of all 57 model coefficients) with the LEnSRF-HML.The state RMSE is shown in blue, the global parameter RMSE in yellow, and the local parameter RMSE in green.The experiment is repeated 1000 times, with different initial conditions and different observations.The thick line shows the average over all repetitions and the thin lines stand for the average plus or minus one standard deviation.

Figure 6 :
Figure6: Time-averaged state analysis RMSE as a function of the ensemble size N e for the third test series (estimation of all 57 model coefficients) with the LEnSRF-HML (in blue) and the LETKF-HML (in yellow).For reference, the red line shows the scores obtained with the LETKF when the model is known.

Figure 7 :
Figure 7: Averaging kernel for each of the 8 satellite channels without (left panel) and with (right panel) normalisation.

Figure 8 :
Figure8: Time series of instantaneous analysis RMSE for the two-dimensional test series (with the mL96 model) using the L 2 EnSRF-HML.The state RMSE is shown in blue, the RMSE for the 17 monomial coefficients a in yellow, and the RMSE for the 1280 reconstructed forcing coefficients (the outer product of the horizontal and vertical coefficients) in green.In all cases, the RMSE is normalised by its initial value.The thick lines represent the median over 100 experiments, and the thin lines represent the 32-th and 68-th percentiles.

Table 1 :
Adequacy (green) and inadequacy (red)between LEnKF types and the estimation of local, global and mixed parameters.CL refers to covariance localisation and DL refers to domain localisation.

Table 2 :
Setup for the different algorithmic variants tested in section 3.4.For each experiment, we specify the inference problem (first column), the analysis algorithm (second column), the forecast model (third column), the definition of the set of global and local coefficients (fourth and fifth columns) and their numbers (sixth and seventh columns).

Table 3 :
Summary of the results for the two-dimensional test series with the mL96 model.For each experiment, we specify the inference problem (first column), the associated dimension of the unstable and neutral subspace of the augmented state dynamics N 0 (second column), the analysis algorithm (third column), the forecast model (fourth column), whether localisation is used or not (fifth column), the ensemble size N e (seventh column), and the timeaveraged state analysis RMSE (last column).The symbol ≥ in the ensemble size column means that increasing the ensemble size does not yield significantly better scores.Third, the vertical cross-localisation matrix between state variables and horizontally local parameters, ρ qx is set to Π because in this specific case the horizontally local parameters (the 40 horizontal forcing coefficients) are not vertically local.
B −1 xx when evaluating the global parameter update from the local state update.Moreover, we have shown how to rigorously assimilate global parameters within the DL-based LETKF, assuming the observations are local.The existing and proposed algorithms are summarised in table 4.