A localized weighted ensemble Kalman filter for high‐dimensional systems

To avoid filter collapse, a new localized weighted ensemble Kalman filter (LWEnKF) is presented. This filter is a nonlinear non‐Gaussian filter that combines some of the advantages of the particle filter (PF) and of the ensemble Kalman filter (EnKF). Additionally, the new method can overcome filter degeneracy in high‐dimensional system applications. Based on the weighted ensemble Kalman filter (WEnKF), we extend the scalar weight of each particle to a vector and limit the influence of distant observations through a localization function. According to the results of experiments using the Lorenz ‘96 model with 40 variables, the LWEnKF with only 10 particles can prevent filter degeneracy. In addition, tests of the new filter are also performed using a two‐layer quasi‐geostrophic model to demonstrate the feasibility of using the new method in high‐dimensional numerical weather prediction models. Comparisons among the LWEnKF, the local particle filter (LPF) and the localized perturbed observation EnKF (LEnKF) reveal that the proposed method can combine the advantages of the latter two in certain aspects, even providing better performance in some situations. This characteristic of the LWEnKF indicates its potential for data assimilation of different types of observations. Moreover, the new filter is compared to the block‐local ensemble Kalman particle filter (LEnKPF). Experiments showed that the LWEnKF has an obvious advantage over the LEnKPF when the number of particles is small, which indicates its potential for realistic applications limited by computing resources.


INTRODUCTION
The ensemble Kalman filter (EnKF) and the particle filter (PF) are sequential data assimilation methods based on statistical theory. Compared with the PF, the EnKF and its derived algorithms have been more widely applied and studied in the field of data assimilation, but note that some of the assumptions adopted by the EnKF actually limit the accuracy of analysis. The EnKF (Evensen, 1994;Burgers et al. 1998) implicitly assumes that the numerical model is linear and that both the forecast error and the observation error are Gaussian-distributed, which is unlikely to be satisfied in most real geophysical systems. In contrast, the PF is not constrained by a linear model and Gaussian noise and can be applied to any nonlinear non-Gaussian dynamic system. Additionally, it adopts the Monte Carlo sampling method to approximate the full posterior probability density function (PDF) of model variables and can thus better represent non-Gaussian information.
The PF has unique advantages in data assimilation, but there is a drawback. For the generalization of the ensemble idea, the PF adds weight to each particle (member of ensemble) to indicate the degree of importance that they have in representing PDFs. When the case is a high-dimensional model with a large number of independent observations, as assimilation progresses forward in time, the weights of a few particles will become increasingly large, and the weights of most particles will be very small: near (or equal to) zero. This occurs even in simple models (Farchi and Bocquet, 2018). Thus, only a small number of particles (occasionally only one particle) can contribute to characterizing the PDF, while most of them are useless. This is called filter degeneracy, and it directly leads to a low effective sample ratio; thus, the accuracy of the posterior estimation will be greatly reduced, and the forecast clearly cannot be improved, which is one of the main purposes of data assimilation. Resampling is a technique to make weights equal by copying those particles with high weights while abandoning particles with very low weights. However, Snyder et al. (2008) proved that to prevent filter collapse, the number of particles must increase exponentially with the dimension of independent observations, which is the so-called "curse of dimensionality". They also contend that resampling is unable to fully overcome the curse of dimensionality, which is clearly a hindrance to the application of the PF in high-dimensional geophysical systems.
There are currently four main techniques for improving the PF to avoid filter degeneracy in practical applications (van Leeuwen et al. 2019). One technique is to introduce a proposal density that depends on the past model variables and the current observations. Particles are sampled from the proposal density rather than the original transition density. The selection of the proposal density function needs to satisfy a very loose condition: as long as its support contains the support of the original transition density. Theoretically, there are many different choices of proposal densities. Considering a class of proposal density, for each particle, it depends only on the position of the particle at the previous time and the most recent observations. Snyder et al. (2015) argued that filters using sequential importance sampling and any proposal density in the above class cannot avoid the curse of dimensionality, nor can the so-called optimal proposal density that yields the minimal degeneracy.
By allowing the proposal density to depend on all particles at the previous time, the equivalent-weights particle filter (EWPF) (Browne and   and the implicit equal-weights particle filter (IEWPF) (Zhu et al. 2016) ensure that most or all particles have the same weight, respectively. A comparison of the EWPF with the local ensemble transform Kalman filter (LETKF) revealed that the former produced larger root mean squared errors than the latter (Browne, 2016). The experimental results of (Zhu et al. 2016) showed that the IEWPF has advantages over the LETKF in some cases. Papadakis (2007) used the stochastic EnKF (Burgers et al. 1998;Houtekamer and Mitchell, 1998) as the proposal density, proposing a weighted ensemble Kalman filter (WEnKF) (Papadakis, 2007;Papadakis et al. 2010). In subsequent research, Beyou et al. (2013) used the ensemble transform Kalman filter as the proposal density. Although these two methods have the potential to combine the advantages of the PF and EnKF, van Leeuwen (2009) noted that they made a mistake in the formulas for calculating weights. However, after the mistake is corrected, these two filters are still not applicable to high-dimensional problems . The localized WEnKF was discussed by van Leeuwen (2009), but the modifications in this paper were not considered.
Another technique to eliminate filter degeneracy is using a transformation process to move prior particles to posterior particles in a deterministic way. The ensemble transform PF (Reich, 2013) uses one transformation step, which fails in high-dimensional settings but allows for localization. The variational mapping PF, which pushes prior particles by using a sequence mapping that represents a particle flow, appears to have the potential for high-dimensional applications; van Leeuwen et al. (2019) give more information about transportation PFs.
The third way to relieve filter degeneracy is localization. The idea of calculating the particle weights locally was first introduced by Bengtsson et al. (2003) and van Leeuwen (2003) and was discussed in detail in van Leeuwen (2009). Farchi and Bocquet (2018) divided the localization of PFs into two categories: state-block domain localization and sequential observation localization. The idea of state-block domain localization is similar to that of the LETKF. For each grid point, only the observations around it are assimilated, which explicitly achieves localization but inevitably leads to discontinuities between adjacent grid points. Farchi and Bocquet (2018) summarized and presented some approaches to alleviate these discontinuities. The local PF (LPF) proposed by Penny and Miyoshi (2016) uses the state-block domain localization method and reduces discontinuities via smoothing by weights.
The sequential observation localization processes the observations in sequence, and only nearby grid points are updated. The algorithms are difficult to parallelize but may mitigate the discontinuities. Depending on the framework of the localized ensemble adjustment Kalman filter (Anderson, 2001), Poterjoy (2016) introduced another LPF that uses sequential observation localization to limit the impacts of observations. The LPF expands the scalar weight of each particle to a vector one such that each model variable has a weight. For each observation, the LPF only updates the weights of the model variables in the vicinity of the observation and then uses the merging step to modify only the model variables that are close to the observation, while the model variables far from the observation maintain the prior information. The latest research has applied this method to the Weather Research and Forecasting (WRF) model assimilating artificial radar radial velocity and reflectivity observations (Poterjoy et al. 2017). However, Shen et al. (2017) argued that in the calculation formula of local weights, Poterjoy (2016) uses spatially linear interpolation. They proposed a new formulation of the local weights using an exponential tapering of observation influence. However, with the new formulation in Shen et al. (2017), whether the equations of the merging step in Poterjoy (2016) are still appropriate is debatable. In the following, LPF refers to the local particle filter developed by Poterjoy (2016).
Hybridisation is considered as a fourth way to mitigate filter degeneracy. Taking advantage of combining the characteristics of the EnKF and PF, Frei and Künsch (2013) developed the ensemble Kalman particle filter (EnKPF), which was later modified by Shen and Tang (2015). Different from the idea of the WEnKF, the EnKPF divides assimilation into two stages with different proportions. The two stages adopt the EnKF and PF, respectively, and their proportion is controlled by an adjustable parameter. It constructs a Gaussian mixture model, which can assimilate non-Gaussian errors. By introducing localization, the local EnKPF (LEnKPF) has been applied to a convective-scale numerical weather prediction (NWP) model and has assimilated real observations . Numerical experiments showed that the LEnKPF can improve the accuracy of the EnKF in strongly nonlinear systems. Combined with anamorphosis and regularization jitter, the numerical results of the LEnKPF also showed that in a linear system, it is comparable to the LETKF; Farchi and Bocquet (2018) give details. However, the computational cost of the adaptive choice of the proportion is relatively large.
In this paper, a similar localization approach to the LPF is applied to extend the WEnKF for high-dimensional systems. The localized EnKF is used to adjust the values of the particles, from which the proposal density is calculated to constitute the total weights. Similar to LPF, the weights are stretched to vector forms, thereby limiting the spatial influence of the observations on the weights. The effectiveness of our new method was tested by applying it to the Lorenz 96 model and the quasi-geostrophic model. The obtained results indicate that localization can effectively solve the filter degeneracy problem and that the new filter performs similarly to or even better than the LPF or the EnKF in both linear and nonlinear cases.
In Section 2, we briefly introduce the idea of the WEnKF; then, we expand the scalar weights of particles to vectors and propose the new local weighted ensemble Kalman filter in Section 3. The numerical experiments and results are presented in Section 4 before a summary and conclusions are provided in Section 5.

THE WEIGHTED ENSEMBLE KALMAN FILTER
Similar to the general data assimilation method, we assume that the forecast model is first-order Markov, the observations depend spatially and locally on model variables, and the observation errors at different spatial locations or different times are independent. According to Bayesian theory, we multiply and divide the right-hand side by the PDF q(x n |x n−1 , y n ), which is also called the proposal density. The posterior PDF of the model variable x n at time-step n given the observations y at time-steps 1, 2, ..., n can be written as p(x n |y 1∶n ) = p(y n |x n ) p(y n ) ∫ p(x n |x n−1 ) q(x n |x n−1 , y n ) × q(x n |x n−1 , y n ) p(x n−1 |y 1∶n−1 ) dx n−1 . (1) Sample x n from the proposal density q(x n |x n−1 , y n ) but not the original transition density p(x n |x n−1 ); then, x i determines the value of each particle in the model space, and the weight w i indicates the relative importance of this particle in representing posterior information.
The weight is divided into two parts, with w o i and w * i denoting the likelihood weight and the proposal weight, respectively.
The likelihood weight represents the probability density of the observations given a model variable. The proposal weight is a proportion that is related to the use of the proposal density to evolve particles rather than the original transition density; thus, it is related to the choice of the proposal and the forecast model. Because we only need the relative values of weights, on the right-hand side of Equation (2), we can ignore the terms with the same value for each particle; then, the total weight is Considering the stochastic EnKF (Houtekamer and Mitchell, 2005) as the proposal density, Papadakis et al. (2010) proposed the weighted ensemble Kalman filter (WEnKF). The formulae for the total weight are given in Papadakis et al. (2010), but for clarity and completeness of this paper, the derivation of the proposal weight is given in Appendix A. The mistake in Papadakis et al. (2010) and Beyou et al. (2013) is that the proposal density is considered to be approximately equal to the original transition density, thus ignoring the proposal weight in the total weight . Consequently, the variance of the total weights is reduced, and the filter degeneracy problem is eased to some extent. However, due to the incorrect calculation of weights, the importance of each particle is misunderstood, and we cannot obtain the correct posterior PDF. The EnKF step used in the WEnKF only makes particles closer to the observations but does not ensure that the weights of particles are evenly distributed. From the current conclusion, using the localized EnKF as the proposal is not valid in high-dimensional systems because it still cannot prevent filter collapse (Morzfeld et al. 2016).

THE LOCAL WEIGHTED ENSEMBLE KALMAN FILTER
Using the localized EnKF as the proposal is not sufficient to prevent filter degeneracy; thus, we introduce localization to the likelihood weight and propose the new local weighted ensemble Kalman filter (LWEnKF). If the observation errors are independent, that is, the covariance matrix R is diagonal, then where y j is the jth element of observation vector y. Under this assumption, the observations can be assimilated serially by calculating the likelihood weights one by one. Because the current consideration is the observations at time step n, we will ignore the superscript n for simplicity. Following Poterjoy (2016), we extend the likelihood weights from a scalar to a vector with a length of N x . Using this vector weight of each particle as a column, we can form an N x × N matrix denoted as . Each element of is calculated through the formula as in Poterjoy et al. (2019) and the subscript k = 1, ..., N x is the model variable indicator. The localization function l c j,k depends on the spatial location of observation y j and model variable x k,i . The cut-off coefficient is denoted by c. We adopt the Gaspari-Cohn (GC) function as the localization function, which is widely used in the localization schemes of EnKFs and PFs. The GC function has compact support and decays to 0 with a radius of 2c. Note that the total weight is the product of the likelihood weight and the proposal weight; thus, the likelihood weight must be normalized before multiplication, which is different from the LPF.
Similar to Poterjoy (2016), a parameter 0 < < 1 is used to adjust the scalar proposal weights (Equation (4)) and the scalar likelihood weights (Equation (10)) . In fact, this adjustment sets the lower limit of the weights, which can mitigate the occurrence of filter degeneracy. Generally, is slightly less than 1. There are other better-performing methods to avoid filter degeneracy, such as those described by Farchi and Bocquet (2018) and Poterjoy et al. (2019). Considering that the focus of this paper is on how to localize the WEnKF, we will adopt this simple adjustment.
After the likelihood weight is extended to a vector, the total weight is also a vector that needs to be resampled. A merging strategy was proposed by Poterjoy (2016) and Poterjoy et al. (2019). We also use this strategy and make appropriate changes to fit the LWEnKF. First, a stochastic universal sampling is performed according to the proposal weights to obtain proposal-resampled particles. This resampling is performed to consider the influence of the proposal density on the weights. Then, for each observation y j , the scalar total weights of all particles are calculated and normalized. According to these total weights, a stochastic universal sampling is performed to obtain total-resampled particles. Last, the corresponding vector total weights are evaluated, and the model variables are updated locally by a linear combination of the proposal-resampled particles and the total-resampled particles.
In addition, similar to Poterjoy (2016), a non-parametric probability mapping approach called kernel density distribution mapping (KDDM; McGinnis et al. 2015) is used after all the observations at time step n are assimilated sequentially. As explained by Poterjoy (2016), Farchi and Bocquet (2018) and Poterjoy et al. (2019), the posterior particles are adjusted by KDDM to have the same quantile as the proposal particles. Experiments will test the effectiveness of this approach in the LWEnKF.
In conclusion, when assimilating observation y, the procedure for the LWEnKF is as follows: 1. Use the forecast model to evolve all the particles to the observation time step. 2. Perform the local perturbed EnKF analysis on each prior particle to obtain the proposal particles (4), (A11) and (A12)). Adjust the proposal weights by the parameter through Assume that the scalar observation y j−1 is already assimilated. To consider the impact of the proposal weights, the total weight without localization for the updated model variable x Then, observation y j is processed as follows: a. For i = 1, 2, ..., N, calculate the likelihood weight according to y j and normalize it: b. For each k in {k|l j,k > 0}, calculate the total weights without localization for x j−1 k,i : c. For each model variable k in {k|l j,k > 0} and each particle i in (1, 2, ..., N), use proposal particles x 0 i in Equation (8) to update the vector likelihood weights j k,i and derive the vector total weights The normalization factor is Ω update the model variables by merging the particles via: where s i is the resampled indicator for the total weights w j k,i , and t i is the resampled indicator for the last step total weights w j−1 k,i . The weighted average of the proposal particles is The parameters r 1,k and r 2,k provide the linear combination of the total-resampled particles and the proposal-resampled particles (Equations (6)), (14)- (16)): e. After updating the particles x j k , the total weights also need to be updated for the next cycle through w j k,i = 1∕N for k in {k|l j,k > 0} and i in (1, 2, ..., N). 5. After all the observations at time-step n are assimilated sequentially, the updated particles obtained in the previous step are x N y i . The KDDM step is performed for each model variable x N y k as follows: a Use Gaussian kernels to approximate the prior and posterior densities by particles with equal weight and the total vector weights.
The kernel bandwidth is chosen as the sample standard deviation of the particles.
b Use the trapezoid rule to form the cumulative distribution functions (cdfs) for prior (cdf p k ) and posterior (cdf u k ) cdfs. c Use cubic spline interpolation to find the cdf values of each proposal particle c The derivation of Equations (6), (14)- (16) is similar to that in Poterjoy (2016), with only minor changes (Appendix B). The motivation is that the mean and variance of the posterior ensemble are approximately equal to the weighted mean and weighted variance of the proposal particles when vector weights are used.
The value of the localization function l c j,k ranges from 0 to 1. When the spatial position of the model variable is close to the observation, the localization function tends to be 1, and d k → 0; thus, The approximately equal sign in the formula is established because the variance of the total-resampled particles estimates the weighted variance of the proposal particles with total weights. At the same time, we have lim l c j,k →1 r 2,k = 0. In other words, for model variables whose position is very close to the observation, the updated particles are almost equal to the total-resampled particles according to Equation (13). Likewise, when the distance between the model variables and the observations is very large (> 2c), we derived l c j,k = 0 and d k → ∞, which leads to lim l c j,k →0 lim l c j,k →0 When l c j,k → 0, the posterior variance and mean are equal to the variance and mean based on the proposal weights, respectively. That is, when assimilating an observation, the model variable close to the observation location is updated by WEnKF, and the model variable far from the observation position is updated by the proposal. However, outside these domains, only the first two moments are considered by the merging strategy, so the KDDM is used additionally to adapt the particles to the higher-order moments.
The merging step enables the posterior particles to combine the proposal particles and the resampled particles to avoid filter degeneracy. It is also an artificial approach to adjust the posterior mean and covariance. The LWEnKF and LPF can still work without using the merging step but using resampling instead. Actually, in this case, the observations could be assimilated at once, which would be equivalent to the state-block localization described by Farchi and Bocquet (2018). To compare the merging step and the adjustment-minimizing stochastic universal (SU) sampling, experiments are presented in Appendix C.
Note that, although the EnKF has a linear/Gaussian assumption, there is no such hypothesis in the likelihood weights calculation or the merging step. Therefore, this does not prevent the LWEnKF from being a nonlinear non-Gaussian filter.

NUMERICAL EXPERIMENTS
In this section, several observation system simulation experiments are used to test the LWEnKF for different observation operators, different ensemble sizes and different models. A simple but computationally fast 40-dimensional Lorenz '96 (L96) model (Lorenz, 2006) is used to compare the analysis results of the LWEnKF, the LPF by Poterjoy (2016), the localized perturbed observation EnKF by Anderson (2003) and the block-local EnKPF by Robert and Kunsch (2017). Sequential-observation localization is used in these filters. For convenience of presentation, LEnKF and LEnKPF are used to represent the localized perturbed observation EnKF and the block-local EnKPF, respectively.
First, filters are tested under a mildly nonlinear configuration, which can provide a fair comparison between the PFs and LEnKF. Then, nonlinearity and non-Gaussianity are introduced, which makes the data assimilation problem difficult for the LEnKF, where PFs excel. We also focus on the improvement of the LWEnKF and LPF by the KDDM, and the experiments are conducted in a 100-dimensional L96 model.
As a further comparison, the LWEnKF, LPF and LEnKF are applied to a two-layer quasi-geostrophic model (Pedlosky, 1987) to test the effectiveness of the new method.
The localization function used by the filters is the GC function, and the localization length-scales are determined experimentally. As mentioned previously, in the proposal density of the LWEnKF the observation error matrix R is inflated by the inflation coefficient, which is chosen experimentally. The parameter in the LPF and LWEnKF is chosen among 0.70-1.00 experimentally.
Experiments using the L96 model are programmed in the MATLAB language, in which the inflation of background-error covariance in the LEnKF and LEnKPF is multiplicative with coefficient testing from 1.00 to 1.20. The two-layer quasi-geostrophic model experiments are performed on the Data Assimilation Research Testbed (DART) and use the adaptive prior covariance inflation (Anderson, 2007) in the LEnKF with the prior standard deviation set to 0.6. To eliminate the influence of the random initial ensembles on the data assimilation, all the following experiments were repeated ten times for different initial ensembles.

Lorenz 96 model
The L96 model contains N x variables whose positions are evenly spaced on a circle. The evolution of each model variable over time is controlled by the following differential equations where k = 1, 2, ..., N x and To ensure chaos in the model dynamics, the forcing term of the truth model is typically set to F = 8.0. The time step of the L96 model is set to 0.05 (6 hr). The truth model runs 10000 time steps to generate the truth and the observations. The observations are simulated by adding random errors drawn from N(0, R) to the truth, in which R = I.

A mildly nonlinear configuration
For the first test, the L96 model with 40 variables is used with no model error. All the model variables are observed directly at every time step, which means that the observation operator is h = I. This configuration provides an advantage to the EnKF but would be challenging for PFs. The LWEnKF, LPF, LEnKF and LEnKPF are tested with ensemble sizes of 10, 20, 40, 80, 160, and 320. The average analysis RMSE is used to evaluate the performance of these filters. The RMSE is averaged over 10000 model time steps. The average RMSEs as a function of ensemble size for each filter are shown in Figure 1. The LWEnKF is always significantly superior to the LPF. The LWEnKF is superior to the LEnKPF when the ensemble size is relatively small (N = 10, 20) and is comparable to the LEnKF when the ensemble size reaches 320. When the ensemble size exceeds 160, the RMSEs obtained by the LEnKPF are lower than those obtained by the other three filters. The LPF obtains higher RMSEs than the other three

A nonlinear/non-Gaussian configuration
In this test, the observation occurs every 4 time steps (i.e. 24 hr), and only the model variables at odd positions are observed. The forecast model used in data assimilation experiments has a forcing term of F = 8.4, which introduces model error caused by inaccurate model parameters. The model error covariance matrix Q is calculated statistically using climatological samples of model error. Q is multiplied by a non-tuning constant to ensure computational stability.
The L96 model with 40 variables is used to test the sensitivity of the LWEnKF, LPF, LEnKF and LEnKPF to the ensemble size and the observation type. All filters were tested with ensemble sizes of 10, 20, 40, 80, 160, and 320. Three sets of experiments were conducted, in which the only difference is the specification of the observation operator h. The first set of experiments used a linear function h(x) = x, while the last two sets used nonlinear functions h(x) = |x| and h(x) = ln(|x| + 1), in which the latter has more nonlinearity than the former.
In this test, nonlinearity and non-Gaussianity are introduced in several ways: using rare and sparse observations, using nonlinear observation operators, and using non-Gaussian model errors. These set-ups make the data assimilation problems easier for PFs than for the EnKF.
We used the average analysis RMSE to evaluate the performances of the four filters. The RMSE is averaged over 2500 analysis cycles, i.e. 10000 model time steps. Figure 2 illustrates the average RMSE for three different observation types as a function of ensemble size for each method. In general, we can obtain effective results from the LWEnKF for ensemble sizes as small as 10. That is, the filter does not collapse over the entire assimilation period, and the RMSE of the posterior ensemble is smaller than that of the prior ensemble (not shown). Because the WEnKF cannot yield these results, the experiments verify that the localization procedure can use small ensembles to prevent filter degeneracy. When the observation operator is linear in Figure 2a, the LWEnKF can match the performance of the LEnKF, while the LPF exhibits relatively poor behaviour for the ensemble size tested here. This result occurs because of the Gaussian/linear assumption that the LEnKF makes. With more than 20 particles, LEnKPF outperforms the other three filters.
The LWEnKF provides an advantage over the LPF, LEnKF and LEnKPF when the observation operator is the absolute value of the system variables in Figure 2b. The nonlinearity of the absolute function fails the Gaussian/linear assumption of the LEnKF, but it is not sufficient to fulfill the advantages of the PFs; thus, the LPF performs worse than the LEnKF until the ensemble size reaches 40. When the logarithmic function is taken as the observation operator, as shown in Figure 2c, the strong nonlinearity allows the LPF to outperform the LEnKF using as few as 10 particles, while the LWEnKF provides results similar to the LPF, which confirms its ability to assimilate For both linear and nonlinear observation operators cases, the LWEnKF has an obvious advantage over the other filters when the number of particles is small (N = 10, 20) . For more than 40 particles, the LEnKPF is slightly better than the LWEnKF for the case of the linear observation operator, and for the case of the nonlinear observation operators the LEnKPF is worse than the LWEnKF. The LEnKF and LPF have shortcomings in the case of the nonlinear and linear observation operators, respectively, and the LWEnKF is no worse than these two methods under all three observation operator cases. In this sense, we believe that the proposed method can effectively promote the advantages of the LEnKF and LPF, avoiding the disadvantages of the latter two filters to some extent.

Benefits of KDDM
To test the impact of the probability mapping method on the LWEnKF, we increase the dimension of the L96 model to a relatively high 100 dimensions. We chose the KDDM here as in Poterjoy (2016). The experiments were conducted with 80 particles. Similar to the set-ups in Section 4.1.2, the model variables at odd positions are observed every 4 time steps, and the observation operator is h(x) = |x|. The model error caused by the forcing term F = 8.4 is also considered. Figure 3 shows that when the KDDM was applied, both the LWEnKF and LPF resulted in a lower RMSE than those without the KDDM. The LWEnKF using no KDDM is even more outstanding than the LPF with the KDDM. Comparison of the ratio of the average RMSE to the average spread illustrated in Figure 4 indicates that the spread of the LEnKF is too low. When no KDDM step is used, the LPF and LWEnKF perform with too much and too little spread, respectively. Applying the KDDM causes good adjustment of the relationship between the RMSE and the spread, which can be clearly seen in Figure 4.
As a further comparison, Figure 5 presents rank histograms calculated for every fortieth model variable from the LEnKF, LPF and LWEnKF. The abscissae of these figures are discrete bins formed by posterior particles sorted in ascending order, and the ordinate counts the number of times when the true model variable falls within different bins (Hamill, 2000). A uniform distribution of count across the bins can reflect that the spread of the analysis ensemble matches the RMSE, which is a good quality of ensembles (Zhu et al. 2016). As shown here, the U-shaped histograms of the LEnKF indicate strong underdispersion. The histograms of the LWEnKF without the KDDM are slightly sunken, which can be slightly alleviated by the KDDM. The humped histograms of the LPF show overdispersion, which can be modified into a relatively uniform distribution by the KDDM. Some of the histograms have high values at both ends, which means that the truth falls outside the range of the particles. This also occurred to most PFs (Farchi and Bocquet, 2018). Note that all the histograms are stacked more or less to the left due to model errors, while the histograms F I G U R E 5 The rank histograms calculated from posterior particles generated by (a) LEnKF, (b) LPF without (first row) or with (second row) KDDM and (c) the LWEnKF without (first row) or with (second row) KDDM. The columns represent the model variables 1, 41 and 81 of the LWEnKF appear to be the most balanced, which reflects the ability of the new algorithm to handle model errors.
The analysis errors of all the variables for the last 1000 time steps are shown in Figure 6. The errors of the LWEnKF with the KDDM are far smaller than those of the other experiments for almost all the variables at almost all time steps, which strongly indicates that the LWEnKF can significantly outperform the LEnKF and LPF with affordable particles. In addition, even if the KDDM is not adopted, the analysis errors of the LWEnKF appear to be smaller than those of the LEnKF and LPF with the KDDM, which further indicates that the new filter is better than the other two filters to some extent.

The two-layer quasi-geostrophic model
The experimental results of the simple model in the previous section indicate that our proposed method can combine the advantages of the LEnKF for the linear observation operator and the LPF for the nonlinear observation operator. This subsection further applies the new method to a quasi-geostrophic model.
The two-layer quasi-geostrophic model considered in this subsection describes the atmospheric flow for geostrophic wind motions. It is a useful tool for studies of data assimilation in NWP systems because it provides some dynamics for operational weather models, such as baroclinic instability. This model supports different resolution settings, has relatively low computational complexity and has been used in studies of the basic characteristics of various data assimilation methods (Fisher et al. 2011;Mussa et al. 2014;Bibov et al. 2015).
The equation and parameter settings of the two-layer quasi-geostrophic model can be found in Fisher et al. (2011). The time step of the model is 1 hr. The model uses a rectangular grid of 64 × 32 with 64 zonal grid points and 32 meridional grid points and two vertical layers. The model is added to the DART platform, on which the experiments in this subsection are conducted.
For the perfect model, we set the upper and lower layer depths to D 1 = 6000 and D 2 = 5000, respectively. The data assimilation tests use a non-perfect model with D 1 = 5500 and D 2 = 4500. The model variables consist of stream function (dimensionless) and potential vorticity q (dimensionless). The zonal wind u (m/s) and the meridional wind v (m/s) are calculated from the stream function using the centred finite difference method. For simplicity of calculation and programming, the model error covariance matrix is simplified into a diagonal matrix as Based on a 50-day period model run, the spatially averaged model error standard deviation ( u , v , , and q ) is statistically estimated using climatological samples. For oceanic or atmospheric data assimilation, the model error covariance can be constructed through control variable transforms.
The model runs a 500 time-step spin-up. To generate the initial particles, the model variables are perturbed and run another 200 time steps to adjust the model dynamic balance. Artificial observations are performed every 12 hr at 20 randomly generated positions per layer, and only wind components u and v are observed with Gaussian error N(0, 0.25). Observations are rare and spatially sparse, which simulates real geophysical data assimilation systems. The data assimilation tests were performed for 1000 days, and the first 50 days were discarded as the filter adjustment period. A relatively small 20 particles are used for involving sampling errors.
Inspired by Poterjoy and Anderson (2016), we use three sets of experiments to test the LWEnKF, LEnKF and LPF. Each set of experiments uses different observation operators: the linear operator h(x) = x and the nonlinear operators h(x) = |x| and h(x) = ln(|x − x|), where x is the climatological mean of the truth model variables.
The RMSE and spread of the analysis particles of the three filters are shown in Table 1. The data in the table are the time averages from 50 to 1000 days of the assimilation period. The LPF and LWEnKF outperform the LEnKF in the experiments using logarithmic observation operators, which demonstrates the advantages of the particle filters. The LWEnKF yields the lowest RMSE when using the linear and the absolute observation operators. When the nonlinear operator h(x) = |x| is used, the analysis RMSE of the LPF is higher than that of the LEnKF, which is reasonable due to the small number of particles, and also reflects the shortcomings of the LPF. However, the LWEnKF alleviates this disadvantage to some extent.
The assimilated observations are the meridional wind and the zonal wind, rather than the model variables potential vorticity and flow function. Comparing the RMSE of the observed model variables with unobserved model variables in Table 1, the ability of the filters to estimate the correlation between multiple variables can be observed. For the potential vorticity and stream function, the performance of the LWEnKF is comparable to that of the wind  components, which indicates that, for multivariate models with complex relationships, the LWEnKF can sample in the high-probability region of the posterior PDF.
The following presents some results for experiments using the nonlinear observation operator h(x) = |x|.  Figure 7 shows the RMSEs of the LEnKF, LPF and LWEnKF at each analysis time step, where the RMSE of the LWEnKF is the lowest for most time steps. Moreover, comparing the observed wind components with the unobserved potential vorticity and stream function, the trend of their RMSE time series is similar, which is consistent with the time-averaged RMSEs in Table 1.
The truth and comparison of the three filters' analyses in potential vorticity and stream function at 1000 day are illustrated in Figure 8. We subtract the truth from the analyses in Figure 8 to obtain Figure 9 where it can be clearly seen that the analysis field of the LWEnKF is the closest to the truth among the three data assimilation methods.

CONCLUSIONS
This paper has presented a localized weighted ensemble Kalman filter, which is a particle filter that combines the advantages of the LEnKF and LPF and can be applied to high-dimensional systems without filter degeneracy. The LWEnKF first uses the LEnKF to assimilate the observations locally and serially and then calculates the proposal weights based on the obtained proposal ensemble and all the observations at the current assimilation time step. Meanwhile, the local likelihood weights are calculated serially. Finally, the merging steps are performed according to the product of the likelihood weights and the proposal weights; that is, the particles are updated by the linear combination of the resampling particles and the proposal particles. Optionally, the probability mapping approach is used to adjust the higher moment of the ensemble to obtain the analysis particles.
The new method computes local likelihood weights in vector form. The likelihood weights together with the proposal weights constitute the total weights in vector form, which can limit the effect of the observations within the specified spatial domain. In the vicinity of the observations, the updated particles approximate the original WEnKF. Far from the observations, the particles only retain information from the LEnKF and the proposal weights.
Using the L96 model, we tested the effectiveness of the LWEnKF and compared it with that of the LPF, LEnKF and LEnKPF. In the experiments with a mildly nonlinear configuration, the LWEnKF is always significantly superior to the LPF and is superior to the LEnKPF when the ensemble size is relatively small. However, the LWEnKF needs a large ensemble size to be comparable to the LEnKF. The LEnKPF shows significant advantages over the LWEnKF when the ensemble size is large.
In the experiments with a nonlinear/non-Gaussian configuration, we set up different observation operators and different scales of particles. It can be found that the LEnKF are more efficient than LPF when using the linear observation operator. More efficient here means that fewer particles are needed to achieve the same assimilation effect (measured by RMSE). When the nonlinearity of the observation operator is strong, the LPF can provide better results than the LEnKF when acceptable particles are adopted, which is consistent with the conclusions in Poterjoy (2016). In both the linear and nonlinear observation operator cases, the LWEnKF can combine the advantages of the LEnKF and LPF and achieve similar or better assimilation efficiency. When the number of particles is small, the LWEnKF has an obvious advantage over the LEnKPF, LEnKF and LPF. This indicates the potential of the LWEnKF for realistic applications since atmospheric or oceanic data assimilation systems can use only a few particles due to limited computing resources. Moreover, the KDDM step can improve the relationship between the RMSE and spread of the analysis ensemble, thereby improving the quality of the posterior ensemble.
Furthermore, the new filter is tested in a sufficiently realistic two-layer quasi-geostrophic model. The experimental results show that the LWEnKF can effectively avoid the filter degeneracy problem in high-dimensional systems. When the observation operator, the model and the number of particles are the same as in the LPF and LEnKF, the analysis fields obtained by the LWEnKF are the closest to the truth. This result reflects the efficiency of the new approach and its potential for application in NWP models.
The LWEnKF is added to DART and can easily be combined with various models and can be readily compared with other data assimilation methods. However, the proposal weights are calculated using the model error covariance matrix, which may be highly degenerate in a realistic geophysical model. Although the weight inflation through can mitigate the degeneracy, this is an artificial technique. If using = 0, the proposal weights and the total weights without localization of the LWEnKF are equal weights, which causes the particles to represent PDF incorrectly. To eliminate this degeneracy, the proposal weights could also be considered localized, and relevant research is under way.

ACKNOWLEDGEMENTS
The data used in the observation system simulation experiments are included within the paper. Thanks are extended to Dr. Zheqi Shen for providing the codes for the LPF with the L96 model used here. The codes for the LEnKPF are modified according to https://github.com/robertsy/ assimilr (accessed 31 October 2019; Robert and Kunsch 2017). This paper is supported by the National Key R&D Program of China (2018YFC1406202), the National Natural Science Foundation of China (NSFC, grant no. 41675097), and the Hunan Provincial Innovation Foundation For Postgraduate (no. CX2017B034).

ORCID
Yan Chen https://orcid.org/0000-0003-2066-6064 where M(⋅) is the nonlinear deterministic forecast model and the model error is Gaussian with zero mean and covariance matrix Q. The equations for the localized stochastic EnKF (Houtekamer and Mitchell, 2005) are i ∼ N(0, R), (A3) where h(⋅) is the nonlinear observation operator. The observation error is denoted as . The Kalman gain matrix K e is calculated from the ensemble and localized via localization function l.
Considering that the model error is small relative to the model variables, there is an approximation where H is the tangent linear observation operator corresponding to h. The right-hand side of Equation (A4) can be divided into two parts: the deterministic part and the remaining is the stochastic part. If the model error is not related to the observation error, the covariance matrix of the stochastic part is = (I − K e H)Q(I − K e H ) T + K e RK T e .
Then, the denominator of the proposal weight (4) is Finally, it is easy to derive the transition density related to the model Equations (A1) and (A2), which is also the numerator of the proposal weight:

B. DERIVATION OF MERGING EQUATIONS
The purpose of this appendix is to derive the update equations for the kth model variable. The derivation is similar to Poterjoy (2016) and Poterjoy et al. (2019), which we include here for the completeness and clarity of the paper. Assume that the kth model variable of the particle i is updated through Let the mean of the updated model variable be equal to the weighted mean of the proposal particles.
In Equation (B2), the mean of the updated particles is approximated by the weighted mean of the particles before sampling.x j k is the mean of particles with total weights w j k,i , andx j−1 k represents the mean of particles with total weights w j−1 k,i . Then, Letting the variance of the updated model variable equal the weighted variance of the proposal particles, we obtain Equation (B3). Then, the positive solution can be derived as in Equation (B4). The updated particles can represent the proposal particles with vector total weights after each observation is assimilated: