Weight structure of the Local Ensemble Transform Kalman Filter: A case with an intermediate atmospheric general circulation model

The Local Ensemble Transform Kalman Filter (LETKF) computes analysis by using a weighted average of the first‐guess ensemble with surrounding observations within a localization cut‐off radius. Since overlapped observations are assimilated at neighbouring grid points, the LETKF results in spatially smooth weights. This study explores the spatial structure of the weights with the intermediate atmospheric model SPEEDY (Simplified Parameterizations, Primitive Equation Dynamics). Based on the characteristics of the weight structure, we also aim to improve the weight interpolation (WI) method, which we use to compute the weights at coarser reference points and interpolate the weights into higher‐resolution model grid points. The results show that larger localization and sparser observations result in spatially smoother weights. WI is less detrimental when weight patterns are spatially smoother. An advanced WI method with observation‐density‐dependent reference points results in better forecasts than those with uniformly distributed reference points. This improvement may be due to the spatially inhomogeneous localization function realized by the WI method with observation‐density‐dependent reference points. The spatial distribution of the optimal localization scales shows that larger (smaller) localization is beneficial in sparsely (densely) observed regions. The WI method is computationally more efficient with larger ensembles since the additional computational cost for the WI is lower than that for the LETKF.


INTRODUCTION
Recent advancements in high-performance computing (HPC) systems have enabled numerical weather prediction (NWP) at high spatial resolutions, such as the 870-m mesh global atmospheric simulation (Miyamoto et al., 2013). In addition, various types of satellites have been able to provide dense atmospheric observations due to advances in sensing technologies. These high-resolution NWP models and dense observations have rapidly increased the computational costs of data assimilation (DA), which seeks the optimal combination of NWP forecasts and massive observations. Therefore, computationally efficient DA methods are very important for current NWP systems. Ensemble DA is a computationally feasible method for parallelized supercomputers because ensemble forecasts are essentially parallel. The ensemble Kalman filter (EnKF; Evensen, 1994) is a common ensemble DA method for NWP. For high-dimensional NWP systems, the generally affordable ensemble size O (10 2 -10 3 ) is much smaller than the degrees of freedom >O (10 6 ) of the NWP models. However, the atmospheric dynamics present local low dimensionality (Patil et al., 2001;Oczkowski et al., 2005), so that practical EnKFs for high-dimensional systems use a localization technique. Among the various types of local EnKF, the Local Ensemble Transform Kalman Filter (LETKF; Hunt et al., 2007) is an efficient DA method for parallel computations. The LETKF has been used in operational NWP systems by the Deutscher Wetterdienst (DWD) and the Japan Meteorological Agency (JMA). In addition, recent studies have shown that the LETKF can be used to optimize tuneable parameters of NWP models (Ruiz et al., 2013;Ruiz and Pulido, 2015;Kotsuki et al., 2018Kotsuki et al., , 2020. The LETKF solves the analysis update equations of the Ensemble Transform Kalman Filter (ETKF; Bishop et al., 2001) at every model grid point independently. The ETKF computes the transform matrix, which consists of the weight vector for ensemble mean updates and the weight matrix for ensemble perturbation updates (cf. Section 1). Previous studies with the LETKF explored methods to reduce the computational costs. For example, Yang et al. (2009) proposed the weight interpolation (WI) method, in which the weight vector and the matrix (or simply weights) are computed at coarser model grid points and the weights interpolated to obtain analysis increments at higher-resolution model grid points. The WI method accelerates the computation because the spatial interpolation is generally cheaper than computing the weights. Yang et al. (2009) also performed a series of observing system simulation experiments (OSSEs) using a quasi-geostrophic model to demonstrate that the WI method has a clear advantage as compared with interpolating analysis increments. This WI technique is used operationally in DWD's global and regional LETKF systems (Roland Potthast, personal communication, 6th March 2016). Potthast et al. (2019) also applied the WI method to develop a local particle filter.
The LETKF computes the weight vector and matrix at every model grid point by assimilating surrounding observations within a prescribed localization cut-off radius. Since overlapped observations are assimilated at neighbouring grid points, the weights are expected to vary smoothly in space. With the quasi-geostrophic model, Yang et al. (2009) demonstrated that the weights actually varied in space more smoothly than the analysis increments.
This study aims to explore the spatial structure of the weights, or simply the weight structure, of the LETKF using an intermediate atmospheric global circulation model known as the SPEEDY (Simplified Parameterizations, Primitive Equation Dynamics) model (Molteni, 2003). The weight structure would be sensitive to factors such as observing density, ensemble size and localization scales. We perform a series of OSSEs using the SPEEDY-LETKF system to investigate the weight structure systematically with those factors. Based on the characteristics of the weight structure, we then aim to explore better WI methods. The weights tend to be smoother in space with sparser observations, and we could adaptively choose the density of the grid points where we compute the weights. This study will propose an efficient WI method by considering the spatial smoothness of the weights due to the spatial distribution of observations. This paper is organized as follows. Section 2 describes the LETKF and the methodology for investigating the weight structure. Section 3 presents results of the spatial patterns of weights and the WI experiments. Section 5 provides a discussion, which is followed by a summary in Section 6.

Local Ensemble Transform Kalman Filter (LETKF)
The LETKF, a computationally efficient version of the local EnKF (Ott et al., 2004), has been used in a number of studies on real-world atmospheric DA (e.g., Terasaki et al., 2015Terasaki et al., , 2019Kotsuki et al., 2017aKotsuki et al., , 2017bKotsuki et al., , 2019aKotsuki et al., , 2019c. Here we briefly introduce the LETKF based on the standard Kalman filter (Figure 1). The Kalman filter F I G U R E 1 Schematic of transformations/projections of error covariance matrices between model, observation and ensemble spaces in the Ensemble Transform Kalman Filter (ETKF). Black, blue and red circles denote the model, observation and ensemble spaces, respectively. The standard Kalman filter solves analysis equations between the model and observation spaces (Equations 3 and 4). The ETKF solves analysis equations between the ensemble and observation spaces (Equations 8 and 9) equation is given by where x is the model state vector (∈R n ), y is the observation vector (∈R p ), H is the linear observation operator matrix (∈R p × n ), R is the observation error covariance matrix (∈R p × p ), K is the Kalman gain matrix (∈R n × p ), P is the error covariance matrix (∈R n × n ), and I is the identity matrix. The superscripts b, a and o represent the background (prior), analysis (posterior) and observation, respectively. The number of model state variables and the number of observations are denoted by n and p, respectively. Assume that we have an ensemble with an ensemble mean vector and a perturbation matrix given by x (∈R n ) and where m is the ensemble size. The overbar indicates the ensemble mean. The EnKF approximates the error covariance matrix by sample estimates using the ensemble perturbations: With this approximation, the rank of P is equal to the rank of X, which is at most m − 1 (Hunt et al., 2007). Bishop et al. (2001) proposed an ETKF that considers the m-dimensional subspace (ensemble space) spanned by the background perturbation matrix X b . The ETKF transforms the background and analysis error covariances from ensemble space to model space: whereP is the error covariance matrix in the ensemble space (∈R m × m ). The tilde indicates the ensemble space. The error covariance matrices in the ensemble space can be projected onto the observation space by the background ensemble perturbation matrix in the observation space Using the equations above, the analysis update equations of the ETKF are given as where w (∈R m ) and W (∈R m × m ) are the ensemble transformation weights for the ensemble mean and the perturbation updates, respectively. In practical EnKF applications, the observation errors are usually assumed to be uncorrelated (i.e., R is diagonal). Additionally, covariance localization is necessary to remove long-range erroneous correlations and increase the rank of the background error covariance P b (Hamill et al., 2001;Houtekamer and Mitchell, 2001;Hunt et al., 2007).
The localization of the LETKF is realized by inflating the observation error variance of R for distant observations from the analysis model grid point (Hunt et al., 2007;Miyoshi et al., 2007). With the localization, the LETKF solves the update equations (Equations 10 and 12) at every model grid point by assimilating the surrounding observations within its localization cut-off radius. Solving the eigenvalue decomposition of (P a ) −1 is one of the most computationally expensive parts of the LETKF.

Properties of weights in the LETKF
The LETKF updates the ensemble mean and the perturbation using ensemble transform weights w and W. Here we describe why weights w and W vary in space smoothly in the LETKF. With the LETKF, an observation is assimilated at the model grid points i and j if the observation is included within the localization cut-off radii of the grid points i and j (Figure 2a). If the grid points i and j are close to each other, they share the same observations within the localization cut-off radii due to the largely overlapped localization areas. Therefore, the analysis error covariances in the ensemble space (P a in Equation 9) at neighbouring grid points are similar, since the corresponding local Y b and R are similar (Yang et al., 2009). Consequently, Equation 10 yields similar w at neighbouring grid points, resulting in spatially smooth variation of w.
In the EnKF, there are many possible choices of analysis perturbation X a that satisfy Equation 5 (e.g., Tippett et al., 2003). The LETKF generates analysis ensemble perturbations by (Equation 12), but the matrix W is not uniquely determined. Since the LETKF analyses every grid point independently, the smooth transition of W in neighbouring grid points is essential. Therefore, the LETKF uses the unique symmetric square root matrix, which ensures spatially smooth transition of W (Hunt et al., 2007). Hunt et al. (2007) pointed out that the symmetric square root matrix minimizes the mean square distance between W and the identity matrix, so that the analysis ensemble perturbations are consistent with the background ensemble perturbations given the constraint of analysis error covariance (cf. Section 2.3.3 of Hunt et al., 2007).

Measure of the spatial smoothness of weights
This subsection describes how we investigate the weight structures of the LETKF. Here we assume physical models that have spatial dimensions, such as atmospheric and oceanic models. This study focuses on the structures of the weight vector w that updates the ensemble mean. To measure the similarity of weights between two model grid points i and j (Figure 2a), we introduce the correlation of the weights at i and j (cw i,j ) given by where s_time and e_time denote the start and end times of the sampling period. The superscript sample represents the sampled weight vector. We also introduce the spatial smoothness of the weights (sw; km 2 ) given by where S j is the area of the jth model grid (km 2 ) and param is a tuneable parameter for computing sw. In this study, param = 0.6 (cf. Section 4.1). As shown later in Figure 6, sw can illustrate the spatial distribution of the smoothness of weights. Larger sw represents spatially smoother weights.
This simple measure sw is sufficient for the present purpose of having a qualitative measure of the spatial smoothness (cf. Section 5.1). A rigorous discussion on the measure of smoothness is beyond the scope of this study. Yang et al. (2009) proposed WI for the LETKF following a suggestion by Bowler (2006); see Section 6.4. Section 2.2 described how the LETKF yields the spatially smoother weights w and W. Based on this property, WI performs local analyses of the LETKF on coarser resolution grid points and distributes the information on the error statistics to the high-resolution grids through the interpolation of weights. Figure 2b,c compares the standard LETKF and the LETKF with WI. After an ensemble forecast, the background ensemble is available at all grid points, denoted by cross marks and circles. Cross marks denote the grid points where the LETKF analysis equations are explicitly solved. The standard LETKF solves the analysis equations (Equations 10 and 12) at all analysis grid points ( Figure 2b). In contrast, with WI we solve the LETKF analysis equations at coarser reference points (Figure 2c, cross marks) and obtain all components of w (1 × m) and W (m × m) by spatial interpolation to the high-resolution grid points (Figure 2c, circles). This interpolation is much simpler and faster than computing the eigenvalue decomposition of (P a ) −1 . Following Yang et al. (2009), we use the bilinear interpolation for WI. Once the weights w and W are computed, we can obtain the analysis ensemble using Equations 10 and 12. In this study, the reference points are selected from model grid points ( Figure 2c). However, the coarser analysis grids can be chosen independently of the model grid points (e.g., Potthast et al., 2019). Figure 2d compares the workflows for a single DA cycle both with and without WI. A single DA cycle has three components: an ensemble model forecast, a simulation-to-observation conversion, and DA. To implement WI, we need only modify the DA component. The detailed experimental settings with SPEEDY-LETKF are described in the next section.

SPEEDY-LETKF
The SPEEDY model is an intermediate global atmospheric model with primitive equations designed for climate simulations (Molteni, 2003). There are 96 × 48 grid points horizontally (T30) and seven vertical layers using the sigma coordinate system. Distances between zonally neighbouring grids are approximately 417 km at the Equator, 361 km at 30 • N and 30 • S, and 208 km at 60 • N and 60 • S. The prognostic variables consist of temperature, specific humidity, zonal wind, meridional wind at the seven layers, and surface pressure. Although running SPEEDY is computationally inexpensive, SPEEDY contains fundamental physical parameterization schemes including cloud, condensation, convection, radiation and surface flux parameterizations. This study uses a DA system that incorporates the LETKF with SPEEDY (referred to as SPEEDY−LETKF).

Experimental design
We perform a series of OSSEs with SPEEDY−LETKF under the perfect model assumption. We first perform the nature run initialized at 0000 UTC on January 1, 1981 by the standard atmosphere at rest. The first year is considered to be a spin-up period, so DA experiments are started at 0000 UTC on January 1, 1982. The observation data are produced by adding uncorrelated Gaussian random numbers to the nature run every 6 hr. The observation error standard deviations (SDs) are 1.0 K for temperature, 1.0 m⋅s −1 for zonal and meridional winds, 1.0 g⋅kg −1 for specific humidity, and 1 hPa for surface pressure. At the observing stations, temperature and winds are observed at all seven layers, and specific humidity is observed at layers 1-4. These experimental settings are determined following previous SPEEDY−LETKF experiments (Greybush et al., 2011;Miyoshi et al., 2014;Miyoshi, 2016, 2019). This study considers three different observing networks ( Figure 3): regularly distributed dense stations at every 2 × 2 model grid point (REG2), regularly distributed sparse stations at every 4 × 4 model grid point (REG4) and radiosonde-like stations (RAOB). The number of observing stations is 1,008 for REG2, 264 for REG4 and 415 for RAOB. All observation data are produced from the same nature run. The ensemble is composed of 16, 32 or 64 members, with initial conditions taken from the nature run on consecutive dates after 0000 UTC on February 1, 1982.
For the LETKF, the adaptive multiplicative inflation method of Miyoshi (2011) is used for covariance inflation. For horizontal and vertical covariance localization, we use the Gaussian-based functions given by  (Miyoshi et al., 2007). When the horizontal localization scale h is, for example, 1,000 km, the localization cut-off radius becomes approximately 3,651 km. The vertical localization scale v is fixed at 0.1 log(Pa) so that an observation at one of the seven model layers has a negligible impact on the other model layers (Greybush et al., 2011;Kondo et al., 2013). For the horizontal localization scale, several experiments with different values will be shown. Hereafter, the horizontal localization scale is simply referred to as the localization scale. Optimal localization scales were estimated manually prior to experiments so that the first-guess root mean square error (RMSE) of the fourth-layer temperature was minimized (cf. Appendix A). This study interpolates the weights only horizontally, not vertically. We solve the LETKF only at reference points, and the weights w and W are bilinearly interpolated into the remaining grid points from the surrounding four reference points comprising a rectangle. We consider uniformly referenced grid points first (cf. Figure 7a−c; Section 4.2) and inhomogeneously referenced grid points later (cf. Figure 7d−f; Section 5.1). At the northernmost and southernmost model grid points, only zonal (i.e., east-west) WIs are performed to avoid complex WIs beyond the Poles. The following Sections 4 and 5 discuss the results with different ensemble sizes, observing networks and localization scales. Hereafter, those experiments are denoted by M{ensemble size}{observing network}_L{horizontal localization scale}km, where the {ensemble size} is 16, 32 or 64, the {observing network} is REG2, REG4 or RAOB, and the {horizontal localization scale} is given as 0300-1400 (km). For example, M32REG4_L0700km represents the 32-member experiment with the REG4 observing network and the 700-km localization scale.

Spatial patterns of weights
Here we investigate systematically the spatial patterns of weights with different localization scales and observing networks. We perform 32-member LETKF experiments with the same first-guess ensemble produced by a 1-month spin-up DA cycle with the experimental setting of M32REG4_L0700km. Figure 4 compares the spatial patterns of weights of the first member (i.e., the first element of w) at the fourth model layer. Figure 4a−c shows the sensitivity to the localization scale with REG4. With h = 500 km (Figure 4a), we find isolated circular patterns around observing stations at lower latitudes (30 • S and 30 • N). This suggests that the weight is mainly dominated by the nearest observations from the model grid points for sparser observations with shorter localization scales. At higher latitudes we find spatially connected weight patterns, possibly because the observations are denser than at lower latitudes. As the localization scale increases, spatially connected weight patterns are more abundant (Figure 4b,c). As the same first-guess ensemble is used for DA, similar weight patterns are seen for h = 700 km and h = 1,000 km. With h = 1,000 km (Figure 4c), the LETKF produces the spatially smoothest pattern.
Figure 4d-f shows the sensitivity of the weight patterns to the observing network. Here, the localization scale is fixed at h = 700 km. Compared to REG4 and RAOB, the amplitude of weights tends to be stronger for REG2 because here are the densest observations (Figure 4d). With RAOB, we find spatially finer weight patterns over land, where denser observations are assimilated (Figure 4f). In contrast, we find isolated weight patterns over the ocean, where observations are sparsely distributed. Figure 4 helps us to understand how the spatial patterns of weights change depending on the localization scale and observing network. First, a larger localization scale results in spatially smoother weight patterns. Second, spatially finer weight patterns appear in densely observed regions. Finally, isolated weight patterns appear in sparsely observed regions. We further explore these tendencies quantitatively with the sw index defined in Section 2.3. Figure 5 compares the spatial patterns of the correlation of the weights (cw; Equation 14) at a model grid point near Japan. Two-month DA cycles are performed for three experiments, and cw is computed using samples from the second month (28 days in February = 112 DA cycles). We find higher cw in closer grids, suggesting that the LETKF produces similar weights in these grids. As the localization scale increases, the area with higher cw increases. This suggests that larger localization results in spatially smoother weights. Based on cw, the spatial smoothness of weights (sw; Equation 15) is computed at each model grid point. sw represents how far similar weight patterns spread spatially. Figure 6 compares sw for six experiments, as in Figure 4. Again, 2-month DA cycles are performed for the six experiments, and cw is computed using samples from the second month. For the REG4 experiments (Figures 6a-c), sw is larger in the tropics, where observations are relatively sparse compared to the midlatitudes. Also, sw is larger near the North and South Poles, where observations are sparser (cf. Figure 3c). This suggests that the weight pattern becomes smoother with sparser observations. As the localization scale increases, sw becomes larger (Figure 6a-c). We find smaller sw in regions of higher elevation, such as over the Tibetan Plateau, Greenland, the Rocky Mountains and the Andes Mountains. This occurs because sw is computed based on the weights at the fourth model layer. Since SPEEDY uses sigma vertical coordinates, the atmospheric pressure of the fourth model layer should be lower over higher mountains. In this study, the vertical localization scale is very small, so the weights over high mountains differ more from the surrounding grid points because the optimal ensemble weights at the fourth model layer show more spatial variations there.
Figure 6d-f compares the sensitivity of sw to the observing network. A comparison of REG2 and REG4 shows that a sparse observing network results in a spatially smoother weight pattern. This is also suggested by the spatially inhomogeneous sw pattern for RAOB (Figure 6f). Such patch patterns in sw are apparent in Figure 6f. These patch patterns are produced when relatively sparse observations are assimilated (cf. Appendix B).

WI experiments
This section explores the impacts of WI on the accuracy of the DA systems. Here we consider WI with spatially uniform reference points (Figure 7a-c). The reference points are distributed uniformly at every λ × λ model grid point (hereafter referred to as the uniform #λ experiment). The number of reference points for each experiment are summarized in Table 1. We perform DA cycles over 2 years from January 1982 to December 1983 and evaluate their accuracies over the preceding 23 months (699 days = 2,796 DA cycles). The ensemble mean first-guess temperature at the fourth model layer was verified relative to the nature run by the RMSE averaged over the globe: The subscript CTRL indicates the control experiment without WI, and TEST indicates the test experiment with WI. The superscript true refers to the true state from the nature run. All experiments use optimal localization scales tuned for every combination of ensemble size and observing network (cf. Appendix A). To test the robustness of the results, we repeated the experiments five times with different random numbers for observation errors. Figure 8 shows the relative changes in the first-guess RMSE due to WI for three ensemble cases with RAOB. As expected, WI degrades slightly (strongly) if denser (sparser) reference points are used. When m = 16, WI results in comparable RMSE to CTRL for uniform #2, moderate degradations of about 5% for uniform #3 and uniform #4, and significant degradations greater than 15% for uniform #5 and uniform #6. Compared to m = 16, sparser reference points have smaller degradation impacts on the first-guess accuracy for m = 64. With a larger ensemble, the optimized localization scale becomes longer, and the LETKF results in smoother weights. Therefore, WI would appear to be less detrimental when weights have spatially smoother patterns. Figure 9 shows the relative changes in the first-guess RMSE due to WI for three observing networks with m = 64. When densely observed (REG2), the first-guess accuracy is more sensitive to the number of reference points compared to the case with sparse observations (REG4). For REG4, the WI experiments show improvements with uniform #4. The reference points of uniform #4 essentially correspond to the observing stations of REG4, except in the polar regions. The results imply that it is better to set reference points at observed points. In the realistic case, it would be better to set reference points at densely observed points such F I G U R E 7 Reference grid points for WI experiments: (a) uniform #2, (b) uniform #3, (c) uniform #4, (d) regulated #2, (e) regulated #3 and (f) regulated #4. Small black dots represent SPEEDY model grid points. Green, blue and red cross marks represent reference grid points. Green, blue and red cross marks in panels (d−f) represent reference points selected by the boundary, observing points and filling steps (cf. The series of WI experiments provide two insights to consider more sophisticated WI. First, less detrimental impacts by WI are observed in sparsely observed experiments. Second, setting the reference points at the observing points seems beneficial. Therefore, we will investigate an additional WI method with observationstation-dependent reference points in the next section. Specifically, the additional experiments allocate reference points inhomogeneously, not uniformly.

Regulating reference points for WI
Here we consider spatially varying reference points for WI in RAOB experiments. We compare WI experiments with uniform and inhomogeneous references with the same number of reference points. Hereafter, WI experiments with regulated reference points are referred to as the regulated #λ experiment, which uses the same number of reference points as the uniform #λ experiment. We set regulated reference points for regulated #2 to regulated #6 by following three steps:  regulated #4, #5 and #6, reduce the reference points in densely observed regions. The reference points were selected so that there is only one reference point within the n search × n search SPEEDY grid points (Table 1). 3 Filling. The filling process selects reference grid points where sparse reference points exist. At all remaining grid points not selected as reference points, distances relative to the nearest reference points d max are computed. Then, the grid point with the largest d max is selected as an additional reference point. The filling process selects the reference points sequentially until the total number of reference points is equal to that of the uniform #λ experiment (red cross marks in Figure 7d−f).
Consequently, the regulated #λ experiment uses the same number of reference points as the uniform #λ for WI. Figure 7d−f shows the reference points for regulated #2, regulated #3 and regulated #4. We use these regulated reference points to perform the WI experiments. Figure 10 shows the changes in first-guess accuracy due to WI for RAOB observations with m = 64. As expected, the WI experiments with regulated references outperform those with uniform references, except for #6. In addition, regulated #2, #3 and #4 result in smaller RMSE than CTRL. Therefore, WI with regulated reference points improved accuracy with reduced computational costs. Figure 11 compares spatial patterns of relative change in the first-guess RMSE for the uniform #2, #3 and #4 and regulated #2, #3 and #4 experiments for RAOB observations with m = 64. With uniform reference points, degradation patterns appear mainly over land, such as over Eurasia, North America and South America, for uniform #3 and uniform #4. In such regions, the weights are similar in spatially finer scales (Figure 6f). Therefore, WI with sparse reference points would seem to degrade the analyses over land. In contrast, beneficial impacts are observed over the Pacific Ocean and Indian Ocean for uniform #2 and uniform #3. With regulated references, degraded impacts over land are less significant for #2, #3 and #4 because of the density of the reference points. In addition, WI shows beneficial impacts over the Pacific and Indian Oceans. In particular, regulated #3 shows improvement relative to CTRL over the globe.
Because of the interpolations of weights, WI would result in spatially smoother weights similar to the larger localization. WI would have an additional effect similar to increasing the localization scale in addition to reducing computational costs. Because WI shows beneficial impacts over the Pacific Ocean, the optimal localization scale may be larger in sparsely observed regions. In the next subsection, we will investigate whether optimal localization scales vary in space when the observing network is spatially inhomogeneous.

Impacts of spatially varying localization scale
Here we investigate a series of experiments for optimizing localization scales (Appendix A). To investigate the optimal localization scale spatially, we evaluate the first-guess accuracy locally (Kotsuki et al., 2019b). We introduce the local RMSE, defined by where D i is the local domain of the grid point i and is set as the surrounding grids within 1,000 km in this study. We simply estimate the optimal localization scales at every model grid point so that local RMSEs are minimized over the 5 months from January to June 1982. Figure 12 shows the spatial patterns of the optimal localization scale for m = 16, 32 and 64. As hypothesized, the optimal localization scales tend to be larger in sparsely observed regions (e.g., over the Pacific Ocean) and smaller in densely observed regions (e.g., over land). It is widely known that the optimal localization scale depends on the ensemble size and the NWP model resolution. This study demonstrates that observation density is also an important factor affecting the optimal localization scale. This is also suggested by the shorter optimized localization scales for REG2 relative to those for REG4 ( Figure A1). Hamrud et al. (2015) found that limiting the number of assimilated observations at analysis grid points improved forecast scores of the European Centre for Medium-Range Weather Forecasts (ECMWF)'s LETKF-based global NWP system. Similarly, Schraff et al. (2016) implemented an adaptive localization into the DWD's LETKF-based regional NWP system by limiting the number of assimilated observations; the localization radius is determined adaptively such that the sum over all localization-weighted observations (f of Equation 17) is equal to a constant tuning parameter. The adaptive localization of Schraff et al. (2016) realizes observation-density-dependent localization.
WI with regulated reference points can be an alternative candidate for observation-density-dependent adaptive localization. Optimizing the localization scale with the local RMSE is an alternative method, although it would be computationally expensive. Exploring better observation-density-dependent localization is thus an important topic for future research.

Computational time
Finally, we discuss the computational time for a single DA step. We performed 32-and 64-member RAOB experiments using 32 cores of a single-CPU machine equipped with an Intel Xeon Processor E5-4650v2 2.4 GHz and 512 GB of main memory. Mean times for the 6-hour SPEEDY ensemble forecast, simulation-to-observation conversion, and DA are 0.31, 0.16 and 4.04 (s) for the 32-member CTRL, and 0.45, 0.29 and 26.92 (s) for the 64-member CTRL, respectively. Namely, DA takes up most of the computational time in this experimental setting. Table 2 summarizes the computational times required for a single DA step, averaged over the initial 40 DA cycles. The results show that WI reduces the total computational time per DA step significantly. The most computationally expensive part of the LETKF is the eigenvalue decomposition of (P a ) −1 , whose computational cost is O(m 3 ). If we were to double the ensemble size from 32 to 64, we would expect an increase in computational time by a factor of 8. The actual computational time of the LETKF for the 64-member CTRL (26.07 sec) is about 7.24 times more than that for the 32-member CTRL (3.60 s), the factor being close to 8. All components of w (1 × m) and W (m × m) are interpolated in the WI experiment, and the computational cost for the interpolation is about O(m 2 ). Namely, with twice as many ensemble members, we would expect four times more computations for WI. The actual computational time of WI for the 64-member uniform #2 (5.33 s) is 3.53 times higher than that for the 32-member uniform #2 (1.51 s), the factor being close to 4. Therefore, WI is more computationally efficient for a larger ensemble. If we have more observations, matrix multiplications in the LETKF such as (Y b ) T R −1 Y b become more computationally expensive. Because we allocate more reference points in densely observed regions, computational costs with regulated reference points are slightly higher than with uniform reference points. However, WI results in significant computational reductions with respect to CTRL even with the regulated reference points. Note. "Other" includes MPI initialization/finalization and data input/output. The computational time should be the same for a given ensemble size.

SUMMARY
The LETKF computes the analysis update equations by assimilating surrounding observations within a localization cut-off radius. The analysis is given by a weighted average of the first-guess ensemble. Since overlapped observations are assimilated at neighbouring grid points, the LETKF results in spatially smooth weights. Yang et al. (2009) proposed the weight interpolation (WI) to accelerate the LETKF computations. This study aimed to explore the weight structure of the LETKF with the intermediate atmospheric model SPEEDY and to improve the WI method based on the characteristics of the weight structure. We performed a series of SPEEDY-LETKF experiments and reached the following conclusions: 1. Larger localization and sparser observations resulted in spatially smoother weights. 2. WI was less detrimental for larger localization scales and in regions where sparser observations were assimilated. This is to say that WI is less detrimental when weight patterns are spatially smoother. In contrast, WI was more detrimental in densely observed regions. 3. An advanced WI method that allocates reference points inhomogeneously was promising. WI experiments with observation-density-dependent reference points resulted in better forecasts than those with uniformly distributed reference points. This improvement may have occurred because of the spatially inhomogeneous localization function realized by WI with observation-density-dependent reference points. 4. The spatial distribution of the optimal localization scale showed that larger (smaller) localization was beneficial in sparsely (densely) observed regions. 5. WI is more computationally efficient for a larger ensemble since the additional computations for WI of O(m 2 ) are smaller than those for the LETKF of O(m 3 ).
Reducing the computational costs of DA is critical, especially for operational NWP systems at high resolutions with massive observations. With an intermediate atmospheric general circulation model, this study demonstrated that WI with observation-densitydependent reference points was a useful method for reducing computational costs without a significant degradation of accuracy. An important future study would be to implement and test observation-density-dependent WI with more realistic DA systems. In addition, exploring more sophisticated observation-density-dependent localization is an important direction for future research.
This study considered only the spatially homogeneous Gaussian-based localization function (Equation 17), which is a common choice for the EnKF in NWP. Recent studies have proposed spatially heterogeneous localization functions (Anderson, 2007;Ménétrier et al., 2015aMénétrier et al., , 2015bSchraff et al., 2016). With such alternative choices, the weight structure of the LETKF would be different, as would the optimal allocation of the reference points.

APPENDIX A: OPTIMAL LOCALIZATION SCALES
This study used brute-force manual optimizations for selecting optimal localization scales. We performed a series of SPEEDY-LETKF experiments with different localization scales and optimized horizontal localization scales so that the first-guess RMSE of the fourth layer temperature was minimized. We performed 6-month DA cycles and used the last 5-month period (February-June 1982) for the optimizations. The optimal localization scales increased with increasing ensemble size ( Figure A1). Among the three observing networks, the optimal localization scale is smallest for REG2 and largest for RAOB. This implies that the optimal localization scale would be smaller (larger) for densely (sparsely) observed experiments.

APPENDIX B: PATCH PATTERNS IN SPATIAL SMOOTHNESS OF WEIGHTS
In this appendix we explain the patch patterns in spatial smoothness of weights (sw) in Figure 6. For this purpose, spatial patterns of correlation in weight (cw) are shown for a sparsely observed region over the Pacific Ocean ( Figure B1). The four panels (a−d) show the cw at the centred grid point indicated by black triangles. Here the centred grid points of panels (a−d) are simply referred to as Grids A−D. The observations at grid points X, Y and Z are also referred to as Observations X, Y and Z.
The LETKF provides similar weights at two grid points when assimilating overlapped observations. In this example, Observations X and Z are assimilated at Grid A, Observations X and Y are assimilated at Grids B and C, and Observations X and Y are assimilated at Grid D. Because of the localization function (Equation 17), distant observations have less impact on the weight. Therefore, the weights are determined mainly by the assimilations of Observation X at Grids A and B. Consequently, LETKF results in similar weights at Grids A and B. In contrast, Observations X have less impact on weights at Grid D. Therefore, the correlation pattern cw at Grid D is dissimilar to that at Grids A and B. Both Observations X and Y have significant impacts on weights at Grid C. This example demonstrates that the weight pattern is domi-nated by the nearest observations in sparsely observed regions. This leads to the production of patch patterns in sw (Figure 6f). The mosaic patterns of the REG4 experiments (Figures 6a,b) would also be produced by the same mechanism. The patch patterns would not be observed with realistic LETKF-based NWP systems that assimilate much denser observations, such as those from satellites and aircraft.

F I G U R E A1
Time-mean first-guess root mean square errors (RMSEs) of the REG2, REG4 and RAOB experiments for temperature (K) at 500 hPa averaged over February-June 1982. Panels (a-c) show experiments with ensemble sizes of 16, 32 and 64, respectively. The abscissa shows the names of the observing networks. For each experiment, 12 localization scales (300-1,400 km at 100-km increments) were tested. Optimal localization scales are presented in the panels (a) (b) (c) (d)

F I G U R E B1
Spatial patterns of correlation in weight at the fourth model layer for M32RAOB_L0800km sampled over February 1982. Correlations from the centred grid point (black triangles) are shown for each panel. Black lines, black dots and black cross marks show the edges of localization, model grid points and observing points, respectively