Partial analysis increments as diagnostic for LETKF data assimilation systems

Local ensemble transform Kalman filters (LETKFs) allow explicit calculation of the Kalman gain, and by this the contribution of individual observations to the analysis field. Though this is a known feature, the information on the analysis contribution of individual observations (partial analysis increment) has not been used as systematic diagnostic up to now despite providing valuable information. In this study, we demonstrate three potential applications based on partial analysis increments in the regional modelling system of Deutscher Wetterdienst and propose their use for optimising LETKF data assimilation systems, in particular with respect to satellite data assimilation and localisation. While exact calculation of partial analysis increments would require saving the large, five‐dimensional ensemble weight matrix in the analysis step, it is possible to compute an approximation from standard LETKF output. We calculate the Kalman gain based on ensemble analysis perturbations, which is an approximation in the case of localisation. However, this only introduces minor errors, as the localisation function changes very gradually among nearby grid points. On the other hand, the influence of observations always depends on the presence of other observations and settings for the observation error and for localisation. However, the influence of observations behaves approximately linearly, meaning that the assimilation of other observations primarily decreases the magnitude of the influence, but it does not change the overall structure of the partial analysis increments. This means that the calculation of partial analysis increments can be used as an efficient diagnostic to investigate the three‐dimensional influence of observations in the assimilation system. Furthermore, the diagnostic can be used to detect whether the influence of additional experimental observations is in accordance with other observations without conducting computationally expensive single‐observation experiments. Last but not least, the calculation can be used to approximate the influence an observation would have when applying different assimilation settings.


INTRODUCTION
Ensemble data assimilation systems such as the local ensemble transform Kalman filter (LETKF; Hunt et al., 2007) have become a well-established approach for regional, convection-permitting numerical weather prediction (NWP) models as they are both computationally efficient and include flow-dependent estimates of error covariances. This is, for example, reflected in the operational implementation of an LETKF data assimilation system in the regional NWP system of Deutscher Wetterdienst in 2017 (Schraff et al., 2016). Due to computational restrictions, the ensemble size is, however, usually restricted to about 20-250 members, which introduces spurious correlations and the need for covariance localisation (Necker et al., 2020a;Necker et al., 2020b). Furthermore, the LETKF minimises the cost function locally in observation space, which introduces difficulties for the assimilation of non-local satellite observations that provide vertically integrated information on atmospheric constituents emitting or scattering radiation. Nonlinearity, non-Gaussianity, systematic model deficiencies in the representation of hydrometeors and their radiative properties (Geiss et al., 2021), as well as significant uncertainty of radiative transfer models in cloudy situations (Scheck et al., 2018) add further complexity to the assimilation of cloud-affected satellite observations in convection-permitting assimilation systems (Hu et al., 2022). Nevertheless, these observations provide potentially very valuable information for convective-scale data assimilation (Gustafsson et al., 2018;Schroettle et al., 2020) and their assimilation is therefore a very active area of research (Zhang et al., 2016;Okamoto, 2017;Scheck et al., 2020). To overcome these difficulties for the assimilation of cloud-affected satellite radiance observations, several studies (e.g, Schomburg et al., 2015;Scheck et al., 2020;Bauer et al., 2010) conducted single-observations experiments to better understand the influence of such observations in data assimilation systems. Such experiments, however, require running a full, computationally expensive data assimilation experiment for the assimilation of just one observation or a very limited number of spatially well-separated observations that do not influence each other in the assimilation process. In this article, we propose a significantly more efficient approach for investigating the three-dimensional analysis influence of individual observations (partial analysis increment [PAI]) based on available LETKF analysis ensemble perturbations. This new diagnostic for PAIs related to a single observation allows one to approximate the contribution of individual observations to the analysis or the contribution that an observation would have with modified assimilation settings (e.g, modified assigned localisation scale or observation error). The strength of the diagnostic is that it allows for investigating the three-dimensional structure of the analysis contribution of one observation directly in model space. By that it is possible to detect where observations draw the analysis into opposite directions, which is especially interesting with respect to the assimilation of novel observations. Though detrimental observation influence is part of the statistical nature of the data assimilation (especially when the model state is already very close to the truth), patterns or large values of detrimental observation influence in the analysis may be an indication for suboptimal data assimilation related to, for example, spurious correlations or wrong localisation settings. Other existing diagnostics, such as observation influence (Cardinali et al., 2004;Liu et al., 2009) focus on the relative contribution of observations to the analysis as dimensionless scalar quantities. Furthermore, several studies used ensemble forecast sensitivity to observations (EFSO; Kalnay et al., 2012;Sommer and Weissmann, 2014;Sommer and Weissmann, 2016;Kotsuki et al., 2019) to approximate forecast observation impact of individual observations in a computationally cheap way, without running multiple experiments. Though, in principle, PAIs are included in the derivation of EFSO (cf. Ota et al., 2013;Hotta et al., 2017a.), the focus in these EFSO studies is mainly on the statistical contribution of observations to the reduction of forecast error with usual lead times of the order of hours. The PAI diagnostic is limited to the investigation of analysis influence, but it has the advantage that it avoids inaccuracies related to the linearity assumption of the forecast evolution, issues with localisation of the forecast error, and the verification of the forecast error (Necker et al., 2018.), in contrast to EFSO. The objective of this study is to show that the PAI diagnostic can be used as an economical alternative to single observation experiments and as a diagnostic to evaluate and even optimise the data assimilation system. Moreover, the derivation of PAI is given in detail with a special focus on the approximations that have to be made to apply it to a near-operational LETKF data assimilation system.
The remainder of this article is structured as follows: Section 2 presents the detailed derivation of the PAI methodology, as well as a description of the modelling and assimilation system, the experimental set-up, and the applied metrics. In Section 3, we illustrate PAI results for several examples and discuss the effect of the approximations by comparison of PAI results with the analysis influence in single-observation experiments. Section 4 presents three potential applications of the PAI diagnostic; namely, the analysis of the contribution of different observations to analysis fields, the detection of detrimental observation influence, and the optimisation of assimilation settings. Finally, conclusions are provided in Section 5.

METHOD AND DATA
In this study, we employ the Kilometre-scale Ensemble Data Assimilation (KENDA) system of Deutscher Wetterdienst (Schraff et al., 2016). The KENDA system comprises an LETKF assimilation scheme, after Hunt et al. (2007), that is coupled with a non-hydrostatic regional NWP model (in this study the COSMO model (consortium for small scale modelling)). The LETKF provides the analysis ensemble in a computationally efficient way by transforming the problem from high-dimensional model space into low-dimensional ensemble space and by computing the analysis locally on a reduced analysis grid. The localisation not only makes the method more efficient but is also necessary to mitigate spurious correlations and increase the degrees of freedom of the analysis. In the following we will derive the mathematical formulation for PAI from the LETKF equations and describe the approximations that are involved. In the derivation, we will use the same notation as in Hunt et al. (2007).

PAI formulation
Before getting to the the PAI formulation for LETKF systems, we start with the general form of the sequential analysis equation, where the analysis x a is produced by a statistical combination of the background x b and the observations y o (Kalnay, 2003): H denotes the nonlinear observation operator, which transforms a vector from n-dimensional model space into p-dimensional observation space. The term K is often referred to as the Kalman gain matrix. The analysis increment is defined as the difference between the analysis and the background: where (y o − y b ) is called the innovation vector or background departure, with y b = H(x b ) being the model equivalent of the observations. From this expression it becomes clear that K is a matrix of dimension n × p that determines the weight of the correction and transforms back from observation space to model space. Assuming that K is known, the formulation of PAI is straight-forward from Equation (2). The PAI that is related to one single observation y o, is then defined as where the index is used to indicate that only the -th column of K and the -th row of the innovation vector are considered. The sum over all PAIs equals the total increment; that is, Similarly, it is possible to calculate PAIs for subsets of observations, which is simply the sum of all partial analysis increments of all observations in the subset. However, in practice, this formulation cannot be used directly since in the LETKF the analysis is carried out in ensemble space and K is never calculated explicitly. It is possible, though, to express K in terms of standard LETKF output data products as where X a and Y a are the ensemble analysis perturbation matrices in model space and observation space respectively and k is the number of ensemble members. This formulation of K has been used before in the context of observation influence by other studies, such as, Kalnay et al. (2012) or Hotta et al. (2017a), and can also be found in Gustafsson et al. (2018). In the study of Kalnay et al. (2012), the derivation of Equation (5) assumes a linear observation operator H. In the following, we will derive Equation (5) from the LETKF equations for a nonlinear H, using the linear approximation in ensemble space (Hunt et al., 2007) that is also employed for the computation of the analysis. The LETKF approximates the background and analysis uncertainty by an ensemble and computes the analysis ensemble mean x a as an optimal linear combination of the background ensemble members. The analysis equation for the LETKF is (cf. Hunt et al., 2007, eq. 22): where w a is the weight vector that minimises the LETKF-cost function in ensemble space. The overbars indicate the ensemble mean. The n × k matrix X b is the background ensemble perturbation matrix. Column i of X b is defined as x b,i − x b ; that is, the deviation of one ensemble member i from the ensemble mean. From Hunt et al. (2007) we know that whereP a is the analysis error covariance matrix in ensemble space, Y b is the background ensemble perturbation matrix transformed into observation space (with dimensions p × k) and R is the observation error covariance matrix. The individual ensemble members x a,i are distributed around the ensemble mean such that their spread reflects the uncertainty of the analysis in ensemble space (P a ), which can be computed explicitly. The weight vectors for the individual ensemble members w a,i are chosen as the symmetric square root ofP a : with W a being the ensemble weight perturbation matrix in ensemble space, with columns w a,i − w a . Hence, the individual ensemble members are given as Taking the difference between Equation (6) and Equation (9) shows that the analysis ensemble perturbations are given as If we now insert Equations (7), (8) and (10) into Equation (6) we get Instead of linearising H around the ensemble mean, which would involve a large p × n Jacobian matrix, Hunt et al. (2007) make a linear approximation in ensemble space to relate perturbations in model state to observations space: (cf. Hunt et al., 2007, eq. 18). Using the same assumption, and hence Y a = Y b W a . Inserting this into Equation (11) yields the desired expression for K as given in Equation (5): In this context, it should be noted that the linear approximation in Equation (12) leads to a suboptimal analysis in case of nonlinear observation operators. The PAI diagnostic described here, however, is consistent with the assumption of the LETKF and, therefore, reflects the actual analysis increment (apart from the approximation related to localisation discussed later herein). So far, we have ignored the effects of localisation even though it is a crucial part of the LETKF. Localisation means that the analysis is carried out independently for the individual model grid points (or on a reduced grid as in KENDA). This is achieved by considering only the observations in a certain region around the location of the respective grid point for the analysis weight calculation. To achieve a smooth and physically consistent analysis, neighbouring analysis points should largely use the same set of observations and the influence of distant observations is reduced gradually. In a mathematical sense, this means that the elements of R −1 are multiplied by a weighting factor, which is equal to one at the location of the analysis and decays to zero after a certain radius. Reducing elements of R −1 means increasing the assumed observation error and thus giving less weight to the respective observation. The weighting function used in the LETKF is the Gaspari-Cohn function, which is a Gaussian-shaped curve that decays to zero after the so-called cut-off radius.
The cut-off radius is defined as r = 2 √ 10 3 , where is called the localisation length scale. For the PAI diagnostic, localisation has two implications: 1. The analysis in the LETKF is computed using a localised R. Thus, for a diagonal R, which is used in the KENDA system and throughout this study, the localised version of the Kalman gain from Equation (5) can be written as follows: where is a matrix of Gaspari-Cohn factors and • is the Schur product. We would like to note here that it is also possible to calculate PAI in case of a non-diagonal R. 2. As mentioned before, the analysis is carried out independently for every model grid point. This means that also the model equivalents Y a and weight vectors W a will change from one grid point to another, as also will the Kalman gain. In practice, however, Y a and W a are not stored entirely as output data since they are not required any more after the analysis has been computed. In fact, the five-dimensional field W a (with three spatial dimensions of the reduced analysis grid and two ensemble dimensions) is not stored at all, because in the KENDA set-up this would take the same effort and disk space as writing out about 60 additional three-dimensional variables. The analysis model equivalents for an observation are not stored for every grid point that is within the localisation cut-off radius of the observation, but only at the grid point that is closest to the nominal position of the observation. Also, for non-local observations, like satellite radiance, there is a nominal position that is used for the localisation. In this study, we will demonstrate that up to the localisation length scale it is a reasonable approximation to use the available Y a at the nominal observation location to compute the Kalman gain at nearby grid points. This works since the weights of the LETKF by design only vary gradually from one grid point to another, and with this also the model equivalents. The errors related to this approximation could be avoided by storing the full Y a , or W a at every model grid point, but this would require significant additional disk space and would only be feasible for short experiments. In contrast, with the approximation, the PAI diagnostic can be applied to the standard output of the operational system, namely the full analysis ensemble and the model equivalents in observation space at the nominal positions of the observations.
Additionally, we want to point out that the PAI diagnostic allows for computationally cheap sensitivity experiments by modifying the localisation scale or assigned observation error R used to compute K. The result yields an approximation to the influence that an observation would have with modified settings of the localisation length or the assigned observation error. This is an approximation, since with a varying localisation or R the analysis products X a and Y a would also change. However, we will demonstrate that the PAI results from non-localised LETKF experiments with retrospective localisation in the PAI calculation are a useful first-order approximation for PAI in assimilation experiments with direct localisation (Section 4.3).

Description of the data assimilation system
The configuration of the KENDA simulations used in this study closely follows that of Scheck et al. (2020). The KENDA system consists of an LETKF assimilation scheme that is coupled with the COSMO regional NWP in this study. Our experiments have 40 ensemble members, and we use version 5.2 of the non-hydrostatic NWP model COSMO in its limited-area configuration (COSMO-DE). COSMO was operational at Deutscher Wetterdienst until April 2021. The COSMO-DE domain is depicted in Figure 1 (grey box). It reaches from 44.7 • to 56.5 • N and from 1.0 • to 19.4 • E and comprises Germany and parts of its neighbouring countries. The numerical grid consists of 421 × 461 columns, resulting in a horizontal grid spacing of 2.8 km. In the vertical, COSMO has 50 hybrid layers, which are terrain following in the lower atmosphere and flat at higher levels. The model top is at 22 km. Deep convection is resolved explicitly in the model, whereas shallow convection is parametrised. The lateral boundary conditions are interpolated from the ICON-EU model with a 7 km horizontal grid spacing and parametrised convection. For more details about the model set-up, the reader is referred to Scheck et al. (2020) (cf. Section 3.1 and 3.2 therein).

Experimental set-up
To validate the methodology and to illustrate potential applications, three different types of experiments were performed: (1)

Single-observation experiments (VIS)
In these experiments, visible satellite radiances of the 0.6 μm wavelength channel (REFL) were assimilated.   The fast, look-up table based method of Scheck et al. (2016) and an approximation accounting for three-dimensional radiative transfer effects (Scheck et al., 2018) were used to generate model equivalents. The horizontal localisation length scale was set to 25 km, resulting in a cut-off radius of ∼90 km. The observation locations (shown in Figure 1) were chosen such that, with the given localisation length scale, the influences of the different measurements do not overlap. As we are only interested in the analysis influence, it is therefore possible to conduct multiple single-observation experiments in one model run.
In total, we have 29 single-observations experiments distributed over the four time points. In contrast to thermal infrared channels, the visible channel considered here is sensitive to clouds at all heights and there is no peak in the weighting function that could be used for vertical localisation. Therefore, no vertical localisation was applied and the nominal height of all satellite observations was set to 500 hPa . Consequently, each of the satellite observations influences the whole atmospheric column within the horizontal cut-off radius. Only individual satellite pixels were assimilated. The resulting analysis departures are verified against spatio-temporally close radiosonde observations that are not actively assimilated in this experiment. This experiment is used to verify the PAI diagnostic by comparing the computed partial increments with the increment obtained from the LETKF (x a − x b ). Apart from the approximations in the PAI diagnostic due to localisation both increments should be identical.

Combined experiments (RASO + VIS)
These experiments use the same set-up as those just described, but with the assimilation of additional nearby radiosonde observations. That means we have 29 radiosonde profiles that are assimilated with a localisation length scale of 25 km in the horizontal and with a constant vertical localisation length scale of 0.3 in logarithmic pressure coordinates; that is, log(p) = 0.3; where p is the pressure in pascals. Each of the profiles consists of ∼30 measurements of temperature (T), horizontal wind (U) and (V), and relative humidity (RH) distributed at different heights. This experiment shows how the influence of the satellite observations changes if additional observations are assimilated.

Combined and localised experiments (RASO + VISLOC)
These experiments use the same experimental set-up as the combined experiment, but the satellite observations are localised in the vertical using the Gaspari-Cohn function with a constant localisation length scale of 0.3 (in logarithmic pressure coordinates). The nominal positions of all satellite observations in the experiment are set to p = 500 hPa. This experiment is used to investigate the feasibility of retrospective localisation of the RASO + VIS experiment in the PAI diagnostic.

Metrics and notation
For the experimental evaluation we use three different metrics: (1) the differences between the computed PAIs and the increments as obtained from the LETKF; (2) statistics of PAIs (mean, standard deviation, and absolute mean); and (3) errors of the model state with respect to the radiosonde measurements; that is, negative background and analysis departures or, in the case of Figures 9 and 12, absolute values of departures. We will assign the following sub-and superscripts to PAI in order to specify it correctly: where y represents the measured variable, that is, y ∈ [T, U, V, RH, REFL] (REFL is the observed visible satellite radiance and the other variables are the radiosonde measurements), x stands for the model variable for which the PAI is computed, and exp ∈ [VIS, RASO + VIS, RASO + VISLOC] indicates the associated experiment. We evaluate the error e of the model state based on the absolute value of the difference between independent radiosonde observations and model equivalents: where v ∈ [a, b] indicates whether the deviation from the radiosonde measurement is computed from the background or the analysis. It should be noted that the difference in Equation (17) also contains a contribution from the radiosonde observation error. But as the radiosonde observation error is the same for the background and analysis departure and usually uncorrelated with model error, the error reduction by data assimilation can be approximated as: A negative Δe indicates a reduction of the error and hence a beneficial impact of assimilated observations. For all the experiments, the results are evaluated up to 200 hPa. For the optimisation of vertical localisation of satellite observations, we define a cost function J that consists of the sum of the radiosonde analysis departures: where corresponds to the vertical localisation length scale and p to the pressure level around which the Gaspari-Cohn function is centered. With Equation (6), this can be expanded to More details about the optimisation are provided in Section 4.3.

ILLUSTRATION OF PAI
Throughout this study, we will illustrate the PAI diagnostic exemplarily with temperature increments, as the main characteristics of the diagnostic are similar for all model variables. Increments on variables other than the temperature are only shown in Section 4.1, where we demonstrate how to analyse the influence of observations on different model variables. Figure 2 shows the horizontal analysis temperature increment at model level 23, corresponding to a mean pressure of around 500 hPa from a single-observation experiment (VIS) that assimilated one satellite reflectance observation in the centre of the domain (Figure 2b) and the corresponding PAI of this observation (Figure 2a). The comparison demonstrates that the PAI calculation is able to reproduce both the structure and the magnitude of the analysis increment with the exception of small differences at larger distances, close to the localisation cut-off radius. These small differences are due to the approximation described in Section 2: Instead of the LETKF weights at every model grid point, the PAI calculation is based on the weights at the point of the observation expressed by the analysis perturbations to avoid the need of storing additional quantities and for the sake of computational efficiency. In the presence of localisation, the LETKF weights gradually change from one grid point to the next one, leading to a deviation of PAI from the analysis increment with increasing distance from the observations. This difference can also be seen in the comparison of the PAI and the analysis difference as a function of horizontal distance from the observation in Figure 2d. However, the LETKF weights, by design, only change very gradually from one grid point to the next one. This means that the differences of the efficiently approximated PAI and the analysis increment are fairly small and avoiding these small differences does not seem to justify the additional storage of LETKF weights. In the vertical, the calculated PAI perfectly matches the analysis increment (except for very small rounding errors) as no vertical localisation was used for the assimilation of the satellite observation (Figure 2c).

Effect of approximating PAI with analysis perturbations
Besides the example shown in Figure 2, we calculated further single-observation experiments for 29 reflectance observations. Figure 3 shows a comparison between the absolute analysis increment and the absolute difference of the analysis increment minus the computed PAI as a function of distance from the observation for all single-observation experiments. On average, the difference of PAI and the analysis increment is less than 17% up to the localisation length scale of 25 km and increases to about 40% at twice the localisation length scale.

Relation of PAI with the increment from single-observation experiments
The PAI of an observation from an experiment assimilating many other observations is, by nature, not the same as where n+1 is the weight of observations in case of n + 1 observations and n is the weight of observations in the case of n observations. Figure 4 shows this effect for up to 40 assimilated observations. The number 40 reflects the local degrees of freedom of the 40-member LETKF system and, therefore, the order of magnitude of observations that can be assimilated within the localisation scale. Assimilating two observations instead of one decreases the weight by a factor of 0.67. With more assimilated observations, the factor gradually increases to 0.91 for 10 observations and 0.98 for 40 observations. This means that adding an additional observation in a comprehensive data assimilation with many assimilated observations only has a marginal effect on the weight of the other assimilated observations. It is important to keep this effect of modified weights in mind when interpreting PAI results. However, the addition of other observations only reduces the weight of an observation; it does not change the overall structure of the influence of an observation. Figure 5a shows an example of the temperature PAI of a satellite reflectance observation in an experiment assimilating this observation and additionally a full radiosonde profile (two wind components, temperature, and humidity at 39 levels; that is, 156 additional observations, RASO + VIS) and the analysis increment in a single-observation experiment with only the satellite observation (VIS). As expected, PAI is smaller than the increment in the single-observation experiment, but both exhibit a similar structure. Figure 5b shows the corresponding mean absolute PAI and single-observation analysis increment averaged over all 29 assimilated satellite observations. On average, the magnitude of the single-observation analysis increment is roughly 30-50% higher than the corresponding PAI in the experiments with radiosonde observations in addition. The structure of the profile, however, is very similar, with largest values of increments and PAIs in the lowest and highest part of the profile. This near-linear behaviour of the influence demonstrates that both PAI and single-observation experiments are useful approaches to investigate the three-dimensional influence of observations. The calculation of PAI, however, is computationally much more efficient. And, furthermore, PAI reflects the influence in the presence of other assimilated observations, which is usually the primary quantity of interest, whereas single-observation experiments reveal the influence in the absence of other observations. Figure 6 shows an example of the contribution of different observations to the temperature increment (temperature PAI) as a function of pressure in the RASO + VIS experiment. As expected, radiosonde temperature observations exhibit the largest temperature PAI throughout most of the atmosphere. The satellite observation, however, also leads to a significant temperature increment in the boundary layer, which is likely related to the correlation of cloudiness and surface insolation.
Information such as the relative magnitude of increments and the strength of the downweighting effect through the assimilation of other variables cannot be retrieved from single-observation experiments alone. A statistical analysis of the PAI-estimated increments on different variables will be discussed further in the next section in the context of potential applications of the diagnostic.

Analysing the influence of observations on different model variables
The PAI diagnostic allows for analysing the influence of individual observations as well as the statistical contribution of observation types to changes in different variables. Especially with regard to operational data assimilation, general information about the relative magnitude of increments is useful to evaluate the effectiveness of the assimilation. Moreover, statistics of PAI can be used to analyse trends (systematic increments) introduced by certain observation types. In particular, for novel observations, such as, for example, satellite reflectance, it is important to monitor that the observations do not cause systematic changes in the model climatology as cooling/warming or drying/wetting at certain levels. Though similar information can be gained from single-observation experiments or EFSO, the PAI diagnostic can be considered as either an economical alternative or an economical addition to such measures that is also capable of identifying systematic non-local effects, such as, for example, the systematic influence of the satellite observations on various vertical levels. The statistical analysis about the performed RASO + VIS experiments is shown in Table 1 and Figure 7. Similar to Scheck et al. (2020), this analysis shows that, in general, the data assimilation of the visible satellite observations yields results with physically plausible interpretations. Averaged over all assimilated satellite observations, the temperature PAI of satellite observations is about 5% of the total temperature increment above 750 hPa and increases gradually below to about 14% at the lowermost level (Figure 7a). The relative contribution of the satellite to the wind increment is overall of a similar magnitude and structure as for the temperature increment, but with a less pronounced maximum at lower levels. In absolute terms, however, the satellite wind PAI is highest at upper levels, given increasing wind speed with height (dashed line in Figure 7c). For RH, the satellite also contributes to about 5% of the total increment above 550 hPa, but to 10-15% of the total increment below 550 hPa ( Figure 7a). As humidity only has a marginal effect on satellite reflectance in the visible range, the humidity PAI of the satellite observations is likely the result of correlations of cloudiness with humidity at the level of the cloud and beneath.
Averaged over the vertical profile, the satellite observations contribute about 7% of the total increment in RH and roughly 5% of the temperature and wind increments (Table 1). This is remarkable given that only 0.9% of all assimilated observations are from the satellite and neither wind, temperature, nor humidity have a pronounced direct influence on satellite reflectance in the visible range. Whether these increments also pull the analysis in the right direction will be investigated further in the subsequent section. The largest relative PAIs of satellite observations are found for cloud water (13.9%) and cloud ice contents (8.7%), which directly influence reflectance in the solar channels. Furthermore, a comparably large PAI of the satellite observations of 7.6% occurs for vertical velocity that is linked to convection, and thereby to convective clouds. For radiosonde observations, Table 1 shows that direct observations of wind components, temperature, and RH contribute about 60% to the increment of the respective variable. The relative PAI of an observed variable on other variables is in the range 10-15%. TA B L E 1 Relative absolute PAI contributions in % for all assimilated observations averaged over all profiles in the RASO+VIS-Experiment for the model variables temperature (T), zonal wind (U), meridional wind (V), relative humidity (RH), vertical velocity (WZ), specific humidity (Q), cloud ice (QI) and cloud water mixing ration (CLWMR). The normalization is done with respect to the total absolute increments (x a,RASO+VIS − x b ) of the the respective model variables.

Detecting detrimental observation influence
For the assimilation of novel observation types, it is important to investigate if the assimilation of such new observations has a beneficial or detrimental influence on the model state. In this study we verify the first guess and analysis states against the observed radiosonde profiles as described in Section 2.3.4. Though detrimental analysis increments are part of the statistical nature of the data assimilation (Gelaro et al., 2010), extended or systematic patterns of detrimental influence indicate potential flaws in the data assimilation system and may provide guidance for optimising assimilation settings; for example, assigned observation error or localisation parameters.
As the influence of individual observations is often blurred in cycled experiments with many observations, previous studies used single-observation experiments that assimilated only a few observations separated by sufficiently large distance to avoid an interaction of the observations (e.g, Schomburg et al., 2015. In their study, Scheck et al. (2020) conclude from single-observation experiments that the assimilation of visible satellite reflectance is able to reduce errors in the model state in their selected cases but that the effectiveness of this process is limited due to ambiguity of the observations, spurious correlations, or nonlinearity of the observations operator. In this section, we demonstrate that similar information can be gained by the PAI diagnostic and the considerable effort for carrying out additional single-observation experiments can be avoided.
For this, we present PAI results as well as the analysis increments of single-observation experiments for two cases. Case 1 (profile 20 in Figure 1) corresponds to the same single-observation experiment as case 1 in Scheck et al. (2020). Our case 2, which corresponds to profile 13 in Figure 1, is not the same as case 2 in Scheck et al. (2020). Figure 8 shows the estimated error of background and analysis model states with respect to the radiosonde observations for the two cases. In each of the two panels of Figure 8 the blue line indicates the error of the background model state with respect to the radiosonde observations (negative background departure), the red line indicates the error of the analysis model state obtained in the single-observation experiment (VIS) with respect to the RASO measurement, and the green line shows the negative background departure plus the satellite PAI from the RASO + VIS-Experiment. The sum of the background departure and the satellite PAI reflects the approximated contribution from the satellite to the analysis departure in the RASO + VIS-Experiment.
In case 1 (Figure 8a), the analysis is at nearly all levels closer to the radiosonde observation than the background, indicating a beneficial influence of the satellite observation in the single-observation experiment. Similar information can be gained by the computationally much cheaper PAI diagnostic that does not require additional experiments. The satellite PAI is usually smaller than the single-observation increment given the presence of other assimilated observations. However, the satellite PAI nearly always points in the same direction as the analysis increment and also indicates a beneficial influence of the satellite throughout this vertical profile. In case 2 (Figure 8b), both the PAI and the single-observation experiment indicate a beneficial influence of the satellite observation around 900 and 300 hPa, whereas there is indication for deterioration at 240 hPa. Figure 9 shows a scatter plot comparing the computed impact of the satellite measurements on the model state Δe in the combined experiment (RASO + VIS, y-axis) and in the single-observation experiment (VIS, x-axis) for all assimilation experiments at all radiosonde observation levels. The results of the satellite impact in the RASO + VIS-Experiment were obtained from the PAI diagnostic. Negative values of Δe indicate that the satellite observation draws the model temperature closer to the radiosonde observation (beneficial impact) and positive values indicate a detrimental impact. Overall, there is a clear correlation of beneficial and detrimental impacts from the two approaches. The slope of the linear fit is close to 0.5, indicating that the impact in the single-observation experiment is about twice as large. Most importantly, both approaches indicated the most beneficial and most detrimental impact at the same locations. The largest beneficial impact occurs at profile 20 at low levels and profile 29 in the midtroposphere. The largest detrimental values occur at upper levels for profiles 13, 19, and 20 as well as at low levels for profiles 22, 25, and 29. As in previous studies, the results of Figure 9 show that there is a large number of observations with detrimental influence on the analysis. This, on the one hand, is related to the analysis verification with radiosonde observations and, on the other hand, to the statistical nature of the data assimilation system. Additionally, we want to mention that this application has a lot in common with 0h-EFSO from (Hotta et al., 2017a), as 0h-EFSO reflects the partial analysis increment projected onto a specific norm (e.g. total energy). This illustrates that the PAI diagnostic can be used to identify potential detrimental effects that should be investigated in more detail with other diagnostics or by additionally approximating assimilation settings such as, for example, the localisation scale with the PAI diagnostic. The latter option will be discussed further in the subsequent section.

Optimising localisation
In the last section, we discussed that PAIs can be used to detect detrimental observation influence caused by suboptimal assimilation settings. Additionally, the PAI diagnostic can be used to approximate the influence of observations assimilated with modified settings for localisation or the assigned observation error without rerunning the assimilation cycle. To demonstrate this, we retrospectively localised the satellite PAI from the RASO + VIS-Experiments vertically with a localisation scale of 0.3 centred at 500 hPa and conducted assimilation experiments with a corresponding localisation for satellite observations (RASO + VISloc). Figure 10 shows that vertical localisation strongly reduces the influence of satellite observations at lower and upper levels as expected. Furthermore, Figure 10 demonstrates that PAI with retrospective localisation (red line) is a good approximation of PAI in the RASO + VISloc experiment with localisation for satellite observations (green line). Only minor differences occur between the retrospective localisation in the PAI calculation and the localisation in the assimilation system. This means that, with the retrospective vertical localisation, it is possible to approximate optimal localisation settings in a computationally cheap manner. The concept is to define a cost function based on the analysis departures of observations that are not assimilated and minimise this function iteratively with respect to the localisation settings. The cost function J is defined in Equation (19). In our study, we used the analysis departures of passive radiosondes and the satellite PAIs computed in the VIS-Experiment to demonstrate the concept. For localisation with the Gaspari-Cohn function, the localisation length scale and the height at which the Gaspari-Cohn function is centred can be optimised. In Figure 11, the cost function was computed for all profiles in the VIS-Experiment. For the iterative optimisation with respect to localisation length scale and cloud height p, we find that the optimal = 0.4 and the optimal p = 800 hPa. Compared with no vertical localisation of the satellite, the optimal localisation with the Gaspari-Cohn function improves the analysis departure statistics by 1.5% (red dot in Figure 11).
As with Figure 9, Figure 12 shows how the analysis increment from the satellite single-observation experiments and the corresponding satellite PAIs draw towards the radiosonde observations. The underlying light-grey dots are the previously shown results without localisation, and the coloured dots are the results with retrospective vertical localisation with the computed optimal localisation settings. Detrimental effects mainly occurred in the boundary layer and at high levels without localisation. The localisation reduces large positive values in the upper atmospheric layers. The largest negative (beneficial) values, which are linked to increments in the boundary layer, are only slightly modified. This illustrates that the PAI diagnostic can be used for efficiently testing various localisation approaches without rerunning the assimilation experiments. However, it should be noted that the optimised satellite localisation in this study was derived from a small sample size for illustrating the concept. Deriving general conclusions for the localisation of satellite reflectance will require longer experiments that are planned for future studies. Furthermore, it should be noted that, in the case of the VIS-Experiment, PAIs are equal to the respective total analysis increments, as no other observations are assimilated and the observations are at distances larger than the horizontal localisation radius. Nevertheless, we illustrated this approach as it would equally be applicable in an experiment assimilating the full observing system, where PAIs would identify the individual influence of individual observations and thereby serve as basis for optimising localisation. For our example, we also tested constructing a cost function based on assimilated radiosondes in the RASO + VIS-Experiment but achieved no meaningful results. Thus, we think that independent (passive) observations are required for optimising localisation. The implementation of this approach in a near-operation data assimilation system may also need to account for specific F I G U R E 12 Same as Figure 9, but the coloured dots show the change in temperature errors due to the retrospectively localised satellite PAIs with the computed optimal localisation settings for the Gaspari-Cohn function. The grey dots are the same as in Figure 9 for the non-localised satellite PAIs from the RASO + VIS-Experiment [Colour figure can be viewed at wileyonlinelibrary.com] system settings, such as adaptive inflation or localisation. Moreover, we expect that the results of the optimisation also depend on the region and the synoptic situation that is considered. In contrast to our experiments, covariance inflation is typically used in a near-operational set-up to counter overconfidence of the analysis and give more weight to the observations. The implementation of inflation in the PAI computation depends on the inflation technique that is used; for example, prior or posterior inflation.

CONCLUSIONS
This study proposes to use PAIs as a diagnostic for LETKF data assimilation systems. The exact computation of these increments would require large amounts of additional output from the LETKF in the form of the five-dimensional weight matrix that is not available in operational set-ups. However, the results presented here demonstrate that PAI can be approximated efficiently using ensemble analysis perturbations available from the standard LETKF output. We demonstrate that using analysis perturbations instead of ensemble weights only introduces very minor errors at larger distances from the observations. Furthermore, we analyse the difference of observation influence in single-observation experiments with cloud-affected satellite observations in the visible spectrum and PAI in experiments that assimilate both radiosondes and satellite observations. The influence of an observation is decreased by the presence of other assimilated observations, but we demonstrate that this effect primarily leads to a reduced influence and does not change the structure of the influence significantly. This means that both single-observation experiments and PAI can be used to investigate the influence of promising additional observations such as, for example, satellite radiances. The PAI approach, however, is computationally much more efficient and has the advantage that it directly reflects the influence of observations in the presence of other assimilated observations, which is usually the primary quantity of interest. Additionally, the study illustrates and discusses three potential applications of PAIs as a diagnostic method. First, we show that PAI can be used to analyse the contribution of different observations to the analysis. In contrast to other scalar diagnostics for observation influence, PAI describes the full three-dimensional influence on the analysis state. This means that non-local effects of observations can also be analysed as well as their effect on other variables besides the observed quantity. We illustrate this approach based on experiments that assimilated experimental satellite observations and radiosondes, where it can be seen that satellite observations also contribute to, for example, model temperature, in particular in the atmospheric boundary layer. Besides the use of the diagnostic for investigating the detailed effects of novel experimental observations shown here, the diagnostic also appears valuable for monitoring more complex operational assimilation systems with multiple observations types. In contrast to a monitoring based on departures and increments in observations space, this would also allow detection of, for example, non-local trends introduced by some observation types (e.g. systematic drying/wetting in some regions). The PAI diagnostic therefore offers a computationally inexpensive approach for monitoring and analysing operational data assimilation systems.
Second, we show that PAI can be used to detect where different observations draw the analysis in opposite directions as an indicator for suboptimal assimilation settings or erroneous observations. The approach is validated with single-observation experiments that show good overall agreement with the PAI diagnostic. Our study primarily focuses on the effect of the experimental satellite observations and determines where their influence is in the same or the opposite direction to radiosondes. The same approach, however, could be used in an operational system to automatically detect large discrepancies between the influence of different observations or observation types.
Last but not least, we show that PAI can also be used to approximate the influence that observations would have with modified assimilation settings with the example of a modified vertical localisation scale for the satellite observations. This approach includes a second approximation that is the modification of the influence of other observations. The comparison with additional experiments with modified localisation scale, however, again shows that the approximation only has a comparatively minor effect. For the experiments conducted, we show that vertical localisation removes the largest opposing influence of satellite and radiosondes observations that is likely due to spurious ensemble covariances. However, this comes at the cost of also removing beneficial (corresponding) influences in some regions. How to optimally treat vertical localisation for cloud-affected satellite observations is subject of other ongoing research projects, but the PAI diagnostic provides an efficient tool to investigate various potential approaches without the need for additional experiments for every configuration. Furthermore, it could be used to objectively optimise the localisation length scale based on the minimisation of opposing influences in a larger dataset, similar to the approach of Hotta et al. (2017b) for optimising the observation error covariance matrix. We did not discuss covariance inflation, although it is another major tuning parameter in data assimilation systems and should be subject to further research in this context. We expect that the computation of PAI can be extended to take into account inflation; the details, however, will depend on the inflation technique that is used in the data assimilation system. In principle, PAI can indicate cases where observations have very small influence and might, therefore, also give an indication of regions with too little ensemble spread. Hence, the PAI diagnostic could provide a basis also for the investigation of adaptive inflation methods.