Strongly Versus Weakly Coupled Data Assimilation in Coupled Systems With Various Inter‐Compartment Interactions

Coupled data assimilation (CDA) has been attracting researchers' interests to improve Earth system modeling. The CDA methods are classified into two: weakly coupled data assimilation (wCDA), which considers cross‐compartment interaction only in a forecast phase, and strongly coupled data assimilation (sCDA), which additionally uses other compartment's information in an analysis phase. Although sCDA can theoretically provide better estimates than wCDA since sCDA fully uses inter‐compartment covariances, the potential of sCDA in practice is still in debate. We investigate conditions under which sCDA is effective by applying Local Ensemble Transform Kalman Filter (LETKF) to a coupled Lorenz96 model. Especially, we aim to generalize CDA's behavior on coupling strength basis. By continuously changing a cross‐compartment interaction in the coupled Lorenz96 model, we find that sCDA's superiority over wCDA is particularly evident with large cross‐compartment interactions which attenuates dynamics of each compartment (the increase of variables in one compartment induces the reduction of those in the other compartment) and does not intensify the chaoticity of the whole system. The performance of sCDA is more sensitive to the LETKF's hyperparameters (i.e., localization and inflation) than wCDA. Furthermore, through two imperfect model experiments in which model parameters are uncertain, we quantify sCDA's vulnerability to model inaccuracies. The findings of the relationship between the skill of sCDA and inter‐compartment interactions give implications for future CDA strategies.


Introduction
It has become increasingly important to predict a system of systems or coupled systems which have more than one compartment.Earth system models typically solve such a system of systems to understand the interactions of many different components of Earth such as atmosphere, land, and ocean (Taylor et al., 2012).Such coupled systems are not limited to the earth system models.They are found in a weather-wildfire coupled system (Bakhshaii & Johnson, 2019), a disaster-evacuation coupled system (Liu & Lim, 2018), socio-hydrology (Di Baldassarre et al., 2013;Jia et al., 2021), and socio-ecology (Sun & Hilker, 2021).As these coupled systems are strongly non-linear, the development of efficient and practical methods of estimating, predicting, and controlling such systems is a difficult challenge for earth system scientists.
Data assimilation (DA) has been recognized as a useful method to estimate the states of these complex coupled systems.It can provide the real-time estimates of state variables by combining model forecast and observation.DA can effectively maximize the potential of the sparsely distributed observations by integrating them into dynamic models, so that it is suitable for earth system models.DA methods are mainly classified into two: ensemble-based methods such as Ensemble Kalman Filter (EnKF; Evensen, 1994) and variational methods such as 4D-Var (Rabier et al., 2000).These two classes of DA methods have been intensively compared (e.g., Kalnay et al., 2007), and many studies proposed combining them (e.g., Zhang et al., 2009).Recently, there are further approaches for coupled systems, which are different from these conventional methods.For example, in an interface solver (Frolov et al., 2016), ones pick up only interface variables, which have high relevance between each other, and consider their correlation when assimilating observations.Laloyaux et al. (2016) suggested a DA in a coupled system based on an incremental variational DA scheme.DA methods in coupled systems are called "coupled data assimilation (CDA)."The CDA methods are classified into two: weakly coupled data assimilation (wCDA) and strongly coupled data assimilation (sCDA).The former considers cross-compartment interactions only in a forecast phase, while the latter additionally uses the other compartments' information in an analysis phase (see Penny et al. (2017) for detailed glossaries of terminology).Operational weather forecasting centers successfully implemented wCDA to provide consistent initial conditions of their coupled models (e.g., Browne et al., 2019;De Rosnay et al., 2022;Fujii et al., 2021;Lea et al., 2015) while sCDA has yet to be performed operationally.
sCDA recently attracts researchers' interests because it is expected to maximize the potential of observation using a cross-compartment covariance.Sluka et al. (2016) used atmospheric observations to update the state variables of a simplified ocean model in their observation system simulation experiment (OSSE).Suzuki et al. (2017) investigated the covariance between near land-surface temperature and snow temperature using a coupled atmosphere-land model.Sawada et al. (2018) assimilated river flow observations into an atmospheric model using a coupled atmosphere-river model.Note that they performed an OSSE, in which they did not use real observations.They assimilated observations in one compartment into the other compartment and they did not do the other way around (we called this type of one-way coupling in sCDA partial-sCDA (psCDA)).Thus, previous sCDA systems have not realized fully sCDA in the multiple compartments.As more realistic experiments, Lin and Pu (2020) conducted sCDA in a coupled atmosphere-land model and showed that assimilating 2-m humidity data into soil moisture contributed to improving forecast.O' Kane et al. (2019) performed climate reanalysis using sCDA in a coupled atmosphere-ocean-sea ice-land model.They tested some configurations of DA and decided not to assimilate oceanic observations into atmosphere to avoid degrading the performance.Tang et al. (2021) studied the influence of ocean observations in an atmosphere-ocean CDA system by assimilating sea surface temperature into an atmospheric model.To summarize, sCDA attracts researchers in various disciplines, and its application to various types of realistic coupled models has recently started.Despite many previous sCDA's applications, it is practically difficult to maximize the potential of sCDA.Previous works revealed sCDA is not necessarily more effective than wCDA in practice.If cross-compartment covariance cannot be adequately estimated, sCDA degrades the skill to estimate the states of a coupled system.Previous studies have extensively investigated whether sCDA is superior to wCDA, but their results are mixed.For example, Ballabrera-Poy et al. (2009) tested sCDA by applying EnKF to a multiscale Lorenz96 (Lorenz, 1995;Lorenz & Emanuel, 1998) model in which eight slower variables have 32 faster subcompartments.In this coupled system, assimilating observations from the fast compartments into the slow compartments could not improve the predictability of the whole system, and they attributed it to the spurious covariances due to the insufficient ensemble size.Han et al. (2013) coupled atmospheric Lorenz63, oceanic pycnocline, and sea-ice models, and they showed that it is difficult to assimilate slower compartment's observations into the faster compartments.Liu et al. (2013) found that in a simplified coupled atmosphere-ocean model, assimilating observation of the dominant compartment (namely atmosphere) to update oceanic variables was effective.Raboudi et al. (2021) used the one-way coupled multiscale Lorenz96 model to indicate that sCDA is superior to wCDA when the ensemble size is large.Tondeur et al. (2020) used Modular Arbitrary Order Ocean-Atmospheric Model (De Cruz et al., 2016), an atmosphere-ocean model with arbitrary spatial resolutions.They discussed error growth in fast and slow scales and clarified how intra-or inter-compartments effects change the performance of CDA.They suggested that frequent observations of a fast compartment could suppress error evolution and enabled sCDA to improve the estimation of state variables.They attributed it to the degree of model's instabilities.Moreover, their results can explain how a temporal scale mismatch between two model compartments impacts the relative accuracy of wCDA against sCDA.
Therefore, it is crucial to identify the conditions under which sCDA is more effective than wCDA, so that we can understand whether implementing sCDA is worthy of its computational/development cost in each specific coupled system.Although the previous studies which applied sCDA into various coupled systems have provided many case studies on the performance of sCDA against wCDA, generalized knowledge about superiority of sCDA is still lacking.To the authors' knowledge, no studies have summarized the effect of sCDA on couplingstrength basis even though it is a crucial indicator which characterizes coupled systems.Coupled systems in earth system sciences have diverse cross-compartment interactions from duplex to one-way, and they may spatiotemporally vary.Furthermore, they may cause non-linearity and chaotic nature of a coupled system.Therefore, we aim to contribute to summarizing the performance of sCDA based on the interactions between compartments.Through an extensive investigation using a coupled Lorenz96 model, we reveal the conditions under which sCDA is effective, and to figure out why and how it happens.

Method
This study uses the coupled Lorenz96 model in which two Lorenz96 models with 40 grid points are coupled.The Lorenz96 model is often used as a testbed of DA methods (e.g., Trevisan et al., 2010).Although this is a toy model, it captures important dynamics of the atmosphere: advection, dispersion, and attenuation.In addition, it shows chaoticity, which means that small errors of the initial states exponentially grow.Equation 1 is the traditional (non-coupled) 40-dimensional Lorenz96 model (Lorenz & Emanuel, 1998): Lorenz (1995) assumed a multiscale coupled system in both X and Y, where X k and Y k are colocated, shown below: In this study, we set the range of k from 1 to 40, b = c, added forcing term and set J = 1.This would lead: Both α and β are parameters for the magnitude of compartment interaction.In Equations 2a and 2b, the magnitude of compartment interaction is same when b is equal to c.However, we enabled these two parameters to be arbitrarily determined to simulate various types of cross-compartment interactions.F 1 and F 2 are forcing parameters for each compartment (in this study, both have a fixed value of 8.0).
Spatial scales of the two compartments are identical.During the time integration, timestep was set to 0.005.As 0.2 corresponds to one day, 40 timesteps correspond to one day.The 4th-order Runge-Kutta method was applied for time integration.We initially set c to 1, so that there is no time-scale difference between the compartments.As the previous studies showed that the time-scale differences are the cause of poorly estimated inter-compartment covariances (Han et al., 2013), we also used the values 0.5, 0.3, and 0.2 for c to analyze the impact of timescale differences on the performance of CDA.
We performed a twin experiment.It is often used as a method to verify the skills of newly developed DA methods (Penny, 2014;Raboudi et al., 2018;Yoshida & Kalnay, 2018).
The coupled system was initialized with random initial conditions, in which each state variable was sampled from the Gaussian distribution N(0.5 2 ), and spun up for 36,000 days (0.2/0.005*36,000 = 1.44 million timesteps).Then, DA was conducted, and observations were assimilated every 6 hr (10 timesteps) for 180 days.In both compartments, we assumed that 13 grid points were observable.Every three grid points can be observed starting from the first grid point.Gaussian observation errors were added to the synthetic truth.When c = 1, observation errors were set to 1.0, which was approximately 20% of the standard deviation of the state variables when α = β = 0 (a uncoupled version).When c is not equal to 1, we set observation error to 20% of the standard deviation of the state variables.
Local Ensemble Transform Kalman Filter (LETKF; Hunt et al., 2007;Miyoshi & Yamane, 2007) was used as a DA method.It is widely used in DA studies and has been used for sCDA in previous studies (e.g., Sluka et al., 2016;Sawada et al., 2018;Yoshida & Kalnay, 2018).The findings of this study are not limited to LETKF and can be applied to the other ensemble-based DA methods.In LETKF, the changes from wCDA to sCDA can be realized by changing the localization scheme.In wCDA, the localization of LETKF restricts observations in one compartment from impacting state variables in the other compartment in the analysis step while there are no such restrictions in sCDA.
In deterministic ensemble-based filteris including LETKF, the dimensions of the analysis space in a k-dimensional system are less than k 1, and should be more than the number of non-negative Lyapunov Exponents (LE) of the system for the accurate estimation (Carrassi et al., 2022;Ng et al., 2011).After calculating the number of positive Lyapunov Exponents (LEs) in the coupled system (results shown in the next section), this study used 20 ensemble members as the small ensemble size and 80 ensemble members as the large ensemble size.As the previous studies show that the score does not improve further when the ensemble member exceeds a certain threshold (Sakov & Oke, 2008;Yoshida & Kalnay, 2018), we did not conduct experiments with ensemble sizes larger than 80.
In LETKF, we have hyperparameters such as localization and covariance inflation.Note that the term "hyperparameter" was hereafter used to distinguish between the model parameters (e.g., α and β) and the LETKF parameters.In this study, the observation localization (Miyoshi et al., 2007) was applied, so that the observation covariance matrix R 1 was multiplied by the localization function shown in Equation 4.
where r is the distance between an analyzed grid point and an observed grid point and σ is the localization scale.
By this method, we can set L(r) to zero for faraway observations (i.e., r ≧ 2 Gaspari and Cohn (1999) or Hamill et al. (2001), for the other methods of covariance localization) and thus we can prevent those observations from being assimilated.Although some previous studies proposed dynamically estimating the localization scales (Wang & Bishop, 2003;Yoshida & Kalnay, 2018), we applied the hyperparameter sweep evaluating many combinations of the hyperparameters in this study.This is because our scope includes the comprehensive analysis of the LETKF's stability to the hyperparameters in the two settings of CDA (i.e., sCDA and wCDA).We used the σ values 3, 5, 7, 9 for each compartment and the values 1.02, 1.06, 1.10, and 1.14 for inflation factors.We applied multiplicative inflation (Anderson & Anderson, 1999).It should be noted that we applied different localization radii for different compartments.
Four types of the CDA methods are implemented.We defined partial-sCDA (psCDA) to distinguish it from full sCDA.In psCDA, we strongly coupled only one compartment and the other compartment was weakly coupled.For example, psCDA_A used observations from both compartment A and compartment B to update the compartment A's state variables but used only observations from compartment B to update compartment B's state variables (see Figure 1c).Therefore, the observations in compartment A were not used to update compartment B. Note that the previous studies of sCDA which used realistic models mainly performed psCDA.
Table 1 summarizes our experiments.In the perfect model experiments (the Experiments 1-1, 1-2, 1-3, and 1-4), we assumed that the process model had no errors.By changing the parameter β, we analyzed the performance of wCDA and sCDA with the different inter-compartment interactions.In addition to the two-way coupled models, we conducted DA experiments with one-way coupled models.Even in a one-way coupled system, it is possible to extract inter-compartment correlation information (to infer compartment A's states from compartment B's observations).In coupled systems where the effects of one compartment on the other compartment are very weak such as river-atmosphere, they can be reasonably recognized as a one-way coupled system.The CDA's behavior in the one-way coupled system, therefore, is important for many coupling systems.In the Experiments 2-1, 2-2, and 2-3, we assumed that the process model is imperfect, and the cross-compartment interaction parameters and the intra-compartment forcing parameters are assumed to be inaccurate.In Experiment 3, we assumed that we did not know the cross-compartment interaction parameters and tested if CDAs could estimate them.This experiment corresponds to the estimation of parameters related to the estimation of fluxes between two compartments.The  Parameter for forcing in each compartment (F) Note.Since in the Experiments 2-1, 2-2, and 2-3 we assumed the bias of parameters, "biased" means there are deviations from the parameters used in the forecast steps and the true parameters.Values in parenthesis show the true model parameter variables.In the Experiment 3, we estimated parameters by DA, so that "estimated" means those parameters are directly estimated by DA.The complete description of the experiments can be found in the manuscript.optimization of parameters which connect different compartments is beneficial in Earth system sciences.Both state variables and parameters were jointly estimated online by applying the state augmentation in which we concatenated the parameter vector to the state vector and then estimated this augmented vector in the analysis step.We used the relaxation to prior spread method (Whitaker & Hamill, 2012) to maintain the spread of the parameters.We reset the variance of the analysis ensemble of the parameters to the initially predefined variance following Kotsuki et al. (2018).The initial conditions of parameter ensembles were derived from Gaussian distribution whose mean is 0.2 larger than the truth and whose standard deviation is 0.01, ensuring that our initial guess does not include the true value of the parameter.Note that the spreads of the state variables were inflated with multiplicative inflation as explained above.
The DA experiments were performed for 180 days with five different initial conditions, all of which were made by repeating the aforementioned time integration procedure of 1.44 million times.Thus, all the initial conditions are located at different places on the attractor.As the length of DA windows was set to be 6 hr, DA was conducted 720 times.The first quarter of the analysis result was discarded as the spin-up period of DA, and the remaining period was used for evaluation.
The score metric used in this study is the compartment average root-mean-squared-error or tRMSE (tRMSE X ,tRMSE Y ), which is defined below: where X anl k,t is the analysis value of grid point k in time t, and X truth k,t is the true value of grid point k in time t.DA was conducted and tRMSE X was calculated from the five different initial conditions, and the score was defined as the trimmed mean of five tRMSE X (largest and smallest ones are discarded).Although this process contributed to removing the outliers, it should be noted that this process does not qualitatively change our conclusions.If some calculations failed due to filter divergence, the trimmed mean was calculated in the same manner.When only two or less experiments were converged, we considered that we could not get a result.
To summarize, this research used the four choices of localization radius in the compartment A, the four choices of localization radius in the compartment B, the four choices of covariance inflation, the five initial conditions, and the two patterns of ensemble size (20 and 80).Totally we had 640 runs of CDA for one of the choices of β.

Characteristics of the Coupled Lorenz96 Model
The time evolution of the coupled Lorenz96 model is shown in Figure 2 (c = 1, α = β).The values oscillate more largely with larger β, which implies the relative instability of the system.To show the chaoticity of the model, the Covariant Lyapunov Vectors (CLVs) and maximum Lyapunov Exponents (LEs) are computed.CLVs are computed following the method proposed by Sandri (1996).Lyapunov vectors indicate the direction where system's error grows rapidly.When the LE (which indicate the speed of error growth on the direction of corresponding CLVs) are positive, the system is unstable in that direction and small error grows exponentially.In addition, we calculated the value of dM/dt of Gutiérrez et al. (2008).In Gutiérrez et al. (2008), M is spatial mean of the growth of perturbations and dM/dt is a logarithm of the fastest error expansion speed when we added small perturbations to initial conditions on an attractor.Thus, it corresponds to the chaoticity of the system and can be recognized as the largest LE.For detailed explanation of dM/dt, see Gutiérrez et al. (2008).Figure 3 shows the number of positive LE and dM/dt.When β is negative, the number of positive LEs is not substantially changed by β, while when β is positive, the number of positive LEs is changed by β.The maximum number of positive LEs is 30, so that 20 ensembles can be considered small, while 80 is sufficient to run EnKFs in this model.The other chaoticicty index (dM/dt) shows similar trends.Although dM/dt depends on the absolute value of β, the chaoticity increment is smaller when β is negative, compared to positive β.
In addition, we measured non-Gaussianity by calculating mean absolute skewness (mas) and mean absolute kurtosis (mak), following Nerger (2022).We used ensemble data of the last 100 DA windows.For each assimilation window, mas and mak are calculated as follows: where σ denotes the sample standard deviation, that is, σ . Note that the ensemble size, N, is 80, and the inflation/localization settings with the best tRMSE x are used in this calculation.Figure 4 is the boxplot displaying the distribution of mas and mak.As shown in the figure, both mas and mak get higher when β is close to 0. In contrast to chaoticity, non-Gaussianity does not change with the sign of β.Non-linearity induces both chaoticity and non-Gaussianity and thus it can be reasonably expected that non-Gaussianity and chaoticity are correlated.However, in our coupled model, non-Gaussianity does not depend on β, while the chaoticity depends on β.Note that the characteristics of non-Gaussianity discussed here is not the consequence of DA (see Appendix A1).In addition, we calculated multivariate skewness and kurtosis in the two compartments (e.g., X 10 and Y 10 ) following Mardia (1970) and found no clear relationship between non-Gaussianity and β (not shown).

Perfect Model Experiments
Figure 5 shows the result of the Experiment 1-1 with the ensemble size of (a) 20 and (b) 80.The system is more likely to diverge when the absolute value of β is large, especially in the case of the small ensemble size.When the ensemble size is large (Figure 5a), sCDA stably scores better than wCDA.Even when we have no crosscompartment interaction by β = 0, sCDA can accurately provide the information of no inter-compartment correlation and does not degrade the score.sCDA is more stable than wCDA especially when the cross-compartment interaction is large.Importantly, when the ensemble size is small (Figure 5-b), sCDA scores better only when the absolute value of β is large.It indicates that the strong cross-compartment interaction is essential for sCDA to outperform wCDA.In addition, there are large variances of tRMSE with each β in the case of sCDA with the small ensemble size, which indicates that RMSE significantly changes as the hyperparameters change.Hence, the calibration of the hyperparameters is essential for the accurate prediction in sCDA.Figure 6 compares the minimum tRMSE x of both wCDA and sCDA and the chaoticity of the coupled system.It shows that the minimum tRMSE x for sCDA is well correlated with dM/dt, which has already been discussed in the previous study (Chen et al., 2021).Note that the magnitude of RMSE itself should be discussed separately from the discussion regarding the comparison of wCDA and sCDA in the present study.In this study, we focus more on the difference of RMSEs between wCDA and sCDA than the magnitude of RMSE.Even with small ensemble size, sCDA scores better than wCDA when β is large negative.In other words, sCDA is effective when large cross-compartment interaction attenuates the dynamics of both compartments (i.e., the increase of variables in one compartment induces the reduction of those in the other compartment) and does not contribute to increasing the chaoticity of the system.This is one of the crucial findings regarding the conditions under which sCDA works better than wCDA.
Figure 7 shows the average tRMSE X (compartment A's RMSE) as a function of compartment A's localization scales and the inflation factors.The behavior of sCDA and wCDA with different hyperparameters in our coupled Lorenz96 model is consistent to the known general properties of EnKF and does not strongly depend on the intercompartment interaction parameter β.Small inflation provides poor estimates when the ensembles is small, while large inflation provides poor estimates when the ensemble is large.The second and fourth graphs in the bottom row show that the unstable result is caused also by the inappropriate choice of localization scales in sCDA with the small number of ensembles.These results are qualitatively independent to β.It should be noted that the performance of CDAs is less sensitive to localization radius than an inflation hyperparameter.Although the performance of sCDA with the small ensemble size tends to be degraded by spurious correlations within localization radius more easily than wCDA, the side effect of taking a larger localization radius is not dominant.
Figure 8 shows the results of partially strongly coupled data assimilation (psCDA) as well as fully sCDA.When the ensemble size is sufficiently large, sCDA scores the best, psCDA_A (sCDA only in compartment A) and psCDA_B (sCDA only in compartment B) follow, and wCDA is the worst.Even when only compartment B is strongly coupled, compartment A can be improved to some extent because improved estimates of compartment B have a positive effect on the estimation of compartment A through the cross-compartment interaction in the forecast step.When the ensemble size is small, wCDA scores better unless the inter-compartment interaction is large.
Figure 9 shows the performance with the one-way coupled system (Experiment ID: 1-2) where only the compartment A influences the compartment B. Since there is no effect of the compartment B on the compartment   A, it is impossible that the effect of strongly coupled compartment B spills over to compartment A and make its estimates better.Since the performances of wCDA and psCDA_B are fully identical, the line of wCDA is not visible in Figure 9a.When β is negative, sCDA outperforms wCDA even when the ensemble size is small.However, compared to the difference between "wCDA" and "psCDA_B" in Figure 8a, the difference of them in Figure 9 is small.Compared to the one-way coupled system, the interdependent coupled system is more likely to be improved by sCDA.
In the Experiment 1-3, where the signs of β and α were different (hereafter called inverse coupling), Figure 10 shows that the results are analogous to the previous experiments.First, for the small ensemble size, wCDA is better when the inter-compartment interaction is small, and sCDA gets better when the inter-compartment interaction becomes strong.Second, for the large ensemble size, sCDA is better in most inter-compartment interaction parameters.In our model with the equal time scales, the magnitude of the compartment interaction is important for the difference of the performances between sCDA and wCDA.The difference of performance does not substantially depend on the direction of compartment interactions.
In the Experiment 1-4, the timescales of the two compartments were different (see Appendix A2 for the Hovmöller diagram).When compartment B is twice as fast as compartment A (Figure 11a: c = 0.5), the results are consistent to those in the previous experiments.sCDA outperforms wCDA when the cross-compartment interaction parameter is large (especially large negative).The minimum RMSEs show that wCDA outperforms sCDA in the slower compartment when the time-scale difference becomes large (Figures 11b and 11f).However, the score becomes highly sensitive to hyperparameters in wCDA and thus, in practice, it is difficult to reach this minimum RMSE.The score primarily depends on an inflation parameter (not shown).Although the effects of hyperparameters on the performance is similar to the Experiment 1-1 (see Figure 7), the sole and striking difference is that the change in RMSE induced by the change in the hyperparameters is larger with the larger ensemble size than the smaller one, contrary to the experiments with the same time scales.We got outliers by the inappropriately large inflation which substantially degrade the efficiency of LETKF.Although the searched parameter range is unchanged from the previous experiments and possibly inappropriate for the large ensemble case, we found the hyperparameters for the large ensemble case which realized more accurate simulation than the small ensemble case.It should be noted that we did not change the observation network (i.e., the number of observations and the frequency of observation) when we increased the speed of the dynamics.The frequency of   observations gets longer compared with the speed of the dynamics, which makes it difficult to maintain the filters especially for wCDA, which used only half of the available observations.This point also makes the hyperparameter tuning difficult.Further, the instability occurs when the observation in the fast compartment is not assimilated to the slower compartment (not shown).Assimilating the compartment with faster time scales into that with slower time scales significantly helps constrain the whole system.

Imperfect Model Experiments
In Figures 12a and 12b, the true model parameters were set to α = β = 0.5 (the Experiment 2-1, see Table 1).However, in the forecast phase, we used the incorrect interaction parameters, ranging from 0.1 to 0.9.The horizontal axis shows incorrect β, and the vertical axis shows RMSE.While sCDA performs better than wCDA in terms of the optimal scores, the performance of sCDA is more sensitive to hyperparameters than wCDA especially when the ensemble size is small.When the interaction is relatively weak, the mis-estimation of crosscompartment error covariance becomes larger under imperfect model settings.Figure 12-c shows the case where α = β = 1.0.The perfect model experiments (the experiment 1-1) showed that sCDA is effective with this inter-compartment interaction parameters.However, in the imperfect model experiment, the difference of the performances between sCDA and wCDA is small (see also Figure 4).This ineffectiveness of sCDA is due to the inaccurate background error covariance estimation.sCDA substantially degrades the score when the process model is highly inaccurate.
Figure 13 shows the results of the Experiment 2-2 which is the one-way coupling case with α = 0 and β = 1.0 and has inaccurate β in the forecast phase.When β is highly biased, wCDA outperforms sCDA even in the cases of the large ensemble size.In the one-way coupled system, if compartment A is not affected by the observations in compartment B, the simulation of compartment A stays accurate since compartment A is not affected by inaccurate β in the one-way coupling.However, if compartment A is strongly coupled, the incorrect state estimation of compartment B due to inaccurate β negatively affect the estimation of compartment A. This inaccurate adjustment Red and blue boxplots show the results of weakly coupled data assimilation and strongly coupled data assimilation, respectively.In this Experiment 2-2 (see Table 1), we assume the compartment interaction term β is inaccurate.The x-axis shows the inaccurate β.The ensemble size is set to 80.In this experiment, the true value of β is set to 1.0.We assume one-way coupling, namely α = 0.
of state variables in compartment A by sCDA spills over to compartment B, resulting in the worse performance than wCDA.
We also tested the effects of the uncertainty in the intra-compartment dynamics on the performance of sCDA and wCDA in the Experiment 2-3.
Figure 14 shows the results when we assume an inaccuracy in the external force term F. The inter-compartment interactions are set to α = β = 1.0.sCDA quickly loses its advantage against wCDA as the error in F increases.This result implies that the intra-compartment inaccuracy substantially worsens the estimation of both intra-compartment and inter-compartment covariances, and the erroneous estimation of inter-compartment covariance lose the advantage of sCDA.Therefore, sCDA cannot contribute to the better estimation of the system when the dynamics of each compartment is not accurately described by the process models.We calculated the severity of the mis-estimation of intra-compartment and inter-compartment covariances.Figure A3 shows that as model parameters deviated from the true value, the inter-compartment covariance becomes inaccurate.The mis-estimation of intra-compartment covariance is not so correlated with the extent of the model inaccuracy (Figures A3c and A3d).Therefore, when the model inaccuracy gets higher, the estimation error of inter-compartment covariance increases while intra-compartment covariance error remains almost the same, which makes sCDA, which explicitly used the inter-compartment covariance, worse than wCDA.
We provide the results of imperfect model experiments, in which the compartment interaction parameters were unknown and thus jointly estimated with the state variables (Experiment ID: 3).The rightmost columns in Figures 12a-12c show the performances of sCDA and wCDA which estimate the interaction term β by stateaugmentation.Note that we have prior knowledge that the interaction is symmetric and uniform in all 40 dimensions.In this case, the model inaccuracy can be corrected by DA when the ensemble size is large.
Figure 15 shows the results of the joint state-parameter estimation with the various β values.The results of both panels are similar to that of the perfect model experiment.sCDA works well when the cross-compartment interaction β is large (especially large negative) and the ensemble size is large.Considering that the deviation of the estimated β from the true value is approximately 0.05 in this experiment (not shown), the present experiment is equivalent to the case of the perfect model.sCDA performs well under imperfect model scenarios when we can utilize DA to mitigate/lessen the model inaccuracy.

Discussion
The extensive analysis has revealed the CDA's behaviors with various cross-compartment interactions.Although some of the findings can be interpreted to be in line with the expected behavior of LETKF which previous works suggested, we have new findings about the conditions in which sCDA can efficiently improve the estimation of states.Here we summarize our findings and compare them with the previous works.
First, we revealed when sCDA is superior to wCDA, which is the primary objective of this study.In the perfect model settings, in addition to the well-known importance of large ensemble sizes (see e.g., Kirchgessner et al., 2014 andespecially for sCDA, compare Figures 3 and4 in Raboudi et al., 2021), sCDA works better when the absolute value of the inter-compartment interaction parameter β is large.Since the accurate estimation of both intra-compartment and inter-compartment covariances is necessary for sCDA, the small number of ensemble members is not preferrable in sCDA because it increases the sampling error.However, for the coupled systems with the interactions in which the increase of variables in one compartment induces the reduction of those in the other compartment and thus inhibit the chaotic nature of the system, sCDA outperforms wCDA even with the small ensemble size.When we calculated the inter-compartment covariance (see Appendix A4), the magnitude of inter-compartment covariance does not heavily depend on the sign of β, so that the difficulty in estimating intercompartment covariance does not depend on the sign of β.Although inter-compartment covariance plays an important role in CDA (e.g., Han et al., 2013), the performance of sCDA is strongly dependent to the other factors which determines how sCDA effectively uses the estimated inter-compartment covariance in the analysis steps.This result may help to attribute the factor which controls the performance of sCDA to non-existence of chaoticity increment through inter-compartment interaction.We also found that non-Gaussianity is not affected by the change in β, so that the difference of the performances between sCDA and wCDA cannot be attributed to non-Figure 14. tRMSE X with respect to the types of the coupled data assimilation method.Red and blue boxplots show the results of weakly coupled data assimilation and strongly coupled data assimilation, respectively.In this Experiment 2-3 (see Table 1), we assume that there is a bias in intracompartment parameter, namely forcing (F).The true value of F is set to 8.0.
Gaussianity of the state estimation.Our finding is useful to identify the systems in which sCDA can be efficiently applied.For example, in the field of tropical disturbance where atmosphere-ocean interaction suppresses the evolution of the system, introducing sCDA may efficiently improve the predictability of the whole system.To the best of our knowledge, the efficiency of sCDA regarding the chaoticity has not been discussed in the previous studies.
Second, in a one-way coupled system, both leader and follower systems can be improved by assimilating observations in a follower system into the leader system that drives the whole system, which is consistent with the previous works (Z.Liu et al., 2013;Sawada et al., 2018).In addition, we suggest that the degree of effectiveness increases with the coupling strength, which is reasonable considering that chaoticity is hardly changed by the coupling strength in a one-way coupled system.Third, the intensive calibration of hyperparameters of covariance inflation and localization is necessary in sCDA.In terms of inflation, Han et al. (2013) used a fixed inflation hyperparameter and showed sCDA does not easily outperform wCDA with a seemingly sufficient ensemble size.Both our work and Han et al. (2013) indicate that inflation has played an important role.In our work, the performance of sCDA is more sensitive to inflation hyperparameter than that of wCDA when ensemble size is small.The inter-compartment localization is an important factor for successful sCDA as discussed in Stanley et al. (2021).Efficient hyperparameter tuning methods are necessary to maximize the potential of sCDA.
Fourth, sCDA is vulnerable to both intra-compartment and inter-compartment inaccuracies because an incorrect process model brings inappropriate estimates of background cross-covariances to accurately update states.The mis-estimation of covariance causes the deterioration of the sCDA's performance.In our coupled Lorenz96 model, the estimation of inter-compartment covariance is severely affected by the inaccuracy of the process model while errors in the estimated intra-compartment covariance are not so correlated with the inaccuracy of the process model.This is why sCDA, which additionally uses inter-compartment correlation, is worse than wCDA when the model is inaccurate.Although this behavior of sCDA with the imperfect models has an important implication for real-world applications, it should be noted that our coupled Lorenz96 model is highly idealized.
Figure 15.tRMSE X with respect to the types of coupled data assimilation method.Red and blue boxplots show the results of weakly coupled data assimilation and strongly coupled data assimilation, respectively.In the Experiment 3 (see Table 1), we assume that we do not know the true value of the compartment interaction term β.The x-axis shows the true β value.Panels (a, b) show the cases of large ensemble size (=80) and small ensemble size (=20), respectively.

10.1029/2022MS003113
Obviously, the models have low dimension, no physical balances, uniform spatial scales, univariate and has spatially homogeneous dynamics (Nerger, 2022).In addition, the interaction between components is linear, while non-linear and multivariate interactions exist in more realistic ESMs.We recommend carefully assessing the impact of model errors on both intra-compartment and inter-compartment covariances' estimation when sCDA is implemented.Since the impact of model errors is complex, it is difficult to infer the impact of model errors on inter-compartment covariances from the impact on intra-compartment covariance calculated by the uncoupled version of the models.When model errors severely deteriorate the accuracy of inter-compartment covariances, it is difficult to efficiently implement sCDA.
Fifth, the improvement of sCDA's score drastically decreased especially in the slow compartment when the time scale difference between compartments becomes large.As suggested in previous studies, the difficulty of CDA lies in the fact that timescales differ across compartments (Yoshida & Kalnay, 2018).To address this issue, some previous works suggested investigating how compartment coupling provokes new characteristics which are not present in the single compartment dynamics.For instance, Carlu et al. (2019) found that a coupled system has a unique mode that was not present in the single compartment.Carrassi et al. (2022) found that interactions between compartments generated a quasi-neutral subspace of higher dimensions which causes the increase of the minimum ensemble size necessary to prevent filter convergence.
The limitations of this work and future suggestions are discussed in the rest of this section.The first possible extension would be to incorporate and analyze the effects of the difference of spatial scales, which was not included in this study.The coupled Lorenz-96 model used in this study is highly idealized and not necessarily representing real coupled models with various spatial scales.
In addition to the strength of coupling, a design of observation networks substantially affects the performance of sCDA and wCDA, which was not discussed in this study.To maximize the potential of sCDA, it is important to discuss an efficient observation network (observation's distribution and frequency).The effect of observation errors should also be discussed more thoroughly.The relationship between efficient observation networks and coupling strength should be investigated as a future work.
We do not consider automatic/dynamic tuning of DA hyperparameters.It is computationally expensive to perform the hyperparameter sweep in real-world applications.Hence, it is essential to investigate the performance of sCDA and wCDA with dynamic localization and covariance inflation schemes (Miyoshi, 2011;Yoshida & Kalnay, 2018) as a function of inter-compartment interaction.
The findings of this study can currently be applied to ensemble DA.Considering that many operational CDA systems use variational DA methods, it is of paramount importance to investigate if our findings can be transferred to the variational DA methods.Future works should focus on the conditions in which sCDA can be accurately applied with variational DA-based Earth system prediction systems.

Conclusion
In this study, we investigated the characteristics of CDA in chaotic systems through the experiments with the coupled Lorenz96 model.sCDA easily outperformed wCDA in the case where the interaction between compartments is large and suppresses the dynamics of each compartment preventing the growth of errors.In this case, the cross-covariance become larger and can be easily estimated by sCDA.In addition, as the cross-covariance gets large, the performance of sCDA is less sensitive to hyperparameter (namely localization radius and covariance inflation).In the absence of a sufficient ensemble size, sCDA outperforms wCDA only in cases where the interaction can damp the dynamics of other compartments and the inaccuracy of the model is small.Experiments with various compartment interactions, such as unidirectional, symmetric, and inverse, revealed that the above properties of sCDA hold in different coupling styles.We concluded that the chaoticity induced by crosscompartment interactions is important to understand whether sCDA outperforms wCDA.Our imperfect model experiment implied that inter-compartment covariance is sensitive to model imperfection, which brings the inaccurate state estimation by sCDA.We systematically show that sCDA is more vulnerable to model imperfectness than wCDA in our coupled Lorenz96 model.Since sCDA inevitably uses erroneous inter-compartment covariance in real-world applications, it is challenging to apply it to complex and biased Earth system models where model parameters or dynamics are highly uncertain.
The novelty of this study is as follows.First, we summarized how sCDA and wCDA behaves under various compartment interactions and revealed that besides strong cross-compartment interactions, how they amplify the whole system' variables and their chaoticity was imperative for sCDA to work.Second, we discussed our findings with an imperfect model, showing the potential mechanism of why sCDA fail to outperform wCDA when the model imperfectness is severe.

A4. Absolute Value of Inter-Compartment Covariance
We measured how inter-compartment interaction influences inter-compartment covariance by calculating the ensemble-based correlation matrix.Using the ensemble data (the experiment with 80 ensembles which bring the smallest RMSE), the time-averaged absolute values of the correlation matrix were calculated over the last 100 assimilation windows.Figure A4 shows the average absolute value of correlation as a function of the distance from the diagonal elements with several β values.The horizontal axis shows the distance from the diagonal elements, and the vertical axis shows the average absolute inter-compartment correlation.The value of distance = 0 corresponds to the trace of inter-compartment correlation matrix, divided by the number of grid points.Figure A4 shows that the magnitude of cross-covariance does not greatly depend on the sign of β, in contrast to our experiment result of sCDA being better than wCDA especially when β is negative.Although intercompartment covariance plays an important role in sCDA, the difference of the performances between sCDA and wCDA in this study cannot be fully explained by the magnitude of covariance.

Figure 1 .
Figure 1.Tested covariance localizations.Areas with "+" signs show taking cross covariances between model components into account.

whereFigure 3 .
Figure 3.The number of positive Lyapunov Exponents (LEs) in the coupled system (gray solid line) and the value of dM/dt in both compartment (dashed lines).The x-axis shows the parameter β, and the y-axis shows the number of positive LEs as well as dM/dt.In this case, β equals to α.

Figure 4 .
Figure 4.The non-Gaussianity index in the coupled system.The x-axis shows the parameter β, and the y-axis shows the (a) mean abosulte skewness and (b) mean absolute kurtosis.In this case, β equals to α.The time-scale difference, c, is set to 1.

Figure 5 .
Figure 5.The performance of (blue) strongly coupled data assimilation and (red) weakly coupled data assimilation as a function of compartment interaction parameters.The y-axis (on the left) indicates tRMSE of the component A. The y-axis (on the right) shows the number of experiments in which Local Ensemble Transform Kalman Filter (LETKF) successfully works from total 320 experiments (64 hyperparameter settings * 5 initial conditions).In this model setup, the LETKF failure happens only when estimated state variables diverge.Note that it is different from the filter divergence in Ensemble Kalman Filter studies, which stands for extremely small variance of forecast ensembles.Panel (a) is for the experiment with 80 ensemble size, and panel (b) is that with 20 ensemble size.

Figure 6 .
Figure 6.The relationship between the minimum RMSE of (red) strongly coupled data assimilation and (blue) weakly coupled data assimilation and (green) the largest Lyapunov exponents in the coupled system.The results of compartment A are shown.Panel (a) is for the experiment with 80 ensembles, and panel (b) is that with 20 ensembles.

Figure 7 .
Figure 7. tRMSE X with respect to each Local Ensemble Transform Kalman Filter hyperparameter setting.The result is averaged over the four localization values in compartment B (namely, 3, 5, 7, and 9).Panel (a) is the case when β = 1.0, and ensemble size is 80. Panel (b) is the case when β = 0.0, and ensemble size is 80. Panel (c) is the case when β = 1.0, and ensemble size is 20.Panel (d) is the case when β = 0.0, and ensemble size is 20.

Figure 8 .
Figure 8. tRMSE X with respect to the types of the coupled data assimilation method (Local Ensemble Transform Kalman Filter hyperparameters are set to the optimal ones).(a) Large ensemble size (=80) is used.(b) Small ensemble size (=20) is used.Red, blue, black and green lines are results by weakly coupled data assimilation, strongly coupled data assimilation, psCDA_A, and psCDA_B, respectively.

Figure 9 .
Figure 9. tRMSE X and tRMSE Y with respect to the types of the coupled data assimilation method (Local Ensemble Transform Kalman Filter hyperparameters are set to the optimal ones).The coupling of compartments is one-way (namely, compartment A affects compartment B but not the other way around).(a, b) Large ensemble size (=80) is used.(c, d) Small ensemble size (=20) is used.Panels (a) and (c) show the performance in the compartment A, while panels (b) and (d) show that in the compartment (b) Red, blue, black, and green lines show the results of weakly coupled data assimilation, strongly coupled data assimilation, psCDA_A, and psCDA_B, respectively.

Figure 10 .
Figure10.tRMSE X with respect to the types of the coupled data assimilation method (Local Ensemble Transform Kalman Filter hyperparameters are set to the optimal ones).In this case, we assume suppressive compartment-coupling (namely, α = β).Panels (a) and (b) are results with large (=80) and small (=20) ensemble sizes, respectively.Red and blue lines show the performances of weakly coupled data assimilation and strongly coupled data assimilation, respectively.A green dashed line shows dM/dt.

Figure 11 .
Figure11.(a, c, e, g, i, k) tRMSE X and (b, d, f, h, j, l) tRMSE Y with respect to the types of the coupled data assimilation method.Red and blue boxplots show the results of weakly coupled data assimilation and strongly coupled data assimilation, respectively.We set the time-scale difference, c to (a-d) 0.5, (e-h) 0.3, and (i-l) 0.2.The ensemble size is set to 80 in (a, b, e, f, i, j) while the ensemble size is set to 20 in(c, d, g, h, k, l).

Figure 12 .
Figure12.(a) tRMSE X with respect to the types of the coupled data assimilation method.In this case, we assume compartment interaction term β is inaccurate.Red and blue boxplots show the results of weakly coupled data assimilation (wCDA) and strongly coupled data assimilation (sCDA), respectively.The x-axis shows the inaccurate β.In this Experiment 2-1 (see Table1), the true value of β is set to 0.5.Note that we assume symmetric coupling, so that α = β.The rightmost boxplot indicates the results when we estimated both the model state and the interaction term β by augmented state vector.(b) Same with (a), but the number of ensembles is 20.(c) Same with (a) but the true β is 1.0.Rightmost columns show the performances of sCDA and wCDA which estimate the interaction term β by state-augmentation.

Figure 13 .
Figure 13.(a) tRMSE X and (b) tRMSE Y with respect to the types of the coupled data assimilation method.Red and blue boxplots show the results of weakly coupled data assimilation and strongly coupled data assimilation, respectively.In this Experiment 2-2 (see Table1), we assume the compartment interaction term β is inaccurate.The x-axis shows the inaccurate β.The ensemble size is set to 80.In this experiment, the true value of β is set to 1.0.We assume one-way coupling, namely α = 0.

Figure A4 .
Figure A4.The average absolute value for the elements of inter-compartment correlation matrix as a function of the distance from the diagonal elements with β from 1.0 to 1.0.Horizontal axis shows the distance from the diagonal elements, and the vertical axis shows the average absolute inter-compartment correlation.Zero distance corresponds to the trace of intercompartment correlation matrix, divided by 40.

Table 1
Model Settings in the Experiments