Assimilation of Satellite Soil Moisture Products for River Flow Prediction: An Extensive Experiment in Over 700 Catchments Throughout Europe

In this study, we perform a data assimilation (DA) experiment on a very large number (>700) of small‐ and medium‐scale (150–10,000 km2) European catchments to assess the impact of satellite soil moisture (SM) DA on streamflow simulations for different climatic and hydrologic conditions. In the experiment, Climate Change Initiative SM active, passive and combined products are assimilated over a time period 2003–2016 via an Ensemble Kalman Filter (EnKF). The results show that, on average, the assimilation of the three products provides relatively small improvements as compared to analogous open loop (OL) results (i.e., with an increase on median Kling‐Gupta Efficiency equal to 0.0048, 0.0033, and 0.0022 [−] for the active, the passive, and the combined products, respectively). OL performance itself is found to be a significant driver of the assimilation results: greater improvements are observed in catchments with poor OL streamflow predictions and inaccurate precipitation estimates. The remotely sensed product accuracy also emerges as relevant for assimilation efficiency, while factors affecting SM retrievals such as vegetation density, topographic complexity and basin area are found to have only a limited impact on the spatial pattern of performance. Small and detrimental effects of SM assimilation are observed over fully humid catchments and at high latitudes where pre‐storm soil moisture has reduced control on runoff generation as well as in basins where the hydrological model contains structural limitations.

The sensitivity of the catchment response to SM can be varied and influenced by the dominant runoff mechanism operating in a particular catchment. For instance, SM DA usually has a negligible effect in catchments where the hydrological response is snowmelt-dominated (e.g., Koster et al., 2018;Reichle et al., 2019).
Moreover, models may not adequately describe the link between SM and runoff generation -thus limiting SM DA efficiency. This applies not only to conceptual hydrological models, but also to Land Surface Models (LSM) that do not always reflect the best hydrologic process understanding . Depending on model structure, simulated SM can be considered as an "index" of the wetness state that does not necessarily provide a measure corresponding to absolute soil water content (Koster & Milly, 1997). In this respect, hydrological models can often produce good runoff results in cases where their SM estimates contain systematic error, and a SM reanalysis may not necessarily benefit streamflow predictions. However, despite the model-specific nature of simulated SM, models tend to produce very similar temporal SM variability (Koster et al., 2009), confirming a high potential for SM DA provided that systematic differences between modeled and observed SM are treated appropriately. Furthermore, if pre-storm SM conditions in the subsurface layers are the most critical for runoff generation, then DA needs to rely on adequate vertical coupling between surface and root zone model layers for the satellite surface SM assimilation to be effective (e.g., Brocca et al., 2012;Chen et al., 2011).
Another key factor in SM DA performances is the accuracy of satellite SM for the purpose of hydrological simulations. In addition to errors in sensor measures and retrieval model, both pre-retrieval and post-retrieval quantities are impacted by the presence of urban areas, dense vegetation, open water, complex topography, snow coverage or frozen soils. These factors result in different types of retrieval errors (gross, systematic, and random) which must be properly corrected and/or characterized and may constitute a limit on DA results. In addition, the procedures used to deal with observation errors (e.g., approaches for bias correction and random error characterization) have been shown to have a significant impact on assimilation results (e.g., Massari et al., 2015).
Finally, different DA approaches rely on their own assumptions (e.g., observation errors uncorrelated in time, error normality, absence of bias, etc.), that, if violated, can preclude realizing any material advantage from SM retrieval skill.
Most studies addressing the impact of DA on streamflow prediction have been carried out on a small number of basins within a limited geographical area. Very few studies have reported results from more extensive analyses considering a large number of basins representing different (climatic, physiographic, etc.) conditions and, therefore, truly capable of providing truly general (i.e., not case-specific) results. In Kumar et al. (2014), the effects of assimilating satellite SM (retrieved from a range of radiometers including AMSR-E) were evaluated over the continental United States. In their experiment, a LSM coupled with a distributed routing module was employed. Minor streamflow improvements were obtained across most basins-perhaps due to the high skill of model SM estimates prior to SM assimilation. Fairbairn et al. (2017) evaluated the impact of assimilating ASCAT-derived SM into a LSM coupled with a distributed hydro-geological model. The investigation was conducted on over 500 stream-gauge stations in France, with the aim of correcting random errors in model initial conditions; however, the experiment resulted in spurious increases in runoff, which degraded the model performances, partly attributed to limitations in DA implementation. In Lievens et al. (2016), SMOS SM was instead assimilated into a LSM coupled with a lag-routing module. The experiment was conducted on a large basin in Australia with over 100 streamflow stations, generally obtaining minor improvements. Small assimilation impacts were attributed to: Limited streamflow sensitivity to antecedent SM conditions; vegetation, climatic, and orographic effects on satellite retrievals; and ambiguous procedural choices for bias correction and random errors characterization.
The extensive experiments in the above-mentioned studies have analyzed the use of data from only one type of microwave sensor (i.e., from only scatterometers or radiometers). For example, Fairbairn et al. (2017) assimilated ASCAT satellite soil moisture data; Kumar et al. (2014) used only a passive microwave sensor and Lievens et al. (2016) assimilated passive SMOS soil moisture retrievals. As a result, these studies have neglected active versus passive variations in SM retrieval quality  contexts (e.g., Spyrou et al., 2020;Viterbo et al., 2020), they are generally much more difficult to apply in near-real-time than simpler conceptual models (Wijayarathne & Coulibaly, 2020). For instance, the Sacramento model (SAC-SMA, Burnash et al., 1973) remains a key model used by the United States National Weather Service River Forecast System to issue river forecasts across the country. Likewise, the HBV model (Bergström, 1995) is widely used by the Swedish Meteorological and Hydrological Institute for operational hydrological forecasting and spillway design.
Other SM DA experiments involved a large number of basins, but as opposed to the above-mentioned studies, were performed on a decadal or monthly scale or without a routing module (e.g., Albergel et al., 2017;Koster et al., 2018;Reichle et al., 2019). These studies also tend to report a relatively small impact on streamflow simulations.
The primary task of this study is to contribute to the evaluation of satellite SM assimilation in a conceptual rainfall-runoff model and, by considering a large sample of catchments (>700) located throughout Europe, to allow for an exhaustive investigation of factors impacting SM assimilation performance. In this way, we also expand upon the geographic coverage of past studies that were mainly confined within national boundaries (e.g., Draper et al., 2011;Fairbairn et al., 2017;Ridler et al., 2014); moreover, the set of basins, we explore is representative of a wide range of climates, as well as vegetation and orography conditions for which the generalization of research findings and conclusions in different contexts is not obvious and cannot be automatically extended.
Here, indications about the added value of SM retrievals from both active and passive sensors under different catchment conditions are thoroughly investigated, focusing on the way in which the accuracy evaluated for the satellite SM data reflects on improvement or deterioration of simulated discharges. This systematic investigation is conducted using SM retrievals derived from active and passive microwave sensors with sufficiently extensive temporal data coverage. For this purpose, SM products from European Space Agency Climate Change Initiative (ESA CCI) are employed. These products are derived by merging observations from different sensors to generate an active, a passive, and a combined data product -each covering a multi-decadal period.
Besides basic catchment properties, data assimilation results also depend strongly on the baseline assimilation model used. Here, remotely sensed observations over the period 2003-2016 are assimilated via an Ensemble Kalman Filter (Evensen, 1994) into a conceptual rainfall-runoff model MISDc-2L . The central research question is whether remotely sensed observations (soil moisture in this case) can compensate for the process-level simplicity of a conceptual model-relative to the more comprehensive representation provided by LSMs. To further explore the effect of satellite SM DA on discharge predictions, the enhancement in efficiency indexes with reference to different streamflow conditions (high, medium, and low flows) are also evaluated.
Furthermore, we carry out a systematic investigation of different conditions possibly influencing the assimilation impact on runoff modeling. This analysis focuses on multiple factors including: (a) the role of baseline open loop (OL) model performance, (b) sensitivity to the use of three different ESA CCI SM data sets, and (c) the accuracy of remotely sensed data. Moreover, the influence of catchment characteristics (e.g., vegetation, topography, and climate conditions) as well as of runoff generation dynamics (e.g., snow-dominated catchments) are also considered.
The study is structured as follows. The materials employed are described in Section 2, and our methods are illustrated in Section 3. The results are presented and discussed in Section 4, and concluding remarks are provided in the last section.

Study Catchments and Hydrological Data
The DA experiment is conducted on a set of 775 European catchments, with areas ranging from 150 to 10,000 km 2 (Table 1 and Figure S1). Daily discharge measurements for these catchments are obtained from five different hydrological archives: The Global Runoff Data Base, the European Water Archive, and DE SANTIS ET AL.

10.1029/2021WR029643
3 of 21 national databases from Italy (ISPRA HIS), Spain (CEDEX CEH), and Portugal (SNIRH). Catchment boundaries, if not made available by data product developers, are obtained following the procedure described in Camici et al. (2020). The study period starts in 2003 (in order to match the daily availability of satellite SM retrievals) and ends on a basin-specific date between 2011 and 2016 (depending on the particular streamflow data record available within each catchment). Potential anthropogenic impacts on discharge time series are checked through an accurate visual inspection identifying clear patterns of flow regulations like constant flow regime to variable precipitation inputs and sudden changes in the flood hydrographs. Moreover, since the model simulates only natural flows the performance over highly regulated basins are very low and the chance of considering these basins is significantly reduced by verifying that the model's streamflow predictions reach a minimum performance benchmark. The list of basins employed in this experiment is reported in Table S1.

Climatic Data
Basin average precipitation and mean temperature at the daily scale are estimated from the gridded 0.25° E-OBS v19.0e data set (Haylock et al., 2008) provided by the European Climate Assessment & Data set (ECA&D) project for the period 2002-2016. E-OBS gridded data are obtained from the ground station observations collected as part of the ECA&D initiative, with station density varying significantly over the European domain. The current spatial interpolation method is based on the generation of a 100-member ensemble of realizations of each daily field. Unfortunately, the uncertainty directly predicted by this ensemble spread, which is closely related to the station density, turned out to be an underestimation in data-sparse regions (Cornes et al., 2018). For this reason, the provided ensemble spread values are only qualitatively considered in this study to capture spatial patterns in expected precipitation errors. Figure S2 plots the temporal average of this ensemble spread over all study catchments for the study period.

Remotely Sensed Soil Moisture Data
Within the ESA CCI project, multiple single-sensor active and passive microwave SM data sets are blended into three separate harmonized global surface soil moisture products: A merged active, a merged passive, and a combined one (Dorigo et al., 2017;Gruber et al., 2019;. All ESA CCI SM v04.4 products employed in this study are provided with a daily temporal sampling (reference time at 0:00 UTC) on a regular 0.25° grid.
Ancillary data are provided by the ESA CCI project containing information on the spatial distribution of factors affecting the accuracy of SM retrievals including vegetation density, expressed as temporally averaged vegetation optical depth (VOD), and topographic complexity. The spatial distributions of basin-averaged VOD and topographic complexity within the study domain are plotted in Figure 1.
SM observations for the period 2003-2016, consisting of retrievals not flagged for data inconsistency (e.g., temporary snow coverage or dense vegetation), are considered for the DA experiment. Until December 2006, active microwave retrievals are obtained from the ERS wind scatterometers (revisit time of several days) and, subsequently, from the ASCAT sensor on board MetOp satellites (with near-daily time coverage). Passive microwave data are based on observations from the AMSR-E sensor on the Aqua satellite, the Wind-Sat sensor on the Coriolis satellite, the SMOS radiometer, and the AMSR2 sensor on the GCOM-W1 satellite. Combined, these sources guarantee nearly daily data cover over our European spatial domain.
A preliminary evaluation of the accuracy of ESA CCI combined SM is performed ( Figure S3) in terms of Pearson correlation with modeled SM estimates acquired versus surface SM estimates acquired from the Global Land Data Assimilation System (GLDAS) (Rodell et al., 2004). These results show good agreement in the Mediterranean (particularly the Iberian Peninsula). However, mountain ranges (e.g., the Alps, Pyrenees) and Nordic areas (e.g., Scandinavia) are characterized by relatively poor temporal correlation between GLDAS and ESA CCI.

Hydrological Model
The rainfall-runoff model employed in this study is MISDc-2L (Brocca et al., 2012;Massari et al., 2018), which allows for the reproduction of different runoff generation mechanisms and simulates several hydrological processes including evapotranspiration, infiltration, percolation, snow melting, surface, and subsurface runoff.
The model consists of two components; the first one is a soil water balance model that simulates soil moisture temporal patterns and sets the initial conditions for the second component which is a rainfall-runoff model for flood hydrograph simulation.
The soil water balance component subdivides the soil into two zones: At each time step, the model simulates the temporal evolution of the soil water content in the upper (layer 1) and lower (layer 2) soil layers-expressed in terms of water depth ranging from 0 to W max,1 and W max,2 , respectively. To model the dynamics of fluxes and water storage processes, the two layers are assumed as interconnected: Specifically, a percolation rate between the two layers, calculated via a non-linear relation proposed by Famiglietti and Wood (1994), links the one-dimensional representation of the soil water balance in the upper soil zone to that of the lower soil zone. This two-layer approach allows for the simulation of SM control on the partitioning of rainfall into infiltration and surface runoff (infiltration excess) in the upper layer, as well as a saturation excess mechanism when water content exceeds the maximum capacity. Surface runoff generated from infiltration and saturation excess is convoluted through a Geomorphological Instantaneous Unit Hydrograph, while subsurface runoff is transferred to the outlet by a linear reservoir approach. For both routing schemes, a common lag time is applied using the relationship with catchment area proposed by Melone et al. (2002).
The model is applied in lumped mode, with a daily time step. W max,1 fixed at 45 mm (which roughly corresponds to 10 cm soil depth at a porosity of 0.45 m 3 m −3 ) for each catchment; this assumption is a tradeoff between the soil depths investigated by satellite sensors and the necessity to avoid numerical instabilities at a daily scale due to an excessively reduced soil thickness (Manfreda et al., 2014). Reducing the gap between modeled and observed surface layers allows for the direct assimilation of remotely sensed SM without a preprocessing step such as the exponential filter (Wagner et al., 1999) that will introduce time-correlated errors into observations. The soil water capacity of the lower layer, W max,2 , is calibrated for each catchment within the range 250-3000 mm, together with eight other parameters (Table S2). All model SM variables are expressed in relative saturation terms by normalizing water content with respect to layer capacity, and referred to as W 1% and W 2% , for the upper and lower soil layers, respectively.
DE SANTIS ET AL.

Methods
DA methods aim to merge forecasted model state (i.e., the background) and observations by considering their respective uncertainties to minimize the error variance in the updated model states (i.e., the analysis). For its computational efficiency and robustness, the EnKF, a sequential DA method well-suited for dealing with moderate nonlinearities typically found in hydrologic models, is employed for assimilating satellite data. The EnKF requires an appropriate ensemble of model states to represent the background error covariance at any given time.
The EnKF assumes the absence of bias in model and observational data sets and is designed to handle random, zero-mean, normally distributed errors. Following typical practice, systematic differences between modeled and remotely sensed SM are corrected prior to DA. A common approach is to rescale the observations to match some statistical properties of the model estimates (e.g., Reichle & Koster, 2004), with the implicit outcome that all bias is attributed to the observations. Here, a preprocessing phase is implemented with remotely sensed SM data being mapped into the space of the W 1% state variable through the CDF-matching approach. The "preprocessed" surface soil moisture observations (hereinafter referred to as SSM*)-properly characterized in terms of their random error variance   2 ,obs -are then assimilated to update W 1% and W 2% model states.
Calibration and DA evaluation are both performed over the entire period 2003-2016. The reason for considering a single time period is twofold: On the one hand there is a limited overlap between actual data availability of ESA CCI SM products and observed streamflow time series (that in many cases do not go beyond 2012); on the other hand, there is a need to perform the various calibration steps (including observations rescaling parameters, here evaluated on a monthly scale) over a sufficiently long time-window to obtain robust estimates. Figure 2 provides a schematic overview of the steps applied to perform DA, which are described in detail in the following sections.

Ensemble Kalman Filter Implementation
The model state vector for the i-th ensemble member consisting of the soil water contents in the two soil layers is: When an observation y (i.e., SSM*) is available at time k t , the model state vector is updated as follows: where: 1. The Kalman gain matrix is defined as follows: where H T is the transpose of the observation operator, and  k P and R are the covariance matrix of background and observation errors respectively. For the assimilation of only one type of observation, the R matrix reduces to the scalar   2 ,obs , while the time-variant 2 × 2  k P matrix is defined as: with: and N is the ensemble size.

Model Calibration and Error Representation
The model calibration is performed by optimizing the Kling-Gupta Efficiency index (KGE; Gupta et al., 2009) based on the Euclidian distance from the ideal point of three components, that is, correlation and ratios in mean and standard deviation: DE SANTIS ET AL.
10.1029/2021WR029643 7 of 21 To represent the background error covariance in the DA framework, a proper ensemble of SM states is generated considering an adequate model error characterization. Errors in model representation of SM state variables are a consequence of both errors in forcing data and errors in estimated parameters and model structure.
In this study, model errors are simulated through direct perturbation of model climate forcing (i.e., precipitation and temperature inputs) and SM states (W 1% and W 2% ) according to appropriate error distributions. The approach implicitly represents both parameter and model-structural errors by directly perturbing model states (e.g., Alvarez-Garreton et al., 2014;Mao et al., 2019;Massari et al., 2018). All generated errors are assumed temporally independent, and the ensemble size is set to 100 members. An open loop (OL) run without data assimilation, with ensembles generated as described above, is also considered as a reference for SM DA assessment (considering the ensemble mean).
A multiplicative error is assumed for precipitation (Tian et al., 2013) based on a lognormal distribution with mean one and a standard deviation of   ,P. Inaccuracies in precipitation data are typically considered as the main random error source in hydrologic model forecasts (e.g., Han et al., 2014;Massari et al., 2018); therefore, given the high impact of rainfall on SM and runoff variability, the dimensionless   ,P value is calibrated for every catchment to satisfy an ensemble test on simulated discharges. A test, proposed by De Lannoy et al. (2006) and adapted to river discharge, checks the consistency between the simulated streamflow uncertainty, quantified by the temporal average of the ensemble spread,   sp , and the observed forecasting error, expressed by the temporal average of the ensemble skill,   sk , with: where Q i,k is the simulated discharge of the i-th ensemble member at time k t ,   Q the mean over the ensemble, Q o the observed discharge, N is the ensemble size and T p the number of time intervals. The precipitation error parameter   ,P is calibrated to ensure that   sp /  sk approaches unity.
For the other forcing and state errors, an additive model is used based on an assumed zero-mean Gaussian distribution. Regarding temperature uncertainties, the error standard deviation   ,T is fixed equal to 1°C.
The state vector consisting of W 1% and W 2% model variables is instead perturbed according to the error covariance matrix Σ: where error variance   2 , 1% W corresponds to an error standard deviation of 0.01, and γ is the ratio of standard deviation in W 2% to standard deviation in W 1% time series, produced during the deterministic run. Defining γ in this way provides noise magnitudes in the two layers that is coherent in terms of their ratio with unperturbed SM variability.

Observations Rescaling and Error Characterization
For each ESA CCI SM product, daily SM data are averaged over the catchments. Time series of the W 1% ensemble mean obtained during the OL simulations are then considered as reference data for the remotely sensed SM rescaling.
DE SANTIS ET AL.

10.1029/2021WR029643 8 of 21
A widely used rescaling technique is CDF-matching (Reichle & Koster, 2004) that seeks to match the complete cumulative density function (CDF) of satellite and model estimates by applying a nonlinear operator. The CDF-matching is applied here following Drusch et al. (2005) and using a sixth order polynomial function to correct CDF differences.
CDF-matching is generally used in a "lumped" mode, by considering, as a whole, the model and observational multi-annual data sets. To account for possible changes in climatological differences at seasonal time scale (Kumar et al., 2015), we calculated the rescaling parameters separately for each month.
During the period considered, the actual temporal coverage of the remotely sensed SM data can be highly variable due to several factors: The temporal frequency of ESA CCI contributing missions (i.e., poor availability of active SM retrievals prior to 2007), seasonal factors (e.g., frozen or snow covered soil at high altitudes and latitudes during winter period) or data gaps attributable to the ESA CCI processing and quality assessment chain (sometimes found in the passive and combined data sets). Therefore, a minimum sample size of 150 remotely sensed SM retrievals is imposed to obtain reliable polynomial coefficients at a monthly scale.
Prior to DA, errors in rescaled SSM* must be characterized in statistical terms. Observation random errors are modeled as additive, temporally uncorrelated and Gaussian-distributed with zero-mean and stationary error variance   2 ,obs which needs to be estimated.
Several error characterization methods are available (e.g., Dorigo et al., 2010;Gruber et al., 2016). Here, the error characterization is performed with the aim of evaluating the optimal relative weight of satellite-based observations with respect to the background modeled SM as to improve streamflow simulations. In other words, the observation error variance is selected by individually maximizing assimilation efficiency (expressed through KGE index) in each study catchment (e.g., Loizu et al., 2018;Massari et al., 2015;Matgen et al., 2012;Patil & Ramsankaran, 2017).

Performance Metrics
The aforementioned KGE index is used as main indicator for DA assessment. Since it is sensitive to large errors (Garcia et al., 2017;Pool et al., 2018;Santos et al., 2018) commonly observed at flow peaks, an KGE index value computed on the untransformed discharges is considered here as a criterion for evaluating the high-flow conditions. To evaluate the performance of the hydrological simulations under different flow conditions, two transformed variants of the KGE skill metric are also adopted to put more emphasis on specific flow ranges. Specifically, the KGE index computed on root-squared-transformed discharges, referred to as KGE sqr , provides more balanced information (Garcia et al., 2017) and is considered here to be more suitable for the evaluation of average-flow conditions. Likewise, the KGE index is also applied to the inverse of the discharge, referred to as KGE inv , to put more weight on low-flow periods (Santos et al., 2018). To deal with zero discharge values, a small constant equal to 1/100 of the mean observed streamflow is added to the discharge values before transformation. Mutual differences in performance metrics between the DA and OL cases are also evaluated and indicated as ΔKGE, ΔKGE sqr , and ΔKGE inv , respectively; positive values imply an improvement over the OL performance.

Model Performances
Following individual calibration in each catchment, relatively good KGE index values (median value equal to 0.773) are obtained. Calibration runs also exhibit satisfactory KGE sqr (median equal to 0.763). However, low-flow conditions reveal poor KGE inv results (median equal to −0.888). The distributions of performance metrics across all catchments are presented in Figure 3. show very limited differences compared to KGE values for the calibration run. Although still unsatisfactory, larger improvements (vs. the calibrated run) are noted for KGE inv results ( Figure 3). The spatial distribution of the OL KGE index, shown in Figure 4, is largely explained by that of the calibrated precipitation error   ,P-which, in turn, show spatial similarities with the independent rainfall uncertainty estimates provided with the E-OBS rainfall product ( Figure S2). OL performance within each study catchment is summarized in Table S3.

Remotely Sensed Soil Moisture Rescaling
The rescaling between satellite observations and the ensemble mean of W 1% for the OL run is performed individually within each catchment during the 2003-2016 period with the described monthly CDF-matching approach.
DE SANTIS ET AL.   The rescaling of satellite data is not performed on catchments that do not meet the data availability threshold imposed above to obtain reliable polynomial coefficients. As a result, the number of catchments considered in DA experiments is 775, 716, and 770 for the assimilation of active, passive, and combined products, respectively. The final availability of rescaled observations for each ESA CCI SM product is shown in Figure S4.
Correlation coefficients r SM between the modeled W 1% reference and active-, passive-, and combined-derived SSM* retrievals are sampled. Having matched the mean and variance of these data via a rescaling method, r SM values will reflect the magnitude of mean-square-differences normalized with respect to a common variance.
The distribution of correlation coefficients within the catchment set is illustrated in Figure 5. Median r SM values are equal to 0.63, 0.44, and 0.56 for the active, passive, and combined products, respectively. The active-derived SSM* product captures the most temporal soil moisture variability in the reference modeled SM, with r SM values that are generally above 0.5. Correlation coefficients exhibit distinguishable spatial patterns, with the best agreement obtained in Spain and United Kingdom. The lowest r SM values are found at the highest latitudes (i.e., Iceland, Scandinavia) and in mountainous regions (i.e., the Alps, Pyrenees) DE SANTIS ET AL.  where climatic and topographic factors negatively influence the quality of satellite-based SM retrievals. The use of the combined product essentially confirms the same results-although with slightly lower r SM values overall. The assimilation of passive-only SM retrievals leads to significant deviations in SM temporal patterns, resulting in correlation coefficients that are generally below 0.5-with widespread higher exceptions in Spain.

Observation Error Variance Calibration Results
The observation error variance employed during assimilation,   ,obs values are 0.33, 0.45, and 0.41 for the active, passive, and combined products, respectively. The high variability in calibrated   ,obs underscores the importance of accurately specifying   ,obs to represent spatially varying observation errors for optimized DA performance.

Data Assimilation Impacts on Simulated Streamflow and Relation With SM Accuracy
The analysis of DA results is carried out by comparing the skill obtained for each of the three ESA CCI SM products within the DA experiment with respect to the OL both in space (i.e., spatial patterns) and time (i.e., seasonal variations).
Boxplots of ΔKGE, ΔKGE sqr , and ΔKGE inv metrics for each ESA CCI SM product are shown in Figure 7, with whiskers being extended between the 10% and 90% quantiles. Very slight improvements in the KGE index are generally found following DA, and median values of ΔKGE are equal to 0.0048, 0.0022, and 0.0033 for active, passive, and combined products, respectively. KGE differences larger than 0.01 are found in between only 15% and 35% of the study catchments-depending on the ESA CCI SM product assimilated. Conversely, deterioration in streamflow predictions are observed in about 25% of the catchments, regardless of the product used. These results confirm previous outcomes observed in the assimilation of satellite SM literature (e.g., Kumar et al., 2014;Lievens et al., 2016), in terms of the limited improvement in streamflow simulations associated with soil moisture DA. Specific DA performances within each study catchment are reported in Table S3. In this experiment, the SM assimilation mainly improves the KGE index through an enhancement in correlation between simulated and observed discharge; however, the latter is not generally due to an improvement in streamflow covariance, but rather to a decrease in the simulated discharge variance.
The assimilation of the active-derived retrievals generally leads to better DA results than the combined retrievals. While the assimilation of the passive retrievals provides the worst DA performance (as expected from its relatively poor quality over many catchments located in Central Europe and at high latitudes, see Figures 5 and S5). In this respect, Figure 8 highlights the agreement between distributions of r SM and ΔKGE values (i.e., higher correlations with modeled SM correspond to greater improvements in streamflow simulations). In other words, the assimilation of better products tends to produce better streamflow outcomes. Figure 9 shows the spatial distribution of differences in KGE between the DA and OL cases; blue shading indicates skill improvements and red shading skill degradations. The spatial pattern of improvements in KGE are quite similar across the three ESA CCI SM data sets. However, active and combined retrieval assimilation results in a net positive impact in central Europe and in the Mediterranean area (i.e., Spain, Italy) providing greater benefits to the discharge model predictions compared to passive product. These regional findings are consistent with those reported in other DA studies investigating the influence of different satellite soil moisture products on hydrological predictions (e.g., Azimi et al., 2020;Cenci et al., 2016; DE SANTIS ET AL.  Laiolo et al., 2016), which also highlighted the positive effect of active SM retrievals for improving runoff prediction over Mediterranean climates. In particular, Figure 9 illustrates that passive retrieval assimilation is somewhat less effective in central Europe. However, across all of Europe, the assimilation of the passive product essentially improves KGE within the same portion of catchments as the active product.
In general, at northern latitudes (e.g., the UK, Scandinavia) DA improvements are minimal and degradation is common. This could be due to snowmelt-dominated runoff in these regions (Berghuijs et al., 2019) that intuitively results in negligible SM data assimilation efficiency (e.g., Koster et al., 2018;Reichle et al., 2019). Some performance deteriorations after DA, not fully explained by the above-mentioned factor, can also be interpreted as a consequence of local climate conditions (as the deteriorations are also observed at lower latitudes during the cold season, see Figure 10).
Poor results can also be observed over western and northern hilly areas of UK. These areas are characterized by a relatively high rainfall accumulation and high soil moisture (for instance, in these areas, average values for the ensemble W 1% in OL range between 0.5 and 0.8, while in the rest of the study domain they usually vary between 0.05 and 0.35). A potential explanation for the less satisfactory performance after DA relies upon the sensitivity of runoff generation to SM: When SM is nearly saturated, runoff is largely controlled by precipitation inputs and less by pre-storm SM. This is confirmed by the study of Massari and Camici (2020) who show-in the same study area and with similar data-that, within the UK, pre-storm river discharge is a better proxy of basin initial conditions prior flood events than SM.
In Figure S5, the distribution of ΔKGE values is represented with reference to the main climate types present within the catchment data set (according to the Koppen climate classification provided in Chen & Chen, 2013). Mediterranean climates with dry summer are characterized by the greatest DA benefits, while the added value of satellite SM is less significant in fully humid climates, such as those occurring in the higher-middle latitudes (a condition characterizing more than half of the basins) and even more in high-latitude areas (i.e., Scandinavian basins). degradation is most common for the active product and less common for the assimilation of passive retrievals. The enhanced impact of SM DA for simulating floods, as opposed to low flow conditions, is consistent with Baugh et al. (2020).
The impact of DA during different seasons of the year is assessed by splitting the analysis of streamflow time series into warm-(April-September) and cold-(October-March) season halves. A seasonal analysis of OL performances (not shown) exhibits higher cold-season KGE values in some areas (e.g., the UK, France, Central Europe), while elsewhere no significant seasonal differences are found. In Figure 10, the DA results in terms of ΔKGE are reported for the combined data set; the other ESA CCI SM products confirm similar outcomes. Positive assimilations impacts are generally found during the warm season-reflecting the same spatial pattern found when considering the whole DA period. However, degradation is much more common during the cold season when most catchments register a negative effect after

Model Skills in Open Loop and Remotely Sensed Product Accuracy
Spatial and seasonal patterns within the study area suggest a strong relationship between OL model performance and DA effects. This is a well-known behavior when looking at SM assimilation impacts on LSM SM state estimates (e.g., Reichle et al., 2008). However, this effect has not been described in detail for the impact of SM assimilation on LSM fluxes. Figure 11 summarizes the results for assimilation of the active SM product, with other ESA CCI SM products showing similar outcomes. By comparing ΔKGE and ΔKGE sqr values with corresponding OL model scores, we find the largest DA improvement in medium-high flows within catchments characterized by poor OL performance. However, the opposite is true for ΔKGE inv results-which show the largest degradation in catchments where the OL performance is already poor during low-flow conditions. This suggests that during low-flow conditions the model structure can exert a non-negligible role on the DA efficiency. This is a noteworthy example that shows that the DA cannot correct for model structural deficiencies and can, in certain cases, even degrade already poor results.

Vegetation and Topographic Characteristics and Area of the Catchments
The impact of vegetation density and topographic complexity conditions known to affect the quality of satellite SM retrieval results is also analyzed.
The accuracy of satellite data is generally expected to be inversely proportional to vegetation density and topographic complexity. In Figure 12, the frequency of ΔKGE values corresponding to the ancillary data provided within ESA CCI SM products is reported. The color patterns for both active and passive products show only limited evidence of the expected trends for differences in runoff simulation performance. For both indices, larger improvements should concentrate in the upper left corner, while smaller ΔKGE or performance decrements should be located in the lower right corner.
In some cases, other factors appear to have a larger impact than vegetation density or topographic characteristics. For example, catchments with VOD and topographic complexity index higher than 0.5% and 12%, respectively, mostly located in the Alps,  with the lowest VOD (localized in Iceland, Scandinavia, Denmark, UK) and topographic complexity values (concentrated in Finland, southern England, northern Germany) exhibit negligible DA improvements, due to either climatic effects or good underlying OL performance.
To understand potential impacts of the area of the basins on assimilation results, and a possible relationship with SM spatial resolution limitations, we plotted ΔKGE as a function of basin area in Figure 13. However, results do not suggest a clear relationship exists between catchment area and ΔKGE.

Conclusions
In this study, the suitability of remotely sensed SM DA to improve hydrological model performances is evaluated through an experiment involving more than 700 European catchments with different climate, topographic and landcover characteristics. The use of the different ESA CCI SM products from active and passive microwave sensors is also investigated. Such an extensive analysis allows for the examination of exhaustive and non-case-specific indications concerning key conditions that determine the assimilation impact on runoff modeling in the study domain, as well as to the extent the streamflow simulations can generally be improved via SM DA.
The added value of remotely sensed SM assimilation compared to OL simulation appears to be weak for the majority of investigated catchments. The assimilation provided noticeable KGE improvements (i.e., larger than 0.01) in only 15%-35% of the catchments, depending on the ESA CCI SM product, whereas ∼25% of the catchments showed small, almost negligible, deteriorations in streamflow predictions. The DA improvement in the forecast skill is found to be higher in the warm season from April to September. These results are substantially confirmed when also considering the KGE sqr index, computed on rootsquared-transformed discharges to avoid applying excessive weight to high-discharge values. may be related to model structure and calibration practices. In this perception, SM assimilation can have a different impact if different model structures are considered as also found by Loizu et al. (2018). Thus, care should be taken in the result interpretation when using a fixed conceptual modeling of processes (under the "one model fits all" paradigms, i.e., a single general model applicable to every catchment, Fenicia et al., 2011) and a small sample of catchments.
Special attention is paid to the evaluation of various factors potentially affecting DA impacts on streamflow simulation. In this regard, our main findings can be summarized as follows.
• The OL model skill is found to play an important role in the assessment of DA effects and contributes significantly to differences in assimilation performances observed within the study area at a seasonal scale. In this sense, the integration of satellite SM information contributes more added value to poorer OL streamflow simulations. However, even in this case, only limited improvements are seen (e.g., for KGE values in OL lower than 0.55, the median ΔKGE is 0.0295). • Remotely sensed product accuracy is also found to play a role in DA effects on runoff. Among the investigated ESA CCI SM products, the active data set is generally the most accurate in reproducing true SM values and, when assimilated, provides the highest runoff improvement within the study catchments (median ΔKGE = 0.0048), followed by the combined (median ΔKGE = 0.0033) and passive (median ΔKGE = 0.0022) ESA CCI SM products.
DE SANTIS ET AL.  • Climatic factors (e.g., snow-dominated or fully humid conditions) are responsible for some local inefficiency of satellite observation integration related to negative effects on retrievals; moreover, the modification of runoff generation dynamics or prolonged periods of high saturation can also reduce the sensitivity of runoff generation on pre-storm SM. • Vegetation and orographic characteristics of the catchments provide only a partial explanation of differences in runoff estimation skill between the DA and OL.
Despite the large number of catchments considered in our analysis, we cannot exclude the possibility that additional factors (e.g., rescaling technique, choice of hydrological model and choice of assimilation filter) also significantly affected our results. Therefore, future investigations should consider the inclusion of different experiment configurations addressing specific issues to provide a clearer picture of the benefit of SM assimilation into hydrological models and the appropriate parameterization of a DA scheme based on catchment characteristics and data availability.

Data Availability Statement
Some results reported in the supporting information were generated using the QA4SM service (https://qa-4sm.eodc.eu/). The service was developed by TU Wien GEO and AWST GmbH. The results were generated on November 2019. The supporting data sets are available in Zenodo repository (http://doi.org/10.5281/ zenodo.3908392).