Flow‐ and scale‐dependent spatial predictability of convective precipitation combining different model uncertainty representations

Considering a whole summer season in central Europe, we find that the operational, convection‐permitting ICON‐D2 ensemble prediction system is spatially underdispersive in convective precipitation forecasts. The spatial spread of hourly precipitation is insufficient to capture the inherent error adequately across all scales (up to 300 km) and forecast times (up to 24 h). This lack of spread becomes more pronounced in the weak convective forcing regime. Using physically based stochastic perturbations in the planetary boundary layer is beneficial and leads to a reduction in spatial error at scales larger than 20 km and increases the spread at scales less than 50 km during weak forcing of convection, whereas the effect is almost neutral during strong forcing. Complementing the stochastic perturbations by perturbed parameters in the microphysics scheme shows an additive effect on spatial error and spread for a characteristic case study. Assessing the practical predictability of convective precipitation in a flow‐dependent manner is crucial, and our approach of combining multiple sources of uncertainty proves beneficial in mitigating the spatial underdispersion across scales, particularly during weak convective forcing.


INTRODUCTION
Convection-permitting models have led to a step-change in rainfall forecasting (Clark et al., 2016).Substantial progress has been achieved with improved data assimilation techniques introducing new types of observations and a better representation of the dynamics of atmospheric convection due to finer grid spacing that made it viable to turn off the parameterisation of deep convection.
Convection-permitting models provide on average more spatially accurate predictions than coarser-resolution models.
However, even with these advancements, the presence of inevitable uncertainties in a numerical weather prediction (NWP) system and the upscale growth of tiny perturbations due to the chaotic nature of the atmosphere limit the predictability in such high-resolution models.Ensemble forecasting techniques allow for an assessment of the limits of predictability, which depend largely on the atmospheric quantity analysed, the prevailing flow situation, and the geographical location considered, as well as the spatial and temporal scale of the predicted phenomena.In this article, we focus on evaluating the impact of forecast uncertainty of summertime convective precipitation in central Europe during varying convective forcing regimes on various spatial scales from hundreds to tens of kilometres.To do this we need three ingredients: (1) a state-of-the-art convection-permitting ensemble prediction system (CPEPS) that represents different sources of uncertainty, (2) an objective diagnostic to stratify different convective weather situations, and (3) a metric to identify spatial discrepancies between hourly precipitation fields.
In operational regional CPEPS, there are essentially three types of uncertainties represented.First, initial condition (IC) uncertainty is central in forecast ensembles and frequently realised by running multiple simulations starting with slightly perturbed initial states provided by, for instance, data assimilation procedures.Second, lateral boundary condition (LBC) uncertainty is typically represented by an ensemble of global model simulations providing diverse large-scale flow patterns encompassing the regional model's simulation domain.The third type is model uncertainty due to unresolved or poorly represented physical processes.Different strategies have been developed to include model uncertainty in CPEPSs.Unresolved, subgrid-scale physical processes and the chaotic nature of the atmosphere are included by applying stochastic perturbation methods, whereas structural uncertainties in the formulation of physical processes are frequently represented using "multiphysics" or perturbed parameter approaches (e.g., Berner et al., 2017;Fleury et al., 2022;Roberts et al., 2023, and references therein).Nevertheless, current CPEPSs are often underdispersive for near-surface variables (e.g., Bouttier et al., 2012;Raynaud & Bouttier, 2017) and the methodology for constructing ensembles that represent the numerous sources of uncertainty effectively remains an ongoing challenge.
Boundary-layer turbulence as well as cloud microphysics and their interaction with aerosols are key physical processes that represent important sources of model uncertainty in forecasts of convection (Clark et al., 2016).The initiation of convection is predominantly linked to boundary-layer processes, but those processes are not fully resolved due to their intrinsic small scales.In many current boundary-layer schemes the turbulent processes are represented by a mean state within a grid box.This leads to insufficient small-scale variability and inhibits or delays the initiation of convection, especially when convective forcing to trigger convection is missing (Kühnlein et al., 2014).In this study we apply the physically based stochastic perturbation (PSP) scheme (Hirt et al., 2019;Kober & Craig, 2016), which adds perturbations to the tendencies of T, w, and q in the boundary layer to increase turbulence.Independently, Clark et al. (2021) proposed a similar stochastic boundary layer (SBL) scheme representing turbulent eddies as random events following a Poisson distribution inside a bulk model of the convective boundary layer.The variability in precipitation rates introduced by the SBL scheme is beneficial for detecting potential risk of flooding better, whereas the variability in spatial distribution "fills the gap" left by insufficient representation of uncertainty by perturbations in initial and lateral boundary conditions (Flack et al., 2021).
Microphysical processes constitute another major source of model uncertainty.Among the large number of poorly constrained parameters used in current bulk microphysics schemes, cloud condensation nuclei (CCN) concentrations (Barthlott & Hoose, 2018;Dagan & Stier, 2020;Glassmeier & Lohmann, 2018) and the shape of the cloud droplet size distribution (CDSD, Igel & van den Heever, 2017a, 2017b) are known to be potentially influential parameters in precipitation forecasts.Thompson et al. (2021) and Barthlott et al. (2022aBarthlott et al. ( , 2022b) ) show that both CCN and CDSD uncertainty have non-negligible impact on accumulated precipitation using stochastic and perturbed parameter approaches, respectively.They also find that the combination of CCN and CDSD uncertainties has a larger impact than their univariate impact, although the synergistic impact is smaller than the simple sum of them.
The limits of predictability of convection are flow-dependent (Lorenz, 1963;Slingo & Palmer, 2011).The rate of convection is controlled by a variety of factors.First is the contribution to generation of instability by cooling of the troposphere, mainly by dynamically driven ascent, but with some contribution from radiation.The cooling aloft is complemented by heating and moistening of the boundary layer by surface fluxes or advection, leading to the creation of convective available potential energy (CAPE).However, the presence of CAPE does not guarantee the occurrence of convection.Triggering by mesoscale and local features is often required to overcome convective inhibition associated with a capping inversion at the top of the planetary boundary layer.Triggering features include convergence lines and other boundary-layer structures, but can also include perturbations from previous generations of convective clouds such as outflow boundaries and gravity waves.It is a challenge to identify characteristics of the meteorological situation that impact predictability.
The impact of the dynamical forcing of convection in the midlatitudes is often described in terms of strong or weak forcing.Typically this is identified based on the presence or absence of synoptic or mesoscale dynamical features that can drive ascent and the creation of CAPE.However, the identification of such features is usually subjective and it is useful to have a more precisely defined measure of the influence of the convective environment.For this purpose, Done et al. (2006) proposed the convective adjustment time-scale, which is a measure of the extent to which the convection is in equilibrium with the larger-scale forcing.If the inhibition of convection is weak and triggering disturbances are plentiful, convection will occur whenever instability is present and CAPE will be consumed as fast as it is created.The amount of convection, measured by say mass flux or precipitation, is in equilibrium with the forcing processes creating instability.On the other hand, if the inhibition of convection is strong and the triggering disturbances are weak or absent, CAPE can accumulate and potentially reach large values, that is, convection is in nonequilibrium.Equilibrium conditions often coincide with strong forcing, because dynamical ascent can weaken inversions while widespread convection provides an abundance of triggering disturbances to initiate new storms.But it is also possible that a very strong capping inversion or other sources of convective inhibition will prevent the convection from reaching equilibrium and the strong forcing will lead to rapid increases in CAPE.The convective adjustment time-scale of Done et al. (2006) uses an estimate of the ratio of CAPE to its rate of reduction by convection as a quantitative measure of the degree of equilibrium, as well as of synoptic forcing required for such equilibrium.This measure has been used by a number of studies to assess the flow dependence of different aspects of convection (Bachmann et al., 2020;Craig et al., 2012;Done et al., 2012;Flack et al., 2018;Keil et al., 2019).For instance, Keil et al. (2014) and Kühnlein et al. (2014) show that the summertime precipitation forecast skill during the nonequilibrium regime in central Europe is below the average for all days of the convective season.
The predictability of weather forecasts depends on the spatial and temporal scale considered.Generally, the forecast skill horizon is considerably longer for spatially and temporally averaged fields (Buizza & Leutbecher, 2015).Over the past decade, spatial metrics have become widely used to measure forecast skill of convection-permitting models acting on kilometre scales and to estimate its uncertainty (Casati et al., 2022;Ebert, 2009;Frogner et al., 2019).The scale at which convection-permitting models show decent skill remains large compared with the horizontal grid spacing and the typical convective rain extent.Reasonable forecast skill is achieved at scales of several tens to over a hundred kilometres for models (Keil et al., 2020;Mittermaier et al., 2013;Schwartz et al., 2009).A widely applied spatial measure to inspect the scale dependence is the fractions skill score (FSS, Roberts & Lean, 2008), which relaxes the pointwise comparison, introduces a spatial tolerance, and rewards the location proximity of forecast convective cells and observed ones in increasingly larger surroundings.
So far, only a few studies quantify the individual impact of boundary-layer and microphysical uncertainties on the spatial predictability of precipitation.Using a multiphysics approach, Cintineo et al. (2014) found that the contributions of different boundary-layer and microphysics schemes act on various scales.Different boundary-layer schemes have a greater spatial impact at the early stage of convection, followed by an increasing impact of microphysical uncertainty.Keil et al. (2019) inspect the relative impact caused by stochastic perturbations in the planetary boundary layer, varied assumptions of CCN concentration, and soil moisture heterogeneity on the uncertainty evolution of hourly precipitation rates.While the total amount of daily precipitation is hardly changed by the different perturbation approaches (less than 5%), the spatial variability of precipitation exhibits clear differences.The stochastic boundary-layer perturbations lead to the largest spatial variability, impacting precipitation from the initial time onwards with an amplitude comparable to the operational ensemble spread.Similarly, perturbed aerosol concentrations impact spatial precipitation variability shortly after model initialisation, but to a smaller degree.Matsunobu et al. (2022) show that initial and lateral boundary condition (IBC) uncertainties affect the spatial variability of convective cells differently compared with microphysical parameter perturbations (MPP).While the IBC perturbations irregularly mix the precise location of cells in nonequilibrium conditions, MPP mainly impact precipitation intensities and only shift those slightly.
The goal of the present study is to quantify the spatial predictability of convective precipitation under varying convective forcing regimes and inspect the individual as well as synergistic effect of different model uncertainty representations.To achieve this goal, we exploit two distinct ensemble forecast datasets generated with the convection-permitting ICON-D2 model.First, the three-month trial experiment testing the PSP scheme throughout summer 2021 (Puh et al., 2023) provides an unprecedented opportunity to perform a systematic assessment of practical predictability of operational precipitation forecasts and the impact of the stochastic scheme conditional on the flow situation.Second, a so-called "grand ensemble" approach, an ensemble of ensembles, is used to explore the impact of different formulations of model uncertainty: the physically based stochastic perturbation scheme PSP and parameter perturbations in a microphysics scheme, both in the presence of operational IBC uncertainties.Due to computational constraints, this latter analysis is only performed for two case studies representing varying convective forcing regimes taken from the trial period.
The next section contains details of the model, the sources of uncertainty, and an overview of forecast as well as observational datasets.The methods employed to address flow-and scale-dependent predictability are presented in Section 3.After this, we first discuss the individual impact of the PSP scheme for the three-month period in Section 4. Having demonstrated the beneficial impact of the PSP scheme on spatial predictability, in Section 5 we then focus on the synergistic impact by additionally including structural microphysical uncertainty, performed on two characteristic weather situations.Conclusions then follow in Section 6.

MODEL, ENSEMBLE DESIGN, AND DATASETS
Here we present details on the numerical model, the various sources of uncertainty, the ensemble design generating the two forecast datasets, and the observational dataset used to assess practical predictability.

ICON-D2 ensemble
All numerical simulations are performed with the ICOsahedral Nonhydrostatic (ICON) model in its limited-area mode ICON-D2, which has been used in operational weather forecasting at Deutscher Wetterdienst (DWD) since February 2021 (Reinert et al., 2021).ICON employs an unstructured icosahedral-triangular Arakawa C grid in the horizontal direction, formed by spherical triangular cells that cover a simulation domain seamlessly.
The ICON-D2 domain covers Central Europe with a grid spacing of 2 km (542,040 grid cells roughly encompassing 1400 km x 1600 km) and 65 vertically discretised layers from the ground up to 22 km above mean sea level.As described in Zängl et al. (2015), its dynamical core is based on the nonhydrostatic equations for fully compressible fluids.The prognostic variables are the edge horizontal wind speed, vertical wind speed, air density, virtual potential temperature, mixing ratios, and, when using the two-moment microphysics scheme (Seifert & Beheng, 2006), the number density of hydrometers.Time integration is performed using a two-time level predictor-corrector scheme.Hourly ICON-D2 output data are interpolated onto a uniform, rotated pole coordinate system consisting of 651 × 716 grid points (466,116 in total) with a grid spacing of 2.2 km.
2.1.1Initial and lateral boundary condition uncertainty IC and LBC are the backbone in limited-area modelling and represent a major source of forecast uncertainty.In the operational ICON-D2 ensemble prediction system (ICON-D2-EPS) at DWD, IC uncertainty is provided by the Kilometer-scale ENsemble Data Assimilation system (Kilometer-scale ENsemble Data Assimilation Schraff et al., 2016).In the 40-member ICON-D2-KENDA, the model state is updated hourly by assimilating observations into 1-h first-guess forecasts.In 2021, only conventional observations (synoptic stations, radiosondes, wind profilers, and aircraft) were assimilated operationally.
The uncertainty representation in LBCs stems from ensemble forecasts generated by the coarser grid model: ICON-EU, nested within the global ICON model.The global ICON-EPS has a horizontal grid spacing of 40 km (26.5 km since November 2022).An ICON-EU nest simulation is embedded online in the global ICON simulation and covers the entire Euro-Atlantic region with half the grid spacing.The ICON-EU ensemble provides the ICON-D2 LBCs.Forecast variability in the global ICON-EPS and ICON-EU-EPS is attained by 40-member IC perturbations generated by the ensemble data assimilation with an assimilation cycle of 3 h, and by ensemble physics perturbations where a random combination of tuning parameters is set for each of the ensemble members and fixed throughout the forecast horizon.As in DWD's operational setup (Reinert et al., 2021), ICON-EU ensemble forecasts initialised three hours before the initialisation time of the ICON-D2 ensemble are used.Therefore, the LBCs are updated hourly using the ICON-EU-EPS output at lead times 3-27 h.Since we focus primarily on the impact of model uncertainties, we consider the impact of IC and LBC uncertainty together and call it IBC uncertainty.In this study, the first 20 members of ICs and LBCs are used, owing to the limited computational resources.

2.1.2
The physically based stochastic perturbation scheme (PSP) Subgrid-scale uncertainty in the planetary boundary layer is represented here using the PSP scheme presented by Kober and Craig (2016) and revised by Hirt et al. (2019).The rationale of the PSP scheme is to improve the coupling between subgrid variability and convective initiation in km-scale models.In convection-permitting models, such as the ICON-D2 model, deep convection is represented explicitly.However, processes that lead to convective triggering often occur below the grid scale and are not sufficiently parameterised.These subgrid processes include boundary-layer turbulence, subgrid orography, and density currents resulting from convective downdrafts.The PSP scheme is re-introducing the missing effects of boundary-layer turbulence, most influential among the three effects, in a physically consistent manner.The revised PSP scheme contains modifications for added tendencies: an autoregressive, continuously evolving random field, a limitation of the perturbations to the boundary layer that removes artificial convection at night, and a mask that turns off perturbations in precipitating columns to retain coherent structures (Hirt et al., 2019).
The perturbations are given by adding stochastic disturbances to the model tendencies of temperature, specific humidity, and vertical wind within the planetary boundary layer following where Φ ∈ T, qv, w.  is a spatio-temporal structure of a random eddy field correlated with a time-scale  eddy and number density l eddy ∕Δx of the eddy. tuning is a scaling factor to magnify the amplitude of the perturbations.However, the total amplitude is inherently scaled by the standard deviation of the tendency √ Φ ′ 2 .The length of eddies l eddy is set to 1000 m.The typical lifetime of convective eddies used for a temporal autoregressive process is 10 minutes.The scaling factor  is 5.0.For all simulations using the PSP scheme, the number of the IBC ensemble members is used as a random seed generating a stochastic pattern, that is, the seed for ensemble member 1 is 1, and that for member 20 is 20 (as in Puh et al., 2023).

Microphysical parameter perturbations
To investigate the impact of microphysical uncertainty we use six combinations spanned by three different CCN concentrations and two different shape parameters of CDSD, similar to Matsunobu et al. (2022).These perturbations necessitate the use of the two-moment bulk-microphysics scheme (Seifert & Beheng, 2006), which is currently used in the pre-operational Rapid Update Cycle suite at DWD.
Perturbations in CCN concentrations consist of three pre-defined parameters: maritime (N CN = 100 cm −3 ), continental (N CN = 1700 cm −3 ), and polluted (N CN = 3200 cm −3 ) aerosol load based on Hande et al. (2016)."Maritime" emulates clean, pristine conditions that have quite small numbers of CCN typical for the sea."Continental" is the default setting that represents the typical CCN concentrations for the European continental regions.The "polluted" setting represents extremely polluted situations caused by, for example, massive wildfires and considerable anthropogenic emissions.
We also vary the shape parameters of CDSD estimation.The CDSD is approximated following the generalised gamma distribution, where A is dependent on the number density of hydrometeor particles and  is a coefficient dependent on the average particle mass.The coefficients  and  are parameters that are pre-defined and fixed throughout a simulation.In this study we control the widths of the CDSD by varying the shape parameter  between 0 and 8 to cover a wide spectrum of possible shape parameter values (Barthlott et al., 2022b).Since the parameters describing the CCN concentration and the shape of CDSD are kept temporally and spatially constant throughout a simulation, the MPP represent a structural uncertainty, mimicking model error due to incomplete knowledge of physical parameters rather than subgrid-scale variability.

Three-month trial run
The first part of this work builds on the ensemble forecast dataset of the trial run (Puh et al., 2023).This dataset facilitates a systematic evaluation of the flow and scale dependence during the three-month period from June-August 2021.The reference ensemble (denoted as "trial reference", TR) is identical to the operational 20-member ICON-D2-EPS, driven by operational ICs provided by ICON-D2-KENDA and LBCs provided by ICON-EU-EPS, using the one-moment bulk-microphysics scheme and including random physics parameter perturbations.In a second parallel ICON-D2 ensemble, we use this ICON-D2 ensemble configuration and additionally turn on the PSP scheme (denoted by TP henceforth).All simulations were initialised daily at 0000 UTC with 24-h forecast lead time and were performed on DWD's High Performance Computer (more details in Puh et al., 2023).The comparison of these ensembles (TR and TP) allows an estimation of the systematic impact of the PSP scheme.The results are presented in Section 4.

Grand ensemble case studies
Our second objective in this work is to gauge the individual and synergistic impact of different formulations of model uncertainty, the PSP scheme and parameter perturbations in the microphysics scheme, both in the presence of operational IBC uncertainty.To achieve this goal we designed a Note: These ensembles form the second forecast dataset discussed in Section 5.
"grand ensemble" (similar to the superensemble of Flack et al., 2021) containing IBC uncertainty and two different flavours of model uncertainty.The "grand ensemble" enables the inspection of the synergistic impact, but also facilitates an estimation of the individual impact of the PSP scheme and MPP by applying subsampling (as in Craig et al., 2022).The results are displayed in Section 5.
Here the "grand ensemble" is a 120-member ensemble consisting of 20 different IBCs, the PSP scheme turned on, and six realisations of microphysical uncertainty (denoted IPM, see Table 1).The IM ensemble lacks stochastic perturbations, while IP combines IBC uncertainty and the PSP scheme where each of the 20 ensemble members has different IBCs and different random seeds in the stochastic scheme, as described in Section 2.1.2.Additionally we performed three ensembles containing the different sources of uncertainty individually.In the I ensemble, IBC perturbations are the only source of uncertainty; the P ensemble has one IBC realisation (ensemble member 1 of I) but 20 different random seeds in the PSP scheme, whereas the M ensemble solely includes six MPPs.The I ensemble can be seen as a reference ensemble in this part of the work.In order to assess the impact of microphysical uncertainty, we used the two-moment bulk microphysics scheme (Seifert & Beheng, 2006) and turned off random parameter perturbations to focus purely on the impact given by the perturbations selected explicitly.Apart from these two differences, the model set-up is identical to the trial dataset described in Section 2.2.Since such an approach is computationally expensive, we restrict the numerical simulation to two cases representing varying convective regimes taken from the trial period.

Verification dataset
An assessment of the practical predictability of convection, in particular its scale-dependent component, requires a sound spatio-temporal dataset of precipitation measurements.The observations used for verification in this study are obtained from DWD's observation network Radar-Online-Aneichung; (DWD, 2023).Near-surface reflectivities are observed every five minutes with 17 C-band radars at a spatial resolution of 1 km.The reflectivities are converted to five-minute precipitation rates and calibrated with ground ombrometer measurements.They are then accumulated to give hourly precipitation rates and regridded to a rotated pole grid identical to the ICON-D2 outputs for evaluation.Thus, the radar precipitation dataset (RY product) comprises a synthesis of two data sources, radar and ground measurement network.
The verification domain encompasses most of Germany and adjacent countries, as shown in Figure 1.

ANALYSIS METHODOLOGY
In this section, we introduce the methodology for investigating the flow and scale dependence of predictability.
The flow dependence is assessed by selecting cases under varying convective regimes classified by the convective adjustment time-scale.The scale dependence is determined using the FSS, which is a widely used spatial verification technique.

Flow-dependent analysis: convective adjustment time-scale
Modulations of the convective environment are distinguished using the convective adjustment time-scale  c (Done et al., 2006;Keil & Craig, 2011) describing a time-scale over which CAPE is consumed by precipitation and convective equilibrium is established.This objective measure to classify convective weather situations is defined as CAPE (J ⋅ kg −1 ) over its removal, which is expressed by the precipitation rate P (kg ⋅ s −1 ⋅ m −2 ): where c p (specific heat capacity),  0 (reference density), T 0 (reference temperature), L v (latent heat of evaporation), and g (gravitational acceleration) are constants.
As an example, fingerprints of representative precipitation patterns during different convective forcing regimes are displayed in Figure 1.During nonequilibrium conditions, there is no general uplift and CAPE can accumulate until local processes trigger convection.Being controlled by local factors, the resulting precipitation field typically has an intermittent spotty character (Figure 1a,b).The area-averaged  c attains fairly large values, especially before the onset of convective precipitation around noon (Figure 1c).On the other hand, during equilibrium, ascending motions driven by a large-scale flow cause widespread heavy rainfall (Figure 1d,e).In such conditions, CAPE generated by large-scale processes is immediately reduced by convective activity and  c usually attains small values of less than one hour (Figure 1f).

3.2
Scale-dependent analysis: fractions skill score

Spatial error and spread
In this study, a variant of the FSS (Roberts & Lean, 2008) is used as a metric to examine the scale-dependent forecast error and spread of precipitation.The FSS is a fuzzy scoring technique quantifying the similarity between two binary fields (denoted A and B, observation and forecast fields in error, or two distinct ensemble members in spread) in terms of a predefined neighbourhood scale.The definition of the FSS is given by where f A and f B represent the fraction of rainy grid points in fields A and B, respectively, at which the precipitation amount is above a certain threshold value.The second term on the right-hand side is the ratio of the mean squared error (MSE) of the fraction fields A and B to the maximum possible MSE.If the number of grid points with a value of 1 within a certain neighbourhood of a grid point is equal between two fields, the FSS is 1.0, which means the two fields being compared are identical at the scale of the neighbourhood window.The FSS becomes smaller as the difference between the two fields gets larger, and it becomes 0.0 when only one of the fields has values and the other has a complete miss in the respective neighbourhood.
It is known that the FSS is quite sensitive to the fraction of precipitating grid points in the entire field (Mittermaier & Roberts, 2010;Skok & Roberts, 2016, 2018).To remove the effect of frequency bias, we use the 95th percentile values of hourly precipitation as the threshold to generate binary fields.The percentile threshold keeps the number of grid points used for the FSS calculation constant.Throughout the study, hourly forecast and observed precipitation rates are used for the evaluation.If the number of rainy grid points is less than 5% in the evaluation domain (Figure 1), the field at that time is regarded as a complete miss.If both fields compared are a complete miss, that combination of fields is excluded from the analysis.The neighbourhood sizes are varied from 2.2 km (1 grid point) to 336.6 km (153 grid points) with an interval of 2 grid points.The largest window size corresponds to half of the shorter side of the evaluation domain, encompassing 294 × 341 grid points.
While the FSS was originally developed for comparing observations and forecasts, it can also show the dispersion of two fields (Dey et al., 2014).In this study, spatial error and spatial spread are defined based on Dey et al. (2014)'s method in order to evaluate the spatial error-spread relationship.To enable the FSS to be directly comparable like a classical error-spread analysis, spatial error and spread are defined as one minus the ensemble mean of each FSS: where N is ensemble size and the subscripts of and ff mean that the FSS is calculated between observation versus forecast and forecast versus forecast, respectively.Following Dey et al. (2014), we calculate FSS ff for all combinations of ensemble members belonging to an ensemble.For instance, FSS ff for a 20-member ensemble can be calculated 20 × 19/2 = 190 times, and for a six-member ensemble 15 times.Given the robustness of the ensemble mean FSS to ensemble size (Necker et al., 2024), we can compare mean FSS for ensembles of different sizes.
The error-spread relationship of the FSS was first illustrated by Zacharov and Rezacova (2009).Their result shows that FSS-based spread is smaller than FSS-based error for five case studies.However, their FSS-based spread might be underestimated because their ensemble spread calculation was centred on a single reference forecast.
Here we apply the method of Dey et al. (2014) to mitigate this problem.

Scale detection using a displacement scale
Although the FSS is a powerful measure for quantifying spatial dispersion of intermittent fields such as convective precipitation, it does not provide a direct measure in physical space.To identify scales where there is a certain degree of error or spread, we use a displacement scale (DS), defined as the smallest neighbourhood window size: where f 0 is the fraction of grid points considered in the FSS calculation (the 95th percentile threshold gives f 0 = 0.05).Statistically, the DS is the smallest scale at which the forecast contains more useful information than a random forecast (Roberts & Lean, 2008).Half of the DS roughly corresponds to the distance of a displaced object between two compared fields with no frequency bias of binary fields (Mittermaier & Roberts, 2010;Skok & Roberts, 2018).
Although the actual displacement length depends on many factors such as the sizes and shapes of an object (Skok & Roberts, 2018), we use the displacement scale as a reasonable estimate of the doubled displacement length in this study.The DS can be considered directly as an estimate of displacement here, because we use percentile thresholds to calculate the scale for each combination of field comparisons.Note that the DS is based on the same definition as the "skillful" (Mittermaier & Roberts, 2010) and "believable" scales (Bachmann et al., 2020;Dey et al., 2014).
In the remainder of this article, we refer to the displacement scale of spatial error and spread as eDS and sDS, respectively.The eDS is the smallest scale at which an ensemble shows a better prediction than a random prediction.The sDS is the largest scale at which significant spatial variability is achieved among ensemble members.If the FSS never exceeds 0.5 + f 0 ∕2, the displacement scale is set to be as large as the length of the evaluation domain.However, this no longer represents the scale of misforecasts-precipitation events are likely to be missed or false-alarmed in this case.For this reason, we examine the ensemble displacement scales using the median rather than the mean.

Classifying varying convective forcing regimes
The different convective environments during the three-month period are classified objectively with the convective adjustment time-scale ( c ) diagnostic (Section 3.1).After excluding 12 days without precipitation, 80 days remain to be arranged in distinct convective forcing regimes.To be consistent with the original publication on the trial experiments (Puh et al., 2023), we use the  c diagnostic in a relative sense and group days with daily mean  c values belonging to the upper (lower) 20% of the distribution throughout the summer season into the weak (strong) convective forcing regime, respectively.This results in 16 days falling into each of these two categories.The remaining 48 days in the middle of the spectrum of  c values make up the intermediate category, for which a primary type of convective forcing is not unequivocally detectable.Figure 2 depicts the classification of the days along with daily accumulated and area-averaged ensemble mean precipitation amounts and radar-observed quantitative precipitation estimates.While the observed and forecast daily precipitation amounts agree reasonably, there are large variations of daily totals from day to day.Days governed by equilibrium conditions predominantly exhibit larger rainfall accumulations than nonequilibrium days.To enable a fair comparison of the spatial predictability, we need to take these amplitude changes into account.That is accomplished by using a percentile threshold in the FSS calculation as outlined in Section 3.2.1.

Spatial error and spread of operational forecasts
Having classified convective weather objectively into different categories, we can inspect the flow-dependent spatial error and spread now.The operational ICON-D2-EPS forecasts (TR ensemble) show the typical diurnal cycle during weak convective forcing, with the 95th percentile value of hourly precipitation attaining highest intensities in the afternoon (0.3 mm⋅h −1 , Figure 3a).Not surprisingly, the largest spatial errors occur on the smallest scales (high error values shown in dark colours in Figure 3a).However, the spatial error exhibits a clear diurnal cycle, having a maximum around 0900-1000 UTC at the time of triggering of moist convection (eDS approx.170 km) followed by a minimum at peak afternoon precipitation (eDS approx.70 km).Thereafter the spatial error increases again as the afternoon precipitation fades.At first sight, the spatial spread behaves similarly to the spatial error (compare colours in Figure 3a,b) but with lower values, that is, smaller spatial variability.There is a relative maximum of the spatial spread at the time of convection initiation (0800-0900 UTC) followed by a slight decrease in the afternoon during the strongest convective activity.In particular, the spread at small scales grows slower than the spatial error.In terms of the displacement scale, the sDS shows a modest temporal evolution and amounts to about 40 km, hence smaller than the eDS (the bold line is below the dashed line in Figure 3b).A gap of several tens of kilometres between the sDS and eDS is evident throughout the forecast, even after the precipitation maximum in the afternoon, which indicates that the ensemble is spatially underdispersive.Hence the ensemble forecast is spatially overconfident, that is, the spatial spread is too low compared with the spatial error.This lack of spatial spread suggests that the perturbations in ICON-D2-EPS are suboptimal and that additional sources of model uncertainty ought to be added.
Under strong convective forcing regimes, the composite precipitation time series shows a continuous decay of intensities throughout the day.The maximum precipitation intensity is six times larger (1.8 mm⋅h −1 , Figure 3e) than the maximum during nonequilibrium conditions and occurs at 0000 UTC (as in Puh et al., 2023).This temporal evolution is dominated by a few strongly forced cases with nighttime maxima (e.g., the heavy precipitation causing the flooding in the Ahr valley on July 14, not shown).The spatial error (eDS) increases almost continuously from model initialisation onwards (with the exception of a slight dip around 1200 UTC) but stays at smaller spatial scales than in the nonequilibrium regime.Similar to the eDS, the sDS increases steadily, too (Figure 3f).The proximity of spatial error and spread curves suggests a smaller spatial overconfidence compared with nonequilibrium.Moreover, the continual increase of spatial variability within the ensemble (measured by the sDS) caused by larger and larger displacements of rainfall patterns is a classic feature of error growth driven by scale interactions (Ying & Zhang, 2017).
For the intermediate-regime cases, time series of precipitation amount display a mixture between equilibrium and nonequilibrium categories in terms of amplitude and the diurnal cycle (Figure 3c,d).The evolution of the DS for error (eDS) and spread (sDS) is more similar to the equilibrium situation, but the values of the spatial error and spread are systematically larger and closer to those of the nonequilibrium regime.The weather-regime-independent spatial error and spread largely parallels the intermediate category (not shown).Hence, only a flow-dependent examination of the evolution of spatial error and spread unveils the contrasting behaviour in nonequilibrium conditions.
Overall, the spatial forecast skill is systematically higher (i.e., smaller spatial error, compare dashed lines in Figure 3a,c,e), indicating a superior forecast quality in equilibrium.Likewise, the spatial predictability is higher at most lead times (i.e., smaller spatial spread) during this particular flow pattern (compare bold lines in Figure 3f).However, there is one important exception to this generalisation.During nonequilibrium, the spatial predictability is higher at the time of afternoon precipitation (lowest sDS) at forecast lead times of 12-18 h (coinciding with time in UTC), which is strengthened by the orographic effect exerting a source of predictability (not shown).Enhanced uplift induced by orography promotes deep convection initiation and tends to organise convective cells along the orographic gradient, demonstrating a higher likelihood of convective activity in these regions.This structuring effect constrains the spatial variability of intense precipitation and increases the predictability.This effect has a greater impact in nonequilibrium regimes, where orographic triggering plays a more significant role, as observed over central Europe (Bachmann et al., 2020;Keil et al., 2020) and the British Isles (Flack et al., 2018), for instance.

Systematic impact of the PSP scheme
Building upon the flow-dependent assessment of the operational forecast system, we proceed to quantify the impact of incorporating a novel model uncertainty representation, specifically the PSP scheme, in the ICON-D2 ensemble.Usage of the PSP scheme shows a large systematic impact on the diurnal cycle of precipitation in the weak convective forcing regime (TP ensemble).Precipitation starts earlier, peak precipitation rates become stronger, and precipitation decays slightly earlier thereafter (Figure 4a,b).Note that the total daily rainfall is hardly affected by the PSP scheme (±3% relative difference).The impact on spatial error and spread is discernible from 0900 UTC onwards, once convection is initiated.For most of the time (0900-1900 UTC, including the convective most active period) the PSP scheme reduces spatial forecast errors of precipitation at scales larger than 20 km (Figure 4a).The spatial error increase after 1900 UTC is presumably linked to an earlier decay of precipitation using the PSP scheme, a known issue in applying the scheme (Rasp et al., 2018).Despite this earlier reduction in precipitation intensity, the general precipitation structure persists, leading to the time lag between the decrease in precipitation and the deterioration of spatial error.
In general, and similar to the spatial error, the impact of PSP on spatial spread shows a distinct spread decrease at scales larger than roughly 50 km between 0900-1700 UTC (Figure 4b).Strikingly, there is a steady increase of spatial spread on scales smaller than 50 km in this time window.By design, the PSP scheme inserts perturbations at the effective model resolution (about 10 km), which consequently cause increased variability in the boundary layer, impacting convective processes among others.Due to upscale error growth, this affects larger and larger scales as time goes by, and leads to a continual spatial spread increase of hourly precipitation rates at spatial scales rising from the model's effective resolution to some tens of kilometres.The reduction of spatial error and spread at scales larger than 50 km can be attributed to the strong penalty associated with complete misses in forecasts when calculating the FSS.By fostering convection initiation in members of the TP ensemble that failed to predict convection in the TR ensemble, the number of complete misses is diminished and the FSS based spatial error and spread is reduced across scales.Vice versa, this effect causes an increase in spatial error and spread across scales when convection ceases earlier in the evening using the PSP scheme.
For the other forcing categories, the PSP scheme shows characteristics similar to the weak forcing (but at reduced levels).Figure 4b,d,f indicates that the spread shows a similar behaviour across all regimes but at successively smaller amplitudes for stronger forcing conditions.Comparison of the errors (Figure 4a,c,e) shows similar changes.

ADDITIVE EFFECT OF MICROPHYSICAL UNCERTAINTY
Building upon the beneficial impact of the PSP scheme shown in the systematic assessment, we strive to inspect the influence of additional sources of model uncertainty inserted into the full CPEPS.Due to computational constraints, we limit the analysis to selected cases representative of different flow situations.

Case studies representing different flow situations
We chose two case studies from the period of the trial run in summer 2021 representing both convective forcing regimes.Among the days characterised by weak convective forcing, June 10 was one of the most typical nonequilibrium days that showed a large impact of the PSP scheme on precipitation.On that day, the atmospheric flow over Germany was characterised by weak northwesterly winds due to a small geopotential gradient.Scattered convection was triggered around noon and reached its maximum intensity around 1400 UTC.The daily accumulated rainfall exhibits a popcorn-like pattern typical for weak convective forcing (Figure 1a,b).In the evening the atmosphere was stable again, ending the well-defined diurnal cycle of convection.
In contrast, weather on June 29 represents a typical day of the strong convective forcing regime, for a number of reasons.Although June 29 is not part of the equilibrium composite of the trial period, it was one of the most representative meteorological situations of its kind, with a strong geopotential gradient, strong southwesterly flow (Figure 1c), and the largest accumulated precipitation in the whole summer period, according to radar observations (Figure 2).The development of a mesoscale convective system along the cold front in southern Germany caused an outbreak of high-impact weather with severe winds, hail, and heavy precipitation.A mean daily area-averaged value of  c amounting to less than two hours clearly classifies this day as being in equilibrium (Figures 1f and 2).This case was also part of an intensive observation period (IOP 5, June 28-30) of the Swabian Modular Observation Solutions for Earth Systems (MOSES) field campaign (Kunz et al., 2022).

5.2
Area-averaged precipitation amount and spread First, we inspect the impact of IBC, PSP, and MPP on ensemble mean precipitation and ensemble spread.Figure 5 shows ensemble-and area-averaged hourly precipitation amounts and its area-averaged spread for four experiments: the pure IBC perturbed ensemble (I), the combined IBC and PSP ensemble (IP), the IBC and MPP perturbed ensemble (IM), and the ensemble containing all uncertainty representations (IPM).On the nonequilibrium day, the onset of precipitation is earlier and the peak of precipitation intensity is enhanced when adding PSP compared with pure IBC uncertainty (Figure 5a).This is attributable to the more effective trigger mechanisms introduced by the PSP scheme.The addition of PSP also enhances the spread.Combining IBC and MPP (IM ensemble) gives a larger and prolonged spread.The spread is largest for the IPM ensemble during convection, followed by the IP and IM ensembles, respectively.The I ensemble exhibits the lowest spread and the peak is reached about one hour later than in ensembles IP and IPM.The effects of both model uncertainties, PSP and MPP, complement one another in terms of ensemble spread.During strong convective forcing the area-averaged precipitation amount is governed by IBC and only marginally influenced by any kind of model uncertainty representation.Such a dominant role of IBC in precipitation forecasts was previously shown, for instance, by Johnson and Wang (2020).Spread is hardly changed by adding PSP to IBC, and is even slightly reduced in the late afternoon (Figure 5b), while adding MPP enhances the spread considerably.Similar to the nonequilibrium case, the change in spread given by adding PSP and MPP shows a qualitatively additive character in IPM.Both cases illustrate that the synergistic impact of PSP and MPP (IPM ensemble) can be beneficial, since both model uncertainty representations partly compensate for the respective deficiencies.
To assess whether the earlier onset of convection due to PSP and the prolonged spread due to MPP are caused by their individual impact rather than the interaction with IBC perturbations, two additional ensembles are examined: the P and M ensembles.In the P ensemble, the only source of uncertainty is the PSP scheme, whereas the M ensemble contains only MPP.Both ensembles are run with one set of IBC (member 1 of the I ensemble), which does not allow for a quantitative comparison in Figure 5.However, the reader can compare these ensembles qualitatively in Figure 6.The P ensemble shows an earlier intensification of precipitation and spread (not shown), which is in line with Leoncini et al. (2010), who found that large perturbations in the planetary boundary layer lead to an earlier growth of perturbations in precipitation.The M ensemble shows a smaller but slower decay of spread at later stages of convection, as shown by Barthlott et al. (2022a).This indicates that PSP and MPP act independently at different stages during the diurnal cycle of convection.

Individual and synergistic impact on spatial error and spread
To set the scene for the scale-dependent examination of the relative influence of three uncertainty representations in the "grand ensemble" (IPM, see Table 1), we first display the time series of spatial spread due to the individual uncertainties (I, P and M ensembles) for both cases in Figure 6.The weakly forced case exhibits a typical diurnal cycle, with the most intense rainfall occurring soon after noon.The characteristic upscale error growth caused by displacements of convective cells is quantified by the spatial spread.Its temporal evolution confirms this steady increase (coloured tiles in Figure 6a).In the equilibrium regime, IBC clearly represents the dominant source of uncertainty in terms of amount (solid lines in Figure 5b) and in terms of location of heaviest precipitation (Figure 6d).The spatial spread of precipitation grows upscale shortly after model initialisation fuelled by strong nighttime rainfall and exhibits large displacements throughout the day.
The spatial spread of model uncertainty represented by pure PSP is displayed in Figure 6b,e.During weak convective forcing, the impact of PSP is in general stronger and conditioned by the daily cycle of convection.The perturbations added in the boundary layer start having an impact on precipitation at the onset of convection, when thermals start to trigger convective cells.As soon as that happens, the spatial scale of spread quickly grows upscale and reaches the maximum extent in the evening when forecast precipitation ceases.The magnitude of displacements in the P ensemble at the peak time of precipitation (1200-1400 UTC) is in parallel with the impact of the SBL scheme found by Flack et al. (2021).In strong convective forcing, the effect of PSP is comparatively small but grows continuously at a slower rate driven by continuing rainfall.
The M ensemble shows different growth patterns depending on varying convective forcing regimes.During weak forcing of convection, spatial spread grows from the smallest scale towards larger scales as for P, but the amplitude is smaller (Figure 6c).During strong forcing, MPP impact the spatial spread at small scales from the early morning onward, but this impact is largely comparable with that of the PSP scheme thereafter (Figure 6f).However, after 1900 UTC the MPP impact spreads across scales, in contrast to the PSP impact, which stays below 80 km.Compared with the impact of PSP, MPP induce similar upscale growth in the weak convective forcing case, but at a slightly later stage, after the onset of convection, since MPP only start acting when convection is ongoing, as shown in Matsunobu et al. (2022).
Finally, the impact of combining PSP and MPP with IBC on the spatial error and spread can be discussed by analysing differences of the IP, IM, and IPM ensembles from the I ensemble (Figure 7).Results are displayed for the weak convective forcing regime, as we found a larger influence in this flow situation (Section 4 and earlier in this section).To begin with, the time series of the 95th percentile of precipitation for the ensembles show again that both model uncertainties compensate each other in the afternoon (compare blue bold lines in Figure 7a,c,e), similar to the time series of spread depicted in Figure 5a.Turning towards the spatial pattern of hourly precipitation, we find that the contribution of the PSP scheme (Figure 7a,b) resembles the one discovered in the composite analysis (Figure 4a,b).The diurnal cycle of precipitation is shifted earlier by the PSP scheme, leading to a significant error reduction at the onset of convection (1000 UTC) at all scales.However, there is a slight spatial error increase in the afternoon.A similar signal can also be found in Figure 4a, discernible as slightly lighter colours at 1600-1700 UTC, presumably due to the earlier decay of convection.However, on average the PSP scheme systematically improves the spatial skill as shown in Figure 4a.The PSP impact on spatial spread, on the other hand, is systematically increasing and growing upscale from about 10-80 km after the onset of precipitation, in agreement with earlier findings (see Figure 3).The impact of adding MPP on spatial error is opposite from that of PSP.Including MPP reduces precipitation intensities on average and also indicates a detrimental impact on spatial error that is largest at the early stages of convection (1000-1200 UTC), rather than uniform across all scales (Figure 7c).Subsequently, the change in spatial error is still slightly positive on scales larger than 80 km and neutral on scales below 50 km.On the other hand, adding MPP increases the spatial ensemble variability almost throughout the forecast at all scales (Figure 7d).In this case, the up-amplitude effects of MPP are relatively independent of scale, potentially improving the location of the strongest precipitation (the 95th percentile) on scales below 50 km, but do not propagate towards larger scales as seen with PSP.
We hypothesise that this scale-independent effect of MPP is caused by the design of the MPP being fixed in space and time (i.e., fully correlated) and not being scaled in direct response to the occurrence and scales of physical processes.This is beneficial at smallest scales where spatial error is large and the ensemble is overconfident, but at the same time can deteriorate forecast skill at larger scales where IBC perturbations dominate and the spread-to-skill ratio is already converging.Nevertheless, the present implementation of MPP improves the overall spatial spread-to-skill ratio as it generally increases spatial variability, which can add value in an overconfident ensemble.A refined configuration of MPP, or a combination with other perturbations that represent spatio-temporally varying microphysical uncertainty better, ought to be pursued in future.
When comparing the contributions of PSP with those of IBC, we find that the gained spatial variability remains below 50 km, at a smaller scale than the variability obtained with IBC.This is similar to the effect of SBL in Clark et al. (2021) and Flack et al. (2021).However, PSP changes the time evolution of precipitation and largely reduces the spatial error at the time of the onset of convection.These improvements are key advantages of the PSP scheme and cannot be achieved using current postprocessing methods.On the other hand, MPP have a broad impact on the spatial spread throughout time and scales.They are systematically shifting the model climate rather than sampling the impact of random errors, as discussed in McTaggart-Cowan et al. (2022).Although this causes an increase in spatial variability, other metrics should be used to assess whether such an approach improves other aspects of the forecast, such as the bias in amplitude.
Overall, the joint impact of PSP and MPP in terms of spatial error and spread appears to be additive (Figure 7e,f), with PSP being the primary source of uncertainty.Precipitation is triggered earlier and its intensity is higher.The earlier decay of precipitation is attenuated by MPP.Spatial error is reduced at convection initiation, while spatial spread is increased across time and scales.Hence the spatial error-spread relationship is significantly improved compared with the pure IBC ensemble.These results imply that the impact of both sources of model uncertainty (that is PSP and MPP) on spatial variability of precipitation are noncorrelated and rather orthogonal, suggesting that these perturbations can work effectively together.However, alternative adaptive and statistical postprocessing methods could be used instead to represent variability induced by the PSP scheme, and might also remedy the undesired impact of MPP at scales larger than 100 km, as shown in Blake et al. (2018) and Flack et al. (2021).

CONCLUSIONS
The spatial predictability of precipitation depends strongly on the prevailing convective forcing regime.For a systematic assessment, we condense three months of operational ICON-D2-EPS forecasts into varying convective forcing categories applying the convective adjustment time-scale diagnostic.The scale-dependent aspect is assessed using spatial error (eDS) and spread (sDS) based on the FSS technique.During weak convective forcing, corresponding to the nonequilibrium regime, the spatial error and spread largely depends on the diurnal cycle of precipitation.The median eDS is about 70 km during the convectively most active period (1200-1800 UTC), whereas the sDS amounts to only 40 km.The scales between 40 and 70 km show no spatial skill and are underdispersive.During strong convective forcing, equivalent to the equilibrium regime, the large-scale flow constrains convection patterns more strongly and the gap of the median eDS and sDS becomes smaller.The spatial forecast quality is superior in this weather regime, with the eDS increasing fairly continuously from model initialisation.The lack of spatial spread suggests that the current perturbations in ICON-D2-EPS are suboptimal and additional sources of model uncertainty need to be added.The application of the PSP scheme systematically shows the anticipated beneficial impact.Whereas the total daily rainfall is hardly affected by the PSP scheme, it reduces spatial forecast errors of precipitation remarkably at scales larger than 20 km, in particular in the nonequilibrium regime during the diurnal cycle from 0900 UTC onwards.Whereas the spatial spread is increased by PSP at scales less than about 50 km, the variability is decreased at larger scales.For the other forcing categories, the PSP scheme shows similar characteristics but at successively smaller amplitudes for stronger forcing situations.The PSP scheme is effectively representing subgrid-scale uncertainty in the boundary-layer turbulence and reduces the systematic error as aspired to by stochastic parameterisations (Berner et al., 2017).
The effect of additionally including microphysical parameter perturbations is for two representative weather situations by constructing prototype "grand ensemble" ICON-D2 experiments.Univariate and multivariate IBC, PSP, and MPP ensemble simulations for representative cases allow us to disentangle individual and synergistic contributions of the sources of uncertainty.They confirm the strong flow-dependent impact of PSP, which has a significant impact on ensemble and spatial spread in nonequilibrium, while its effects are negligible in equilibrium.The time and scale of the maximum difference is at the onset of convection and at the scale of the perturbations, which is beneficial, because the operational ICON-D2-EPS struggles to introduce sufficient spatial variability in the precipitation field at that time.In comparison, IBC represent the primary source of variability in the ensemble in equilibrium.The joint impact of PSP and MPP in the presence of IBC uncertainty regarding spatial error and spread is additive, with PSP being the primary model uncertainty in nonequilibrium.Precipitation is triggered earlier and its intensity is higher.The spatial error is largely reduced at convection initiation and the spatial spread is increased across time and scales.Hence the spatial error-spread relationship is significantly improved compared with the pure IBC ensemble.The conclusions on the additive impact of MPP hold for these two cases and for this ensemble configuration.However, longer term testing of the synergistic impact of model uncertainties is required to achieve more robust results.
In summary, ICON-D2-EPS forecasts show insufficient spatial spread of precipitation compared with spatial forecast errors.This agrees with earlier findings for other CPEPSs of Rezacova et al. (2009) and Dey et al. (2014).Stratifying this shortcoming by convective forcing regimes reveals that the spatial underdispersion emerges especially in nonequilibrium conditions.The application of physically based stochastic perturbations in the planetary boundary layer reduces the spatial overconfidence considerably.This article emphasises the importance of the flow-dependent approach and supports previous results of variable predictability of area-averaged precipitation amounts (Barthlott et al., 2022b;Keil et al., 2019;Matsunobu et al., 2022) and scale-dependent results (Flack et al., 2018(Flack et al., , 2019;;Matsunobu et al., 2022).Tracing the varying sensitivity of uncertainty representations to different origins using a "grand ensemble" with many formulations of model uncertainty will be pursued in future work and is presently left as a challenging open question.

F
Characteristic fingerprint of precipitation during different convective forcing regimes in central Europe, illustrated by daily accumulation of precipitation for (a,b) a weak forcing case on June 10 and (d,e) a strong forcing case on June 29 of an ensemble member of (a,d) the TR ensemble and (b,e) radar observations.Vectors depict the wind velocity at 850-hPa pressure level.The black rectangle indicates the verification domain.The bottom row (c,f) shows hourly time series of ensemble mean precipitation (left axis) and the convective adjustment time-scale  c (right axis).

F
I G U R E 2 Time series of June, July, and August 2021 illustrating the day-to-day variability of 24-h accumulated precipitation (bars) and convective adjustment time-scale  c (dots).The colours of the dots represent weak (red), intermediate (white), and strong (blue) forcing regimes (see text for details).Green bars depict TR ensemble mean and grey bars the radar-observed daily area-averaged rainfall.

F
I G U R E 3 Weather-regime-dependent, spatio-temporal variability of ensemble mean spatial error and spread, based on 20-member operational ICON-D2-EPS hourly precipitation forecasts, averaged over (a,b) 16 weak forcing and (e,f) 16 strong forcing cases.The centre column (c,d) illustrates the intermediate cases.The black dashed lines show median eDS of spatial error and the bold lines show median sDS representing spatial spread.The blue lines indicate the 95th percentile values of hourly precipitation used in the FSS calculation.

F
I G U R E 4 Same as Figure 3, but the composite of FSS differences of TP minus TR ensembles.The blue lines indicate the 95th percentile values of hourly precipitation of TP (bold) and TR (dashed blue) forecasts.The black dashed (bold) line shows the median eDS (sDS) of the TP simulation.

F
hourly precipitation (solid) and its area-averaged spread (dashed) over Germany for (a) a weak forcing case on June 10 and (b) a strong forcing case on June 29, representing varying convective forcing regimes.

F
Spatial spread in ensembles I (top), P (middle), and M (bottom) on June 10 (left) and June 29 (right).The lines indicate the 95th percentile values of hourly precipitation of (a,d) I, (b,e) P (continuous), and (c,f) P (dashed) and M (continuous) ensembles.Masking is applied when the observed or forecast precipitation amount is below 0.01 mm⋅h −1 .F I G U R E 7 Time series of difference in spatial error (top) and spread (bottom) between combined ensembles and the I ensemble for June 10.Panels show the change given by adding (a,b) PSP, (c,d) MPP, and (e,f) both PSP and MPP.Blue (red) shading indicates a reduction (increase) in spatial error and spatial spread, respectively.The lines indicate the 95th percentile values of hourly precipitation of the reference I ensemble (dashed) and (a,b) IP, (c,d) IM, and (e,f) IPM ensembles (bold).
Overview of ensemble experiments performed to gauge the relative and synergistic impact of different sources of uncertainty: acronym, ensemble size, and perturbations.
TA B L E 1