The Contribution of Convection to the Stratospheric Water Vapor: The First Budget Using a Global Storm‐Resolving Model

The deepest convection on Earth injects water in the tropical stratosphere, but its contribution to the global stratospheric water budget remains uncertain. The Global Storm‐Resolving Model ICOsahedral Non‐hydrostatic is used to simulate the moistening of the lower stratosphere for 40 days during boreal summer. The decomposition of the water vapor budget in the tropical lower stratosphere (TLS, 10°S–30°N, and 17–20 km altitude) indicates that the average moistening (+21 Tg) over the simulated 40‐day period is the result of the combined effect of the vertical water vapor transport from the troposphere (+27 Tg), microphysical phase changes and subgrid‐scale transport (+2 Tg), partly compensated by horizontal water vapor export (−8 Tg). The very deep convective systems, explicitly represented thanks to the employed 2.5 km grid spacing of the model, are identified using the very low Outgoing Longwave Radiation of their cold cloud tops. The water vapor budget reveals that the vertical transport, the sublimation and the subgrid‐scale transport at their top contribute together to 11% of the water vapor mass input into the TLS.

2 of 16 to quantify. Only three studies from two groups provided such estimates so far, two for boreal winter (Schoeberl et al., 2014(Schoeberl et al., , 2018 and one for the full year (Dauhut et al., 2015). Two studies from another group provided estimates for two 7-day periods in both boreal summer and winter, but only in the vicinity of the tropical cold-point tropopause at the 100 hPa level (Ueyama et al., 2015. Summarizing the results of these studies, Schoeberl et al. (2014) found a 13% contribution of convection, by applying a Lagrangian model that follows air mass trajectories to reanalysis and comparing the results with and without the effect of convection. In their study, the effect of convection is deduced indirectly from the resulting adjustment of the water vapor content of the air masses in the presence of cloud along the trajectory. Such a quantification is limited by the accuracy and the resolution of the cloud height dataset as well as of the wind and temperature fields used to advect the air masses. Using the same method, Schoeberl et al. (2018) revised their estimate using an updated and more correct temperature dataset, with colder tropopause, and found a convective contribution of 2% only. Using also a Lagrangian model but with a more sophisticated, computationally expensive microphysical scheme, Ueyama et al. (2015) and Ueyama et al. (2018) found a 14% and 15% contribution of convection to the water vapor content at 100 hPa for their two 7-day periods in boreal winter and summer, respectively. The level at which their investigation is focused is located close, though slightly below the averaged cold-point tropopause. The impact of convection on the full stratospheric water vapor budget might be larger than these estimates for two main reasons: first the air masses are very close to saturation at the cold-point, limiting the hydration by convection, second they can be hydrated by convection at higher altitudes, during their whole rise through the low stratosphere. Dauhut et al. (2015) found an 18% contribution by simulating one very deep convective system with a Large-Eddy Simulation (LES) and upscaling the subsequent stratosphere hydration to the whole tropics. Here the estimate relies on the ability of the LES model to represent the hydration, on the representativeness of the study case and on the estimated frequency of very deep convective systems in the tropics, which varies with season. It should be kept in mind that such studies are highly dependent on the chosen representation of the microphysics. Large uncertainty remains in quantifying the importance of convection for the global stratospheric water budget.
Here, to take up the challenge we use a distinct approach. We take advantage of a GSRM that is able to both represent the convective-scale processes explicitly and simulate those over the full globe. The global atmosphere is simulated with the ICOsahedral Non-hydrostatic model (ICON), integrated with a grid spacing of 2.5 km during 40 days in the framework of the DYAMOND intercomparison project (Stevens et al., 2019). The convection is resolved and the vertical resolution around the tropopause, about 600 m, is fine enough to capture the small-scale mixing at the top of the overshoot and the subsequent shallow hydration patches that form above the tropical tropopause, as will be shown. Taking advantage of these two aspects, we derive for the first time a global budget of the low-stratospheric water vapor using a model that resolves convection, and provide a new estimate of the convective contribution.
In Section 2 the observational data sets, the simulation and the analysis methods are presented. Section 3 describes the observed stratospheric humidity field and assesses the ability of our model to represent this field and its variations. Section 4 presents the budget of the stratospheric water vapor to identify the origin of its temporal variations. Section 5 quantifies the contribution of the very deep convective systems to this stratospheric water budget. Section 6 discusses the implications of our investigation, trying in particular to reconcile the various estimates of the convective contribution to the stratospheric water vapor as deduced from various observations and modeling studies. Section 7 gives the conclusions.

MLS Observations
To assess the representation of the stratospheric water vapor field and its variations in the simulation, we use the observations (version v4.22) from the Microwave Limb Sounder (MLS) instrument onboard of the NASA's Earth Observing System Aura satellite (Waters et al., 2006). Microwave Limb Sounder has a daily near global coverage thanks to its near-polar orbit. We use the gridded water vapor product on the seven pressure levels located between 100 and 132 hPa, between 1 August and 9 September 2016 to match the simulation time period, as well as in eight other years to document the year-to-year variability. The vertical resolution ranges between 3.0 and 3.2 km, and the horizontal resolution between profiles ranges between 190 and 265 km. The zonal and temporal averaging of the MLS observations reduces much of the uncertainty resulting from the precision of the instrument, but the uncertainty due to its accuracy is unchanged and ranges from 4% to 9% (Livesey et al., 2017).

of 16
The recommendations for data screening given in Livesey et al. (2017) are followed. We do not apply the MLS averaging kernels to the simulated stratospheric water vapor. As shown by the study of Ploeger et al. (2013), using the averaging kernels only matters for the estimated stratospheric water vapor content at high latitudes. As our study is focusing on the tropics, using the averaging kernel is not necessary.

CERES Observations
The development of the very deep convective systems, characterized by their extremely cold cloud tops, can be monitored using spatial observations of the Outgoing Longwave Radiation (OLR). Here we use the observations from the CERES project (Clouds and the Earth's Radiant Energy System; Wielicki et al., 1996) as given in the CERES SYN1deg Edition4A product (Doelling et al., 2016). The latter is a global dataset with a resolution of 1°. It contains, among others variables, hourly averaged OLR. The radiances are measured by the MODIS imager onboard of the Terra and Aqua satellites, with daily global coverage. To account for local, hourly variations of OLR, measurements from new-generation geostationary satellite imagers are incorporated. The resolution of the CERES dataset is too coarse to capture individual overshoots, that have a width of about 10 km and a lifetime of 15 min (Dauhut et al., 2018), but the cold temperature of the anvil tops and of the rising or collapsing overshoots leaves a low-value and large-scale signature in the OLR field that allows us to detect the very deep convective systems. MODIS and geostationary infrared measurements have been extensively used to detect the overshooting tops in the past (e.g., Bedka et al., 2010;Sohn et al., 2009). The CERES product, that combines the two and provides an hourly global coverage is used here to validate the geographical distribution of the deep convective activity in the simulation.

DYAMOND ICON Simulation
The global troposphere and stratosphere are simulated for 40 days starting on 01 August 2016 with the Eulerian ICON atmospheric model (Icosahedral Nonhydrostatic Weather and Climate Model, Zängl et al., 2015), under the framework of the DYAMOND intercomparison project (DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains, Stevens et al., 2019). We use here the simulation performed with a horizontal grid spacing of 2.5 km. The vertical grid spacing ranges between 20 m at the surface and 1.8 km at 44 km altitude (top of the physical domain, below the sponge layer), and is about 600 m at the tropopause. The DYAMOND set-up is described in details in Stevens et al. (2019), while the set-up, parameterizations used and validation of this particular ICON simulation are presented in Hohenegger et al. (2020). In short, deep and shallow convection are explicitly represented, without the use of any convection parametrization. Cloud and precipitation are represented by the prognostic specific mass content of five hydrometeor species: cloud water, cloud ice, rain, snow and graupel, whose evolutions are calculated by a bulk one-moment microphysics scheme (Baldauf et al., 2011). The precipitating ice hydrometeors (snow and graupel) are assumed to have a size distribution that is exponentially decaying with particle size, and a fall speed that only depends on particle size. A comparison (not shown) to the size distributions reported by Woods et al. (2018) for Tropical Tropaupose Layer (TTL) cirrus indicates that the number concentrations are in the range of the ones observed between −90°C and −70°C for the large precipitating hydrometeors (larger than 100 μm), whereas they are too low for the smaller hydrometeors. Turbulent fluxes are represented by a turbulent scheme based on a prognostic equation for turbulent kinetic energy (Raschendorfer, 2001). Heating and cooling rates due to radiation are calculated every 15 min with the Rapid Radiative Transfer Model (Mlawer et al., 1997;Mlawer & Clough, 1998). Chemical reactions are not included, this is not an issue since the source of stratospheric water vapor by methane oxidation is active well above the investigated area that stops at 24-km altitude.
The atmosphere as well as soil moisture and temperature are initialized on 1 August 2016 at 00 UTC with the analysis from the European Centre for Medium-Range Weather Forecast (ECMWF), and then evolve freely. Sea surface temperature and sea ice are prescribed from ECMWF operational analysis. In order to account for the spin up of the atmosphere, the first day of the simulation is discarded when computing the water vapor budget described in the following section. As the model is allowed to freely evolve, the simulated fields are expected to differ from the observed ones, especially after 5 days of simulation time when much of the atmospheric predictability is lost. For this reason, in the current study, the simulated fields are not only compared to the observed fields of 2016, but also to those of eight other years (all shown in Supporting Information S1).
As seen in Lee et al. (2019), a grid spacing of 2.5 km allows representing the convective overshoots into the stratosphere and the mixing leading to the local hydration. The results are however sensitive to the chosen grid spacing. Dauhut et al. (2015) analyzed the sensitivity of the transport by one very deep convective system to the horizontal grid spacing, varying it from 100 to 1600 m. They found that the updraft properties (vertical velocity, hydrometeor content) and the stratospheric hydration start to converge at a horizontal grid spacing finer than 200 m, with 20%-25% weaker transport at kilometric horizontal grid spacing. The current study may thus underestimate the hydration of the stratosphere by the convection. Not only the horizontal resolution, but also the vertical resolution, may affect the results. Dauhut et al. (2018) investigated the processes at the overshooting tops and found that the subsequent hydration at each top is determined by its maximal overshooting altitude, using 100-m vertical grid spacing. The 600-m vertical grid spacing used here certainly undersamples the full range of overshooting depths, although it is not clear whether this will lead to a high or low bias on the estimated convective hydration.
The output of the simulation, provided at a 3-hourly frequency, is initially on a native grid made of triangular cells. In order to apply the budget decomposition described below, we first regridded the data on a Cartesian, latitude-longitude grid, that has a grid spacing of 0.1°.

Water Vapor Budget
To investigate the causes of the variations in the water vapor content of the lower stratosphere, we decompose the water vapor budget at each grid cell with the following equation, consistent with the continuity equation used in ICON to achieve mass conservation (Equation 5 in Zängl et al., 2015): where ρ is the full air density, q is the specific humidity, u, v, and w are the zonal, meridional and vertical components of the wind, respectively, and s is the sink/source term due to microphysics (sublimation, condensation, deposition) and subgrid-scale transport (turbulent fluxes and coherent flows that are finer than 0.1°, the resolution of the analysis). The first two terms in brackets on the right-hand side denote the horizontal convergence of the moisture flux, and the third one is the vertical convergence of the moisture flux, all simulated by the explicit flow on the 0.1° analysis grid. The sink/source term is computed as a residue. With such a decomposition of the water vapor budget, the variations of water vapor mass in each grid cell equals the convergence of the fluxes across the edges, plus a sink/source term that must be understood as the local variations not accounted for by the explicit flow at 0.1°. This decomposition is consistent with the equation solved in the Eulerian ICON model, and differs from the equation of conservation formulated this way: = where s Lag is the water vapor sink/ source term associated to a followed air parcel, and = + + + is the material derivative of water vapor mass. The two decompositions differ by a divergence term, that is non-zero since ICON is fully compressible. We opted for Equation 1 because it is the one used to include the microphysical water vapor variations in ICON (Reinert, 2020). In the whole study, the specific humidity and the integrals of the terms of Equation 1 are systematically converted into volume mixing ratios.

Moistening of the Stratosphere
The distributions of the low-stratospheric water vapor in the MLS observations and in the ICON simulation are shown in Figures 1a-1c as zonal and temporal averages over the simulated period. Two years are shown for the MLS observations (2016 and 2017) to illustrate the year-to-year variability and more years can be found in the SI. The simulation correctly reproduces the contrast between the moist region up to 60 hPa in the northern hemisphere and the dry region between 80 and 40 hPa in the southern hemisphere. Except for the latitudes between 15° and 35°S, where the simulation exhibits a moist bias at the tropopause, the simulation produces values in agreement with the MLS observations within 1 ppmv (within 21%). Between 10°S and 30°N and between 90 and 50 hPa, that corresponds to the region of investigation defined below, the ICON moist bias with respect to MLS observation in 2016 (2017) reaches a maximum of +0.45 (+0.25) ppmv, and +0.4 (+0.1) ppmv on average. As a matter of reference, the inter-annual variability of humidity in this region, computed as the standard deviation between the MLS average volume mixing ratio in this region between 2011 and 2019, amounts to 0.25 ppmv.

of 16
The less-than-0.5-ppmv bias with respect to the MLS observations is also much smaller than the biases generally apparent in traditional global Eulerian models: many state-of-the-art General Circulation Models (GCMs) and Chemistry Climate Models (CCMs) exhibit very large biases, about ±2 ppmv (Eyring et al., 2006;Hardiman et al., 2015). Several factors contribute to these biases, in particular the propagation of incorrect humidity values from the upper troposphere by the vertical advection scheme (Hardiman et al., 2015) and biases in tropical tropopause temperatures (Eyring et al., 2006). A common limitation of GCMs and CCMs is their coarse vertical and horizontal resolutions. Besides inaccuracy in the large scale cross-tropopause transport (Stenke et al., 2008), their representation of convection is not designed to reproduce the overshoot transport up into the stratosphere, as illustrated for one convective parametrization in Dauhut et al. (2018).
In terms of the temporal evolution of the low-stratospheric humidity (Figures 1d-1f), both MLS observations and ICON simulation show an increase between 01 August and 09 September. The observations show for 2016 a moistening of the low stratosphere of 0.2-0.6 ppmv, up to a height of 50 hPa, over latitudes spanning 40°S to 60°N. Such a moistening is part of the annual cycle of the stratospheric water vapor, and can actually be of much larger amplitude in other years, as Figure S2 in Supporting Information S1 shows with up to 1.8 ppmv in 2017 ( Figure 1e). A simple bulk computation can explain these variations solely based on a typical upward velocity w = 4.10 −4 m/s and the water vapor gradients that one can estimate from Figure 1: The moistening Δq equals − Δ . For instance, in ICON at 10°N, between 62 hPa (4.9 ppmv) and 34 hPa (4.1 ppmv) ≃ −2.10 −4 ppmv/m. For Δt = 40 days this leads to Δq = +0.3 ppmv, consistent with the water vapor variation shown in Figure 1f Above 50 hPa, and south of 40°S, the variation is negative, and is due to the upward and poleward advection of the dry phase of the lower tropical stratosphere by the Brewer-Dobson circulation. Similar bulk computations can explain why the Brewer-Dobson circulation induces less drying in the ICON simulation than in the observations. This is because the ICON initial stratosphere above 40 hPa, and south of 40°S, is slightly drier than in the MLS observations and has then smaller humidity gradients. To quantify the contribution of the tropical convection to this moistening, and more generally to the water vapor input into the stratosphere, our study focuses on a wide latitudinal band, between 10°S and 30°N, that encompasses well the Inter-Tropical Convergence Zone (ITCZ) during the simulated period (cf. Figure 3), where air enters the stratosphere from the troposphere (the head of the atmospheric tape-recorder, Mote, 1995). At these latitudes we define the tropical lower stratosphere (TLS) as the 3-km deep region above the tropopause: between 17 and 20 km altitudes. This region is where the agreement between observations and simulation is best. The altitude range corresponds approximately to the observational pressure levels between 90 and 50 hPa. In this region and over the simulated period, the water vapor mass increase is +29 Tg in the ICON simulation and +12 Tg (+28 Tg) in MLS observations in 2016 (2017) (Figures 1d-1f). The interannual variability of this moistening is very large ( Figure S2 in Supporting Information S1), with an average and a standard deviation both equal to 11 Tg over 9 years 2011-2019. The ICON simulation lies in the upper range, typical of observed years like 2017 that exhibits a large TLS moistening between 1 August and 9 September.
The following section investigates the origin of the TLS moistening by decomposing the water vapor budget to shed some light on the contribution by the convergence of the moisture fluxes and by the sink/source term.

Origin of the Moistening
The decomposition of the water vapor budget following Equation 1 allows us to quantify the contributions of moisture flux convergence and of the sink/source term to the net variation of the simulated water vapor. The terms of Equation 1 are zonally averaged and integrated over the course of the simulation (Figure 2). The resulting water vapor mass variations per unit volume are converted into volume mixing ratio variations in order to facilitate the comparison with the background values. Mean residual meridional and vertical velocities are overlaid in order to visualize the low-stratospheric circulation, and to assess the strength of the Brewer-Dobson circulation as it drives the large-scale water vapor transport. The distribution of the mean residual vertical velocity at 70 hPa in ICON (near 18.5 km altitude) is in very good agreement with the climatological one in August, as given in Figure 4 of Butchart (2014). The values are upward up to 0.3 mm/s between 20°S and 40°N and downward elsewhere, down to −0.5 mm/s around 50°S. The moistening at the tropopause, below 17 km altitude, is first due to the vertical convergence of the moisture flux ( Figure 2b) with a strong maximum between 5° and 30°N, extending up to 19 km (around 470 K potential temperature). This region is situated at the same latitudes as the ITCZ, characterized by deep convection producing very cold cloud tops and very low OLR (Figure 3c). Part of the strong moistening by the vertical flux is compensated by horizontal divergence, which actually reaches its largest values below 17 km as well. This is expected as the horizontal moisture flux redistributes the water vapor brought there by the vertical moisture flux. In contrast, the sink/source term (Figure 2c) leads to a dehydration between 20°S and 30°N. The sink/source term is interpreted as the tendency resulting from all processes occurring at scales finer than 0.1° (about 10 km), the resolution of the regridded output. Theses processes include from the finest to the largest scales: the microphysical processes, namely the sublimation and the condensation either by deposition or by nucleation, the turbulent mixing, and the mixing by coherent circulations finer than 0.1°. The dehydration around the tropopause is due to the condensation of the water vapor exceeding the saturation. This dehydration largely compensates the vertical convergence of the moisture flux and explains that, despite the strong maximum in the vertical moisture flux convergence, the net variation does not exhibit such a strong maximum. It matches the concept of cold trap, where moist air enters the stratosphere in the coldest regions where it is strongly dehydrated by freeze-drying processes (condensation). As will be shown and in line with previous studies, while the cold trap is efficient to remove water vapor around the tropopause, convection is able to bring water vapor above it.

of 16
Between 17 and 20 km, the sink/source term adds water vapor. In the next section, the contribution of convection to this TLS moistening is quantified, after having demonstrated the ability of the ICON model to simulate the very deep convective systems responsible for this hydration. The pattern of the horizontal and vertical moisture flux convergence terms is more noisy but together they also contribute to the moistening of the TLS. Quantifying the contributions (see the upper row of Table 1) the sink/source term contributes 9% to the moistening of the TLS, the vertical one 128% and the horizontal one −37%. As the integral of the moisture flux convergence over the box equals the flux across the edge of the region (divergence theorem), the moistening by the vertical moisture convergence is a direct result of the upward flux at 17 km altitude, also because the vertical flux at 20 km is small in comparison. In contrast, the integral of the horizontal convergence is due to the exchanges across the region boundaries at 10°S and 30°N and indicates a net poleward export of water vapor.

Contribution of the Very Deep Convective Systems
The objective of this section is to isolate the contribution of the very deep convective systems to the stratospheric water vapor budget. The first step is to define a threshold, here based on OLR, to detect the very deep convective systems. Then the water vapor tendencies associated with the very deep convective systems will be explained and quantified.

Detection of the Very Deep Convective Systems
The ability of the ICON model to simulate deep convection is shown in Hohenegger et al. (2020). In particular, the simulation produces precipitation and top-of-the-atmosphere radiation fluxes averaged over −30° to 30°N and over the full simulation period that match the observed values within 8% (see Table 2 in Hohenegger et al., 2020). The center of mass of the Intertropical Convergence Zone (ITCZ) is simulated in a very good agreement with observations over both the Atlantic and Pacific oceans, although it is slightly wider (by less than 1°) than observed in 2016. The ability of ICON to simulate the deepest convection on Earth is assessed here using maps of the first percentile of the OLR time series at each grid point (a metric of the minimal OLR values). This quantity will nevertheless not be used to define the very deep convective systems, for which we rather select a fixed OLR threshold as detailed just after.
The very deep convective systems are the few deepest systems whose tops reach the lower stratosphere. These tops are overshoots. Only the systems supplied with very unstable air at the surface and internally organized so that this air experiences only little dilution by mixing with environmental air during its ascent are able to produce such overshoots into the stratosphere (Dauhut et al., 2016). Iwasaki et al. (2010) and C. Liu and Zipser (2015) documented the climatological distribution of these systems based on a combination of CloudSat satellite radar and Caliop satellite lidar observations, and on TRMM satellite radar observations, respectively. Rysman et al. (2017) characterized their microphysical profiles using the spaceborne Microwave Humidity Sounder. These systems can be identified from space thanks to their cold cloud tops, and the associated very low longwave radiation exiting the atmosphere aloft. The first percentile of the OLR time series at each grid point allows us to spot the low-OLR regions where these very deep convective systems developed, both in the CERES observation for 2016 and 2017 as well as in the ICON simulation (Figure 3). The simulated OLR first percentiles are slightly larger than the observed ones in 2016 (by 10-20 W/m 2 in the West Pacific), but in fair agreement with the observed ones in 2017. This is true over the Asian monsoon region in particular, which indicates that ICON deepest monsoon convective systems are not deeper than the observed ones. This suggests that ICON is neither producing too deep monsoon convection, which could have led to a too high injection of convective water into the stratosphere over this region. The obtained larger OLR for the 2016 comparison is consistent with the too low frequency of the lowest IR brightness temperatures as found in Senf et al. (2018), who used the same model and the same period, but integrated the model over the Tropical Atlantic only. The geographical distribution of simulated OLR is in excellent agreement with the observations from the 2 years and with the climatologies of Iwasaki  Note. The budget is computed between 10°S and 30°N, between 17 and 20 km altitude, and integrated between 2 August and 8 September included. Last column is the water vapor input calculated as the sum of the vertical convergence and the sink/source terms. The percentage is with respect to the water vapor input for all points.

Table 1 Contribution of the Different Budget Terms to the Simulated Net Variation of Water Vapor (in Tg), for All Points (First Line) and Over the Very Deep Convective Systems Only (Second Line)
and C. Liu and Zipser (2015). The simulation reproduces the deepest convection on Earth over the regions where it has been observed.
At the top of the very deep convective systems, the water vapor volume mixing ratio is anomalously large with more than 6 ppmv between 17 and 18 km (Figure 4a). Just below, the air is particularly dry, with less than 4 ppmv. This is explained by the processes occurring inside the overshoots: the overshoots are extremely cold because of the strong adiabatic cooling during their ascent. As they are colder than their environment, they are also drier, that is, they contain less water vapor. Most of their water content is in the ice phase. At their top, the entrainment of subsaturated stratospheric air warms the cloud, leads to intense sublimation of the ice crystals and the development of a moist anomaly (Dauhut et al., 2018). The vertical velocity at the top of the very deep convective systems (Figure 4c) is at least one order of magnitude larger than in the other regions, reaching 300 m/h on average up to 21 km.

Stratosphere Hydration by Convection
Once the very deep convective systems are detected, it is possible to visualize their impact on the stratospheric water vapor field. Figure 5 illustrates the injection of water by some very deep convective systems into the stratosphere, at 19-km altitude. Note that water vapor anomalies can be clearly seen at 19 km. As shown in Hassim and Lane (2010) and Lee et al. (2019), the water vapor anomalies lie at higher altitudes than the overshooting tops, typically 0.5 up to 1.5 km above because of top entrainment of stratospheric air (Dauhut et al., 2018) or hydraulic jump (O'Neill et al., 2021). Also individual overshoot reaching as high as 19 km do exist (e.g., C. Liu & Zipser, 2015), although they are extremely infrequent. In Figure 5 the propagation of the hydration patches injected by the very deep convective systems over the West Pacific on 5 August (labeled 1), over the border region between Pakistan and India on 8 August (labeled 2), and over Halong Bay and Hainan coasts on 11 August (labeled 3), can be visually tracked. The strong easterlies in the low stratosphere quickly advect the hydration patches away from their genesis location, over large-OLR regions. Diffusion by turbulence, itself fostered by the vertical shear of the horizontal wind, decreases the water vapor volume mixing ratio of the hydration patches, from more than 11 ppmv shortly after the injection to about 8 ppmv 3 days after. Similar advection and decrease of the water vapor anomalies shortly after the injection were already reported by Dauhut et al. (2015) and Lee et al. (2019).

Contribution to the Water Vapor Budget
Summing up the previous results, ICON produces water vapor variations in line with the observations of boreal summer (Figure 1), its TTL vertical velocities are consistent with the ones reported in this region (Figure 2), the model produces the deepest convection where it is observed (Figure 3), it produces overshoot cloud tops and the associated dry and moist anomalies in line with previous studies (Figure 4) and finally the moist patches produced downstream of the overshoot injections are as expected ( Figure 5). In short, the ICON model is able to reproduce the key phenomena at play, as well as their overall impact. We can thus now assess the contribution of convection to the water vapor budget based on this ICON simulation.
By integrating the terms of the water vapor budget over all time steps, between 10°S and 30°N, and plotting them as a function of OLR, it is possible to disentangle the contribution of the different OLR regions to the stratospheric water budget (Figure 6). Here the focus is on the very deep convective systems, defined above as the regions with OLR lower than 90 W/m 2 . The OLR coordinate in Figure 6 is used to primarily distinguish these very deep convective systems from the other regions, but it should not be read as a proxy for the horizontal distance from the overshooting tops. As illustrated in Figure 5, intermediate OLR values like 110-180 W/m 2 can also correspond to regions far away from the very deep convective systems. Figure 6 indicates that, in contrast to the bin averages shown in Figure 4, the few points at very low OLR contribute significantly to the different budget terms, despite their much lower frequency (cf. Figure S3 in Supporting Information S1).
The very deep convective systems do not lead to an obvious moistening (Figure 6d). In the stratosphere above them, the hydration by microphysics and subgrid-scale transport up to 19 km (Figure 6c) is compensated by the vertical moisture flux (Figure 6b) that transports part of this hydration upward, between 18 and 20 km. No net hydration is visible in these regions because of the efficient transport out of the very deep convective regions  Figure 4 that shows bin averages, here bin contributions are shown, weighting each bin by its frequency (cf. Figure S3 in Supporting Information S1). Values are in ppmv.
by the divergent horizontal moisture flux (Figure 6a): this corresponds to the fast advection and spread of the hydration patches by the stratospheric winds, visible in Figure 5. In contrast, in regions with slightly larger OLR, between 90 and 110 W/m 2 , a net moistening is visible and can be explained by the convergence of the horizontal moisture flux. Table 1 summarizes the contribution of the very deep convective systems (OLR lower than 90 W/m 2 ) to the net moistening of the TLS. The net water vapor mass variation above the very deep convective systems corresponds to an increase of +0.57 Tg of water vapor, that is, around 3% of the water vapor mass increase in the whole TLS. The sink/source term actually adds 4.0 Tg of water vapor. The reason why most of the hydration by the sink/ source term does not translate into a net increase of water vapor comes from the strong horizontal divergence of the moisture flux above the very deep convective systems that contribute to −2.6 Tg.
In order to differentiate the impact of convection itself from the impact of the diverging stratospheric winds (advection and spread of the hydration patches), we define here the stratospheric water vapor input as the sum of the vertical moisture flux convergence term and the sink/source term. Comparing the input of +3.2 Tg in the regions with OLR lower than 90 W/m 2 , attributed to the very deep convective systems, to the input over the full TLS gives a 11% contribution of the very deep convective systems to the input of water vapor into the stratosphere.
At large OLR, the net moistening between 17 and 20 km visible in Figure 6d is due to the horizontal convergence of water vapor and the sink/source term (Figures 6a and 6c). The contribution of microphysics to this latter term is expected to be small, as the ice concentrations are very low in this range of OLR values, cf. Figure 4b. The moistening of these regions are thus due to the horizontal transport by the flow and the subgrid-scale transport including the turbulent fluxes. The spread of the hydration patches-visible in Figure 5, Figure 2 of Dauhut et al. (2015) and Figures 4 and 6 of Lee et al. (2019)-is an illustration of such transport that can occur thousands of kilometers away from the very deep convective systems, over regions with large OLR (white-hatched areas in Figure 5). This moistening by the horizontal moisture fluxes and the sink/source term is partly compensated by the vertical moisture flux that is divergent there (Figure 6b).

Discussion
The decomposition of the water vapor budget in the TLS and the integration of its terms over the very deep convective systems allowed us to quantify the contribution of convection to the water vapor input in the TLS. Our result, a 11% contribution, lies in the range of the estimates by previous studies. It is lower than the estimate of 18% made by Dauhut et al. (2015), who upscaled the stratospheric water vapor input from one very deep convective system. It is however not as low as the last estimate of Schoeberl et al. (2018) of 2%. The latter is computed for boreal winter, and as the difference in the stratospheric humidity between Lagrangian simulations considering and not considering the transport of water by convection. Using this method, Ueyama et al. (2015) and Ueyama et al. (2018) computed the convective contribution to the water vapor at the very specific 100 hPa level. Their estimates of 14% and 15% for boreal winter and summer, respectively, are slightly larger than our estimate.
The differences between the estimates can result from differences in: (a) the time period considered, (b) the level considered, (c) the methods used to get the estimate, and (d) the models used. The different time periods considered could explain that the estimate in Schoeberl et al. (2018) is lower than in our study. They considered a boreal winter period. Given the compilation of observations by C. Liu and Zipser (2005), the climatological abundance of overshooting convection during the month of August, investigated here, is larger than during boreal winter, but certainly not by a factor of more than 5 that would be needed to reconcile the estimates. Also, the two estimates from Ueyama et al. (2015) and Ueyama et al. (2018) suggest a very limited seasonal variation of the convective contribution to the low-stratospheric water vapor, with the caveat that these latter two estimates are based on short time periods (7 days). Especially for the comparison to the Ueyama et al. (2015) and Ueyama et al. (2018) estimates, the different levels considered could be a plausible explanation as they only considered the 100 hPa level. However, as stated in the introduction, considering only the 100 hPa should bias the estimate low, not high, as found here with our estimate. Furthermore, Ueyama et al. (2015) and Ueyama et al. (2018) ran diabatic back trajectories to convection, using monthly averaged heating rates inconsistent with the wind fields. Finally, about one third of their back trajectories end up in the stratosphere, where they set them to MLS water vapor values. This implicitly ties their results to MLS, so uncertainty remains about the ability of their simulation to quantify the water vapor variations. This suggests that the different methods used to get the estimates might explain much of the differences as, in contrast to these other studies, we computed here a full budget of water vapor in the low stratosphere. Precisely isolating the reason for the differences between the various estimates is beyond the scope of this study and would require an international concentrated effort, for example, defining an experiment where the hydration would be computed using the different methods on the same field/time period. Finally, the differences in the model used, and especially in their representation of the microphysics, is also crucial. A way to evaluate this uncertainty would be to apply our analysis to the other storm-resolving models participating in the DYAMOND intercomparison project, something left for future work.
Besides the representation of microphysical processes, there are other sources of uncertainties associated to our results. As mentioned in Section 2, whereas the horizontal grid spacing of 2.5 km may lead to underestimate the convective injection of water vapor into the stratosphere (Dauhut et al., 2015), the use of a vertical grid spacing of 600 m may lead to an over-or underestimation of the convective hydration of the stratosphere. Also, the convective contribution of the water vapor input into the stratosphere is deduced from the sink/source term of the water vapor budget (Equation 1), a term that is computed as a residue. Using in-line budget of the condensation/ sublimation processes would provide a more direct quantification. Finally further uncertainties arise due to the shortness of the simulated period. Doing similar quantification with other GSRM and using longer simulations, simulations that now become available, would help reduce such uncertainties.
Our analysis also draws to the attention the fact that deducing the convective contribution to the water transport into the stratosphere from the anomalies of humidity observed above the tropical very deep convective systems (like the ones in Figure 5) can be misleading. Indeed, whereas the local humidity increases at the top of the very deep convective systems correspond to +0.6 Tg in the simulation, that is, 3% of the water mass increase in the full TLS (Table 1), the budget indicates that this does not reflect the 11% contribution of convection to the water mass input. The difference is primarily due to the horizontal moisture fluxes, which efficiently advect the water vapor anomalies out of the very deep convective systems (Table 1 and Figure 5). This strong ventilation contrasts with the paradigm of containment, that holds at the scale of the Asian monsoon anticyclone, that is, at much larger scale than the typical size of the moist anomalies due to individual convective injections (about 100-km wide). The strong ventilation explains why a relatively limited increase of water vapor (less than +8 ppmv) is found in the stratosphere over the individual tropical very deep convective systems, as underlined in E. J. Jensen et al. (2020).

Conclusions
We used a simulation of the global atmosphere performed with the ICON model at a grid spacing of 2.5 km to assess the contribution of deep convection to the water vapor budget of the lower stratosphere. For the first time, global estimates of this contribution can be made without having to rely on a convective parameterization or using a trajectory model. The considered period lasts 40 days, from 1 August to 9 September 2016. The simulation reproduces the structure of the zonal and temporal average of the water vapor field. It reproduces as well the moistening of the stratosphere observed over the simulated period, with a too large amplitude compared to the observations in 2016, but in better agreement with observations from other years like 2017.
We quantified the budget of the water vapor in the TLS and disentangled its different contributions. The main source of hydration comes from the vertical fluxes of water vapor that converge up to 19 km altitude, with a net maximum localized above the ITCZ between 5° and 30°N. The hydration due to these vertical moisture fluxes is compensated locally by the horizontal moisture fluxes that redistribute the humidity southwards and northwards. Finally, the sink/source term due to the microphysics and mixing by the smaller-scale circulations subtracts water vapor from the tropopause up to 17.5 km and adds water vapor above, spreading the hydration vertically up to 20 km. For the TLS and for the simulated period, this translates into an increase of +27.3 Tg water vapor by the vertical moisture fluxes, −8.0 Tg by the horizontal ones and +2.0 Tg by the sink/source term.
To quantify the contribution of the very deep convective systems to this hydration we first identified them by their associated very low OLR. Their distribution is in excellent agreement with observations. The lowest OLR regions, with OLR lower than 90 W/m 2 , exhibit typical characteristics of very deep convective systems that overshoot into the stratosphere: significant ice content (larger than 1 eq. ppmv) and a positive water vapor anomaly (larger than 6 ppmv) above the tropopause. The computation of the water vapor budget based on the ICON simulation indicates a 11% contribution of the very deep convective systems to the water vapor input into the stratosphere. This water vapor input includes both the direct hydration by the overshoots (i.e., the sink/source term) and the vertical convergence term. The seasonal and interannual variability of this estimate remains to be investigated as well as the sensitivity of our results to the representation of the microphysical processes. 81% of the water vapor input by the very deep convective systems is directly transported away from these systems by the horizontal moisture fluxes, decreasing the amplitude of the stratospheric water vapor anomalies. Our results underline that, despite the relatively small increase of stratospheric water vapor concentration observed above the tropical very deep convective systems a full-budget computation greatly helps to quantify the contribution of convection to the stratospheric water vapor input.

Data Availability Statement
The Aura-MLS data are archived online (https://mls.jpl.nasa.gov/). The CERES SYN1deg product is available on the CERES data portal (https://ceres.larc.nasa.gov/data/). Model output supporting the conclusions of this article are archived by the German Climate Computing Centre (DKRZ) and made available through the ESiWACE project webpage (https://www.esiwace.eu/services/dyamond). The scripts for the analysis will be available online (http://hdl.handle.net/21.11116/0000-0007-989A-0) upon publication.