Recreating Observed Convection‐Generated Gravity Waves From Weather Radar Observations via a Neural Network and a Dynamical Atmospheric Model

Convection‐generated gravity waves (CGWs) transport momentum and energy, and this momentum is a dominant driver of global features of Earth's atmosphere's general circulation (e.g., the quasi‐biennial oscillation, the pole‐to‐pole mesospheric circulation). As CGWs are not generally resolved by global weather and climate models, their effects on the circulation need to be parameterized. However, quality observations of GWs are spatiotemporally sparse, limiting understanding and preventing constraints on parameterizations. Convection‐permitting or ‐resolving simulations do generate CGWs, but validation is not possible as these simulations cannot reproduce the CGW‐forcing convection at correct times, locations, and intensities. Here, realistic convective diabatic heating, learned from full‐physics convection‐permitting Weather Research and Forecasting simulations, is predicted from weather radar observations using neural networks and a previously developed look‐up table. These heating rates are then used to force an idealized GW‐resolving dynamical model. Simulated CGWs forced in this way closely resembled those observed by the Atmospheric InfraRed Sounder in the upper stratosphere. CGW drag in these validated simulations extends 100s of kilometers away from the convective sources, highlighting errors in current gravity wave drag parameterizations due to the use of the ubiquitous single‐column approximation. Such validatable simulations have significant potential to be used to further basic understanding of CGWs, improve their parameterizations physically, and provide more restrictive constraints on tuning with confidence.

Despite the importance of CGWs in climate and seasonal prediction, they remain largely unresolved in global prediction models, and their forcings on large-scale circulations must be parameterized (Bushell et al., 2022;Richter et al., 2020).The sparsity of quality observations of CGWs has prevented development of quantitative constraints on parameterizations (Alexander et al., 2021;Lee et al., 2022).As a result, these parameterizations are highly simplified using numerous idealizations and typically tuned to minimize a handful of global error metrics depending on the application (Richter et al., 2022).Instead of using observations to further fundamental understanding of CGWs and improve parameterizations, convection-permitting and -resolving simulations do internally generate CGWs and could be used.However, such simulations cannot reproduce the timings, locations, and intensities of actual convective sources, preventing validation of such simulations against the few CGW observations that exist.Without validation of such simulations, it is difficult to make progress in CGW research with confidence.
Here, a recently developed method is used to force an idealized GW-resolving model with reasonably realistic diabatic heating at the correct locations and times in order to have a chance at simulating CGWs in a way that can be directly compared with observations following Grimsdell et al. (2010), C. Stephan and Alexander (2015), C. C. Stephan et al. (2016), andBramberger et al. (2020).This diabatic heating is predicted from weather radar observations of actual cases.Two methods are used to predict diabatic heating: the previously developed look-up table (LT) method of Bramberger et al. (2020) and a new simple neural network (NN) model.This radarderived heating is then provided to a GW-resolving idealized configuration of the Weather Research and Forecasting (WRF) model, which responds dynamically to the diabatic forcing in all ways the non-linear dynamical core and resolution allow.This method is tested against Atmospheric InfraRed Sounder (AIRS) and Project Loon super-pressure balloon observations in two cases.These two cases highlight the methods' abilities to reproduce observed CGWs.Previous work suggested the gravity wave spectrum above convection in WRF simulations was only modestly sensitive to the choice of microphysics parameterization (C.Stephan & Alexander, 2015), while the depth and strength of the convective latent heating are key determinants of the gravity wave spectrum (Bramberger et al., 2020).Our study also addresses how sensitive the CGW are to LT and NN methods based on specific locations/conditions.The overall method to simulate actual cases of CGWs, the two tools used to predict convective diabatic heating, and the training data sets used for both tools are described in Section 2. The skill of the look-up table and NN models in predicting WRF-simulated diabatic heating profiles is presented in Section 3. Idealized model runs forced with the different diabatic heatings are then performed and compared to two cases of observed CGW: One observed by AIRS and one with Loon super-pressure balloon data in Section 4. Finally, Section 5 is a discussion of the results and conclusions.Details on the accessibility of data, NNs, WRF source codes, and analysis codes are given in Section 6.

Overall Summary of the Method
CGWs are simulated within an idealized WRF configuration solely forced by convective diabatic heating.This diabatic heating, Q, is derived from the Multi-Radar, Multi-Sensor (MRMS) data set, which merges numerous radar-derived quantities from all weather radars in the contiguous United States onto a single 0.01°latitude, longitude (≈1-km resolution) grid every 2 minutes (Zhang et al., 2016).Similar methods have been previously used to force CGWs from other weather radar data sets over the mid-latitude, Midwestern US (C.Stephan & Alexander, 2015;C. C. Stephan et al., 2016) and near Darwin, Australia (Bramberger et al., 2020;Grimsdell et al., 2010).

Training Data
Two methods are used to predict profiles of Q given radar-observed quantities: the look-up table method of Bramberger et al. (2020) and a neural network (NN) method developed here.While radar reflectivities provide observations of falling convective precipitation, there are no observations of Q for training the methods.To work around this issue, the two methods are trained on full-physics, realistic convection-permitting (Δx = 2-km, Δz < 500-m resolution) WRF simulations of observed convective events.Within these simulations, the two methods are trained to predict simulated diabatic heating given simulated radar-observable quantities.
Two sets of full-physics, realistic WRF simulations were used for training: simulations of a case of significant deep tropical convection used by Bramberger et al. (2020) over Darwin, Australia (hereafter the Darwin run) and a simulation of typical diurnal convection over Florida (hereafter the Florida run).
The Darwin run simulated a 48-hr period, beginning 11 January 2003 at 12 UTC.The inner-most domain used a Δx = 2-km resolution, was 408 km by 408 km wide, and was run three times with three slightly different model tops, effectively producing three ensemble members of the same case.A 10-km-deep upper sponge layer was used to prevent GW reflection off the top of the domain.The tropical physics suite was used (https://www2.mmm.ucar.edu/wrf/users/physics/ncar_tropical_suite.php).Initial and boundary conditions were provided by the ERA-Interim reanalysis.All three "ensemble members" of this case were included in the training and are together referred to as the Darwin run.The outer 20 km of the 2-km resolution domain were excluded from training, as were the first 12 hr of the simulations while initial imbalances dissipate and convection becomes well-developed.For complete details, see Bramberger et al. (2020).
The Florida run was completed as part of this work using WRFv4.4.A single Δx = 2-km resolution domain was set up, with initial and boundary conditions from the ERA5 reanalysis (Hersbach et al., 2020).The domain was 1,200 km by 1,200 km wide, had a top at 1 hPa (z ≈ 45 km), and 110 vertical levels resulting in a nearly constant resolution of Δz ≈ 500 m above the tropopause.A 10-km-deep upper sponge layer was again specified.The tropical physics suite was again used.The period simulated was 72 hr, beginning 14 June 2018 at 12 UTC.Given large difference in resolution between the forcing reanalysis used for boundary conditions (Δx ≈ 31 km) and WRF (Δx = 2 km), the outermost 200-km of the domain were excluded from training.The first 12 hr of the simulation were also excluded.
To train the two methods described below, simulated radar quantities (i.e., inputs) and diabatic heating profiles (outputs) were paired at each grid point and time, but only for convective grid points.Grid points were deemed convective if the simulated rain rate exceeded 1 mm (10 min) 1 .In the Darwin and Florida runs, 1558031 and 180247 convective grid points were extracted, respectively.

Look-Up Table
The look-up table (LT) used here was the same as used by Bramberger et al. (2020).Briefly, to create their LT, convective grid points were binned by rain rate (RR) and echo top height (ET).Simulated diabatic heating profiles were then averaged within the simulated RR and ET bins.Then, given a RR and ET, a diabatic heating profile, Q (z), is predicted via 2-D linear interpolation.The LT used here was trained only on the Darwin run, referred to as "DALT" in the figures.

Neural Networks
The LT method likely introduces errors due to the averaging applied within RR and ET bins, the dimensions of which are imposed.Additionally, it is not straightforward to expand the look-up table to take advantage of additional radar-observable quantities.Neural network architectures, and machine-learning methods in general, can provide a few advantages over a LT method.For example, NN training provides a flexible framework to increase the number of input quantities and more fully make use of available data.Additionally, averaging or compositing of heating profiles over RR and ET is not imposed, which may allow NNs to be more sensitive to input variables and distinguish between different diabatic heating regimes.Finally, the inherently non-linear nature of using an NN for prediction has potential in to increase skill by being better able to represent the complex structures of heating profiles.Here, five radar-observable quantities were used to predict diabatic heating profiles at a given point: radar reflectivities at 0 C, 10 C, and 20 C isotherms in addition to RR and ET used by the LT method.Prior to use with the NNs, all input variables and diabatic heatings were demeaned and then normalized by their standard deviations.
Here, a 40-neuron-wide, 6-layer-deep fully connected NN with a hyperbolic tangent activation function was used to predict diabatic heating profiles gridpoint by gridpoint.Given the two sets of simulations to train on, three NNs were trained to predict diabatic heating: one trained on the Darwin run only, one trained on the Florida run only, and one trained on both, represented by "DANN," "FLNN," and "DAFLNN," respectively.The DANN was trained on all Darwin run convective grid points.The FLNN was trained on 90% of the Florida run convective grid points.The DAFLNN was trained on convective grid points from both simulations.Given the much smaller number of convective grid points in the Florida run, the Florida run profiles were duplicated until the number of Florida profiles was equal to the number of Darwin profiles to avoid data imbalance.A mean-squared error (MSE) loss function and a learning rate of 0.005 were used for training.Weights were updated after every batch of 10,000 input-output pairs.Training continued until the epoch-accumulated MSE reduced by less 0.01%.These three NNs trained on the two training sets allow some inference of how generally applicable a NN trained on a single case of deep, tropical convection (e.g., the Darwin run) might be when used, for example, on a case of subtropical convection over the southeast US.
Limited hyperparameter optimization was performed in this problem.An NN with double the neurons (80 neurons, 6 layers) and an NN with an extra two layers (40 neurons, 8 layers) were trained on Darwin run profiles to predict a subset of convective profiles, also from the Darwin run.Changes in validation profiles (similar to Figure 2, not shown) were minute, so the 40 neuron wide, 6 layer deep NN architecture was chosen.Further hyperparameter optimization is left to future work.

Evaluations of Diabatic Heating Predictions
The four methods of predicting diabatic heating are tested against the 10% of the Florida run profiles withheld from training.These withheld profiles were compiled by first binning all of the Florida-run convective grid points into RR bins of 5 mm (10 min) 1 and then withholding a randomly chosen 10% of the profiles in each RR bin for testing.This process ensures the RR probability density function of the testing data is the same as in the training data and also ensures that the rarest, but most important profiles with the highest rain rates do not all end up being withheld from training.Rain rate is a good proxy for the magnitude of the diabatic heating above, which forces CGWs.Note that the two NNs that include profiles from the Florida run in training are being evaluated against Florida run profiles withheld from the same simulation.
WRF-simulated diabatic heating profiles and predictions from the four methods are shown for five randomly chosen profiles within the five RR bins in Figure 1.By eye, the NNs predict WRF-simulated Q similarly.The DALT predictions are somewhat distinct, being more smooth in the vertical, which might be expected given the averaging inherent in the LT method.Encouragingly, all of the NNs represented the negative heatings near the surface due to evaporative cooling in the smallest RR profiles (Figure 1a), whereas the DALT did not.While the DALT did not represent this feature here, look-up tables can be constructed to represent it (Lang & Tao, 2018;Tao et al., 2019).

Profiles of mean absolute error (MAE) and bias (MAE
and Q is diabatic heating) validation statistics are presented in Figure 2. Here, the bin-mean WRF-simulated diabatic heating profile is shown in black for reference, averaged over the number of profiles given in each panel title.In the smallest RR bin, all methods perform the worst, with MAE significantly larger than the bin-mean diabatic heating.This lack of predictive skill may be due insufficient information within the input quantities.Also, at these low RRs, not all of the profiles might be convective in nature, leading to errors when trying to predict a non-convective diabatic heating profile.At larger rain-rates (Figures 2b-2e), diabatic heatings are much larger and all methods perform much better, with MAEs smaller than the mean heating rates.
Figure 2 allows the predictive skill of the Darwin-trained LT and the Darwin-trained NN to be compared.Across all larger RR bins with more of a signal to predict, the two methods have very similar performance.Perhaps the DALT has slightly better skill than the DANN, with incrementally higher MAE by the DANN near the diabatic heating maxima apparently due to a weak bias in heating.However, the NNs perform notably better than the DALT for the smallest RRs, with smaller MAEs and biases in the lower half of the troposphere.Perhaps this is a reflection of the NNs' abilities to better represent more complex profiles of heatings, due to less averaging or compositing of the majority of profiles at these smaller RRs used in training, or a result of more information about the profile being used as input (i.e., reflectivities at 0C, 10C, and 20C used by the NNs and not the DALT).
Comparison of the validation profiles for the DANN, FLNN, and DAFLNN allow some inferences to be made on how generally applicable a NN trained on a single case of deep, tropical convection might be.In all RR bins except the lowest, the FLNN outperforms the DANN, with MAE reduced by about 33% relative to DANN.This might not be too surprising as the FLNN was trained on the same run from which these testing data were withheld.For all but the highest RR bin, including the Darwin-run profiles in the NN training did not change the predictive skill much.However, at the highest RRs, inclusion of the Darwin profiles in training did notably increase the predictive skill of the NN.This is likely due to the fact that the Darwin run included much stronger (RRs 65+ mm (10 min) 1 ) and deeper (tropopause at z = 18 km near Darwin vs. z = 15 km over Florida) convection, having more convective grid points at these higher rain rates from which to learn.
To summarize, convective diabatic heating exhibits significant point-to-point variability and is a challenge to predict skillfully given only a handful of radar-observable quantities.Both the LT and NN methods have similar predictive skill at larger RRs.The NNs appear to be better able to predict the complex heating profiles at the smallest RRs.More representative training data (e.g., from the Florida run) increases predictive skill.Finally, as largely expected, more training data (i.e., including both runs in the training) can further increase skill incrementally.

Idealized WRF Configuration
The four tools described above were used to predict convective diabatic heating from MRMS data.Then, these heatings were supplied to the same idealized configuration of WRF used by C. Stephan and Alexander (2015) and Bramberger et al. (2020).Briefly, the 3-D super cell idealized case within WRFv3.7 was the starting point.The initialization code was modified to remove the default initial warm bubble.All physical parameterizations were disabled.WRF's "open" boundary conditions were used, designed to allow small-amplitude GWs to propagate out of the domain without affecting the interior solution.The Coriolis parameters were constant across the domain and set using a latitude of 28.5°north.The namelist parameter "pert_coriolis" was set to true to only allow the Coriolis forces to be applied to the wind speed deviations from the initial profiles.Initial profiles were taken from MERRA2 (Gelaro et al., 2017), averaged between 25 and 34°latitude, 77 and 84°longitude at times closest to the measurements of interest (see cases below).A key modification was made to the WRF variable registry, which allowed WRF to read the internal diabatic heating variable, "h_diabatic," from a file via an auxiliary input stream.The modified WRF source code, along with a diff relative to the original source code, are provided.See the Open Research section below for details.
The four tools were used to create 3-D diabatic heating files readable by WRF on the 2-km resolution WRF grid every 2 minutes.Heatings were only provided within the dashed box in Figure 3e, tapered from zero to the full amounts between the dashed and solid boxes.Additionally, the small heatings produced by the NNs above the echo top heights (e.g., Figure 1) were set to zero.These heating files were read by WRF, updating the constant diabatic heating used to force changes in temperature every 2 minutes.As WRF integrates forward in time (a Δt = 10 s was used), WRF's dynamical core responds to this heating in every way the governing equations and resolution allow.Convective updrafts and compensating subsidence are forced.All mechanisms that generate CGWs (i.e., diabatic heating, obstacle, mechanical oscillator) act to the extent possible, as forced by the provided diabatic heating.

The Case of Interest
In order to evaluate the idealized WRF simulations, an attempt was made to reproduce CGWs observed by the Atmospheric InfraRed Sounder (AIRS) in the stratosphere.Brightness temperature perturbations from AIRS radiance measurements averaged over 42 channels with wavelengths near 4 μm and 2 channels with wavelengths near 15 μm are shown in panel (a) of Figures 3 and 4. For details on the brightness temperature products, see Hoffmann et al. (2013Hoffmann et al. ( , 2014Hoffmann et al. ( , 2017)).Vertical observational filter kernels, averaged over all channels included in each product, are shown in Figure 5, which depict the relative importance of different altitudes in emitting radiation at the selected wavelengths to the AIRS sensor.The 4 μm channel set is most sensitive to stratospheric temperature perturbations at about 30-40 km of altitude.The 15 μm channel set is most sensitive at about 40-45 km.Note the different vertical width and sensitivity of the two kernel functions.
In both of these products, small-scale perturbations within eastward-directed semicircular GWs are apparent just north of the gulf coast and over northern Florida.These observations are consistent with localized convective sources below, which was the case as seen in the MRMS lowest reflectivity mosaic at 18 UTC (2 p.m. local) on 22 July 2018 in Figure 6, valid about 40 min prior to the AIRS data being collected overhead.Earlier analyses of reflectivities indicate these two convective features initiated approximately 6 hours earlier (8 a.m.local) and so were rapidly developing up to the time of the AIRS overpass.
To simulate this case, the idealized WRF model was configured with 110 evenly spaced vertical levels extending up to z = 80 km (Δz ≈ 727 m), with a 10-km deep GW-absorbing sponge at the top.This depth was chosen in  order to cover as much of the AIRS observational kernels within a physically interpretable portion of the domain as possible.The idealized model was initialized 6 UTC, 22 July 2022 with the wind (Figure 7) and stability (not shown) profile from MERRA2 and integrated forward 30 hr in time.Four simulations were completed, forced by diabatic heatings produced by the four tools described above updated every 2 minutes.Variables were output every 10 min.

Application of AIRS Observational Filters to WRF Output
In order to validate the four runs against AIRS data, both vertical and horizontal observational filters were applied to the WRF output to approximate the brightness temperature perturbations that would be seen by the AIRS sensor viewing through the simulated atmosphere.The vertical observational filter was applied first by taking the vertically weighted average of WRF temperature perturbations (T′) using the kernels in Figure 5 as weights.Temperature perturbations were computed by first applying spatial high-pass filtering following Kruse and Smith (2015) to retain scales smaller than 500 km, similar to the high-pass filtering applied when removing background brightness temperature (T b ) from AIRS swaths (Hoffmann et al., 2013(Hoffmann et al., , 2014)).
After application of the vertical observational filter, the simulated T′ b field is still at 2-km horizontal resolution, containing small-scale, large amplitude T′ b .However, the field of view of individual AIRS footprints is ≈13.5 km × 13.5 km at nadir, increasing to 41 km × 21.4 km at the edges of cross-track scans within an AIRS swath (Aumann et al., 2003;Hoffmann et al., 2013).Cross-track scans are ≈18 km apart, leading to a slight underlap of footprints in this direction.To roughly approximate the AIRS horizontal observational filters and scanning geometries, the 2-km resolution WRF-simulated T′ b were coarsened to 16-km resolution.

WRF Validation Against AIRS
The WRF-simulated T′ b approximately visible to the AIRS sensor are shown in panels (b-e) in Figures 3 and 4. The model output time was 18:50 UTC, ≈7 min after the AIRS overpass over the region.Overall, the CGWs in WRF do resemble the CGWs emanating from the two regions of convection in the AIRS observations.Small-scale T′ b features are apparent in both the observations and all WRF simulations.The larger-scale eastward propagating GW to the east of the convective sources also closely resembles those seen in the data.The minimum and maximum T′ b in WRF, due to the small-scale perturbations right above convection, are very comparable to those in the observations.Though, in the WRF output, T′ b minima and maxima were very sensitive to the degree to which WRF was coarsened.For example, coarsening to 20-km resolution reduced the simulated extrema by about half, due to significant small-scale CGW variability unresolved by AIRS.The amplitude of the CGW features to the east of the convection is quite comparable to that seen in the observations and not sensitive to the degree to which output was coarsened.
Several differences between the models and the observations can be noted as well, however.Phase-lines of the CGWs southeast of the convection appear slightly rotated clockwise relative to those in WRF.This might be due to latitudinal variations in the background flow (e.g., ∂ y U, ∂ y ∂ z U) in reality that were unrepresented by the horizontally homogeneous profiles used to initialize WRF (i.e., Figure 7).Additionally, observed large-scale GWs with northeast-southwest-oriented phase lines in the northern part of the domain  15) μm kernel plotted here is the average of kernels of 42 (2) individual channels (Hoffmann et al., 2013(Hoffmann et al., , 2014(Hoffmann et al., , 2017) ) to reduce noise.These kernels were computed assuming climatological midlatitude atmospheric conditions.3 and 4.

10.1029/2023MS003624
are not present in the models.These GWs are likely due to sources outside of the spatiotemporal domain represented by WRF or outside of the region where convective forcing was supplied (Figure 3e) and, hence, were not represented.Finally, the observations include significant noise, particularly in the 15 μm product (Figure 4), where only two AIRS channels were averaged.
Brightness temperature perturbations along 29N are shown in Figure 8. East of 79W, the CGW amplitudes and phases are very similar to the observations, at least in the 4 μm T′ b (panel a).The comparison east of 79W is not as good in the 15 μm product (panel b), though, the significant noise in the observations (∼0.3 K), potentially of similar amplitude to the CGWs according to WRF, obscures the comparison.While the CGWs do not obviously emerge from noise in such a transect, CGWs are visible through the noise when plotted spatially in Figure 4a.(Note noise in the 4 μm channel is smaller ∼0.1 K.) The simulated CGWs (Figures 4(b-4e)) do resemble those visible through the noise in the observations.West of 79W, the high-amplitude, small-scale perturbations in WRF do not match in phase with those observed (Figure 8).Simulated perturbation amplitudes are similar to the observations, being similar in the 4 μm product and slightly smaller in the 15 μm product.Perhaps the simulated amplitudes could be made more comparable with the observations with a more realistic treatment of AIRS footprint geometries and sizes and/or the addition of noise to the WRF output, however, this was not performed here.Still, the exact locations and phases of these small-scale CGWs right above the sources are likely inherently unpredictable, meaning matching simulated phases with observations may not be realistic.
While the evaluations of diabatic heating predictions by the four tools could suggest one tool is better than the other (e.g., comparing MAE from the DANN vs. the DAFLNN in Figure 2), the CGWs produced by all four diabatic heatings are quite similar between the four runs (Figures 3, 4, and 8).It is unclear if the small differences in AIRS-visible simulated CGWs between the four simulations are significant, being attributable to differences in the diabatic heatings, or if these differences are essentially within an ensemble spread where only diabatic heatings were purturbed (i.e., indistinguishable).As such, it is difficult to claim one tool is better than the other when validating the simulated CGWs against the AIRS observations.However, the similarity of CGWs between the four solutions, all resembling the observations quite well, allows the conclusion that if a reasonable diabatic heating, in this case learned from a microphysics parameterization within a covection-permitting and not convection-resolving simulation, is supplied to a GW-resolving model at correct locations and times, the CGWs resulting from this forcing can be quite realistic.

WRF Validation Against Loon Super-Pressure Balloons
For further evaluation, another case was simulated using the modified idealized WRF configuration.Here, a case of typical diurnal convection over Florida was simulated that happened to have two super-pressure balloons, flown by Project Loon (hereafter Loon), advecting from east-to-west overhead near z = 19.4km.Loon was a Google project, and later an Alphabet subsidiary, that flew 2131 super-pressure balloons nearly globally in order to provide wireless internet access to rural areas (Rhodes & Candido, 2021).Loon balloons carried a payload with instruments measuring pressure, temperature, and horizontal velocities from GPS (Friedrich et al., 2017) at 1 Hz.These balloons also had the capability of changing their density, allowing some altitude control and steering by catching winds at different altitudes.A flag recorded when vertical maneuvering occurred.These data have been used in a handful of studies up to this point (Conway et al., 2019;Friedrich et al., 2017;Lindgren et al., 2020;Schoeberl et al., 2017).Only the 1 Hz location, height, and horizontal wind observations are used here.
For this case, an 800 × 800 × 55 km domain was used at Δx = 2-km horizontal resolution and Δz = 500 m vertical resolution via 110 evenly spaced vertical levels.The idealized configuration was initialized at 12 UTC on 16 June 2018 and integrated 24 hr in time.Diabatic heatings were again computed from MRMS data via the same LT and three NNs and supplied to WRF every 2 minutes.Figure 9 shows the zonal wind perturbations relative to the The WRF output was then 4-D linearly interpolated to the time, altitude, latitude, and longitude of the observations taken by both Loon flights during the 24 hr of the four simulations.The Loon height, zonal wind perturbation, and meridional wind perturbation time series for both flights are shown in Figure 10.Initially, there are no perturbations occurring at the Loon locations, as the convective forcing did not begin immediately and when it did occur, it was some distance northwest.When the CGWs do reach the Loon locations, as noted in the previous case, the differences in heatings provided by the four tools do not seem to result in significant differences in the simulated CGWs they force.
In this case, none of the idealized WRF simulations were able to well-reproduce the observations.Here, simulated wind speed perturbations are relative to the initial wind at the altitude of interest.The Loon perturbations are relative to the mean over the 24 hr period presented.The simulated u′ = u(t) u(t init ) amplitudes were generally notably higher than in the Loon observations.The simulated v′ compared, perhaps, slightly better to the observations.Likely the best point of comparison was in the arrival times of the CGWs to the Loon locations.For example, about 8 hr after initialization, the appearance of significant simulated CGW perturbations appear.This timing roughly corresponds to when higher-frequency variability appears in Loon as well.It is difficult to say whether or not the overall method of recreating CGWs did not work in this case.While the time series comparisons are poor (Figure 10) and wind speed uncertainty is reported to be much smaller (0.23 m s 1 , Friedrich et al. (2017)) than the observed variations, data in this case are limited to only two transects.Comparisons of GWs along individual transects can be misleading, as small differences in the location of interest relative to the GWs can lead to significant differences of the apparent GW field sampled on a transect when, spatially, the GW fields are similar (cf.Figures 8 and 4).Additionally, the data quality is somewhat questionable in this particular case.The portions of the Loon time series highlighted in red indicate times when the super-pressure balloon was vertically maneuvering by changing its density.It is unknown if this maneuvering was performed to steer the balloons or an automated response to oppose the influences of CGWs.

GW Analysis of the AIRS Case
A primary motivation for the overall method of forcing an idealized model with weather-radar-derived diabatic heating was to produce validatable simulations of CGWs and then use these validated simulations to study CGWs.Here, the CGWs within the AIRS-validated case above are briefly analyzed.The objectives are to see how far laterally CGWs can propagate in this case, to see where they dissipate, and how strong the drag decelerations are.All of these objectives are currently relevant to the development and improvement of GW parameterization in weather and climate models, which has not been well constrained by observations or constrained by directly validated CGW-resolving simulations such as these.
Over the entire 30-hr AIRS-validated WRF simulation, the convective diabatic forcings were fairly compact.The height-and time-integrated diabatic heating over the entire simulation is shown from the DALT and DAFLNN predictions in Figure 11.The corresponding maps from the DANN and FLNN predictions were largely similar and so are not shown.The most intense, prolonged heating resulted from the convective region over northern Florida, with more localized and weaker forcings scattered within the domain elsewhere.This localization of CGW forcing simplifies interpretation of GW analyses somewhat, as the CGWs can largely be interpreted as being generated by a single localized source.
GW amplitudes are illustrated in Figure 12.Amplitudes were computed using the discrete Hilbert Transform following Eckermann et al. (2015) and Mercier et al. (2008), allowing phase-averaged quantities to be produced in physical (e.g., x,y) space.These amplitudes were then averaged over all output times during the 30-hr simulation, from output every two minutes.At z = 40 km, CGWs are most apparent over and to the east of the diabatic forcing (cf.Figures 11b, 12a-12c, and 12g).The prevalence of CGW activity to the east is largely expected, considering the strong easterly wind shear in the ambient winds below this altitude (Figure 7) forcing critical-level dissipation of the westward-propagating CGWs.The CGW activity is most spread out according to u′ amplitudes (Figure -12a) and most localized according to w′ amplitudes (Figure 12b), with the spread of vertical fluxes of horizontal momentum (MF x = ρ û′w′ , MF y = ρ v′w′ with hats here indicating phase averaging via Hilbert transform) in between.In terms of momentum flux, CGWs can clearly propagate O(1,000) km away from their source, consistent with the modeling study of Sun et al. (2023), observational study of Corcos et al. (2021) and inconsistent with the conventional column-approximation in parameterizations.
Vertical fluxes of zonal (b-d) and meridional (f-h) momentum are shown at z = 20 km, 40 km, and 60 km in Figure 12 to give a sense for how CGWs both dissipate and spread with height.The color shading scales are reduced with height, implying CGW dissipation and momentum deposition.Alternatively, lateral spreading can result in spreading and reduction of fluxes, too (Eckermann et al., 2015).However, the spatial extents do not appear to change significantly with height, suggesting GW dissipation.13b).This spread is also seen in the contours of vertical flux of zonal and meridional momentum (Figures 13c  and 13d), though, this spread with height is more subtle in this variable.
The zonal and meridional CGW drag was quantified via and shown in panels (e-f).The influences of lateral divergences of lateral fluxes of horizontal momentum can be important (Sun et al., 2023), but were not investigated here.The vertical profiles of zonal drag are largely consistent with linear GW theory.Westward-propagating GWs producing negative zonal momentum flux encountered critical levels and dissipated in the region of strong negative zonal wind shear between z = 15 and 20 km (Figure 7).This results in negative drags of ≈1 m s 1 day 1 , though, these values of drag are somewhat subjective as they depend on the choices made in areas over which fluxes were averaged.The eastward-propagating waves do not encounter critical levels, but do grow with altitude and gradually reach overturning amplitudes and dissipate, indicated by the general increase in positive drag with height (Figure 13e).However, zonal drags rise sharply in the layers of positive shear above z = 35 km, as CGWs propagating into these layers encounter shear that brings the environment a bit closer to their phase speeds, forces GWs toward steepening and saturating (see Kruse et al. (2016) for further discussion on this effect, but for orogaphic GWs).
The growth in amplitudes due to these local zonal wind maxima can be seen in Figures 13a and 13b.Interestingly, the zonal and meridional drags are fairly invariant in latitude despite localized forcing, except at the highest altitudes, highlighting the effects of lateral propagation on drag.It should be noted that the idealized WRF configuration used no physical parameterizations and so did not use a turbulence parameterization.Also, the vertical resolution of Δz = 727 m may be coarse relative to the scales of motions involved in CGW breakdown.Both simulation characteristics will likely affect some details of how and where these simulated CGWs break and deposit momentum.Testing how turbulence parameterizations and vertical resolution affect drag on the mesoscales is certainly warranted, but is left to future work.

Discussion and Conclusions
If reasonably realistic diabatic heating is supplied at the correct locations and times in a GW-resolving model, the CGWs generated within that model can resemble observed CGWs quite well.This overall method (i.e., forcing CGW-resolving simulations with observations of convection) shows significant promise in furthering CGW research and parameterization development with confidence, as it allows full 4-D fields of realistic CGWs to be generated and analyzed rigorously.
Here, diabatic heating was learned from full-physics, Δx = 2-km, Δz < 500-m resolution WRF simulations.These simulations were convection-permitting, but not convection-resolving (Jeevanjee, 2017;Jeevanjee & Zhou, 2022), and diabatic heatings are predicted by the WRF Single-Moment 6-class (WSM6) microphysics scheme (Hong & Lim, 2006).The good agreement between simulated and observed CGWs (Figures 3 and 4) suggests the convection permitted by these resolutions and the heatings predicted by this microphysics scheme are reasonably realistic, at least when it comes to CGW forcing.The look-up table method and NNs had similar skill at predicting the WRF-simulated diabatic heating profiles at larger rain rates, while the NNs showed promise at being better able to represent complexities in heating profiles (e.g., evaporative cooling layers) at smaller rain rates.The vast majority of gridpoints deemed "convective" (i.e., having a rain rate exceeding 1 mm (10 min) 1 ) had these smaller rain rates.This increased performance by NNs at smaller rain rates could be attributable to the inherent ability of such an architecture to represent such profiles, potentially the increased information contained by the additional radar reflectivities used as input, or just a reflection the NNs being trained mostly small-rain-rate profiles.Perhaps a proper hyperparameter optimization, a loss function used to emphasize skill of the larger-amplitude heating profiles, or an architecture more appropriate for this application (e.g., one that uses spatial input to account for the 3-D tilting of convection observations due to wind shear) could enhance skill in this application over all rain rates.
While machine learning methods still have significant potential to further improve skill in predicting convective diabatic heating beyond conventional methods (e.g., look-up tables), variations in CGWs generated by the different heatings predicted here were small.It is unclear if better heatings will be significant when it comes to CGW forcing.
In the Δx = 2-km, Δz = 727-m resolution idealized WRF configuration used here, larger-scale CGWs that apparently propagate more laterally validated the best against AIRS observations, with both phases and amplitudes reproduced reasonably well quantitatively.The WRF configuration was also able to reproduce the smallerscale, more vertically propagating CGWs above convective sources as well, at least in amplitudes.Still, these small-scale CGWs were highly sensitive to the details sampling a simulation as if AIRS were viewing through it.
A more accurate treatment of how AIRS might sample these simulated CGWs that takes into account viewing geometries of individual footprints, variations in horizontal observational filtering with viewing zenith angle, and perhaps even radiative transfer would likely alter how a hypothetical AIRS sensor would see these CGWs.This is particularly relevant as these small-scale CGWs right over the convection are responsible for much of the momentum flux (Figure 12).
Finally, CGWs are inherently non-stationary and propagate away from the convection.A spectrum of horizontal and vertical group velocities is generated.In the validated simulation presented here, it is clear CGWs propagate 100s of kilometers away from the convective sources.The most momentum fluxed and drag deposited does occur above the convective sources, but significant drags still occur 100s of kilometers away.These results provide more evidence for relaxing the commonly employed single-column approximation in GW parameterizations, which assumes GWs propagate only vertically.

Figure 1 .
Figure 1.Individual profiles of WRF-simulated (black) and predicted (colors) latent heating.Profiles were randomly chosen from within the five, 5-mm rain rate bins from the Florida run.The tropopause was near z = 15 km for this case.

Figure 2 .
Figure 2. Validation statistics plotted as a function of height for the three NNs and the LT.All methods are tested against Florida run convective profiles from WRF (e.g., Figure 1) that were withheld from training.Mean-absolute errors (MAE) are plotted as solid, colored lines.Mean errors (i.e., biases) are dashed.The mean latent heating profiles within the 5 mm (10 min) 1 bins are plotted in solid black.

Figure 3 .
Figure 3. Maps of observed (a) and WRF-simulated (b-e) T′ b .AIRS observations shaded in (a) were collected over 18:41 to 18:45 UTC on 22 July 2018.The WRFsimulated T′ b were computed using output at 18:50 UTC.Approximate vertical and horizontal AIRS observational filters were applied to WRF in (b-e).Diabatic heating, Q, supplied to WRF was limited to within the boxes in (e), with a cosine ramp transitioning predicted Q from zero to its full amount between the dashed and solid lines.

Figure 4 .
Figure 4.As in Figure 3, but for the 15 μm product.Note the gray-shading range is twice that in Figure 3.

Figure 6 .
Figure 6.Multiple Radar, Multiple Sensor (MRMS) mosaic of lowest weather radar reflectivity valid 18 UTC on 22 July 2018, approximately 40 min prior to the AIRS observations in Figures 3 and 4.

Figure 7 .
Figure 7. MERRA2 wind components area-averaged between 25 and 34N, 77 and 84W, valid 18 UTC on 22 July 2018.This wind (and stability, not shown) profile was used to initialize all idealized WRF simulations.

Figure 8 .
Figure 8. Brightness temperature comparison along 29N over Florida and to the east.WRF output was coarsened to 16-km to approximate an average horizontal observational filter of AIRS.

Figure 9 .
Figure 9. Horizontal cross-section of u′ at z = 19,400 m at 22 UTC on 16 June 2018.Here, the entire WRF domain is shown.The idealized WRF model was initialized 10 hr prior to the valid time.The two Loon superpressure balloon tracks are shown during the 24 hr period beginning at 12 UTC, 16 June 2018.The circles indicate the positions of the super-pressure balloons at the valid time.The height was chosen to be an approximate average height of the balloons (cf. Figure 10).

Figure 10 .
Figure 10.Time series of Loon super-pressure balloon GPS altitude, zonal wind perturbations, and meridional wind perturbations (black) for two Loon flights that happened to drift over Florida 16 June 2018.The four WRF runs were 4-D linearly interpolated (colors) to the latitudes, longitudes, heights, and times for comparison with the observations (obs).The portions of the observed time series (black) highlighted in red indicate periods where the super-pressure balloon was maneuvering vertically.

Figure 11 .
Figure 11.Height-and time-integrated latent heating (Q) predicted by (a) the look-up table method and (b) the DAFLNN on the WRF domain for the AIRS case.Latent heating was zeroed outside of the dashed line in Figure 3e.

Figure 12 .
Figure 12.Phase-averaged (via Hilbert transform, (. )), time-averaged GW amplitudes of (a) u′, (e) w′, (b-d) vertical flux of zonal momentum, (f-h) and vertical flux of meridional momentum at selected levels indicated in the panel titles.These analyses are of the DAFLNN-forced WRF run.Comparison with Figure 11d gives an indication of how different variables tend to spread laterally and how this spread varies with height.Note every panel has an individual color shading range.

Figure 13 .
Figure 13.Time-and zonal-mean (a) u′ amplitude, (b) w′ amplitude, (c) vertical flux of zonal momentum, (d) vertical flux of meridional momentum, (e) zonal GWD, and (f) meridional GWD.The entire 30-hr simulation was included in the time averaging.The outer 200 km of the domain were excluded.The vertical fluxes of horizontal momentum and zonal drags were smoothed along latitude with a 42-km moving average smoother.The thick black contours depict the time-, zonal-mean latent heating at 1, 6, and 12 K day 1 .