Cloud Botany: Shallow Cumulus Clouds in an Ensemble of Idealized Large-Domain Large-Eddy Simulations of the Trades

.

The goal of improving our understanding of the mesoscale, marine trades has mobilized an entire community, centered around the EUREC 4 A (Elucidating the role of clouds-circulation coupling in climate) field campaign (Stevens et al., 2021).Fortunately, advances in computational capabilities now allow these observations to be complemented by (a) global and regional models running at a sufficiently fine resolution to begin resolving shallow convection (e.g., Stevens et al., 2019) and (b) detailed process models-"classical" LES codes-running on sufficiently large domains to capture the mesoscale (e.g., Lamaakel & Matheou, 2022;Narenpitak et al., 2021;Seifert et al., 2015).In particular, these models facilitate understanding of the degree to which mesoscale cloud patterns originate in larger-scale dynamics, which set the environment in which clouds form, or small-scale processes, which govern individual cumulus structures.Regional simulations running at less than a kilometer resolution are beginning to appear (Schulz, 2021;Schulz & Stevens, 2023); these attain a detailed representation of the larger scale and are therefore well-suited to investigate the importance of those scales.However, we still miss a systematic exploration of large-domain (>100 km) LES that maintains a simple representation of the larger-scale environment, but does not compromise on its turbulence-resolving resolution of around 100 m.
To bridge that gap, this paper presents Cloud Botany, an ensemble of 103 simulations on domains of 150 km at 100 m horizontal resolution, enabled by the computing capabilities of supercomputer Fugaku.With Cloud Botany, we take a step back from the pursuit of realistic regional or global simulations.Instead, we hypothesize that if we wish to understand the role played by cumulus convection in organizing the tropical mesoscale, it is helpful to begin by idealizing and fixing the larger-scale environment and boundary forcings on a mesoscale domain, and study the response of freely developing cloud patterns to variations in these idealized forcings.Therefore, we will parameterize the vertical structure of the trade-wind environment with six parameters.We then co-vary these parameters across the range of typically observed conditions in the trades, which results in the ensemble of initial conditions and boundary forcings that our simulations run under.Such ensembles successfully explain parameter-dependencies in small-domain simulations of the subtropics (e.g., Bellon & Stevens, 2012;Feingold et al., 2016;Glassmeier et al., 2021;Nuijens & Stevens, 2012;Schalkwijk et al., 2013;Shen et al., 2022); we designed Cloud Botany to test if extending this approach to large LES domains can help understand the origins of mesoscale cloud patterns.
The construction of the simulation ensemble and description of the resulting data products are the main focus of the present manuscript.We aim to use the data to investigate targeted questions, such as how the smallest energetic scales of motion self-organize into mesoscale structures (e.g., Bretherton & Blossey, 2017;Janssens et al., 2022;Narenpitak et al., 2021;Seifert et al., 2015) under varying conditions.However, the simulations also come forth from a general curiosity as to which trade-wind cloud structures our LES model can actually produce (and which not), and how we might describe and classify these.It is in this sense that our exploration comes closest to paralleling the botanist's quest.Most importantly, we hope the data set is useful to a community with a broad range of research questions pertaining to the understanding of the detailed dynamics of the mesoscale trades.
10.1029/2023MS003796 3 of 25 The paper is organized as follows.We begin by describing the creation of the initial and boundary conditions that define our simulation ensemble (Section 2).Running each simulation still requires the choice of several other parameters which we hold fixed over the ensemble.These are outlined in Section 3. Section 4 describes the workflow of setting up and running the simulations on Fugaku, and how its output is translated into accessible data sets.Section 5 describes the salient features of these data products, before Section 6 gives a brief overview of some frequently recurring cloud patterns.A conclusion is offered in Section 7.

Creating an LES Ensemble in a Parameter Space
To study how self-organized cloud patterns in LES respond to variations in the larger-scale environment, we will initialize and force LESs with simple, functional representations of the vertical structure of the trade-wind environment ("profiles").The parameters that control these profiles will span a "parameter space," which we will explore by co-varying the parameters.To cover this space with around 100 simulations, we must keep its dimensionality as low as possible.Therefore, we wish to find a set of profiles which is controlled by a minimal number of parameters.At the same time, we want these profiles retain enough realism to remain useful for comparing variability over our simulation ensemble to variability in the real-world sub-tropics.
In this section, we will elaborate on how we design a parameter space that strikes this balance.We will first present our chosen set of idealized profiles and their free parameters (Section 2.1).We will then judge the realism of these profiles by analyzing how well we can fit them to reanalysis and observations (Section 2.2).Finally, we will use the variability in the parameters as fitted to observations to inform the ranges we will co-vary our parameters over, resulting in the set of initial conditions and forcings that make up our ensemble (Section 2.3).

Idealizations of the Trade-Wind Environment
Cloud Botany is based on simulations conducted with the Dutch Large Eddy Simulation (DALES; Heus et al., 2010;Ouwersloot et al., 2017).In the configuration used here, DALES solves numerical approximations of the anelastic equations of atmospheric motion in a three-dimensional domain over a sea surface with a homogeneous temperature.The domain is discretized by a staggered grid.To initialize our idealized DALES simulations, we specify vertical profiles for five of its prognostic quantities: Liquid-water potential temperature θ l , total specific humidity q t , horizontal velocity in west-east (u) and south-north (v) directions, and sub-filter scale (SFS) turbulent kinetic energy e; vertical velocities w are zero when horizontally averaged and do not require initialization.Similarly, we will parameterize scales larger than the simulation domain with idealized profiles for (a) geostrophic horizontal wind (u g , v g ), (b) a large scale vertical velocity (w ls ), and (c) large scale tendencies of moistening and heating, which we keep constant over 2.5 days of simulation.We will model these profiles of initial conditions and large-scale forcings using profiles that capture basic aspects of the trade-wind environment's expected, physical structure with at most two free parameters.Thus, our parameter space will contain both parameters that set the initial state of the atmosphere in our simulations, and parameters that explicitly force the atmospheric state; their common denominator is that they all explain an appreciable amount of variability in the environment, and are thought to be important cloud-controlling variables.Parameters that are kept fixed over the ensemble are listed in Table 1.
We set both the initial profiles and geostrophic wind profiles of horizontal velocities u and v to where u 0 is the initial near-surface wind and u z = ∂u/∂z denotes the initial vertical shear of horizontal wind speed.
The geostrophic wind is assumed to remain constant in time during each simulation.Except for a few exceptions, all simulations will be initialized with the same zonal shear strength.As our analysis is positioned in the downstream trades, we assume v 0 = 0, v z = 0 for all our experiments, that is, the geostrophic wind is predominantly east-west.
Profiles of the initial liquid water potential temperature θ l follow a similar, linear approximation.However, the lowest θ l levels are expected to co-vary with the surface conditions.Indeed, upon consulting the global ERA5 reanalysis (Hersbach et al., 2020), we find a Pearson correlation of r = 0.57 between θ l at the lowest ERA5 level and the surface.To avoid long model spinups where surface fluxes attempt to re-calibrate an out-of-equilibrium 10.1029/2023MS003796 4 of 25 mixed-and cloud layer, we therefore initialize θ l with a residual layer of constant height z ml = 500 m.Having chosen a (potential) sea-surface temperature (SST), θ l0 , we simply set the residual layer's value to the sea surface temperature minus the reanalysis-mean difference in θ l between the lowest ERA5 level and the surface, Δθ l0 .This gives the following definition for θ l : Hence, the initial profile of θ l is fully determined by setting θ l0 and Γ.In observations, u 0 and Γ seem to be important control parameters for the size and degree of clustering of trade-wind clouds (Bony et al., 2020;Schulz et al., 2021).To test whether similar dependencies can be observed in our LES setup, we have deliberately chosen u and θ l to be specified by these parameters.
Profiles of the total humidity q t are modeled with a similar initial well-mixed layer, but drop off exponentially above z ml , following Vogel et al. (2020): The free parameters of this parameterization are the initial mixed-layer moisture q t,ml and the moisture scale height    .The surface moisture is assumed to be at saturation, and thus follows from θ l0 and the surface pressure, and the difference in moisture between the first model level and the surface may be diagnosed in turn.
Finally, we will impose profiles of the large scale vertical velocity w ls that includes two terms: (a) a term representing the downwelling branch of the Hadley cell, modeled by exponential decay with height following for Varying w 1 captures a substantial amount of the mesoscale variability in vertical velocity in the trades (George et al., 2023).Therefore, we fix the free-tropospheric, asymptotic subsidence w ∞ and its scale height   ∞ .Furthermore, we assume (a) that the vertical depth of the circulations, encapsulated by   1 , scales with the boundary-layer height, which LES studies of the phenomenon indicate to be reasonable (Bretherton & Blossey, 2017;Janssens et al., 2022;Narenpitak et al., 2021), and (b) that it to first order is constant in time.This leaves the strength of the sinusoidal term w 1 as a free parameter in our large-scale vertical velocity profiles.Importantly, we do not fix the large scale vertical velocity profiles to satisfy a Weak Temperature Gradient (WTG) constraint on the mean flow in the free troposphere, in which horizontally averaged vertical motion is diagnosed given a radiative heating rate and Γ (Bellon & Stevens, 2012;Nuijens & Stevens, 2012).Not enforcing WTG allows richer responses of the boundary layer to its forcing (Betts & Ridgway, 1989), and is more representative of the real trades, where free-tropospheric tendencies in heating and moistening are not usually small (e.g., Nitta & Esbensen, 1974).This choice prevents the free-troposphere from acquiring quasi-equilibrium, and requires us to add a subtle nudging to prevent the tendencies from becoming overly adventurous; we return to this in Section 3.
In all, our idealized framework has six free parameters that set the environment we launch our simulations in, spanning a six-dimensional parameter space: surface wind u 0 , surface temperature θ l0 , temperature lapse rate Γ, surface humidity q t,ml , humidity scale height    and large-scale vertical velocity variability w 1 .

Quality of Fits
To assess how idealized the chosen functional forms are with respect to the vertical structure of the real trade-wind environment, we will compare them with the ERA5 global reanalysis, sampled every 3 hr between 9.8-16.8°N and 62.22-54.22°Wbetween 1 January 2020 and 31 March 2020.This domain is representative for the trades in general (Medeiros & Nuijens, 2016) and the period spans the winter during which the EUREC 4 A campaign was conducted (20 January 2020 to 20 February 2020).We complement use of ERA5 with the JOANNE (Joint dropsonde Observations of the Atmosphere in tropical North atlaNtic meso-scale Environments) data set (George et al., 2021), gathered during the campaign by launching densely spaced meteorological dropsondes from an aircraft along the perimeter of a 200 km circle.This spatial scale roughly fits that of our horizontal domain size.Therefore, we will directly use this data at the spatial scale of the circle and time scale of a day's flights.We fit all profiles in our ERA5 database with Equations 1-4, using a non-linear least squares algorithm.The results are shown in the top three rows of Table 2.  Quality of fit is assessed in terms of the coefficient of determination (R 2 ), averaged over all fits for θ l , q t , u, and w ls .While R 2 cannot formally be interpreted as the explained variance for our non-linear least-squares regressions, the top row of Table 2 still illustrates that the fits of θ l are excellent, those of q t and u are adequate, and those of w 1 are inadequate.The poor fit of w 1 reflects both significant deviations from the prescribed functional form, and variability in higher-order modes in the ERA5 database than our simple approximation captures.Since ERA5 assimilates the JOANNE data, and agrees well with the observed velocity even when it is not assimilated (George et al., 2023;Savazzi, Nuijens, Sandu, et al., 2022), these higher-frequency w fluctuations are unlikely to be spurious.Therefore, we revisit the design of w ls below.
By excluding v, we artificially remove momentum from our simulated environment.To investigate the consequences, we have fit profiles of v in the same manner as for u.The resulting meridional surface wind (v 0 ) is on average around 15% the strength of the zonal surface wind (u 0 ), while the meridional shear v z ≈ 0. We compensate for this general lack of momentum in the simulations by also investigating marginally broader ranges in u 0 in our parameter sweeps (see below).

Chosen Parameter Ranges
To keep our simulation number manageable while capturing as much of the variability that occurs in the winter trades as possible, Cloud Botany consists primarily of simulations conducted at the corners of a hypercube in our six-dimensional parameter space, that is, 2 6 = 64 simulations.These stem from considering all possible combinations of our environmental control parameters, at a minimum and a maximum point informed by the 10th and 90th percentile of each parameter's variability over the ERA5 fits (second-to-last row in Table 2).This choice makes our simulations indicative of the envelope of conditions observed in the trades; they are thus not to be confused with the climatology that would have resulted from sampling the multivariate probability distribution functions of the fitted parameters.To still capture parameter dependencies in more typically observed conditions, we supplement the hypercube corners with "sweeps" of every parameter: Runs that span the range between the extrema in several steps for one control parameter, with all other parameters held at the center of the hypercube.
The sweep runs thus explore the sensitivity of the central point in the parameter space to perturbations in a single parameter.
Since the chosen parameters will be varied independently of each other, it is prudent to quantify their independence in observations, that is, whether they each capture a unique aspect of the environment's variability.Pairwise Pearson correlations of our ERA5 fits broadly confirm this: All coefficients are below 0.4, with the largest correlations existing between θ l0 and Γ (−0.31), θ l0 and q t,ml (0.33), Γ and    (−0.26), q t,ml and w 1 (0.32), and q t,ml and    (0.35).All other correlations are below 0.25.
The final ranges over which we run each control parameter are given in the bottom row of Table 2.For Γ,    and u 0 , these directly result from rounding the 10th and 90th percentile values.Variability in θ l0 subsumes both variability in surface pressure and SST.Since we keep the surface pressure over our ensemble fixed at 1,016.05 hPa, we adjust the rounded range over which we vary θ l0 to better match the variability in SST.This results in a downwards adjustment of 0.5 K.In preliminary experiments, combinations of high-end free-tropospheric moisture and free-running free-tropospheric tendencies would sometimes produce clouds near our domain tops, which after spurious boundary interactions with our radiation scheme would yield temperatures exceeding the local boiling point and crash our thermodynamics scheme.Conversely, simulations with less cloud-layer moisture than the ERA5 envelope would often not even develop clouds.To avoid these situations, we narrow the envelope of q t,ml slightly to avoid unrealistically dry and moist free-tropospheric moisture profiles and initial profiles that exceed a relative humidity of 100%.As we shall see in Section 6, even the final ensemble still contains some runs that fail in this manner.
There are certain inherent limitations to modeling variability in w ls with a framework as simple as ours: it does not adequately represent high-frequency vertical modes, nor does prescribing w ls allow the convection developing in our simulations to interact with vertical velocity structures of scales larger than our domain.Our compromise aims to (a) capture sufficient w variability to satisfy our main objective-studying environmental dependencies-and (b) ensure that the variability we capture is more representative of the reanalysis than traditional exponential (Bellon & Stevens, 2012;Blossey et al., 2013) or linear (Siebesma et al., 2003;Stevens et al., 2001;Yamaguchi et al., 2019) approximations.Therefore, we set w ∞ to a number characteristic of the ERA5 mean in 10.1029/2023MS003796 7 of 25 the free troposphere, where its variation is not expected to be important for the current study, and vary w 1 according to how it varied between the moistest and driest 50% of circles flown by the HALO aircraft during EUREC 4 A (George et al., 2023).
We separate the vertical velocity variability by moisture variability and not by the vertical velocity itself.This results in a smaller w ls variability than observed during EUREC 4 A and in the reanalysis.We justify this by noting that large values in w ls are not expected to remain constant over the multiple days we simulate, since they arise from shorter-lived circulations (George et al., 2023).For instance, applying a constant 2 cm/s cloud-layer subsidence velocity (the 10th percentile in ERA5), would imply a subsidence heating rate of roughly 10 K/day and a drying rate of 10 g/kg/day, which heat and dry our cloud layers beyond conditions that can sustain cumulus convection within a few hours.Indeed, we found that spanning a wider range of the w 1 parameter led to many corner points in the parameter hypercube becoming either too dry for clouds to form, or too moist, resulting in deep convection and model crashes.Multiple-day cloud-free or deeply-convecting situations are hardly observed in the winter trades, nor are they our primary interest.Thus, we elect to fit w 1 on the more stable, larger-scale water-vapor variability, resulting in the profiles shown in Figure 1.We performed additional tests varying w ∞ with w 1 = 0, which indicate that w ls in the cloud layer controls the cloud formation, and that the shape of w ls is less important.
The remaining parameters needed to complete Equations 1-4 are reported in Table 1, and the complete ensemble of initial and boundary conditions that emerges is plotted in Figure 2.

Design of Fixed LES Parameters
While Section 2 describes the set of initial and lower boundary conditions that vary over our simulation ensemble, running a simulation still requires the prescription of a model grid, a precipitation model, a radiation model, a prescribed large-scale advective forcing, and a free tropospheric nudging.These are all kept the same for all simulations; we briefly describe them in turn below.
Our simulations run on horizontally square domains of 153.6 km, with a height of 7 km.The domains have periodic boundary conditions in the two horizontal directions.To discretize this cuboid, we use a grid with a horizontal spacing of 100 m, and vertical spacing of 20 m in our first model level, stretched by 1% in each level above.This yields 1,536 grid points on a horizontal side, and 175 vertical grid levels.The grid's size limits our simulations' integration time to 60 hr, which is substantially shorter than the 20 days that for example, M. Zhang et al. (2013) run their small-domain simulations to come into equilibrium with their larger-scale environment.Our simulations do not reach such an equilibrium.We choose to accept this limitation, since it is not obvious whether 150 km fields of trade cumuli are in equilibrium with their larger-scale environment in nature.Furthermore, we will show in Section 6 that the liquid-water path (LWP) is essentially in equilibrium after two simulated days, by extending a single simulation to 9 days.Advection of momentum, θ l and e is discretized with a sixth order scheme, advection of q t and precipitation species with a fifth order scheme which is nearly monotonous (Wicker & Skamarock, 2002).The sources and sinks of precipitation are modeled with a warm microphysics scheme based on Seifert and Beheng (2001), whose two moments we prognose.We prescribe a (fixed) cloud-droplet number concentration of 7 ⋅ 10 7 /m 3 .
Radiative heating rates are calculated interactively with RRTMG (Iacono et al., 2008;Pincus et al., 2003).As the importance of diurnal, radiative variability in the downstream trades has recently been emphasized (Albright et al., 2021;Narenpitak et al., 2023;Vial et al., 2019Vial et al., , 2021Vial et al., , 2023)), we include in the model's shortwave component the diurnal cycle representative for Feb-01-2020 at 13.1°N and 52°W (representative of the EUREC 4 A domain and period).Required input profiles for water vapor and temperature above the LES domain were derived  1 and 2. from ERA5, averaged over the EUREC 4 A region and period.These were stitched to the prognosed profiles of temperature and water vapor within our numerical domain from the 7 km domain top until a height corresponding to the 100 Pa pressure level (which we refer to as the top of the atmosphere-TOA).Default profiles supplied with RRTMG were adopted for other gases (O 3 , CO 2 , CH 4 , N 2 O, and O 2 ).
We add two large-scale forcings to the simulations.The first are (horizontally constant) tendencies that aim to be representative of the typical drying (for q t ) and cooling (for θ l ) of our region of interest through advection on a horizontal scale larger than we simulate.We estimate these tendencies from JOANNE following a linear approximation, held at zero once they cross the ordinate (Figure 3): These tendencies display variability around the fixed, approximate state we have chosen, which would have made their inclusion in our parameter space interesting.We excluded such variations to keep the required simulation number tractable, but recommend investigating their importance in future extensions.
Finally, our rich ensemble of initial conditions combine with our variation of w ls to form a rather broad variety of w ls -induced heating and drying tendencies forced on our slab-averaged prognostic variables in the free troposphere.To prevent these tendencies from driving the initial state outside the ERA5 envelope, we impose a nudging tendency on our prognostic variables (u, v, θ l , and q t ) that forces them back toward their initial state with a height-dependent nudging time scale τ: Profiles of (a) θ l , (b) q t , (c) relative humidity (RH), (d) u, (e) v (kept at zero in the ensemble), and (f) w ls over the 10th-90th percentile envelope in ERA5 (reanalysis, shading), its mean (gray), and for initial and large-scale forcing of Cloud Botany simulations: the center (purple) and corners (pink) of the six-dimensional hypercube.

Workflow to Create the Data Set
To turn the LES ensemble design into accessible data products, four steps need to be taken: (a) creating a set of input files for each ensemble member, (b) running each simulation, (c) converting simulation output to an easily accessible format, and (d) uploading the data set to a data repository.In this section, we briefly document how we carry out these steps.
To produce the input files required to run each ensemble member, we used a Python script and EasyVVUQ (Groen et al., 2021), a framework for uncertainty quantification.EasyVVUQ can sample a parameter space using different sampling strategies, for example, based on quadrature methods suitable for uncertainty quantification methods.EasyVVUQ then produces model input files, using a template where the varied parameters are substituted.We use this mechanism to produce a Fortran namelist, which is the main DALES configuration file, for each ensemble member.The input files for the initial vertical profiles of the prognostic variables are produced with a Python script.The setup for using EasyVVUQ to run DALES experiments was presented in Jansson et al. (2021).
All simulations were run with DALES on supercomputer Fugaku.Fugaku is based on the Fujitsu A64FX CPU, built on the ARM architecture with Scalable Vector Extension.Each node of Fugaku has 48 CPU cores and 32 GB RAM, and is characterized by a high memory bandwidth and fast node interconnect.Fugaku is a CPU-only system, that is, it does not rely on accelerators such as GPUs (graphics processing units).These properties seem to be a good fit for DALES, a CPU-only code, which in our experience is often memory-bandwidth limited and able to benefit from vectorized floating point mathematical operations.Porting DALES to Fugaku did not require extensive changes to the program, and was mostly a matter of adding the option to use the Fujitsu Fortran compiler.For improved performance, the possibility to store the prognostic fields in single precision was implemented.The single precision version is faster and requires less memory for storing the prognostic fields.The latter is particularly important on Fugaku which has relatively little RAM memory per core (around 600 MB).Further optimizations included rewriting and simplifying some loops for better vectorization, based on profiling the program.These modifications have been found to be beneficial on other architectures as well, enabling us to maintain a single version of the code for all architectures.See Data Availability Statement section for details of the DALES version used and for accessing the code.DALES is parallelized using Message Passing Interfaces (MPIs) in x and y, the two horizontal directions.Each simulation was run on 24 nodes, with 24 × 48 MPI processes.The simulations lasted around 5 days (wall-clock time of running the simulation) per ensemble member.More details on the computational requirements of one specific ensemble member, are shown in Table 3, compared with a similar run on Snellius, the Dutch national supercomputer.The results show that at the beginning of the simulation,  DALES runs faster on Fugaku, comparing time required per grid point and time step.Further into the simulation, DALES runs slower on both systems, with a larger slowdown seen on Fugaku.This behavior is result of the cloud microphysics and precipitation scheme which is activated when precipitation occurs.This scheme has not been tuned for Fugaku yet, and appears to vectorize poorly.What the table doesn't show is that the scaling efficiency for using more nodes is better on Fugaku.
Each MPI process writes the output data for its own part of the simulation domain in the netCDF format.We used the uncompressed netCDF3 format, since it was found to require less RAM memory than netCDF4 during simulation.These netCDF tiles were then merged and converted to compressed netCDF4 using CDO 2.0.4 (Schulzweida, 2021).
Finally, the netCDF files were converted to the Zarr format (Miles et al., 2022) and uploaded to the German Climate Computing Center (DKRZ)'s SWIFT object storage for easy access, as described further in Section 5.As a backup, the netCDF files are kept on the tape archive of the European Center for Medium-Range Weather Forecasts (ECMWF).

Data Set Description
Cloud Botany contains a rich set of idealized large-eddy simulations that provide valuable resources to study the dependency of shallow cumulus convection to environmental conditions.In addition to the vast range of environments, the large domain-size itself allows an investigation of scales that remain uncaptured in previous simulation studies of trade-wind cumuli centered around the RICO (Rain in Cumulus over the Ocean) (vanZanten et al., 2011) and BOMEX (Barbados Oceanographic and Meteorological Experiment) (Siebesma et al., 2003) campaigns.Due to these large opportunities, we put additional effort into providing an easy and free access to these simulations.
We acknowledge that the download of 40 TB of simulation output is a burden and most users will only access portions of this data set, for example, specific timesteps, specific members or height levels.To allow for a more modular access, the data set has been chunked along all its dimensions and saved as Zarr files which support these chunks.The Zarr file format allows hosting Cloud Botany on the DKRZ SWIFT object storage.The combination of the Zarr format with an object storage leads to faster access rates compared to traditional file system based hosted data sets and make the Cloud Botany data set analysis ready.
An analysis in Python can be started by accessing the EUREC 4 A intake catalog (https://howto.eurec4a.eu/botany_dales.html):import eurec4a cat = eurec4a.get_intake_catalog()botany_cat = cat.simulations.DALES.botanyFurther details on how to visualize and analyze this data set can be found in the interactive How-To-EUREC 4 A book at https://howto.eurec4a.euamong other EUREC 4 A related data sets.
All the simulations in the Cloud Botany ensemble are listed in Table A1 together with their parameters.Run 1 is at the center of the parameter hypercube, runs 2 to 65 are at its corners.The remaining runs 66 to 103, labeled "sweep," lie on lines through the center of the hypercube, where one parameter at a time is varied.The remark column gives subjective description of the clouds and cloud organization (from the second day on, when patterns and organization generally occur) based on visual inspection.
The data is divided into several data sets, according to output frequency and dimensionality.Each data set is indexed by ensemble member, time and spatial coordinates.The data sets and their variables are summarized in Tables A2-A9.In general, we have stored 3D fields and 2D radiation fields hourly, and 2D fields such as the liquid water path as well as horizontal cross sections of the prognostic variables every 5 min.As an aid to navigating the ensemble, we have prepared a web page with a set of plots and animations for each member.This page and the images and animations can be downloaded and used offline (Jansson, Janssens, Grönqvist, Siebesma, et al., 2023).

Results
In this section, we include a preliminary exposition of the simulation results, in terms of horizontally averaged cloud properties, and the development of mesoscale cloud patterns.Our aim will be to give a brief impression of the ensemble's variability, both within a single simulation and between ensemble members, to motivate future quantitative, in-depth analyses of the data set.

Time-Evolution in a Single Ensemble Member
We begin with Figure 5, which shows the evolution of several quantities of interest and snapshots of the cloud cover and precipitation in simulation 1, the center point of our parameter space.Its evolution is qualitatively similar to that of many ensemble members, and the Lagrangian simulations by Narenpitak et al. (2021).All simulations depart from cloud-free states at midnight UTC.The first 10 hr are characterized by the onset of convection and the development of small, unorganized cumuli.These non-precipitating clouds then gradually cluster into larger structures.This evolution is modulated by the diurnal cycle of shortwave radiation.After sunrise, shortwave heating gradually stabilizes the domain to convection, reducing both the cloud fraction and horizontally averaged LWP.After sunset, the cloud structures rapidly grow vertically and begin to precipitate around 24 hr from the start of the simulation.The second diurnal cycle is then dominated by larger, precipitating convection cells, organized along cold pools and frequently topped by thin inversion clouds.
The cloud cover reached by simulation 1 (0.12, equal to the ensemble-average) is akin to earlier LESs of the trades (0.13 in Radtke et al. (2021) or 0.19 in vanZanten et al. ( 2011)).However, it is lower than what was observed during the EUREC 4 A field campaign (0.42 including optically thin clouds, based on both satellites and aircraft-borne lidar measurements, Mieslinger et al., 2022).This underprediction of cloud cover remains across the ensemble (Figure 6).Understanding the causes of this bias, and whether it affects the sensitivities of the cloudiness to changing the environment, are worthwhile topics of follow-up research.

Variability in Cloudiness, Rain and Mesoscale Organization Between Simulations
While most simulations develop qualitatively similarly, they produce rather different cloud fractions, and, somewhat independently, different surface rain rates (Figure 6).Hence, changing our environmental control parameters influences both the shallow convection's coverage and depth.This is further illustrated by Figure 7, which shows how vertical profiles of cloud fraction vary under parameter sweeps.At a glance, many of these sensitivities appear to match earlier smaller-domain LES sensitivity studies: the cloud fraction profiles are rather insensitive to changes in θ l0 , in agreement with for example Bretherton et al. (2013), and stronger surface winds give deeper cloud layers with more inversion cloud, in line with Nuijens and Stevens (2012).Following Bellon and Stevens (2012), we observe that stabilizing the free troposphere through increasing Γ lowers the inversion and increases its cloudiness, while moistening the free troposphere through increasing    lowers the inversion and cloud base.Finally, reducing the subsidence strength through increasing w 1 gives rise to increased inversion cloud, in line with Vogel et al. (2020).The apparent similarities between the sensitivity of cloud fraction in our large-domain simulations, which include mesoscale organization, and the small-domain simulations in literature, which do not, motivate more rigorous analysis, to understand the specific role played by self-organization in setting the cloudiness in the trades.Therefore, we plan to repeat all simulations presented herein in small (10 km) domains.
Figure 8 shows a few examples of cloud patterns that develop under different parameter combinations.As anticipated by Figure 6, most of these develop precipitating convection, almost always paired with cold pools.When they occur, such cold pools visually appear to substantially modulate the cloud patterning.We find at least three different ways in which this happens.First, we sometimes find cold pools lined by arcs of cumuli (e.g., Figure 8a).Second, in simulations with strong surface wind, large lapse rates, small moisture scale heights and positive large-scale vertical velocity (e.g., runs 37, 45 in Figures 8b and 8c and 13, 79, not shown), cold pools are produced by sufficiently vigorous convective cells that they produce large (>50 km) sheets of thin, stratiform outflow layers, reminiscent of the structures termed "Flowers" by Stevens et al. (2020).At a glance, the appearance of such structures under stronger stratification and higher surface winds appears consistent with the observations by Bony et al. (2020), Schulz et al. (2021), and follows from Figures 7b  and 7c.Third, in runs 84 to 91 the wind shear is varied.At strong wind shears of both positive and negative signs, cold pools are observed to deform into bands (Figure 8d); such features are also found in runs 4 and 100.
We also find a set of simulations with less vigorous, at most weakly precipitating convection (lower left corner in Figure 6), often at lower surface winds.When the winds blow weakly, the large-scale vertical velocity has a strong switching effect on the cloud formation: Negative w 1 often results in very weak, sometimes cloud-free convection (e.g., runs 18 and 50); merely switching w 1 to its positive counterpart (runs 19 and 51) makes them produce deeper, precipitating convection.Yet strikingly, all non-or weakly precipitating simulations that support a cumulus layer still see their convection organize into mesoscale patterns (e.g., runs 8, 66-68), such as bands aligned with the mean wind (Figure 8e) or into quasi-circular clusters (Figure 8f). Figure 9 shows 3D renderings of parts of scenes b and e of Figure 8.
In summary, we can identify at least five visually distinct forms of convective patterning in our ensemble, several of which appear to match visually identified categories of cloud patterns in nature: "Sugar," "Gravel," the aforementioned "Flowers" and "Fish" (Stevens et al., 2020).In order of increasing visual complexity, we find simulations with (a) no clouds, (b) small, randomly spaced cumulus, (c) clustered, non-precipitating cumulus (both ii and iii seem to fit the "Sugar" category), (d) precipitating convection and subsequent cold pools ("Gravel"), and mesoscale convective systems topped by thin stratiform clouds ("Flowers").Patterns larger than our domain size, such as "Fish" we cannot simulate; our data set therefore cannot shed light on their formation.However, since Schulz et al. (2021); Aemisegger et al. (2021) show that "Fish" originate in extratropical, synoptic disturbances, there is also no reason to expect such structures to form spontaneously in even larger-domain simulations forced by conditions that characterize the trades.

Length of Integration Period
To see the longer-time development we have repeated the simulation for the central point of the parameter space (run 1) for a longer period of 9 days, shown in Figure 10.Hours 24 to 60 of the simulation appear similar to the longer time behavior.As the long simulation was done on a different computer (Snellius, x86 architecture) and the model was compiled with a different compiler, the simulation trajectory is different, and thus comparing the two simulations indicates the randomness or internal variability in these simulations.The timing in the liquid water path peaks differ slightly, though their shapes and amplitudes are similar, and the mean statistics are unaffected (mean liquid water path and cloud fraction over 24-60 hr are within 1% of each other).

Truncated Simulations
Runs 7, 15, 38, 39, and 47 did not finish due to a crash in the thermodynamics routine, when temperature and moisture reach non-physical values.All these runs have a low Γ, sometimes allowing the convection to permeate through to our domain top, where their spurious interactions with our boundary conditions makes them fail (see Section 2.3).Since we do not expect such deep convection to frequently occur during the suppressed conditions we aim to study, we recommend disregarding these runs.Additionally, runs 11, 14, 43, 46, and 87 only span 48 hr due to their computing jobs being interrupted.

Conclusions
There are several approaches to improve understanding of the processes that underpin the rich spectrum of cloud patterns over the tropical ocean.Many attempts rest on the construction of models that strive for maximum realism across the entire relevant scale range, from the synoptics to the large-eddy scales of turbulence.In this paper, we have presented Cloud Botany, an ensemble of large-eddy simulations on 150 km domains that instead represents the larger-scale environment in a highly idealized manner.We do this to elucidate the processes through which shallow convection can self-organize into mesoscale cloud patterns, and to study systematically how these processes vary as the larger-scale environment changes.
We design our idealized large-scale environment by fitting functional forms of the vertical structure of liquid-water potential temperature, total specific humidity and horizontal wind from reanalysis, and vertical velocity from observations.For most of these, reasonable fits can be attained with very simple approximations, allowing us to span the range of observed conditions by varying only six parameters: these span a parameter space that we explore by simulating (a) all possible combinations of high and low values in the parameters that are representative for observed variability over the boreal winter of 2020, and (b) sweeps of single parameters.
In the Cloud Botany simulations, 93 out of 103 runs support cumulus-topped boundary layers.Strikingly, all those that do also self-organize into mesoscale cloud patterns.We typically first observe small, randomly spaced cumuli, which quickly begin self-aggregating into mesoscale clusters.After a marked diurnal cycle, we often observe the onset of precipitation after around 24 hr of simulation; subsequent cloud pattern varieties are dominated by cold pools and layers of thin inversion cloud.We also observe ample variability in the self-organized cloud patterns when we vary the parameters controlling the large-scale environment, all of which are closely reminiscent of cloud patterns observed in nature.We take these results to be early indications that parameter ensembles will prove fruitful for understanding the processes that govern the variability of the mesoscale trades, under a range of larger-scale conditions.
We hope this makes Cloud Botany a valuable community resource for studies that simultaneously require the resolution of individual cloud structures, a mesoscale environment and variability over a range of conditions characteristic for the trades.It also serves as a point of departure for using parameter ensembles to study variability in convective clouds in other regions of the world, or in warmer climates.Another use of the ensemble is as a benchmark for convective parameterizations for GCMs, since the task of such parameterizations is to compute cloud properties from vertical profiles of the model's prognostic variables.Finally, we see Cloud Botany as sitting on the abstract side of a spectrum of modeling approaches, which include simulation setups under time-varying forcings derived from a numerical weather prediction model (Savazzi, Nuijens, de Rooy, & Siebesma, 2022), on  (Dauhut et al., 2022), Lagrangian LES (Narenpitak et al., 2021), mesoscale models with parameterized convection (Beucher et al., 2022) and regional and global models with partially resolved convection (Schulz & Stevens, 2023;Stevens et al., 2019).All these will be needed to fully elucidate the subtleties that govern the interactions between clouds, their environment and climate at the trade-wind mesoscales.

Appendix A
This appendix contains tables describing the Cloud Botany data set.Table A1

Figure 1 .
Figure1.Envelopes and mean of daily-averaged vertical velocity in the JOANNE data set(George et al., 2021).Daily averages are obtained from the six circles flown on a day by the HALO aircraft during the EUREC 4 A campaign.The data is separated into the 50% moistest (blue) and driest (red) flight days.Dashed lines indicate the profiles used in the simulation, constructed with Equation 4 and the parameters reported in Tables1 and 2.

Figure 3 .
Figure 3. Inter-quartile range (IQR, shading) and mean (gray line) of JOANNE-derived tendencies of (a) heating and (b) moistening, and idealized fit used to force Cloud Botany simulations (blue line).

Figure 4 .
Figure 4. Height-dependence of the nudging time scale τ in Equation 7.

Figure 5 .
Figure 5.Time series of simulation 1, the central point of the parameter hypercube.The snapshots show cloud albedo in white (as parameterized by Y. Zhang et al. (2005)) and rain water path in blue.The time series curves show the liquid water path (LWP), rain water path (RWP) and cloud fraction over time.The shaded background shows the diurnal cycle, the darker regions are night (18-06 hr in local time).The times of the snapshots are indicated by gray vertical lines.

Figure 6 .
Figure 6.Scatter plot of the cloud fraction and surface precipitation of the (numbered) ensemble members, averaged over the last 24 hr of the simulations.Ensemble members that did not reach 60 hr are excluded.

Figure 7 .
Figure 7. Variation of vertical profiles of cloud fraction with sweeps of the environmental control parameters (a) surface liquid water potential temperature, (b) nearsurface wind, (c) temperature lapse rate, (d) humidity height scale, and (e) the subsidence parameter (colors), averaged over the last 24 hr of the simulations.

Figure 8 .
Figure 8. Different types of cloud organization seen in the Cloud Botany ensemble.(a) Run 6, cold pools, (b) run 37, large cloud cluster topped by stratiform outflow, (c) run 45, multiple such clusters, on the edges of cold pools, (d) run 84, line of precipitation, (e) run 40, non-precipitating cumulus in bands, and (f) run 66, non-precipitating cumulus in aggregated in quasi-circular clusters.The wind is easterly, that is, from the right side of the image.

Figure 9 .
Figure 9. Rendered 3D view of (a) the central, large cloud structure in Figure 8b and of (b) the stripes in Figure 8b, above a reflecting plane representing the ocean.The rendered domain is 70 km × 70 km.The rendering shows an isosurface of q l = 2 ⋅ 10 −5 kg/kg.

Figure 10 .
Figure 10.A 9-day simulation of the central point of the parameter space (purple), compared to the 60 hr simulation which is part of the ensemble (blue).

Table 2
Properties of Environmental Control Parameters, Fit to the ERA5 Database, and Selected for Cloud Botany

Table 3 Computational
Resources Used for the Simulation of Ensemble Member 1, the Central Point in the Parameter Hypercube, on Supercomputer Fugaku (One 2.0 GHz 48-Core A64FX CPU Per Node), and on the Dutch National Supercomputer Snellius (Two 2.6 GHz 64-Core AMD Rome 7H12 CPUs Per Node) list the parameters of the ensemble members.Tables A2-A9 give details of the variables stored in the different data sets.

Table A3
Variables in the Profiles Data Set Containing Horizontally Averaged Profiles, Sampled Every 5 Min, Part 1 Note.Dimensions: (member, time, z).

Table A4
Variables in the Profiles Data Set Containing Horizontally Averaged Profiles, Sampled Every 5 Min, Part 2

Table A5
Variables in the Profiles Data Set Containing Horizontally Averaged Profiles, Sampled Every 5 Min, Part 3

Table A6
Variables in the 2D Data Set, Containing Horizontal Fields Sampled Every 5 Min, 2D Note.Dimensions: (member, time, y, x).Variables in the 3D Data Set, the Full 3D Fields of the Model Sampled Every Hour Note.Dimensions: (member, time, z, y, x).

Table A8
Variables in the Cross_xy Data Set, Horizontal Cross-Sections of the Prognostic Variables Sampled Every 5 Min

Table A9
Variables in the Radiation Data Set, 2D Radiation Fluxes Sampled Every Hour