Ensembles of Global Climate Model Variants Designed for the Quantification and Constraint of Uncertainty in Aerosols and Their Radiative Forcing

Tropospheric aerosol radiative forcing has persisted for many years as one of the major causes of uncertainty in global climate model simulations. To sample the range of plausible aerosol and atmospheric states and perform robust statistical analyses of the radiative forcing, it is important to account for the combined effects of many sources of model uncertainty, which is rarely done due to the high computational cost. This paper describes the designs of two ensembles of the Met Office Hadley Centre Global Environment Model‐U.K. Chemistry and Aerosol global climate model and provides the first analyses of the uncertainties in aerosol radiative forcing and their causes. The first ensemble was designed to comprehensively sample uncertainty in the aerosol state, while the other samples additional uncertainties in the physical model related to clouds, humidity, and radiation, thereby allowing an analysis of uncertainty in the aerosol effective radiative forcing. Each ensemble consists of around 200 simulations of the preindustrial and present‐day atmospheres. The uncertainty in aerosol radiative forcing in our ensembles is comparable to the range of estimates from multimodel intercomparison projects. The mean aerosol effective radiative forcing is −1.45 W/m2 (credible interval of −2.07 to −0.81 W/m2), which encompasses but is more negative than the −1.17 W/m2 in the 2013 Atmospheric Chemistry and Climate Model Intercomparison Project and −0.90 W/m2 in the Intergovernmental Panel on Climate Change Fifth Assessment Report. The ensembles can be used to reduce aerosol radiative forcing uncertainty by challenging them with multiple measurements as well as to isolate potential causes of multimodel differences.

Our ability to robustly quantify the effects of aerosols on the climate ultimately rests on being able to reliably simulate aerosol properties and radiative effects on regional and global scales. However, despite major improvements in global observing systems and climate models representing aerosols more realistically, the estimated net aerosol RF over the industrial period ranges from near zero to around −2 W/m 2 . This uncertainty has persisted through all intergovernmental panel assessment reports since 1996 and significantly limits our ability to understand the causes of historical climate change and our confidence in climate change projections (Andreae et al., 2005;Collins et al., 2013;Schwartz, 2018;Seinfeld et al., 2016). In other words, while our knowledge about aerosols and the driving processes has improved, this has not yet been translated into more reliable climate model simulations.
Multimodel intercomparisons provide valuable information about the differences between models, often called the model diversity (Bellouin et al., 2016;Mann et al., 2014;Myhre et al., 2013;Randles et al., 2013;Shindell et al., 2013;Stier et al., 2013). The AeroCom intercomparison of aerosol microphysical properties (particle concentrations and size distributions) showed that global aerosol models can capture many of the important spatial and temporal variations in aerosols on a global scale . Nevertheless, the models have a very large spread (a factor 2 to 30, depending on region) in their simulations of aerosol microphysical properties. Although model intercomparisons provide valuable information about model diversity, it is difficult to attribute these differences to particular processes or model inputs when there are many structural differences among the models . Most importantly, the small ensemble size used in multimodel comparison studies inhibits drawing statistically robust conclusions (Carslaw et al., 2018).
A perturbed parameter ensemble (PPE) is a set of simulations created using a single model structure with perturbations made to uncertain physical parameters (Collins et al., 2011) or inputs to the model, like aerosol emissions . PPEs are widely used in many fields of science to understand and quantify model uncertainty, such as in environmental models (Salter & Williamson, 2016), hydrological models (Liu & Gupta, 2007), astrophysics (Rodrigues et al., 2017), and disease transmission (Andrianakis et al., 2017). PPEs have been used to develop an understanding of uncertainties in climate change projections (Collins et al., 2011;Fletcher et al., 2018;Murphy et al., 2004;Sexton et al., 2012) as well as to explore the causes of uncertainty in global aerosol concentrations (Lee et al., 2011(Lee et al., , 2012, aerosol RF Regayre et al., 2014Regayre et al., , 2015, precipitation (Qian et al., 2015), and radiative fluxes (Shiogama et al., 2012). PPEs can also be used to systematically identify model structural deficiencies (McNeall et al., 2016;Yokohata et al., 2010).
Causes of uncertainty in the aerosol component of a global model have previously been identified using ensembles of "offline" models (often called chemical transport models), where the couplings of aerosols to the radiation and clouds were not accounted for (Carslaw, Lee, Reddington, Mann, et al., 2013, Regayre et al., 2014. Here we go further by constructing PPEs in a global climate model in which aerosols are coupled to physical climate processes. This enables us to analyze the uncertainty in aerosol RF including atmospheric rapid adjustments (i.e., semidirect effect and cloud lifetime effect; Boucher et al., 2013). conditions so were therefore nudged toward analyzed meteorology. Both PPEs also included equivalent ensembles for preindustrial emissions so that the uncertainty in aerosol RF can be quantified. The first "aerosol" ensemble (hereafter referred to as the AER PPE) perturbs 26 parameters related to aerosol processes and emissions and was run in such a way that changes in aerosols did not affect the meteorology. The AER PPE therefore allows the aerosol RF to be diagnosed. The second "aerosol-atmosphere" ensemble (hereafter referred to as the AER-ATM PPE) was designed to quantify the relative importance of uncertain parameters in the aerosol model and the host physical atmosphere model. The AER-ATM PPE therefore allows the main causes of uncertainty in aerosol effective RF (ERF, Boucher et al., 2013) to be diagnosed.
Outputs from PPEs are subsequently used to create Gaussian process emulators (Lee et al., 2011;O'Hagan, 2006) to enable several million "model variants" to be generated across the multidimensional parameter spaces. This approach enables a sufficiently high sampling density to perform robust statistical analyses of model parametric uncertainty. In our analysis we represent aerosol forcing uncertainties using standard deviations or 90% credible intervals calculated using samples of one million model variants.
In this article, we describe the design of the two PPEs and examine the major differences between them by comparing the spatial patterns of uncertainty and by contrasting the sensitivities of global mean values to the parameter perturbations. In section 2 we summarize the base model setup, modifications we made to the model structure, the aerosol emissions used in all simulations, the parameters we perturbed, and the design of the PPEs. In section 3 we analyze the major differences between the PPEs in terms of aerosols and aerosol RF. In section 4 we evaluate the potential for these ensembles to inform multiple aspects of climate research, including identifying priorities for model development, model structural deficiencies, and regions where new, targeted observations will better constrain aerosol forcing uncertainty.

Base Model
This work uses global simulations of the HadGEM3 (Hewitt et al., 2011) with N96 horizontal resolution (1.25°in latitude and 1.875°in longitude) and 85 vertical levels up to 85 km above sea level. The model uses the fourth Global Atmosphere configuration (Walters et al., 2014) of the Met Office Unified Model. Sea surface temperatures and sea ice were prescribed using reanalyzed monthly varying fields based on the methodology of Reynolds et al. (2007) as used in phase 2 of the Atmospheric Model Intercomparison Project.
We used version 8.4 of the UKCA model, which is a whole atmosphere chemistry and aerosol model within the HadGEM3 host model (Abraham, 2014;O'Connor et al., 2014). Within the UKCA model the evolution of particle size distribution and size-resolved chemical composition of aerosols are calculated using the modal version of the GLObal Model of Aerosol Processes model (GLOMAP-mode; Mann et al., 2010). The GLOMAP-mode simulates new particle formation, gas-to-particle transfer, aerosol coagulation, cloud processing of aerosols, and both dry and wet deposition of gases and aerosols. In our model setup GLOMAP resolves five aerosol components (sulfate, organic carbon, black carbon, sea salt, and dust) in seven internally mixed modes with a lognormal number size distribution (soluble modes in nucleation, Aitken, accumulation, and coarse size ranges and insoluble modes in all but the nucleation size range).
Particles form through binary homogeneous nucleation (Vehkamäki et al., 2002) throughout the atmosphere and through organically mediated nucleation (Metzger et al., 2010), which occurs mainly in the planetary boundary layer close to biogenic volatile organic carbon compound emissions. Particles grow following microphysical processes such as condensation of gas species and coagulation between particles. They are moved from one mode to another when the mean modal size becomes larger than a prescribed threshold or when insoluble particles are aged to become soluble. Secondary organic carbon is produced from the first stage oxidation products of biogenic monoterpenes under the assumption of zero vapor pressure. Secondary organic material is then combined as one chemical tracer with other organic carbon after condensing to the particle phase.
Hygroscopic growth of soluble particles occurs according to the local relative humidity following Zadanovskii (1948) and Stokes and Robinson (1966) using water activity coefficient data from Jacobson et al. (1996). Activation of aerosols to cloud droplets is calculated applying κ-Köhler theory (Petters & Kreidenweis, 2007) with composition-dependent hygroscopicity factors (κ; kappa) and an updraft velocity probability density function (PDF) representing subgrid variability of vertical velocities (West et al., 2014).
Aerosol optical properties (extinction, absorption, and asymmetry parameter) for each of the GLOMAP internally mixedsize modes are calculated using the RADAER module (Bellouin et al., 2013), and aerosol radiative effects are calculated via the Met Office Unified Model radiative transfer module (Edwards & Slingo, 1996).
Note that to improve the fidelity of predicted aerosol RFs in these ensembles we evolved the GLOMAP codebase in UM-UKCA for a number of key improvements (the updated model GLOMAP v8.1). We document these improvements as part of this article, and the changes described are added as a progression from the v8.0 codebase, which enabled GLOMAP to be applied across both tropospheric and stratospheric conditions (Dhomse et al., 2014).
Tropospheric ozone, HO x , NO x , and VOC (volatile organic compound) chemistry are calculated using the mechanism described by O'Connor et al. (2014), which includes reactions of odd oxygen (O x ), nitrogen oxides (NO y ), hydrogen radicals (HO x = OH + HO 2 ) and CO, as well as methane and other short chain nonmethane VOCs. The scheme has been extended to include additional sulfur (Mann et al., 2010) and monoterpene (Spracklen et al., 2006) chemistry. The Fast-JX scheme is implemented to calculate photolysis rates based on simulated aerosols, liquid and ice cloud water content, and ozone. Fast-JX is a version of the Fast-J scheme (Wild et al., 2000) with an extended wavelength range suitable for stratospheric conditions (Abraham et al., 2012;Neu et al., 2007;Telford et al., 2013). Dry and wet deposition of gas-phase species is described in O'Connor et al. (2014). The UKCA chemistry module is currently one-way coupled to the aerosol module, which means that prognostic aerosols are calculated within the atmospheric chemistry scheme but these prognostic aerosols do not affect atmospheric chemistry (through either atmospheric oxidants, photolysis, or heterogeneous reactions) during a model time step.

Aerosol Sources
Emission fluxes of sea salt and dust are calculated within the model as a function of surface wind speed and the emissions of other primary particles and aerosol precursor gases are prescribed using emission inventories. Emissions of dimethyl sulfide (DMS), a precursor of sulfate aerosol, are calculated as a function of surface wind speeds following Nightingale et al. (2000), based on prescribed surface seawater DMS concentrations given by Kettle and Andreae (2000). The emission inventories used for aerosols and precursor gases in recent decades (1978, 1998, and 2008) and for the preindustrial atmosphere are shown in Tables S1 and S2 in the supporting information.
Sulfur dioxide (SO 2 ) emissions from anthropogenic (fossil fuel [FF] and biofuel) and volcanic sources, and carbonaceous (i.e., organic carbon and black carbon) emissions from combustion (biomass burning [BB]) sources are prescribed distinctly for each period (Tables S1 and S2). However, in AER-ATM PPE the surface emission of anthropogenic sulfur dioxide was erroneously duplicated and given as high-level emissions representing the industrial and half of energy sector emissions. As a result, global, annual mean high-level sulfur dioxide emission is 7.67e-13 kg·m −2 ·s −1 , which is smaller by a factor of 3 than it should be (2.31e-12 kg·m −2 ·s −1 ) in this PPE. This caused differences in global budgets of sulfate aerosol between two PPEs (Tables S7 and S8) and led to weaker forcing over industrial regions in AER-ATM PPE (section 3). Biogenic VOC emissions of isoprene and monoterpenes are the same for both periods following Guenther et al. (1995). As stated in section 2.4, the meteorology is identical in all AER simulations. Therefore, the emissions of aerosols and precursors (sea salt, dust, and DMS flux) are simulated identically before perturbation. In the AER-ATM simulations, however, rapid adjustments in the boundary layer as well as perturbations to host model physical parameters cause the calculated emission fluxes of sea salt, dust, and DMS to differ between simulations.

Structural Modifications to the Model
The baseline GLOMAP-mode aerosol model in UKCA is essentially as described in Mann et al. (2010) with small adjustments to the widths of the particle modes . In this section we describe the improvements made to the release version of the model and how they benefit the model setup used to create the ensembles. Structural improvements in model release version 8.4 were identified during the design phase of these ensembles. The main changes to the model for this study were the following.

New Particle Formation
The release version of the model has multiple options for new particle formation (particle nucleation) in the boundary layer. Here we use an organically mediated parametrization (Metzger et al., 2010). In the original formulation it was assumed that 13% of α-pinene first-stage oxidation products produce secondary organic aerosol (SOA) and also enter into the nucleation rate expression, which gave realistic results (Metzger et al., 2010). In the PPEs we now allow this 13% SOA yield to vary over a wide range to represent a realistic uncertainty in total SOA production. To span realistic ranges of observed particle concentrations we therefore found it necessary to allow only one tenth of this SOA to participate in nucleation (see section 2.5.2).

Organic Aerosol Water Uptake
We previously assumed that water uptake per mole of the organic component of the particles was 65% of that of sulfate (Mann et al., 2010). In this study water uptake is calculated using the κ-Köhler approach (Petters & Kreidenweis, 2007) to be consistent with the use of κ in cloud droplet activation (West et al., 2014).

Cloud Droplet Activation
Cloud droplet activation is calculated using the parameterization of Abdul-Razzak and Ghan (2000;West et al., 2014), with a varying updraft velocity, including Gaussian PDF for subgrid variability whose width is linked to the turbulent kinetic energy in the model. In the PPEs we made this width independent of turbulent kinetic energy so that its value could be defined more precisely. Above the boundary layer a fixed value of 0.1 is assumed as in the base model.

In-Cloud Aerosol Scavenging
In our previous studies we perturbed the size of aerosol particles at which scavenging occurs Carslaw, Lee, Reddington, Mann, et al., 2013) and in later studies the drizzle rate (Regayre et al., 2014(Regayre et al., , 2015. Within UM-UKCA, the host general circulation model has large-scale precipitation and cloud schemes, so we elected to perturb the parameter controlling the raining fraction of the grid box (Rain_Frac). This is applied to large-scale precipitating cloud but not to convective precipitation. We did not perturb the autoconversion rate which is a potentially important parameter (see section 4).

Aerosol Scavenging in Mixed-Phase Clouds
The scavenging of aerosol material in dynamic rain systems is controlled partly by the rain formation process. Our previous work has shown that if aerosol scavenging scales with total water removal (as was originally the case in the base HadGEM3-UKCA model), then too little aerosol is transported into the Arctic (Browse et al., 2012). In our previous studies in the GLOMAP model without any cloud microphysics we therefore introduced an adjustable in-cloud temperature threshold below which scavenging was switched off (to account for less efficient scavenging in mixed-phase clouds). To exploit the HadGEM3-UKCA cloud microphysics we instead define an ice mass fraction threshold above which no nucleation scavenging occurs. We assume that the wet scavenging of all aerosol particles (soluble and insoluble) is set to 0 in large-scale raining clouds if the simulated ice to total water mass fraction is higher than a fixed value (determined by the new parameter Cloud_Ice_Thresh).
In our preliminary experiment, the newly implemented Rain_Frac and Cloud_Ice_Thresh parameters were found to improve the agreements of simulated aerosol concentrations with station observations in the Arctic (see Figure S1).

Dust in the Modal Aerosol Scheme
In the release version of HadGEM3 dust is simulated in the CLASSIC bin scheme (Bellouin et al., 2007;Woodward, 2001) separately from all other aerosols (which are treated in GLOMAP) so that interactions between dust and other aerosol species are not simulated. However, recent research shows that dust emissions can have important regional influences on cloud drop concentrations (Karydis et al., 2017). Therefore, in our ensembles we emit dust into the GLOMAP modal scheme where the particles can be aged through condensation of soluble materials (such as sulfate and organic carbon). Dust particles are transferred into soluble modes when they acquire sufficient soluble materials to be considered hygroscopic (as defined by the "Aging" parameter; sections 2.4 and 2.5). Emissions in bins 1 (0.0316-0.1 μm in radius) and 2 (0.1-0.316 μm) and half of the emission in bin 3 (0.316-1 μm) are transferred into accumulation insoluble mode. The remaining emissions in bin 3 and emissions in bins 4 (1-3.16 μm) and 5 (3.16-10 μm) are transferred into the coarse insoluble mode. Dust in bin 6 (10-31.6 μm) is not currently transferred to the GLOMAP modal scheme because this size of particles is not properly represented the GLOMAP modal scheme. These particles have very short atmospheric life time and hence are not affected by microphysical processes.

Aerosol Optical Properties
We replaced the lookup tables of aerosol optical properties (Bellouin et al., 2016) that include optical properties for mineral dust (Balkanski et al., 2007) with a higher resolution in the increments of the imaginary part of the refractive index to better resolve the specific absorption coefficient of aerosols, especially at the lowabsorption end. This development was necessary to obtain aerosol optical depth (AOD) in accumulation and coarse insoluble modes where dust is the only aerosol component. However, the optical properties for mineral dust used in AER-ATM are based on Woodward (2001), leading to increased absorption at visible wavelengths. This results in higher-absorption AOD shown in Table S10. Since the absorption by dust applies to both present-day and preindustrial simulations, this has a minimal effect on aerosol forcing.

Model Setup
As described in section 1 we have created two complementary PPEs designed to tackle multiple aspects of aerosol forcing uncertainty quantification and reduction. The HadGEM-UKCA model was set up differently for each ensemble, as summarized in Table 1.

The AER Aerosol PPE
The AER PPE was designed to understand and constrain the uncertainty in model output resulting from perturbations to uncertain aerosol parameters using high spatial and temporal resolution observations (Reddington et al., 2017). For this purpose, we wanted to have identical meteorology for all ensemble members to remove the noise caused by differences in dynamics and cloud properties (internal model variability).
In AER the double call configuration (Bellouin et al., 2013) is used to calculate radiative fluxes. In this configuration the radiation scheme is called twice to calculate radiative fluxes with and without simulated aerosols. The model dynamics and physics in the following time step is calculated using the "prognostic call" with no radiative perturbation by aerosols. Therefore, the changes in aerosol concentrations and radiative properties occurs only in the "diagnostic call" and do not affect temperature, clouds, humidity, or precipitation during the simulations. Although atmospheric oxidants are calculated interactively within UKCA, the radiative effects of these oxidants in the diagnostic call do not affect the simulation in the next time step. Thereby, meteorological fields are completely decoupled from aerosols and chemistry and are identical across all simulations in AER. In addition, we use the same trace gas concentrations such as CO 2 , CH 4 , N 2 O, and chlorofluorocarbons and other boundary conditions for the preindustrial (1850) and present-day (2008) simulations. Since these pairs of simulations also share all parameter values, the sole difference between each pair of simulations is anthropogenic and volcanic aerosol emissions (section 2.2). Radiative forcing is calculated as the difference in shortwave and longwave radiation at the top of the atmosphere in the diagnostic call for pairs of simulations with 2008 and 1850 anthropogenic emissions. Clear-sky RF has been diagnosed in the same way using clear-sky radiative fluxes that have been calculated within the model ignoring radiative effects of clouds.
Horizontal wind and temperature fields are relaxed (Telford et al., 2008) toward European Centre for Medium-Range Weather Forecasts Re-Analysis (ERA)-Interim values between model levels 12 (around 1 km) and 80 (around 60 km). This makes the model behave like GLOMAP-mode coupled to a global chemistry-transport model driven by meteorological fields from reanalyses. The benefit of coupling GLOMAP to HadGEM-UKCA is that, among others such as better convective transport, the atmospheric model simulates cloud fields explicitly rather than being prescribed according to observed climatologies.

The AER-ATM Aerosol-Atmospheric PPE
The AER-ATM PPE has two main purposes. First, it is designed to allow the investigation of uncertainty in aerosol ERF. Therefore, rapid atmospheric adjustments to clouds, water vapor, and temperature resulting from the radiative effects of aerosols (e.g., the cloud lifetime effect and the semidirect effect) are active in these simulations. Second, it is designed to quantify the relative importance of aerosol and physical 10.1029/2019MS001628

Journal of Advances in Modeling Earth Systems
atmosphere model parameters as causes of aerosol ERF uncertainty. Therefore, the nudging is applied in a manner that retains the effects of physical atmosphere parameters within the boundary layer, while ensuring all ensemble members have identical synoptic-scale features. Specifically, atmospheric temperatures evolve freely in these simulations and horizontal wind fields are relaxed toward ERA-Interim values at and above model vertical level 17 (around 2 km).
In AER-ATM the radiative flux double-call method is configured so that changes in aerosols interact with meteorology and the above-cloud aerosol radiative effects can be accounted for (Ghan, 2013). Aerosol ERF is calculated as the difference in outgoing shortwave and longwave radiation at the top of the atmosphere between pairs of simulations with distinct anthropogenic emissions (1850, 1978, 1998, and 2008). Similarly, the components of aerosol ERF (the aerosol-cloud interaction ERF ACI and aerosol-radiation interaction ERF ARI ) were calculated as the difference between clear-sky, clean-sky, and clear-clean-sky radiative fluxes (following Ghan, 2013) from simulations with different anthropogenic emissions.

Parameter Uncertainty 2.5.1. Expert Elicitation of Uncertain Parameters
We followed the recommendations for expert elicitation of uncertainty outlined in the Sheffield Elicitation Framework (Oakley & O'Hagan, 2010) to determine the choice of uncertain parameters to be included in the ensembles and the probability distributions of their values. Multiple expert elicitation meetings were conducted over the course of several months. During these meetings statisticians facilitated discussions between small panels of experts to elicit their understanding of the uncertainties in the model parameters as evidenced by the current state of research in the relevant field. Probability distributions based on the discussions were created as the discussion progressed and were used to elicit further information from the experts. We predominantly use trapezoidal probability density functions in "R" (Hetzel, 2012) to represent parameter uncertainty, in a similar way to Sexton et al. (2018). Using univariate trapezoidal distributions avoids oversampling the central region of multidimensional parameter space. Furthermore, experts more readily converged on agreement about important aspects of parameter uncertainty when using trapezoidal distributions (see Figure S2) than they did with other types of distribution.
Probability distributions of aerosol parameter values perturbed in AER are shown in Figure S3. Probability distributions for parameters included in previous ensembles (e.g., Lee et al., 2013;Regayre et al., 2014) were reelicited for this study to account for improved process understanding from recent research activity.

The AER Parameters
The 26 aerosol parameters and their perturbation ranges in AER are summarized in Table S3. Thirteen aerosol parameters are related to microphysical aerosol processing, and another 13 are related to aerosol emission and deposition fluxes. Twenty-one parameters have previously been identified as important causes of model uncertainty in our earlier studies (Carslaw, Lee, Reddington, Mann, et al., 2013;Lee et al., 2013;Regayre et al., 2014;Regayre et al., 2015). These parameters are described briefly here and more fully in Lee et al. (2013).

Journal of Advances in Modeling Earth Systems
1. BL_Nuc: The boundary layer nucleation rate. We use the parameterization of Metzger et al. (2010) in which the nucleation rate depends on concentrations of gaseous sulfuric acid and condensable organic molecules. We perturb the nucleation rate coefficient. 2. Particle aging: The aging rate. Black carbon, organic carbon, and dust are nonhygroscopic when freshly emitted but assumed to become soluble following condensation of sulfuric acid and condensable organic matter. In UKCA particles in the insoluble modes are transferred to soluble modes after acquiring soluble materials. This parameter defines the amount of soluble material required for this transfer to occur as the number of monolayers. 3. Acc_Width, Ait_Width: Width of accumulation and Aitken modes. GLOMAP uses fixed geometric widths of the lognormal size distribution modes defined by the standard deviation of the size distribution. The uncertainty range was chosen based mainly on Heintzenberg et al. (2004) and Birmili et al. (2001). Changing the mode width modifies the size distribution for particles in that mode, which in turn affects aerosol properties and processes such as dry and wet deposition rates, cloud drop activation, and aerosol optical properties as well as aerosol diagnostics such as aerosol number concentrations in given size ranges. 4. Cloud_pH: The acidity of cloud droplets. The rate of in-cloud sulfate formation through oxidation of SO 2 by ozone is highly sensitive to the pH of cloud droplets, thereby affecting the sulfate aerosol mass and aerosol radiative effects (Turnock, 2016). In contrast to the previous studies where cloud pH parameters were set separately for clean and polluted environments, here a single parameter is specified globally since we perturb cloud pH in different periods. 5. Carb_FF_Ems, Carb_BB_Ems, Carb_Res_Ems: The carbonaceous particle emission fluxes from FF, BB, and residential fuel (Res) consumption. The primary emission fluxes of black carbon and organic carbon are scaled by the same amounts. 6. Carb_FF_Diam, Carb_BB_Diam, Carb_Res_Diam: The carbonaceous particle emission diameters from FF, BB, and Res consumption. These parameters directly control the number of emitted particles for a given mass flux and therefore affect various aerosol microphysical processes. Although recent measurements provide some information about emitted particle number concentrations (Janhäll et al., 2010), the particle size remains highly uncertain. The size of FF combustion particles depends on the source. BB particle size depends on burning efficiency (Janhäll et al., 2010) among other factors that are encompassed within our ensemble range. 7. Prim_SO4_Frac, Prim_SO4_Diam: Fraction and diameter of subgrid-scale formation of sulfate particles. These parameters describe the formation of particles in subgrid-scale plumes, such as power plants (Luo & Yu, 2011;Mather et al., 2003;Stevens et al., 2012). Prim_SO4_Frac defines the fraction of the SO 2 mass that enters the model grid square as new sulfate particles, and Prim_SO4_Diam defines the size of these particles. Previous studies have shown this to be an important source of global cloud condensation nuclei (CCN; Spracklen et al., 2005, Pierce & Adams, 2006, Luo & Yu, 2011, but other studies suggest a more limited effect (Stier et al., 2006), likely because of very different assumptions about the emission size distributions used. We base our ranges on the plume-scale study of Stevens et al. (2012). 8. Sea_Spray: Sea spray emission flux. Uncertainties in the wind-driven mass flux of sea spray particles are accounted for by scaling the interactively computed flux within the model with this parameter. This parameter conflates multiple sources of uncertainty: the function describing the wind-speed dependence of the flux, processes that are unaccounted for in the existing parameterizations (such as fetch and sea state), the wind speed itself, and the effect of spatial resolution of the wind fields used by the model. 9. Anth_SO2: Anthropogenic SO 2 emission flux. Uncertainty in anthropogenic emission mass flux of SO 2 is accounted for by scaling the baseline emission from Granier et al. (2011) by this parameter. 10. Volc_SO2: Volcanic SO 2 emissions. Volcanic SO 2 emissions from continuously degassing volcanoes and time-averaged sporadic eruptions are scaled by this parameter. The baseline emissions are based on Andres and Kasgnoc (1998) and Halmer et al. (2002) data sets. 11. BVOC_SOA: Biogenic emission for SOA production. In our model, SOA material is produced through multistep oxidation reactions of the precursor gases (i.e., biogenic monoterpenes) by atmospheric oxidants. The elicited quantity was the global annual SOA production (teragrams per year) and in the model we scale the percentage yield, which therefore conflates the uncertainties in the precursor gas emissions and in the yield of the SOA itself. The baseline emission is 47.5 Tg per year.
12. DMS: Dimethyl sulfide emission flux. DMS emissions are calculated interactively based on the seawater concentration of DMS from Kettle and Andreae (2000) and the wind-driven sea-air exchange parameterization (Nightingale et al., 2000). We conflate uncertainties in these two factors by varying the calculated sea-air transfer flux by a given factor. 13. Dry_Dep_Ait, Dry_Dep_Acc: Dry deposition of Aitken (Ait) and accumulation (acc) mode particles.
GLOMAP calculates the dry deposition velocities due to Brownian diffusion, impaction, and interception depending on wind speeds, particle sizes, and surface types following Slinn (1982). In the perturbed runs, the calculated dry deposition velocities in a time step over different surface type are scaled by parameters for particle size ranges. 14. Dry_Dep_SO2: The dry deposition velocity of gaseous SO 2 . SO 2 is a precursor for aerosol sulfate (SO 4 ) and new particle formation. Therefore, the removal of atmospheric SO 2 can limit SO 4 formation and the growth of existing aerosols. 15. Five parameters were either newly introduced because they were shown to be important in recent studies or have been included as new parameters corresponding to the structural advances in the modeling framework. These five parameters are described below. The probability distributions of parameter values are presented in Figure S3. 16. Kappa_OC: Hygroscopicity parameter of organic carbon. In the model aerosol water uptake efficiency is determined by κ-Köhler theory using size-and composition-dependent hygroscopicity factors (κ; Petters & Kreidenweis, 2007). Perturbations to this parameter will change CCN concentration and cloud drop activation and hence affect the indirect RF (Liu & Wang, 2010;Rastak et al., 2017). 17. Sig_W: Updraft vertical velocity standard deviation. This parameter controls the width of the probability distribution of subgrid vertical velocities used to calculate the activation of aerosols into cloud drops (West et al., 2014). A higher value of Sig_W enables higher maximum subgrid updraft velocity and hence higher maximum supersaturation, which allows smaller particles to activate. This in turn leads to changes in cloud properties and precipitation such as higher cloud droplet concentration, lower precipitation efficiency through lower autoconversion rates, higher liquid water path, and higher cloud albedo. Sig_W has greater impacts on cloud properties in regions of relatively high aerosol concentrations where droplet activation is updraft limited rather than aerosol limited. 18. Dust: Dust emission flux scale factor. The emission flux of mineral dust is calculated interactively within the model using the CLASSIC bin scheme then transferred into the UKCA modal scheme (as described in section 2.3). The dust emission scheme relies on a tuning factor to account for emission dependence on model resolution and the representation of microenvironment (Woodward, 2001). 19. Rain_Frac: The fraction of the cloud covered area where rain forms. As described in section 2.3 we introduced this parameter as the fraction of the cloudy part of the grid box where rain sees the aerosol and thereby aerosol scavenging is allowed to take place. In the default version of the model the raining fraction of the grid box is assumed to be the same as the cloud fraction, and therefore, it corresponds to the case with Rain_Frac = 1. An overestimation of aerosol scavenging in the base model leading to insufficient aerosol loadings over Arctic regions (e.g., Figure S1) stimulated the inclusion of this parameter in our ensembles. 20. Cloud_Ice_Thresh: Threshold of cloud ice-phase fraction for scavenging. We introduced this parameter in conjunction with the Rain_Frac parameter. It is designed to suppress aerosol scavenging by both cloud liquid and ice water in dynamic clouds when the cloud ice water fraction is higher than the threshold defined by the parameter value.

The AER-ATM Parameters
In AER-ATM, 27 uncertain parameters are perturbed. Sixteen of the 18 aerosol parameters included here are also included in AER (as indicated in Table S3). Two additional aerosol parameters and nine physical atmosphere parameters were also included in AER-ATM. The effect of these parameter perturbations would be suppressed by the prescribed meteorological fields in AER. We did not include a comprehensive set of physical atmosphere parameters (e.g., Sexton et al., 2018) in AER-ATM because the prescribed meteorological fields force synoptic-scale features to be very similar across the ensemble. Additional parameters included in AER-ATM are summarized in Table S4, and the corresponding probability distributions are presented in Figure S4. The climatic effects of AER-ATM parameters are described in Regayre et al. (2018), and further detail on the elicitation of physical atmosphere parameters can be found in Sexton et al. (2018). We briefly describe the additional aerosol and physical atmosphere parameters below.

10.1029/2019MS001628
Journal of Advances in Modeling Earth Systems 1. Rad_Mcica_Sigma: The fractional standard deviation of the subgrid cloud condensate as seen by radiation. This parameter controls the inhomogeneity of cloud condensate within vertically overlapping subgrid clouds, which is used to calculate cloud radiative fluxes. This parameter affects the cloud albedo calculated in the radiation module and thus alters atmospheric temperatures in AER-ATM, which can induce changes in precipitation rates and cloud amount. 2. C_R_Correl: Cloud-rain subgrid variability correlation coefficient. Different PDFs are used to represent the subgrid variability of cloud and rain, as detailed in Boutle et al. (2014). This parameter determines the degree of correlation between those PDFs, which is typically less than 1 due to differences in how turbulence evolves the PDFs, and the fact that rain can be falling through cloud from which it did not originate. The correlation determines the accretion rate of cloud droplets by rain droplets, which in turn affects the scavenging of aerosol, in-cloud aerosol concentrations, and cloud amount. 3. Niter_Bs: The number of cloud microphysics substeps. This discrete parameter controls the number of substeps over which the cloud microphysics is called within each model time step. Increasing the frequency of calls allows more accurate treatment of any nonlinear interactions. An example of this is the interaction of drizzle drops with the environmental air; with more frequent calculations there is more opportunity for evaporation of these drops. Although this is a discrete model parameter it is treated as continuous in the emulation process. 4. Ent_Fac_Dp: The entrainment amplitude scale factor. This convection scheme parameter controls the shape of the convective mass flux and the sensitivity of convection to relative humidity. This parameter affects convective depth, precipitation, cloud amount, and hence the spatial distribution of aerosols and clouds. Increased values lead to the reduction of convection depth and to some extent suppression of active precipitating convection. 5. Amdet_Fac: The mixing detrainment rate scale factor. This parameter controls the rate of humidification of the atmosphere and the shape of the convective heating profile. Increasing this parameter increases large-scale atmospheric humidity and affects temperatures and thus convective processes. 6. Dbsdtbs_Turb_0: The cloud erosion rate. This parameter defines the rate at which unresolved subgrid motions mix clear and cloudy air. Therefore, this parameter affects cloud liquid water content, the autoconversion of cloud droplets to rain drops and cloud amount. Increasing this parameter value tends to reduce the atmospheric lifetimes of aerosols and precursor gases. 7. Mparwtr: The maximum value of the function controlling convective parcel maximum condensate. In the model precipitation is triggered in convective parcels near the Earth's surface when the amount of moisture reaches a threshold set by this parameter. This parameter affects cloud amount and lifetime by altering convective precipitation. 8. Dec_Thres_Cld: The threshold for cloudy boundary layer decoupling. This parameter defines a threshold of buoyancy flux at which the boundary layer decouples from the rest of the atmosphere. It therefore affects the extent of boundary layer mixing, cloudiness, and ultimately the time available for in-cloud aerosol processing. 9. Fac_Qsat: The rate of change of convective parcel maximum condensate with altitude. The maximum amount of moisture a convective parcel can hold transitions from the threshold set by the parameter Mparwtr at the surface to a much smaller threshold at high altitudes. This parameter affects precipitation efficiency and cloud amount by altering the rate at which the moisture threshold decreases with altitude. 10. BC_RI: The imaginary part of the black carbon refractive index. This additional aerosol parameter controls the absorption of radiation as it passes through black carbon-containing aerosol particles. The real part of the refractive index was adjusted according to the imaginary part assuming a linear relationship between real and imaginary parts (Bond & Bergstrom, 2006). Therefore, this parameter affects the absorption, reemission, refraction, and reflection of incoming radiation by black carbon aerosols. 11. OC_RI: The imaginary part of the organic carbon refractive index. This parameter controls the absorption of radiation by organic carbon. The real part of the organic carbon refractive index is not perturbed. 12. Some of the AER parameters (BL_Nuc, Acc_Width, Ait_Width, Carb_FF_Ems, Carb_Res_Ems, Carb_FF_Diam, Carb_Res_Diam, Prim_SO 4 _Frac, Prim_SO 4 _Diam, and Dry_Dep_Ait) were omitted from the AER-ATM design during the expert elicitation process. Uncertainty in these parameters was neglected (by setting them to their median parameter values in Table S3) based on evidence indicating they are only important for aerosols and/or aerosol forcing in a small number of regions and/or seasons (e.g., Lee et al., 2013;Regayre et al., 2014, and one-at-a-time screening experiments using the current model setup). The subset of AER aerosol parameters included in AER-ATM were identified by experts as being the most important for achieving the research goals of AER-ATM (section 2.4.2).
Reducing the dimensionality of the ensemble means that the remaining parameter combinations more densely fill the multidimensional parameter space (with the same computational cost) and potentially provide better quality information for statistical model emulator construction (as described in sections 2.6 and 2.10). However, the effect of removing the parameters from the design cannot be known a priori. It is possible that the robustness of the statistical model emulation approach is sufficient to accommodate more parameters and a less dense coverage of the parameter space than we chose to implement here.

PPE Design
For each ensemble, Maximin Latin Hypercube sampling McKay et al., 1979) was used to create a parameter combination design, of k * N points, that spans the N-dimensional uncertain parameter space (where N = 26, k = 7 for AER and N = 27, k = 6 for AER-ATM). Experience with the GLOMAP model suggests that with six ensemble members per uncertain parameter (k) a design that sufficiently spans the Ndimensional parameter space (for emulation and sensitivity analysis purposes; section 2.10) can be constructed. Twenty-six members in AER-ATM ensemble failed to complete the 1-year simulation in at least one of three periods 1850, 1978, and 2008 because parameter combinations produced values that exceeded model performance thresholds. Therefore, k was increased to 7 for AER. A simulation with all parameters set to their median values (Tables S3 and S4) was additionally included in each ensemble to ensure coverage of an important region of parameter space. A further set of 2 * N simulations with parameter combinations that augment the original design were created to validate the emulators (Bastos & O'Hagan, 2009). In total (k + 2) * N + 1 full-year simulations (ensemble members) were created for each anthropogenic emission period. This equates to 235 simulations for AER and 217 for AER-ATM. Preliminary parameter combination screening tests revealed that combined high values of two physical atmosphere parameters (Ent_Fac_Dp > 1.8 and Amdet_Fac > 8.0) cause AER-ATM simulations to fail. This part of the 27dimensional parameter space (a corner of a 2-D plane) was removed from the AER-ATM design. Some AER-ATM ensemble members (usually with parameter values close to the omitted part of parameter space) also failed to complete a full simulation year for all of the 1850, 1978, and 2008 anthropogenic emission periods. Therefore, the AER-ATM ensemble is made up of the remaining 191 simulations. Additional simulations failed in the AER-ATM for 1998, which were performed after simulations for 1850, 1978, and 2008 periods had been completed and some analyses had been made for them. As a result, there are only 174 simulations for the 1998 period.

Simulations
For each ensemble, the effect of parameter perturbations was allowed to spin up before output was collected.
In AER, all simulations for 2008 and 1850 were started from a spun-up base job previously run for multiple years and branched at December of the previous year. Parameter perturbations were applied to all AER ensemble members including the median simulations and then these were run for a further 13 months. In AER-ATM the median simulation was run for 4 months before parameter perturbations were applied and then run for a further 15 months. The final 12 months of outputs for each emission period were used in each PPE to create the PPE data and analyzed in section 3.

Model Output Variables
Monthly, daily, and 3-hourly mean outputs were produced for AER and vertical profiles were produced using hourly means. For AER-ATM only monthly mean values were output (see Table S5). The data storage saving allowed the AER-ATM ensemble to be run for four anthropogenic emission years (1850, 1978, 1998, and 2008), whereas output from AER is available for only two anthropogenic emission years (1850 and 2008). The additional temporal resolution in AER is crucial for comparison to a large suite of observations.
Monthly mean outputs include around 100 global two-dimensional fields (e.g., AOD, top-of-atmosphere radiative fluxes, aerosol surface emission, and deposition fluxes) and around 350 global three-dimensional fields (e.g., aerosol number and mass mixing ratios by mode and component, aerosol optical properties,

Journal of Advances in Modeling Earth Systems
and aerosol microphysical processes such as nucleation and condensation fluxes). Daily mean outputs include around 10 two-dimensional and around 60 three-dimensional fields with global coverage (including AODs and mixing ratios). The same two-dimensional fields and surface values of three-dimensional fields were also output at the 3-hourly temporal resolution. Hourly vertical profiles of around 40 fields (such as aerosol mixing ratios and optical properties) were output at 24 locations coinciding with long-term aerosol observation sites. Daily, 3-hourly and hourly outputs are intended to be used to diagnose aerosol indirect effects that can readily be compared to observations. Because all AER simulations share the same meteorology, meteorological fields were only output in the median simulation. However, in AER-ATM temperatures evolve freely throughout the atmosphere, as do horizontal winds in the boundary layer. Therefore, multiple meteorological fields of interest were additionally output for all AER-ATM ensemble members.

Derived Fields
Some variables of scientific interest (such as aerosol mass concentrations and aerosol column loadings) need to be derived from model output. Major derived fields with their time resolutions and spatial domains are listed in Table S6. Model outputs such as aerosol number mixing ratio (ratio of particle number to number of air molecules) for all seven aerosol modes as well as mass mixing ratios of five aerosol components in each mode are used in the derivation of other important fields.
N3 and N50 are number concentrations of particles whose dry diameters are larger than 3 and 50 nm, respectively. These values were derived using the number mixing ratios within each lognormally distributed aerosol mode. Since the widths (geometric standard deviations) of the Aitken and accumulation modes are treated as uncertain parameters in AER (as AIT_WIDTH and ACC_WIDTH), mode widths used to derive N3 and N50 from each ensemble member needed to be varied accordingly.
Similarly, PM2.5 has been derived by calculating the mass fractions of dry particle size larger than 2.5 μm of all aerosol modes. Different widths of Aitken and accumulation modes were used to match the values used in the simulations.
CCN is number concentration of CCN at given altitude and supersaturation. In this study we use the cloud base height (assumed to be around 650-m elevation) and 0.2% supersaturation. We derive CCN using the concentration of soluble particles and mass mixing ratios of internally mixed aerosol components. We used the Köhler theory with hygroscopicity factors (κ) specific to each aerosol component. Since the hygroscopicity factor for OC (KAPPA_OC) was treated as an uncertain parameter in both ensembles, member-specific values of this parameter were used in the CCN derivation. The perturbed values of AIT_WIDTH and ACC_WIDTH were also used in the CCN derivation for AER.
The derivation of aerosol ERF and its components as well as the equivalent values without rapid atmospheric adjustments are described in sections 2.4.1 and 2.4.2.
The raw simulation outputs and derived fields from both ensembles comprise about 160 TB of data. See section 5 for details.

Emulation and Sensitivity Analysis
We use Gaussian process emulation (Lee et al., 2011;O'Hagan, 2006) to statistically represent model output across the N-dimensional parameter space bounded by the upper and lower limits of the uncertain model parameters. Here emulators were created for multiple fields in order to compare uncertainty in the two ensembles. Emulators were created for monthly and annual mean aerosol mass concentrations, AOD at 550 nm, CCN, as well as for 1850 to 2008 aerosol ERF and its components in AER-ATM and their equivalents without rapid atmospheric adjustments in AER. In each case the emulators were created using the k * N ensemble members then validated using the 2 * N additional model runs designed for this purpose (see section 2.6) ensuring that the emulator uncertainty around its mean was low compared with the parametric uncertainty (see Figure S5). All ensemble members were combined postvalidation to produce new emulators with better space-filling properties, under the assumption that emulators created using denser parameter spaces provide an equally good or better representation of the model output.
To compare the model sensitivities within these ensembles we use the Extended-FAST sampling method (Saltelli et al., 1999)

Journal of Advances in Modeling Earth Systems
decompose the variance into its components. We create emulators of every model output in individual model grid boxes. To make this approach tractable, the horizontal resolution has been reduced to N48 (2.5°in latitudes and 3.75°in longitudes). The use of reduced spatial resolution will not seriously affect the results of analyses of global or regional mean RF. It may, however, lessen the quality of data comparisons and parameter constraints using point measurements. In this case it is an option to extract and use simulation data at given location from the original resolution data. Figure 1 and Table 2 summarize the magnitude and uncertainty in global, annual mean 1850-2008 aerosol RF (i.e., direct effect plus cloud albedo effect; see section 2.4.1) in the AER PPE and the aerosol ERF (RF plus rapid adjustment) in the AER-ATM compared to equivalent ranges in Forster et al. (2007), Shindell et al. (2013), Myhre et al. (2013), and Boucher et al. (2013). We also show ERF ARI and ERF ACI as well as clear-sky components of RF and ERF ARI . As summarized in  (Table S3) cause as much additional uncertainty as the physical atmosphere parameters included in AER-ATM (Table S4).  (Tables S7-S9), optical properties of carbonaceous aerosols (perturbed in AER-ATM only), and rapid atmospheric adjustments (in response to the radiative effects of these absorbing aerosols). The contribution of clear sky to the all-sky forcing is clear-sky forcing multiplied by clear-sky fraction (1-cloud fraction). Therefore, in both PPEs, the clear-sky forcing components make smaller contributions to net aerosol forcing (RF and ERF) than the cloudy-sky components. The negative clear-sky ERF ARI is nearly canceled by accounting for the positive above-cloud absorption (Ghan, 2013) to make global average net ERF ARI only −0.03 W/m 2 . The uncertainty in clear-sky forcing is also much larger in AER (90% credible range of 0.75 W/m 2 ) than the ERF ARI-Clr uncertainty in AER-ATM (credible range is 0.38 W/m 2 for clear sky only). However, the small magnitude and uncertainty in global, annual mean clear-sky forcing is the result of regions of positive and negative ERF ARI canceling each other . At the regional level ERF ARI is likely to be climatically important, so the causes of important regional ERF ARI uncertainties need to be examined and understood.

Magnitude and Uncertainty in Global, Annual Mean Aerosol Radiative Forcing
The global, annual mean 1850-2008 aerosol forcing is stronger (more negative) in our models than the mean forcing assessments from the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP; Shindell et al., 2013), the Aerosol Comparisons between Observations and Models phase II (AeroCom II; Myhre et al., 2013) and the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5; Boucher et al., 2013; see Figure 1 and  Myhre et al., 2013) was produced by the modeling group (CAM5.1) who calculated ERF in the same method as we did here (Ghan et al., 2012). Their all-sky RF is nearly 0, which is in agreement with our result. In this method absorption by above-cloud aerosols is included in ERF ARI . Table 2 shows that the aerosol forcing uncertainty in our samples is comparable to the uncertainty from model intercomparisons. These results suggest that our approach has sufficiently sampled the uncertainty in a single model to characterize the model's range of behaviors.

Spatial Distributions 3.2.1. Aerosol Spatial Distributions
Monthly means and 90% credible intervals of CCN number concentrations calculated using AER ensemble member output at locations coinciding with observation stations (Schmale et al., 2018) are compared to observations in Figure 2. Stations are located across Europe, Asia, and Alaska and represent coastal, rural, alpine, and urban environments. These results show that the ensemble output overlaps the observed values and ranges at most of the stations. However, the observed and ensemble mean values differ markedly in some locations. For example, from winter to spring the ensemble mean CCN is much smaller than the  Table 2 where applicable. (a) Net ERF, (b) ERF due to aerosol-radiation interaction and clear-sky component of it, (c) ERF due to aerosol-cloud interaction, (d) net aerosol RF, and (e) aerosol direct radiative forcing and clearsky RF.

Journal of Advances in Modeling Earth Systems
observed values at many sites (e.g., Cabauw, Mace Head, and Melpitz). However, in Finokalia the ensemble mean CCN is much larger than the observed value especially in early summer. The lack of modelobservation agreement in these locations can be used to identify model structural deficiencies. For example, it may provide strong evidence that our factor 2 uncertainty in primary aerosol from FF sources is conservative in parts of Europe, although there are likely to be other compensating factors.  (Shindell et al., 2013) and satellite-assimilated simulations  especially when we account for the emulator standard deviation, which is about 0.1-0.15 over oceans and as high as 0.5 over China. A comparison with observations will enable implausible parts of the sampled parameter space to be identified, thereby helping to reduce the standard deviation of our AOD estimates and in forcing estimates .
The spatial distributions of CCN at cloud base height with the 2008 emissions are similar in the two ensembles. However, peak CCN concentrations over the largest anthropogenic emission sources (China and India) are higher in AER than in AER-ATM. Similarly, the mean AOD is higher near anthropogenic emission sources in AER. These are mainly due to the low anthropogenic SO 2 emission in AER-ATM (section 2.2) with possible contributions of other factors such as primary sulfate aerosol emission sizes and numbers perturbed only in AER PPE. However, aerosols are also advected further from anthropogenic sources over marine environments in AER-ATM. This is partly due to the longer atmospheric lifetime of sulfate aerosol in AER-ATM that arises from less effective condensation and coagulation (see Table S7-S9). Also, surface wind speeds and precipitation rates are noticeably different in the two PPEs over many marine areas. The wind speed differences largely arise from whether or not physical parameters are perturbed and whether or not rapid adjustment of boundary layer is allowed. Larger surface wind speeds in AER-ATM produce larger emission fluxes of sea spray aerosols including regions near anthropogenic emission sources ( Figure 5). Use of different years of ERA-Interim wind speed for nudging causes much smaller differences in emissions of sea salt and dust than different model setups in AER and AER-ATM (see Tables S7-S9). Higher aerosol concentrations reduce precipitation rates in AER-ATM, which includes aerosol feedbacks on clouds, radiation, and dynamics. Perturbations to the optical properties of carbonaceous aerosols additionally have the potential to produce nonlinear aerosol and cloud lifetime effects, and physical atmosphere parameters can additionally affect surface wind speeds. In summary, the means and standard deviations of CCN and AOD are higher over marine regions in AER-ATM because of higher surface wind speeds and aerosol  Myhre et al. (2013) 1850-2010 IPCC AR5; Boucher et al. (2013) 1750

Journal of Advances in Modeling Earth Systems
feedbacks, while continental CCN and AOD are higher in AER due to higher SO 2 emission and nonlinear responses to perturbations in sulfate aerosol emission size and number. Concentrations of natural aerosols are higher in AER-ATM in both PD and PI atmospheres, whereas anthropogenic aerosol concentrations are lower in AER-ATM in the PD. Both of these lead to weaker aerosol ERF in AER-ATM than aerosol RF in AER.

Forcing Spatial Distributions
Annual mean total forcing (RF and ERF) is more negative than −2 W/m 2 over large areas of the North Atlantic and Europe and −5 W/m 2 over large areas of East Asia and North Pacific in both ensembles. The regional peaks of aerosol RF and ERF are much stronger than the multimodel mean in Shindell et al. (2013, Figure 11), consistent with the comparisons of global means shown in Table 2. For example, negative forcing just exceeds −1 W/m 2 over east and south Asia, Europe, and eastern North America in the

10.1029/2019MS001628
Journal of Advances in Modeling Earth Systems multimodel mean, whereas negative RF and ERF here are up to −10 and −5 W/m 2 in the same regions. Over the North Atlantic and the North Pacific, our PPEs estimates stronger forcing than over these continental regions, while the multimodel mean RF is weaker over these ocean basins. Off the west coast of South America is one of the regions with strongest negative forcing in both of our PPEs, but the forcing in this region is not particularly strong in Shindell et al. (2013). The strength of the aerosol forcing in our PPEs compared to the multimodel mean suggests that our base model may be more sensitive to changes in aerosols than other global climate models.
The cleanest comparison between ensemble members can be made using the simulations where all parameters are set to their median values. However, there are multiple differences between these two "median" simulations: the degree of atmospheric nudging, the meteorological years used in the nudging process, the inclusion of rapid atmospheric adjustments, and the inclusion of nonzero imaginary indices for carbonaceous aerosols, on top of the difference in high-level SO 2 emissions. These two median model simulations show similar spatial patterns to the emulator mean distributions (not shown). Both CCN and AOD are consistently lower over most of the continents and higher over most of the oceans and the Antarctica in AER-

Journal of Advances in Modeling Earth Systems
ATM compared to AER. Lower SO 2 emission in AER-ATM tends to lead to longer atmospheric lifetime of anthropogenic aerosols through less efficient condensation of sulfate gas onto existing particles leading to slower particle growth and lower particle hygroscopicity. The meteorological difference caused by rapid atmospheric adjustments with contribution of the radiative properties of carbonaceous aerosols (which affect aerosol and cloud lifetimes; see sections 2.4 and 2.5 and Table 1) are additional factors contributing the different aerosol distributions in two PPEs. Ensemble mean aerosol distributions may also be affected by the skewness of the probability distribution functions of parameter values, nonlinearity in the model sensitivity to the parameter perturbations, and interactions between multiple parameters.
As we saw in section 3.1, the global, annual mean 1850-2008 aerosol RF in AER is stronger (−2.12 W/m 2 ) than the mean aerosol ERF in AER-ATM (−1.45 W/m 2 ). Inspection of the forcing maps (Figures 3e and  3f) shows that the AER aerosol RF is stronger than the AER-ATM aerosol ERF over anthropogenic source regions (in line with higher CCN concentrations and AOD) as well as over remote marine regions (such as oceans in the Southern Hemisphere, North Atlantic Ocean, and much of Africa and Siberia). This stronger remote aerosol RF in AER occurs despite the fact that anthropogenic aerosols are advected further from source in AER-ATM.

Journal of Advances in Modeling Earth Systems
Remote aerosol forcing is weaker in AER-ATM because that ensemble has higher (by 16%; see Tables S7 and  S8) natural aerosol (sea salt and dust) concentrations than AER. Figure 5 shows that sea salt loadings are higher in AER-ATM (especially over southern high-latitude regions). The spatial pattern of increased sea salt loadings roughly corresponds to the spatial pattern of weaker aerosol forcing in AER-ATM (Figures 5c and  5d). Surface DMS concentration is also higher in AER-ATM than in AER over most of the oceans in the Northern Hemisphere (not shown). Natural aerosol emissions are some of the most important causes of 1850-2008 aerosol forcing uncertainty in both AER and AER-ATM (see Figure 6 and Table 3). These results are consistent with the fact that high natural aerosol emissions in both 1850 and 2008 atmospheres weaken the aerosol-cloud-interaction component of forcing because of the highly nonlinear relationship between emissions and cloud albedo .
The forcing uncertainty is larger over anthropogenic emission regions in AER (where the mean forcing is also stronger) but larger in other part of the globe especially over remote marine regions in AER-ATM (Figures 4e and 4f). Both aerosol and physical atmosphere parameters are important causes of aerosol ERF uncertainty , and the combined effect of these uncertainties (which determine the magnitude of rapid atmospheric adjustments to aerosols) makes aerosol ERF more uncertain in marine regions, despite having weaker (less negative) global and regional mean forcing values.

Sources of Uncertainty in Aerosol Radiative Forcing
Here we analyze the importance of individual parameters as causes of uncertainty in aerosol RF in the two ensembles by using variance decomposition (Lee et al., 2011). Parameters that cause a similar magnitude of uncertainty in both ensembles can be considered to have effects on the model that are insensitive to the inclusion of rapid atmospheric adjustments and a dynamically responsive boundary layer, which were included in AER-ATM but not AER.

Journal of Advances in Modeling Earth Systems
The main causes of aerosol ERF uncertainty in AER-ATM were analyzed in Regayre et al. (2018). In that study emissions of natural (Sea_Spray and DMS) and anthropogenic (Anth_SO2) aerosols, cloud process (Sig_W) and radiation (Rad_Mcica_Sigma) parameters were found to cause around 70% of total ERF variance. However, the percentage of uncertainty depends on total uncertainty in each PPE and which other parameters were perturbed. In Figure 6 and Table 3 the absolute uncertainties caused by individual parameters are summarized as standard deviations of annual mean aerosol RF and ERF in the AER and AER-ATM, respectively. The causes of global and European mean variance were calculated by creating emulators for global and regional averages (taken from individual ensemble members) and then sampling from the emulators and performing sensitivity analyses as outlined in section 2.10.
Many parameters included in both ensembles (circles in Figure 6a) cause larger magnitudes of uncertainty in global, annual mean aerosol forcing in AER than AER-ATM. For example, anthropogenic emission parameter (Anth_SO2) and the updraft velocity parameter (Sig_W) are among the most important parameters causing forcing uncertainty in both ensembles but cause more than twice as much uncertainty in forcing in AER than in AER-ATM despite being perturbed within the same uncertainty ranges. In contrast, the sea spray emission parameter (Sea_Spray) causes around half as much uncertainty in forcing in AER as it does in AER-ATM. Sea spray emissions cause more forcing uncertainty in AER-ATM through the higher aerosol loading as well as their effect on rapid adjustments and interaction with physical atmosphere parameters, which affect the properties of low-level clouds in marine environments. Among parameters causing smaller forcing uncertainties, cloud droplet acidity (Cloud_pH) and organic carbon hygroscopicity (Kappa_OC) cause many times larger uncertainty, whereas accumulation mode dry deposition (Dry_Dep_Acc) causes many times smaller uncertainty in AER-ATM than in AER. Causes of these differences include the lower high-level anthropogenic SO 2 emissions within AER-ATM and higher natural aerosols in AER-ATM described in section 3.2.2. Furthermore, the emission size and amount of primary sulfate aerosols are perturbed in AER. These effects are likely to be combined to make anthropogenic emissions a more important source of uncertainty in AER. Other parameters such as DMS and the diameter of emitted BB particles (Carb_BB_Diam) cause a similar amount of uncertainty in the two ensembles. This suggests that the aerosol forcing uncertainty to these parameter perturbations is relatively insensitive to rapid atmospheric adjustments, meteorological factors, and other differences in ensemble setup (sections 2.4.1 and 2.4.2).
Aerosol parameters included in AER but not in AER-ATM are relatively unimportant for global, annual mean aerosol forcing. However, at least two important causes of forcing uncertainty (BL_Nuc and Ait_Width) were neglected in AER-ATM. These results suggest the global, annual mean aerosol ERF uncertainty would be larger if all uncertain aerosol parameters were included in AER-ATM. However, interactions between the parameters and dependencies on model setup (e.g., including rapid adjustments) mean that an increase in uncertainty is not assured. The importance of aerosol parameters included in AER but not in AER-ATM is more pronounced for European aerosol RF than in the global mean case. Because aerosols have a relatively short atmospheric lifetime, aerosol parameters have a highly region-dependent influence on aerosol RF uncertainty (Regayre et al., 2015). Over Europe parameters that alter the size and emission flux of primary sulfate particles (Prim_SO4_Diam and Prim_SO4_Frac) are the largest causes of aerosol RF uncertainty. In response to the constant primary sulfate emissions in AER-ATM (the associated parameters are held constant), parameters related to aerosol removal (such as Dry_Dep_Acc) and production (Cloud_pH) processes cause more aerosol ERF uncertainty.
Physical atmosphere parameters were not perturbed in AER because their effects would be suppressed by the model setup (section 2.4.1). However, Figure 6 and Table 3 show that physical atmosphere parameters are among the largest causes of aerosol ERF uncertainty. Indeed, the cloud condensate parameter (Rad_Mcica_Sigma) is the largest cause of aerosol ERF uncertainty in AER-ATM. Regayre et al. (2018) showed that aerosols collectively cause more of the global mean aerosol ERF uncertainty than physical atmosphere parameters, and in fact Figure 6 and Table 3 show that physical atmosphere parameters are less important for aerosol ERF over Europe. Although we have previously shown that the uncertainty in European aerosol ERF could be reduced by as much as 30% using multiple observations , the underlying uncertainty would be larger if additional aerosol parameters were included in AER-ATM.
In summary, neither ensemble comprehensively samples model uncertainty. Because physical atmosphere parameters were perturbed in AER-ATM and rapid atmospheric adjustments were included within that ensemble, it can be used to inform our understanding of which processes have the greatest influence on aerosol ERF. However, because AER contains additional aerosol parameters that are important causes of uncertainty, a more detailed understanding of the effect of aerosol processes on aerosol RF can be gained by analyzing output from this ensemble.

Summary and Discussions
We have presented two PPEs of the HadGEM atmosphere-only global climate model coupled to the UKCA chemistry and aerosol model. These ensembles were designed to be complementary and address different aspects of the aerosol RF uncertainty reduction challenge. The AER (aerosol) ensemble was designed to be compared extensively to atmospheric measurements mainly on high time resolutions. Model internal variability was completely suppressed in this ensemble by prescribing wind and temperature fields and neglecting aerosol radiative feedbacks on meteorology, that is, the model was run essentially as a chemical transport model as in previous studies (Carslaw, Lee, Reddington, Mann, et al., 2013;Lee et al., 2011Lee et al., , 2012Lee et al., , 2013Regayre et al., 2014Regayre et al., , 2015. As such, the effects of perturbing 26 aerosol emission, microphysical and deposition parameters on aerosol concentrations, and aerosol RF can be quantified in the absence of additional uncertainties from meteorological adjustments. The AER-ATM (aerosol/atmosphere) ensemble was designed to quantify the relative importance of 27 aerosol and physical atmosphere parameters for annual and monthly mean aerosol ERF (including rapid atmospheric adjustments) at the global and regional level. Therefore, in this ensemble the boundary layer (containing low-level clouds important for aerosol RF) was free to respond to changes caused by parameter perturbations.
Uncertainty in output from both ensembles can potentially be constrained by observations Regayre et al., 2018). Here we performed a preliminary comparison of model-derived CCN to observe values at regionally representative stations (Schmale et al., 2018). We find that the modeled CCN uncertainty (in large samples of model variants based on statistical emulators) spans the observed CCN at most locations. However, at some stations, for some seasons we note a lack of model-observation agreement, which could be used to identify areas for model structural development.
The global mean 1850-2008 aerosol RFs are −2.12 (credible interval −2.76, −1.47) W/m 2 (RF) based on the AER PPE and −1.45 (−2.07, −0.81) W/m 2 (ERF) based on the AER-ATM PPE. There are several factors that contribute to the weaker forcing in AER-ATM. First, more natural aerosols (sea spray, DMS, and dust) are typically emitted in AER-ATM than in AER because of their dependence on surface wind speeds that are larger over most marine environments in AER-ATM, largely due to perturbed physical atmosphere parameters and rapid adjustments in the boundary layer. Larger natural aerosol emissions in both 1850 and 2008 reduce the magnitude of the aerosol-cloud interaction . Second, uncertainty in the radiative properties of carbonaceous aerosols are included in AER-ATM. Rapid atmospheric adjustments to carbonaceous aerosols produce regions of positive and negative forcing that cancel and cause a near-zero global mean aerosol-radiation interaction component of forcing across the ensemble (Figure 1, see also Regayre et al., 2018). Third, anthropogenic emissions from industrial and energy sectors were too low in AER-ATM simulations. This weakened the aerosol forcing over anthropogenic sources in AER-ATM.
The global, annual mean aerosol forcings are stronger in both of our ensembles than in model intercomparison studies Myhre et al., 2013;Shindell et al., 2013). The mean and uncertainty of forcing in our samples will change substantially by identifying all model variants (parts of parameter space) that are plausible compared to aerosol, cloud, and radiation observations; a task begun in Regayre et al. (2018) and Johnson et al. (2018). We are also aware that we do not include some potentially important marine sources of organic aerosols (Gantt et al., 2015) and additional sources of preindustrial fire emissions (Hamilton et al., 2018) which, if included, would increase preindustrial aerosol concentrations and weaken the aerosol forcing . The autoconversion rate (the rate at which cloud droplets convert into rainfall) is also a potentially important parameter that we did not perturb in our ensembles. The relationship between the autoconversion rate and the droplet number (which is controlled by the updraft-driven CCN activation), for example, the exponent in the Khairoutdinov and Kogan (2000) parameterization, would be another interesting parameter to perturb in future studies.
The credible aerosol forcing ranges in our ensembles are larger than or comparable to those obtained from model intercomparison projects ACCMIP (Shindell et al., 2013), AeroCom II , and IPCC AR5 ; Figure 1 and Table 2), even though inclusion of observationally implausible model variants somewhat contributes to additional uncertainty. Multimodel intercomparison studies are useful to identity structural issues in the representation of aerosol cloud interactions in global models that PPEs cannot address. However, the large uncertainty within a single model suggests that PPEs can be used to identify the key processes causing model uncertainty and differences among multimodel ensemble members. Additionally, the large parametric uncertainty sampled here suggests a need for caution when interpreting conclusions drawn from multimodel intercomparison studies that represent unknown combination of structural and parametric uncertainties from small size of ensembles (Carslaw et al., 2018). Both structural and parametric uncertainties need to be understood and interpreted in order to properly develop a strategy for reducing the uncertainty and improve confidence in climate projections.
Regionally, uncertainty in aerosol forcing is larger over most marine regions in AER-ATM (Figure 4) with exceptions of regions near anthropogenic sources. However, global, annual mean aerosol forcing uncertainties are roughly the same in two PPEs (Figure 1 and Table 2), as a result of the cancelation between regional effects of opposite signs in AER-ATM. These results suggest that uncertainty in the spatial redistribution of aerosols between regions is important for understanding the climate effects of aerosols at the regional level. We also neglected several aerosol parameters (related to the size and number of primary sulfate emissions) in AER-ATM that are important causes of global and regional uncertainties of aerosol RF in AER ( Figure 6 and Table 3). The regional aerosol ERF uncertainty in AER-ATM would be even larger if these additional aerosol parameters were included.
The AER and AER-ATM ensembles were designed to help understand and reduce the persistent uncertainty in aerosol RF. These ensembles are the first step toward this goal because they densely sample the uncertainty in our model. Our aim now is to challenge the model uncertainty using multiple observations related to key processes driving model uncertainty (e.g., Johnson et al., 2018;Regayre et al., 2018). Comparison of the ensemble output to observations will also help to identify model structural deficiencies (e.g., McNeall et al., 2016) and prioritize areas of model development. Because we have sampled the full parametric uncertainty range, to our best knowledge, we can be more confident in the attribution of biases to model structural error than if we had only a handful of sensitivity tests to rely on.
Once observationally constrained, these ensembles will provide multiple opportunities for analysis in other aspects of climate research. For example, the constrained parameter space and sensitivity analysis results will inform field campaigns by highlighting the most useful new measurements for reducing the remaining forcing uncertainty. Additionally, observationally constrained aerosol ERF over, say, the North Pacific in recent decades, would provide values to feed into simulations designed to examine the full dynamical response to those regional forcings. Finally, understanding the causes of uncertainty in AER and AER-ATM will inform the development of climate models in general because other climate models will have parametrizations that correspond to the processes that cause uncertainty in our ensembles.

Data Availability Statement
Raw simulation output data in both AER and AER-ATM ensembles are available as Met Office postprocessing data format (.pp; Met Office, 2013) from the JASMIN data infrastructure (http://www.jasmin.ac.uk). Some of the climate-relevant fields are derived and stored in netCDF files (.nc) containing data for all ensemble members and made available as a community research tool. Variables contained in raw outputs and derived data as well as their spatial domains and temporal resolutions are summarized in Tables S5 and S6. The total data sizes are around 85 TB for AER and 75 TB for the AER-ATM.