Energy Budget Constraints on the Time History of Aerosol Forcing and Climate Sensitivity

An observationally constrained time series of historical aerosol effective radiative forcing (ERF) from 1750 to 2019 is developed in this study. We find that the time history of aerosol ERFs diagnosed in CMIP6 models exhibits considerable variation and explore how the time history of aerosol forcing influences the probability distributions of present‐day aerosol forcing and emergent metrics such as climate sensitivity. Using a simple energy balance model, trained on CMIP6 climate models and constrained by observed near‐surface warming and ocean heat uptake, we derive estimates for the historical aerosol forcing. We find 2005–2014 mean aerosol ERF to be −1.1 (−1.8 to −0.5) W m−2 relative to 1750. Assuming recently published historical emissions from fossil fuel and industrial sectors and biomass burning emissions from SSP2‐4.5, aerosol ERF in 2019 is −0.9 (−1.5 to −0.4) W m−2. There is a modest recovery in aerosol forcing (+0.025 W m−2 decade−1) between 1980 and 2014. This analysis also gives a 5%–95% range of equilibrium climate sensitivity of 1.8°C –5.1°C (best estimate 3.1°C) with a transient climate response of 1.2°C –2.6°C (best estimate 1.8°C).

Given its large uncertainty, aerosol forcing has remained an active research area. Several studies have quantified the aerosol ERF in the present day relative to pre-industrial based on observations, models, energy balance arguments, or a combination of approaches (Andrews & Forster, 2020;Boucher et al., 2013;Fiedler et al., 2019;Forest, 2018;Forest et al., 2002Forest et al., , 2006Myhre, Shindell, et al., 2013b;Skeie et al., 2018;Smith et al., 2020;Zelinka et al., 2014). Fewer studies have attempted to diagnose a time series of historical aerosol forcing. Murphy et al. (2009) used observations of ocean heat uptake, surface temperature, and radiative forcing of long-lived greenhouse gases and volcanic eruptions since 1950 to determine the residual forcing, which was mostly attributed to aerosols. Skeie et al. (2011) and Lund et al. (2018) used chemistry-transport models with prescribed meteorology, evaluated at frequent time slice years since pre-industrial, to determine historical aerosol forcing since 1750. Shindell et al. (2013Shindell et al. ( ) used timeslices from 1850Shindell et al. ( , 1930Shindell et al. ( , 1980Shindell et al. ( , and 2000 in full-complexity climate models to estimate the historical aerosol forcing. The most complete historical aerosol forcing time series, for 1750-2011, is given in Annex II (Prather et al., 2013) of the Intergovernmental Panel on Climate Change Working Group 1 Fifth Assessment Report (AR5), which takes in multiple lines of evidence including the Skeie et al. (2011) and Shindell et al. (2013) modeling studies.
Our goal is define an aerosol ERF time series from 1750 to 2019 that is consistent with energy balance constraints from observations; effectively to provide an update to AR5 Annex II that takes into account more recent evidence. Here we take a combination of the climate modeling and energy balance approaches. Under the Radiative Forcing (RFMIP) and Aerosol Chemistry (AerChemMIP) Model Intercomparison Projects, historically time-varying aerosol forcing can be diagnosed directly from CMIP6 models. However, in the multi-model mean, CMIP6 model simulations of global-mean surface air temperature (GSAT) are cooler than observations throughout the latter part of the 20th Century before recovering to near-present levels of warming today (Flynn & Mauritsen, 2020). One hypothesis is that aerosol forcing in the 20th Century may be too strong in some CMIP6 models, coupled with high transient climate response (TCR) that causes implausibly rapid recent warming in many models (Tokarska et al., 2020). Nevertheless, CMIP6 models remain an important line of evidence in determining historical aerosol forcing. Unlike for greenhouse gases, proxy records for aerosol forcing are sparse before widespread surface radiation measurements became available in the 1950s Moseid et al., 2020). No global observations of aerosols were available until the satellite era (late 1970s), whereas CMIP6 models provide aerosol forcing estimates from 1850. Therefore, we use CMIP6 model forcing over the industrial era to inform our estimates of historical aerosol ERF, and "correct" for these responses by constraining the forcing estimates to observations of GSAT and Earth energy uptake (EEU).

Methods and Data
This section describes how the historical ERF time series are generated and how observational constraints are used with a simple energy-balance climate model to produce the best estimate and range of historical aerosol forcing estimates. A number of historical aerosol forcing time series are investigated. The primary focus of this study is an ensemble of time series generated from a simple relationship of global annual emissions to global annual historical aerosol ERF, using historical aerosol emissions time series and tuning this relationship (which we call a "forcing emulator") on CMIP6 models where historical aerosol forcing estimates exist. Following the observational constraining, we refer to this time series as "CMIP6-constrained. " We also investigate replacing the CMIP6 emissions time series with scaled estimates of historical aerosol forcing from 11 CMIP6 models and one chemistry-transport model. This provides a total of 13 different historical aerosol forcing scenarios.
Throughout this paper, a probabilistic approach is taken, sampling 100,000 historical forcing timeseries per scenario with the same number of simple climate model configurations. Uncertainties in the non-aerosol components of historical forcing (greenhouse gases, land-use change, black carbon (BC) deposition on snow, aviation contrail and contrail cirrus, and natural forcings) are also taken into account. The resulting SMITH ET AL. GSAT and EEU time series from each ensemble member is compared to the observational constraints and a weighted posterior distribution produced of historical aerosol ERF.
In all cases, aerosol ERF is diagnosed as the top-of-atmosphere radiation flux difference between parallel climate model experiments, one with time-varying aerosols, and one with pre-industrial aerosols (Table 2). In the RFMIP models, SSTs, sea ice, and non-aerosol forcings are given as a pre-industrial climatology in both the transient (CMIP6 name piClim-histaer) and control (piClim-control) experiments, with aerosols following the 1850-2014 (or 2100, in models running the SSP2-4.5 extension) emissions from CMIP6 in the transient run (Hoesly et al., 2018;van Marle et al., 2017). The pre-industrial control is a 30 yr climatology. In AerChemMIP models, we use historical (1850-2014) SSTs, sea ice and forcings (histSST) as the perturbation experiment, and historical SSTs, sea-ice, and non-aerosol forcings with pre-industrial aerosols (histSST-pi-Aer) as the control. E3SM also followed this method (described in Golaz et al., 2019). Figure 1 shows the aerosol ERF with respect to 1850 from CMIP6 models. Most models show a peak in negative aerosol forcing at some point between 1975 and 2010 before recovering in recent years.

Separation of Aerosol Components
For each CMIP6 model, the shortwave (SW) aerosol-radiation and aerosol-cloud interaction components of the ERF (ERFari SW and ERFaci SW ) are determined using the Approximate Partial Radiative Perturbation (APRP) method (Taylor et al., 2007;Zelinka et al., 2014). The LW ERF from aerosol-cloud interactions (ERFaci LW ) was determined using the difference between all-sky and clear-sky forcing (difference in cloud radiative effect) with the LW ERF from aerosol-radiation interactions (ERFari LW ) estimated as the difference between ERF LW and ERFaci LW . The APRP is not exact and a small residual term arises that varies over time and by model ( Figure S1), some of which is related to a small surface albedo adjustment (Ghan, 2013), but only the time-varying shapes and relative magnitudes of ERFari and ERFaci to each other are important for this decomposition.
For the RFMIP models, we calculate the APRP using the difference of each year of the piClim-histaer run against every year of the piClim-control run before averaging across the 30 piClim-control years to determine the ERFari and ERFaci from each year of 1850-2014 (or 2100). This method removes some nonlinearities SMITH ET AL.     preted as the radiative efficiency of emission of each aerosol species. Strictly, Equations 1 and 2 represent radiative effects rather than radiative forcings. Radiative forcings are given by the difference of radiative effects in Equations 1 and 2 calculated between the emissions in the year of interest and the pre-industrial year (either 1850 or 1750).
Equation 1 follows from studies showing that ERFari scales linearly with emissions (Johnson et al., 2019;Lund et al., 2018;Mahajan et al., 2013;Rap et al., 2013). Equation 2 is an extension of the simple relationship of Stevens (2015) and is based on the understanding that the change in cloud albedo is logarithmic with sulphate burden, and that burden scales with emissions Charlson et al., 1992). Including carbonaceous aerosol in Equation 2 represented by the sum of BC and OC emissions is useful as some CMIP6 models include the effects of BC and/or OC on the change in cloud condensation nuclei. The resulting aerosol-cloud interactions can be substantial, for example, a negative ERFaci to BC emissions in the MIROC6 model (Thornhill et al., 2021). Equation 2 is found to give a good heuristic approximation of global-mean ERFaci to a more sophisticated aerosol indirect effect model  as shown in . The sum of BC and OC emissions is used following the original Ghan et al. (2013) aerosol indirect model which considers primary anthropogenic emissions to be BC + OC, and to limit the number of free parameters in Equation 2 to three.
parameters in Equations 1 and 2 are estimated using a least squares curve fit of each CMIP6 model's ERFari and ERFaci (Table 3). A multi-model mean emulation is performed where the ERFari and ERFaci from the 11 models is averaged and Equations 1 and 2 applied. The multi-model mean  coefficients (−2.5, +28.5, and −8.5 mW m −2 (Tg yr −1 ) −1 for SO 2 , BC, and OC respectively) are of similar magnitudes to the radiative efficiencies from Aerocom models (Myhre, Samset, et al., 2013) for SO 2 and BC, and a little stronger for OC here. The radiative efficiency coefficients are negative for BC and positive for OC in the IPSL-CM6A-LR model. However, using coefficients derived from the AerChem-MIP single-forcing experiments for BC, OC, and SO 2 in IPSL-CM6A-LR-INCA gives a much less good fit to the historical aerosol forcing in IPSL-CM6A-LR than our fitted coefficients, and while the fitted coefficients may not be representative of the true physical behavior in this model, allowing these values as part of the prior sampling allows for a larger diversity of aerosol forcing time series. We do not attribute a nitrate forcing to avoid overfitting and because most models do not include the effects of nitrogen compounds on aerosol formation. In reality, nitrate formation may compete with sulphate formation for available ammonium. Model evidence suggests this is of limited importance historically (Thornhill et al., 2021), but may become more important in future scenarios where the nitrate to sulphate emissions ratio is projected to increase (Bellouin et al., 2011;Hauglustaine et al., 2014 The degree of logarithmic behavior that ERFaci exhibits to emissions differ considerably between GCMs (Wilcox et al., 2015), and the possibility that ERFaci may be linear with emissions in some CMIP5 models was discussed in Booth et al. (2018). Figure 2 shows the emulated fits to each CMIP6 model from Equations 1 and 2 as colored curves with the APRP-derived forcing from the GCMs in gray using an 1850 reference. The gray curves in Figure 2 show that most models show a peak in ERFari that is weakening in recent years, whereas ERFaci is approximately constant up to 2014 or shows a slower weakening trend. It can be seen that the emulated relationships (colored curves) give good representations of each component of the aerosol forcing in each model. We extrapolate these CMIP6 model-specific emulations back to 1750 in each model, resulting in a small positive forcing in 1750 relative to 1850. Where models do not provide an SSP2-4.5 future projection, we also extend these time series forward using Equations 1 and 2. Finally, we re-base all emulated time series to a 1750 reference (Figure 3), where the impacts of the different shapes for historical aerosol forcing due to different parameter combinations are more clearly seen. One notable feature in all time series is an increased forcing between 2014 and 2015 in the emulated curves owing to a 16% reduction in global SO 2 emissions from the CMIP6 historical to SSP2-4.5 data sets over 1 year.

Ensemble Generation
To simulate time-varying aerosol forcing we take probabilistic ensembles for both the magnitude and the shape of the historical aerosol forcing. To generate historical shapes, we take 100,000 samples of  SO 2 ,  BC ,  OC , SO 2 s , and based on their distributions from the 11 participating GCMs (Table 3). A joint kernel-density estimate of the  coefficients is used to derive a distribution which is then sampled from for ERFari (Figures 4a-4c).
Accounting for correlation between the coefficients maintains the connection that different aerosol species are often co-emitted. For ERFaci, we take    Table 3   of these parameters span several orders of magnitude.  is not a degree of freedom in this setup because the resultant ERFaci time series will be scaled as described below.
We combine the sampled  and s coefficients with 100,000 samples of the absolute values of the 1850 to 2005-2015 ERFari and ERFaci from the process-based assessment in Figure 8 of ; (hereafter the "Ringberg assessment"), using the distributions that do not account for energy budget constraints. We run the emissions emulator in Equations 1 and 2 using a update of the historical CEDS emissions to 2019 (O'Rourke et al., 2020) for energy and industrial sectors and BB4CMIP (for biomass burning) under a historical + SSP2-4.5 assumption (the emissions are much less sensitive to the choice of scenario for biomass burning than for energy and industry). Critically, the updated CEDS emissions account for phenomena such as an earlier and more gradual reduction in SO 2 emissions from China than were in the original CMIP6 data set (Paulot et al., 2018), as well as other corrections, and avoids arbitrarily choosing a scenario for post-2014 emissions. Differences in these time series are plotted in Figure S2.
This process rescales the  coefficients and selects  in each ensemble member consistent with the present-day ERFaci. These 100,000 sampled time series are then rebased to a 1750 baseline by subtracting the 1750 forcing from the 1850 forcing, producing 100,000 candidate historical time series of aerosol forcing for the period 1750-2019 that differ in shape and magnitude ( Figure 5).

Scaled Model Forcing
Alongside the emissions-based aerosol forcing time series, we repeat the analyses using scaled historical ERFari and ERFaci from each CMIP6 model, where the shapes of the historical forcing derived from the APRP method ( Figure 2) in each model are fixed, but the pre-industrial to present-day magnitudes are allowed to vary. We also use RFari and RFaci time series generated from the Oslo-CTM3 chemistry transport model (Lund et al., 2018(Lund et al., , 2019

Non-Aerosol Forcing Time Series
The non-aerosol component forcings are also generated from a 100,000-member Monte Carlo ensemble.
As they are either less uncertain or smaller in magnitude (or both) than the aerosol forcing they are not the main focus of this paper but are generated to provide a consistent view of the total ERF, including uncertainty estimates, so that energy budget constraints can be applied. Detailed information is provided in Text S1 with a summary of the key data sources in Table 4. SMITH ET AL.
10.1029/2020JD033622 9 of 20 . The grayscale 2D hexbin histogram of points represents a density of the 100,000 drawn sample sets and the colored points are fits to each CMIP6 model.

Simple Climate Model
We use our 100,000 member ensemble of aerosol and non-aerosol forcings and run them in a two-layer energy balance model, including efficacy of deep-ocean heat uptake Held et al., 2010). To perform this many simulations precludes the use of a comprehensive GCM and similar constrained Monte Carlo ensemble methods using reduced-complexity models have been done previously (Meinshausen et al., 2009;. The structural uncertainty associated with the choice of simple climate model has been found to have limited impact on global mean temperature projections in historical simulations . We choose to use the two-layer model due to its computational efficiency, inclusion of both EEU and GSAT, and its proven ability as a useful emulator of complex GCMs (Palmer et al., 2018). We use the formulation from the two Geoffroy et al. papers, hereafter G13a and G13b. The two-layer model can be written as SMITH ET AL.

10.1029/2020JD033622
10 of 20 where mix C and deep C (W yr m −2 K −1 ) are the heat capacities of the ocean mixed layer and deep ocean, mix T and deep T (K) are the respective layer temperature anomalies,  (W m −2 K −1 ) is the climate feedback parameter (using the convention that negative values indicate stabilizing feedbacks), F (W m −2 ) is the ERF,  (dimensionless) is the efficacy of deep ocean heat uptake and  (W m −2 K −1 ) the heat transport between the two layers. The relatively small heat capacity of the land surface and atmosphere compared to the ocean means that it can be neglected and the GSAT anomaly is given by mix T . All available 44 CMIP6 models that published both abrupt-4xCO 2 and piControl simulations to the Earth System Grid Federation as of July 2, 2020 were used to tune the two-layer model using the method set out in G13a and G13b. Models that were available on different resolutions (e.g., NorESM2-LM and NorESM2-MM), and physical and Earth system models from the same group (e.g., CNRM-CM6-1 and CNRM-ESM2-1) were treated as separate models, but different physics versions of the same model were not (e.g., r1i1p1f1 and r1i1p3f1 from GISS-E2-1-G). The two-layer model is tuned to the GSAT and TOA radiation imbalance of each CMIP6 model's abrupt-4xCO 2 run. There are five free parameters in the G13b model:  ,  , , mix C , and deep C . Radiative forcing from a quadrupling of CO 2 (  4 F ) is also calibrated using this method with the abrupt-4xCO2 experiments. Table S1 sets out the parameters for each model and Figure S3 shows the temperature evolution in CMIP6 models for the model output and the simulated two-layer model fits for abrupt-4xCO2. The equilibrium climate sensitivity (ECS) is an emergent parameter from this model and is calculated as . The inclusion of the efficacy of ocean heat uptake in the two-layer model leads to different and often larger estimates of climate sensitivity than from a "Gregory regression" of TOA energy imbalance against global mean temperature anomaly for the first 150 yr of abrupt-4xCO 2 (the so-called "effective" climate sensitivity, EffCS). The strengthening of climate feedbacks over time as temperatures approach equilibrium in abrupt-4xCO 2 experiments results in the long-term ECS being in the region of 10%-30% larger than EffCS (Rugenstein et al., 2020). The version of the two-layer model that includes efficacy of ocean heat uptake captures this effect somewhat, and we cautiously refer to our derived climate sensitivity as ECS for this reason.
A joint kernel density estimate distribution of the six input parameters to the two-layer model are sampled based on the values resulting from the 44 CMIP6 model tunings (marginal distributions for each parameter are shown in Figure S4). The joint distribution allows for correlation between model parameters to be taken into account when sampling (Table S2).
Internal variability in GSAT is simulated by analyzing the piControl run in all available models (49 as of July 2, 2020). Global mean temperatures from the piControl simulations are detrended to remove any residual drift with the residuals around the long term trend taken to be the internal variability. The autocovariance matrix of these residuals in each model is constructed by regressing the time series with itself at lag 1 yr, lag 2 yr, …, lag n years where n is the length of the model's piControl simulation. This input is then used as the covariance of a multivariate Gaussian distribution that governs the time-dependent internal variability of temperatures in each model. For each ensemble member simulated, one of the 49 CMIP6 models is selected at random, and 270 yr (1750-2019) of internal variability generated based on the underlying distribution in the selected GCM. This allows for a more realistic construction of internal variability than a memoryless process noting that in reality events such as ENSO tend to cluster warm and cool years, and also provides for the recreation of low-frequency long-term internal variability that is present in some models such as CNRM-ESM2-1 ( Figure S5).

Observational Constraints
We use GSAT estimates from 1850 to 2019 and total Earth system energy uptake observations from 1971 to 2018 to constrain our simulations. To estimate GSAT we use the Cowtan & Way (2014) data set of infilled global-mean surface temperatures (GMSTs; near-surface temperatures over land and sea-ice and SSTs over SMITH ET AL.

10.1029/2020JD033622
11 of 20 open ocean) for 1850-2019, multiplied by a time-varying ratio of GSAT/GMST from CMIP5 models under the historical and RCP8.5 pathways from 1861 to 2014 (Richardson et al., 2016). This ratio converges toward 1.08 for the recent past  and we extend this ratio forward to 2019. Both observations and model output are rebased to the 1850-1900 mean as a proxy for pre-industrial following the IPCC Special Report on 1.5°C. This results in a central estimate of GSAT warming of 1.10°C for 2010-2019 relative to 1850-1900 with a warming trend of 0.30°C per decade since 2010.
For observations of total EEU, we use data from the Global Climate Observing System (GCOS) observational data set that includes estimates from ocean heat uptake, cryosphere, atmosphere, and land surface (Von Schuckmann et al., 2020). The ocean has absorbed 89% of the total energy uptake in the Earth system since 1960 owing to its larger thermal capacity compared to other components of the Earth system. The two-layer model only tracks heat uptake into the ocean, but to be physically consistent with the total observed EEU we assume that the heat uptake of the atmosphere, cryosphere, and land is taken into account in the heat uptake of the mixed layer of the ocean. The GCOS data set extends back to 1960, but we focus on the period from 1971 to 2018 due to the limited coverage of deep ocean temperature observations before the advent of expendable bathythermographs (XBTs) in the late 1960s (Palmer, 2017). Our constraint is based on the agreement of EEU calculated in the two-layer model with the total EEU from 1971 to 2018 in GCOS of 358 ± 37 ZJ (1 s.d.).
Following the running of the two-layer model, each of the 100,000 ensemble members is assigned a weight i w based on how well it reproduces GSAT and EEU observations. The weighting is based on the model weighting technique of Knutti et al. (2017): where , X i r is a measure of how well the model reproduces observations for variable X for ensemble member i, and  ,D X is the "radius of model quality" (Sanderson et al., 2015). The radius of model quality is a subjective choice and for both GSAT and EEU we use assessed uncertainties. For GSAT, GSAT,i r represents the root-mean-square error of each ensemble member's simulated temperature compared to observations and  GSAT,D = 0.12°C based on the "likely" (>66%) range of GMST from the 1850-1900 period to 2006-2015 in IPCC Special Report on Global Warming of 1.5°C. For EEU, we use EEU,i r as the difference in EEU between the model ensemble member and the GCOS best estimate of 358 ZJ  and use the GCOS uncertainty of  EEU,D = 37 ZJ. Unlike in Knutti et al. (2017) we do not downweight similar ensemble members.
Following calculation of each i w , the ensemble weights are normalized such that   1 i w . Although subjective, our choices for the goodness-of-fit to the constraints ensure that GSAT and EEU have approximately equal influence on the total weighting given to each ensemble member ( Figure S6). An ensemble member will receive a high weight if it is close to both observed GSAT and observed EEU resulting in fewer highweight ensemble members than using a single constraint only. Results using just one constraint are reported in the supporting information. Figure 6 shows the aerosol ERF time series that best fit the observational constraints of GSAT and EEU for the CMIP6-constrained ensemble plus 12 climate models. Also shown are the projections of GSAT and EEU with the applied ensemble weighting. The weighted 5%-95% range from the CMIP6-constrained time series using both GSAT and EEU as constraints is shown as a gray band with the weighted mean as a gray line. The 1750-2019 aerosol ERF is determined to be −0.90 W m −2 (−1.55 to −0.35 W m −2 5%-95% range), comprised from −0.31 (−0.62 to −0.08) W m −2 for ERFari and −0.59 (−1.18 to −0.10) W m −2 for ERFaci. The central estimate of total aerosol forcing is equal to the −0.9 (−1.9 to −0.1) W m −2 assessed for 1750-2011 in AR5, with a narrower "very likely" (>90%) range in this study. SMITH ET AL.

10.1029/2020JD033622
12 of 20 With the CMIP6-constrained time series, aerosol ERF exhibits a slight recovery between 1980 and 2014 of +0.025 W m −2 decade −1 . This is a lower aerosol recovery than seven of the 11 CMIP6 models, although the constrained 5%-95% range is wide (−0.074 to +0.111 W m −2 decade −1 ) and includes the means from all but the UKESM1-0-LL model (Figure 7). These results indicate that a rapid aerosol forcing recovery is unlikely and not consistent with the energy budget constraints, but whether aerosol forcing has been strengthening, weakening, or stable in recent decades is not conclusive. Figure 8 uses the GSAT and EEU constraints to show the present-day distributions of aerosol forcing. Alongside this, we use the ensemble weights to calculate distributions of ECS and TCR from the ensemble given the two-layer model parameter distributions and ensemble weights. To calculate TCR we take the approach of Jiménez-de-la-Cuesta and Mauritsen (2019) noting the TCR is approximately The mean, 68% and 90% range for these parameters along with their unconstrained (prior) distributions are shown in Table 5. All 13 historical time series are shown in Table S4 for both GSAT and EEU constraints, and in Tables S5 and S6 using only GSAT or EEU respectively.
A diversity of constrained present-day aerosol ERF distribution shapes is possible for the aerosol forcing from each model's historical time evolution. In particular, there are a group of models (the two from SMITH ET AL.

10.1029/2020JD033622
13 of 20 GFDL, HadGEM3-GC31-LL, GISS-E2-1-G, and NorESM2-LM) where present-day ERFari is relatively weak and few values less than −0.5 W m −2 satisfy the observational constraints. For ERFaci, neither the CMIP6 models nor the CMIP6-constrained time series support a strong negative forcing and the constrained distributions are less skewed than the Ringberg assessment range. All historical aerosol forcing time series constrain the present-day aerosol forcing to a narrower range than the full process-based distributions of the Ringberg assessment. As discussed in , energy budget constraints do not favor a present-day aerosol forcing more negative than −2 W m −2 , and this is also borne out by our distributions in Figure 8c.
High values of ECS and TCR are also constrained out when applying observational constraints (Figures 8d and 8e). The prior distributions in thin black curves allow for ECS values much larger than those seen in CMIP6 models as values of net feedback  sampled in the prior distribution ( Figure S4) can be close to zero. Interestingly, the choice of historical aerosol forcing time series is less important for constraining ECS and TCR than for the present-day aerosol forcing, and in every case the constrained distribution favors lower climate sensitivity than the energy-balance derived prior. The constrained distribution of ECS is not tight (1.8°C -5.1°C 5%-95% range with a best estimate of 3.1°C), which is lower than the raw CMIP6 model range inferred from the two-layer model fits (1.9°C -7.1°C; Table S1). The best estimates reported in the main body of the text, Table 4 and Figures 6 and 7 relate to the weighted mean of the constrained distributions; the median estimates are provided in Tables S4-S6. Transient climate response falls in the 5%-95% range of 1.2°C -2.6°C (best estimate 1.8°C).

Discussion and Conclusions
Comprehensive climate models are the best tools available for determining global aerosol forcing where other spatially complete lines of evidence do not exist, such as prior to the satellite era (∼1980). However, model-derived aerosol forcing depends on a chain of processes, and ultimately on spatially resolved aerosol emissions time series that are themselves uncertain (Hoesly et al., 2018). It is not possible to determine here whether the tendency for some models to project strong aerosol forcing in the second half of the 20th Century and/or too much of a recent aerosol recovery, at least compared to what would be implied by the observational constraints, is due to model processes or uncertainties in the emissions.
It is likely that regional effects are significant and aerosols emitted from different source regions affect the global energy balance in different ways which is not captured in a global emissions to forcing relationship (Kretzschmar et al., 2017). Including how global forcing changes to regional aerosol emissions could be an improvement to the forcing emulator, although may be difficult to implement for aerosol-cloud interactions in marine stratocumulus cloud decks which may be thousands of kilometers from the emissions source (Regayre et al., 2014). However, we show for the 11 GCMs that our relationship is trained on, the forcing emulator works well ( Figure 2) and is useful as a first-order global relationship that incorporates sufficient flexibility in its forcing response to emissions.
Our results are consistent with the conclusions of previous studies that have attempted to constrain present-day aerosol forcing based on energy balance arguments. We find that very large negative values of pre-industrial to present-day aerosol ERF are inconsistent with observed warming and total Earth system energy gain (Andrews & Forster, 2020;Forest, 2018;Forest et al., 2002Forest et al., , 2006Skeie et al., 2018;. Our best estimate 1750-2019 aerosol forcing of −0.90 W m −2 is similar to recent observationally constrained studies that put aerosol forcing in the −0.8 to −0.9 W m −2 range (Andrews & Forster, 2020; SMITH ET AL.  Table S3. Skeie et al., 2018;, and the review of 19 inverse estimates in Forest (2018) of −0.77 W m −2 . We also find that the most likely shape of recent  aerosol forcing is approximately constant or with a slightly positive trend, which is in line with reanalysis-derived estimates of aerosol RFari and RFaci (Bellouin, Davies, et al., 2020), and a rapid recovery in aerosol forcing is not likely. SMITH ET AL.   The shape of historical aerosol forcing time series -whether from our simple emulator or provided by a particular CMIP6 model -does not provide a constraint on the overall 1750-2019 aerosol forcing. However, the choice of aerosol forcing data set used matters less for constraining ECS and TCR than it does for the shape or magnitude of present-day forcing (Figure 8). It is difficult to constrain the upper bound of ECS due to the heavy tail of the prior distribution, and 95th percentile values of ECS range from 4.3°C to 6.0°C depending on the aerosol forcing time series used (Table S4). Other studies also show that these constrained climate sensitivity distributions are sensitive to the priors used (Sherwood et al., 2020). For example, our prior sample space informed by CMIP6 models has very few ensemble members with ECS < 1.5°C and TCR < 1.0°C, both lower bounds of the respective "likely" range in AR5, and it is possible that this area of the distribution is undersampled. Note that we do not perform a full Bayesian analysis in this paper. However, our ECS estimate of 3.1°C (1.8°C -5.1°C) is in line with, although with wider uncertainty, than the Bayesian estimate of 3.2 (2.3°C -4.7°C) in Sherwood et al. (2020), which takes into account several lines of evidence in their assessment.
The two-layer model used in this study includes the efficacy of deep ocean heat uptake, which can partially account for the "pattern effect" Sherwood et al., 2020) in which evolving patterns of sea surface temperature change can influence estimates of climate feedback as warming approaches equilibrium. In the notation of Sherwood et al. (2020) the total effective climate feedback can be written     with   the contribution from the pattern effect. Across the sample of CMIP6 two-layer model calibrations in this study, the pattern effect   varies from −0.1 to +0.6 W m −2 K −1 (mean +0.2 W m −2 K −1 ) between 1980 and 2050 (assuming SSP2-4.5 forcing), similar to previous studies (e.g., Armour 2017). This forced contribution to the pattern effect arises through the efficacy of deep ocean heat uptake in the two-layer model, and occurs where  > 1  as it is in the majority of CMIP6 model calibrations. The pattern effect means that historically, the effective climate feedback tends to be slightly weaker than the equilibrium feedback, . However, this historical pattern effect is not as large as that determined from observed SSTs from AMIP models  of about +0.5 W m −2 K −1 , as the historical pattern effect also includes an unforced component that is related to internal variability (Zhou et al., 2021). If we included the unforced component of the pattern effect through a further adjustment to   , it is likely our derived historical aerosol forcing would be weaker. However, applying this historical pattern effect approach to our projections is not straightforward. Calculating   this way requires a 30 yr moving window regression , and historical simulations in CMIP6 are provided only to 2014 meaning   can only be estimated until around 2000.   is sensitive to volcanic eruptions and aerosol forcing , and the historical SST record is only one realization. We therefore do not take the historical approach, but account for the pattern effect through the sampling of deep ocean heat uptake efficacy, and for internal variability through the temporal autocorrelation of piControl runs.
Our results suggest that the limited number of CMIP6 models considered here have a stronger aerosol forcing than may have actually occurred during the 20th Century and this effect may be responsible for the modest warming in the CMIP6 ensemble mean over this time period (0.24 ± 0.22°C for 1961-1990 relative to 1850-1900 in CMIP6 models compared to reconstructed GSAT observations of 0.39 ± 0.06°C over the same period; Figure S7). The diagnosis of historical aerosol forcing in more CMIP6 models to confirm or disprove this would be welcomed. Inclusion of uncertainties in historical emissions would be useful to determine whether this is a factor in suppression of warming. Re-running of GCMs with updated emissions inventories could determine how sensitive models are to emissions uncertainties, and the importance of regional effects. The time history of aerosol forcing and its present-day magnitude both constrain key climate system uncertainties such as climate sensitivity and the rate of recent warming (Tanaka & Raddatz, 2011). Reducing uncertainty in both will reduce uncertainty in climate projections.