Compensation Between Cloud Feedback and Aerosol‐Cloud Interaction in CMIP6 Models

The most recent generation of climate models (the 6th Phase of the Coupled Model Intercomparison Project) yields estimates of effective climate sensitivity (ECS) that are much higher than past generations due to a stronger amplification from cloud feedback. If plausible, these models require substantially larger greenhouse gas reductions to meet global warming targets. We show that models with a more positive cloud feedback also have a stronger cooling effect from aerosol‐cloud interactions. These two effects offset each other during the historical period when both aerosols and greenhouse gases increase, allowing either more positive or neutral cloud feedback models to reproduce the observed global‐mean temperature change. Since anthropogenic aerosols primarily concentrate in the Northern Hemisphere, strong aerosol‐cloud interaction models produce an interhemispheric asymmetric warming. We show that the observed warming asymmetry during the mid to late 20th century is more consistent with low ECS (weak aerosol indirect effect) models.

feedback in models. The ECS ranges from 1.8 to 5.6 K in the CMIP6 models, with seven of them having an ECS greater than 4.7 K, the upper bound of ECS in CMIP5 Flato et al., 2014).
In addition to changes in GHGs, climate forcing over both the historical era and projected future scenarios involve changes in aerosols. Interactions between clouds and aerosols are complex and also influence the radiation budget (Penner et al., 1992). Aerosols affect the radiation directly by scattering and absorbing incoming sunlight. Additionally, aerosols can act as cloud condensation nuclei, change the cloud droplet size and alter cloud albedo, and cloud lifetime, modulating the radiation budget (Rotstayn & Penner, 2001;Twomey, 1977). The indirect effects are both highly uncertain and often larger than the direct radiative impact of aerosols (Lohmann et al., 2010;Myhre et al., 2013;Smith et al., 2020;Zelinka et al., 2014).
In this study, we show that models with a more positive cloud feedback in response to increasing GHGs also tend to have a stronger cooling effect from aerosol-cloud interactions (ACI). These two effects offset each other during much of the 20th century, when both anthropogenic aerosols and GHGs emissions increased. Thus, both models with low and high ECS are able to reproduce the observed changes in global-mean temperature. However, this compensation does not occur in future emission scenarios where aerosols are projected to decrease as CO 2 and other GHGs continue to increase. We will show that the interhemispheric temperature contrast over the historical period provides a way to distinguish between low and high ECS models. Also, we find that models with a lower ECS (and weaker ACI) are more consistent with the observed interhemispheric asymmetric warming pattern during the 20th century.

Data and Methods
We use monthly model data from historical, piControl, abrupt-4xCO2, and 1pctCO2 experiments in CMIP6 (Eyring et al., 2016). We limit our analysis to models that have the variables necessary to compute cloud feedback parameters in four experiment, and a piControl experiment longer than 450 years. This leaves 30 models as listed in Table S1. All anomalies in this study are reference to the monthly climatology of piControl experiment. The abrupt-4xCO2 experiment is used to estimate climate sensitivity and cloud feedback. The 1pctCO2 experiment is used to quantify the cloud radiative response to surface warming under a transient emission scenario. The global analysis of surface temperature observations is obtained from the GISS Surface Temperature Analysis (GISTEMP v4) (Lenssen et al., 2019).
Following Gregory et al. (2004), we calculate the ECS by regressing the global-mean top of the atmosphere (TOA) radiation anomaly on the global-mean surface temperature anomaly of the first 150 years in abrupt-4xCO2 experiment. Half of the x-intercept of the regression is considered as ECS, which is defined in terms of the doubling CO 2 . One thing to note is that this method tends to underestimate the equilibrium climate sensitivity (Armour et al., 2013;Winton et al., 2020).
To calculate the cloud feedback strength, we use the radiative kernels from the GFDL model (Soden et al., 2008) to decompose the radiative response at the TOA into the components due to changes in temperature, water vapor, surface albedo, and clouds. The cloud feedback is defined as the regression slope of the cloud radiative response on global-mean temperature anomaly in the abrupt-4xCO2 experiment.
The radiative response due to changes in clouds in the historical simulation arises from both surface warming induced changes (i.e., cloud feedback) and ACI. Following Soden and Chung (2017), we decompose the total cloud radiative response ( tot Δ c R ) in the historical experiment into two parts: the part due to global-mean surface temperature change and the part due to ACI (the aerosol-mediated cloud radiative response, aer Δ c R ). The aerosol-mediated cloud radiative response includes both the aerosol indirect effect and nonlocal changes in clouds that result from aerosol-induced changes in the large-scale circulation . The surface temperature driven part can be estimated by multiplying the global-mean temperature anomaly and the normalized cloud radiative response parameter α obtained from the corresponding 1pctCO2 experiment for each model. Therefore, the aerosol-mediated cloud radiative response can be expressed as: We note that clouds also exhibit a fast response to CO 2 forcing (Andrews & Forster, 2008 this is included in the α parameter estimated from 1pctCO2 experiment, but not in the cloud feedback estimated from abrupt-4xCO2 experiment. As shown by Soden and Chung (2017), this approach successfully reproduces the estimates of aerosol-induced cloud radiative response calculated using single forcing (i.e., aerosol-only) experiments with fixed SSTs to suppress the surface temperature driven cloud feedbacks. As a further test of the method we use to estimate ACI, our results are highly correlated to the ACI cooling effect estimated by the approximate partial radiative perturbation method in Smith et al. (2020) ( Figure S11).
The independent two-sample t-test is applied to distinguish the statistically significant features between the nine most positive cloud feedback models (the "top nine" or T9) and the nine weakest cloud feedback models (the "bottom nine" or B9). In the main text, we compute the differences in cloud feedback, radiation and temperature between the multimodel ensemble mean of T9 and B9 models. All the plots only show the differences that reject the null hypothesis that the T9 and B9 models have the same multimodel ensemble mean (using a two-sided t-test with a p-value<0.05). As shown in the supplementary material, the conclusions of this study are not sensitive to the number of models chosen to represent the top or bottom range of the intermodel spread in cloud feedback.
We evaluate how well the models simulate the global-mean historical warming by the GOOD HIST index: the absolute difference in historical warming between CMIP6 models and GISTEMP data. The historical warming is defined as the averaged surface temperature in 1990-2014 minus that in 1880-1909. So, the models that are good at simulating the historical warming have a small GOOD HIST index (see values in Table S1).

Cloud Feedback and ECS
In response to increasing CO 2 , models show warming and substantial climate changes that feed back onto the warming, including changes in the amount and distribution of clouds (Wetherald & Manabe, 1988). The part of cloud radiative response (units of W m −2 ) due to a change in global-mean surface temperature (units of K) is defined as the cloud feedback (W m −2 K −1 ). In CMIP6, the cloud feedback tends to be positive and there is a strong relationship between cloud feedback and ECS: models with more positive cloud feedback show higher ECS ( Figure 1a, r 2 = 0.69) (Meehl et al., 2020;Zelinka et al., 2020). This strong ECS-cloud feedback relationship is consistent with previous studies showing that cloud feedback is the dominant source of the uncertainty of climate sensitivity (Cess et al., 1990;Colman, 2003;Dufresne & Bony, 2008;Soden & Held, 2006;Webb et al., 2013;Zelinka et al., 2020).
The spatial pattern of cloud feedback (Figures 2a-2c) differs considerably between the models with the most positive cloud feedback (the "top nine" or T9) and those with the least positive cloud feedback (the "bottom nine" or B9). The more positive global-mean cloud feedback in the T9 models arises principally from a substantially more positive cloud feedback in the Southern Hemisphere. The differences are statistically significant in the southeast regions of the Pacific, Atlantic and Indian Ocean as well as in the Southern Hemisphere midlatitude (Figure 2c). The more positive cloud feedback, mostly due to more positive shortwave low cloud feedback in the Southern Hemisphere midlatitude, is the primary cause of the substantially higher ECS in CMIP6 compared to previous coupled model ensembles (Meehl et al., 2020;Zelinka et al., 2020)

Aerosol-Cloud Interaction in the Historical Period
To better understand the cloud radiative response in the historical period, we investigate CMIP6 models forced by the historical radiative forcing over 1850-2014. The historical experiments allow us to: (i) examine the behaviors of clouds in response to more complex emission scenarios that involve both aerosols and GHGs; and (ii) ascertain the extent to which observations can constrain the range of cloud feedbacks and/ or ECS.
In contrast to the GHGs-only forcing experiment, the total cloud radiative response ( tot Δ c R ) to the more complex historical forcings involves both surface temperature driven and aerosol-mediated changes in clouds.
WANG ET AL.
10.1029/2020GL091024 4 of 10 ACCESS-CM2   Figure 3a), tot Δ c R actually exhibits a cooling effect in the historical simulations; that is, tot Δ c R <0. Even more surprising, the models with the most positive cloud feedback (T9; thick red line) have a larger cloud-induced cooling effect than the models with a weaker cloud feedback (B9; thick blue line in Figure 3a). That is, the models with a more positive cloud feedback in response to CO 2 show a more negative cloud radiative response in historical simulations. This is particularly evident after 1950 (gray shading in Figure 3).
The negative tot Δ c R in T9 models arises almost entirely from a negative aerosol-mediated cloud radiative response ( aer Δ c R ; Figure 3b). In contrast, the B9 models (with weak cloud feedback) have a very small aerosol-mediated cloud radiative response. In other word, models with a more positive cloud feedback (T9) tend to have a more negative aer Δ c R compared to those with a weaker cloud feedback (B9). This occurs despite the B9 models having a more negative value in the TOA clear-sky shortwave forcing ( sw clr F ; Figure 3c), representing the shortwave clear-sky aerosol direct forcing (considering the GHGs have tiny effect on shortwave radiation and the solar irradiation change is small during the historical period). Since the difference between T9 and B9 models in aerosol direct forcing is much smaller than that in ACI, it indicates that, at least for the shortwave, the intermodel difference in the total aerosol forcing is dominated by ACI, not the aerosol direct forcing. The aerosol emissions, especially sulfur dioxide, are almost fixed after 1980, while the GHGs emissions continue to increase (Hoesly et al., 2018). Thus, tot Δ c R increases with increasing global-mean temperature in T9 models (red line in Figure 3a) after 2000, while aer Δ c R remains nearly constant, reflecting a more dominant role of cloud feedback in determining the total cloud radiative response.
In the T9 models, there are four models developed at the same modeling center, which raises the possibility that lack of model independence and a single family of models may bias the composite model results discussed above. To evaluate this possibility, we repeat the analysis using two other groupings. First, selecting only one model per modeling center in our composites and consider the top and bottom six models (T6 and B6). Second, selecting a broader range of models (i.e., further from the extremes) while also restricting the analysis to one model per center (the top and bottom eight models, T8 and B8). These alternative composite analyses (see Supporting Information) reproduce the main aspects of the T9/B9 composite results, indicating robustness relative to the details of model selection within this ensemble. Figure 3b imply a compensation between the cloud feedback from CO 2 -induced surface warming and the aerosol-mediated cloud radiative response. This anticorrelation is more clearly shown in Figure 1b, which compares the global-mean cloud feedback for each model from the abrupt-4xCO2 simulations with the corresponding aer Δ c R from the historical simulations. Models with a more positive cloud feedback tend to have a more negative aer Δ c R (r 2 = 0.60). This helps to explain why models with a higher ECS also tend to have a larger net aerosol radiative cooling effect (Meehl et al., 2020).
The spatial pattern of the cloud radiative response in the historical experiment also differs from that obtained in the abrupt-4xCO2 experiment because of the regional imbalance in the aerosol emission during the historical period. In Figure 2d, the cloud radiative response (1950-2000 mean) is negative in the Northern Hemisphere in high cloud feedback models (T9). The cooling effect of clouds is as large as −4 W m −2 over many of the northern extratropics and subtropics in the T9 models, while the B9 models have little change in    (Figures 2f and 2i). The intermodel differences in the spatial pattern of the aer Δ c R lead to a distinct warming pattern and can be useful to constrain the aerosol indirect effect with observation temperature.

Interhemispheric Warming Asymmetry
Due to the larger cooling effect of the ACI, T9 models simulate slightly colder surface temperature anomalies during the mid to late 20th century compared to the B9 models (Figure 4a), even though the T9 models have a more positive cloud feedback and a higher ECS. While this difference between the B9 and T9 models' surface temperature anomaly is small when globally averaged (and only few scattered years are significantly different-indicated by the gray shading), the hemispheric asymmetry of the historical aerosol forcing induces substantial differences in the interhemispheric warming asymmetry (Figure 4b). Here, we use the surface temperature change in Northern Hemisphere minus that in the Southern Hemisphere to evaluate the interhemispheric warming asymmetry. The meridional asymmetry in the temperature evolution over the late 20th century distinguishes the T9 and B9 models: the T9 models warm more in the SH than the NH during the last century, and the differences in the interhemispheric warming asymmetry between the T9 and B9 models are significant during 1950-2000 (gray shading in Figure 4b).
The observed interhemispheric warming asymmetry over the 20th century is more consistent with the models with weaker cloud feedback and aerosol indirect effect (B9) than those with more positive cloud feedback and aerosol indirect effect (T9). Although the observed global-and annual-mean temperature anomalies are broadly consistent with both sets of models (Figure 4a), the B9 model ensemble mean of B9 more closely reproduces the observed hemispheric contrast in warming over most of the historical period (Figure 4b). The rank of the observed NH-SH temperature anomaly pooled from the B9 model ensemble produces an approximately uniformly distribution, but pooling from T9 model ensemble produces a skewed distribution, indicating that the B9 model ensemble is a more reliable representation ( Figure S1). WANG ET AL.

Discussion
The seeming consistency of global-mean temperature evolution between more positive cloud feedback (high ECS) models and observations requires a strong aerosol indirect cooling effect that leads to an interhemispheric temperature evolution that is inconsistent with observations. Because of the strong negative correlation between a model's cloud feedback in response to CO 2 (and its CO 2 -induced ECS) and its aerosol indirect effect (Figure 1b), the global-mean temperature evolutions in more positive and less positive cloud feedback models are not well separated over the historical period ( Figure 4a) as both CO 2 and aerosol increase. Both more positive (high ECS) and less positive (low ECS) cloud feedback models are able to simulate the observed global-mean temperature record, but T9 models do it through a combination of strong warming from GHGs and strong cooling from aerosols, while B9 models do it with moderate warming from GHGs and modest cooling from aerosols. Because historical aerosol forcing has been larger in the Northern Hemisphere, the strong ACI cooling effect in T9 models produces a distinctive historical interhemispheric surface temperature evolution (red line in Figure 4b), which is inconsistent with that in observations over 1950-2000 (black line in Figure 4b). These results support the recent findings that the CMIP6 models more faithfully capture the observed evolution of surface anomalies across a range of quantities over 1980-2014 tend to have lower 21st century projected warming (Brunner et al., 2020).
Reproducing the observed global-mean temperature evolution over the 20th century is an important test for climate models. It seems unlikely that a model with a more positive cloud feedback and a weak ACI, or vice-versa, could achieve this important benchmark. Thus, the compensation could result from implicit or explicit efforts to tune the representation of clouds in models to reproduce the observed global-mean temperature record when forced with historical emissions (Mauritsen & Roeckner, 2020;Schmidt et al., 2017). Dividing the 30 models into two groups based upon how well they simulate the observed global-mean historical warming (defined in Data and Methods) implies the potential of model tuning based on observed global-mean surface temperature changes in the process of model development as a cause of this compensation relationship. The correlation between the warming effects of cloud feedback and the cooling effect of ACI is higher for the 15 models that better reproduce the observed warming (r 2 = 0.83 filled circles in Figure 1b) compared to those that do not (r 2 = 0.17, open circles in Figure 1b). This helps to reconcile why Meehl et al. (2020) found a significant positive correlation between the total aerosol forcing and ECS, while Smith et al. (2020), in which many of the used models are less consistent with the observed global-mean surface warming, did not. In previous generations of models, which largely did not include the aerosol indirect effect, a significant correlation was found between the aerosol direct forcing and climate sensitivity (Kiehl, 2007;Knutti, 2008). In CMIP5 models, Forster et al. (2013) also found a significant intermodel correlation between the total aerosol forcing and ECS for the models that simulate the historical warming well.
An alternative interpretation of the intermodel correlation between ECS and ACI, is that there could be a physical process that is intrinsic to models that links cloud feedback and ACI. For example, models with a more positive cloud feedback produce a very different temperature change pattern than those with a weak cloud feedback ( Figure S3). Such hemispheric asymmetry in warming can induce changes in the large-scale circulation (Allen, 2015;Allen et al., 2015;Hwang et al., 2013;Ming & Ramaswamy, 2011;Wang et al., 2016), which might lead to differences in aerosol transport and lifetime, or to different changes in cloud regime. Additionally, both ACI and cloud feedbacks can be affected by the mean cloud field, and it is possible that a mean state that is advantageous to larger positive cloud feedbacks also makes large ACI more likely-so that mean cloud biases could impact ACI and cloud feedback, and lead to the intermodel correlation. However, we have not been able to find evidence for this interpretation in our analyses.
The differences in the spatial pattern of warming induced by strong ACI also impact many other aspects of the simulated response to anthropogenic forcing. For example, the meridional structure of sea surface temperature and heating changes has been connected to the evolution of tropical rainfall (Deser et al., 2020;Jacobson et al., 2020;Kang et al, 2008Kang et al, , 2014Xie et al., 2010;Yang et al., 2019) and tropical cyclone activity (Booth et al., 2012;Merlis et al., 2013;Vecchi & Soden, 2007;Villarini & Vecchi, 2012Yang et al., 2019). The extent to which observed changes in the meridional structure of rainfall and tropical cyclone activity can be ascribed to past radiative forcing change will also depend in part on the realism of the cloud response to historical aerosol forcing (Booth et al., 2012;Zhang et al., 2013).