Why Do CO2 Quadrupling Simulations Warm More Than Twice as Much as CO2 Doubling Simulations in CMIP6?

We compare abrupt CO2‐quadrupling (abrupt‐4xCO2) and ‐doubling (abrupt‐2xCO2) simulations across 10 CMIP6 models. Two models (CESM2 and MRI‐ESM2‐0) warm substantially more than twice as much under abrupt‐4xCO2 than abrupt‐2xCO2, which cannot be explained by the non‐logarithmic scaling of CO2 forcing. Using an energy balance model, we show that increased warming rates within these two models are driven by both less‐negative radiative feedbacks and smaller global effective heat capacity under abrupt‐4xCO2. These differences are caused by a decrease in low cloud cover and shallower ocean heat storage, respectively; both are linked to smaller fractional declines in the Atlantic Meridional Overturning Circulation (AMOC) under abrupt‐4xCO2 (relative to abrupt‐2xCO2). On a global scale, higher climate sensitivity under larger forcing can be explained by a feedback‐temperature dependence; however, we find that forcing‐dependent spatial warming patterns due to AMOC decline are an important physical mechanism which reduces warming in a way that is not captured by a global‐mean framework.


Introduction
Understanding Earth's climate response to greenhouse gas forcing is essential for climate adaptation and mitigation.A key metric of the global climate response which is correlated with transient warming in climate models (Grose et al., 2018;Sherwood et al., 2020) is the effective climate sensitivity (EffCS).EffCS is often calculated by extrapolating to equilibrium the regression between the global top-of-atmosphere (TOA) radiation anomaly and the global near-surface air temperature anomaly of abrupt-quadrupling CO 2 experiments, often referred to as a "Gregory regression" (Gregory et al., 2004).However, quadrupling CO 2 concentrations from pre-industrial greatly exceeds present radiative forcing change; thus it is an open question as to whether abrupt-4xCO2 experiments properly represent changes at lower warming levels.
In this paper, we will compare how and why the transient response to abrupt changes of CO 2 varies across forcing levels using an ensemble of 10 general circulation models (GCMs) from the Coupled Model Intercomparison Project Phase 6 (CMIP6; Eyring et al., 2016) for which experiments of both abrupt-doubling (abrupt-2xCO2) and abrupt-quadrupling (abrupt-4xCO2) from pre-industrial CO 2 levels are available.We focus on two models in particular, CESM2 and MRI-ESM2-0, which show the largest differences in climate response at higher forcing.

Abrupt-2xCO2 and Abrupt-4xCO2 Simulations
This study uses annually averaged CMIP6 data (Eyring et al., 2016;Tables S1-S3 in Supporting Information S1).The ensemble mean refers to a multimodel mean excluding CESM2 and MRI-ESM2-0, such that the two models of interest can be compared to the rest of the ensemble.Occasionally, we use the entire 10 model ensemble mean in statistical significance testing.When performing an ensemble mean across zonal or global structure, each model is interpolated to a common latitude-longitude grid (taken from CanESM5).We calculate anomalies by subtracting the mean of the final 100 years of the pre-industrial control (piControl) from the abrupt-4xCO2 or abrupt-2xCO2 data, since we did not find significant drift in the last 100 years of the piControl surface air temperature.Since our analysis constitutes comparisons of a model with itself at a different forcings, timedependent drift will not impact results as long as the scenarios have the same branch point (see Table S1 in Supporting Information S1).
Using the mean ERF (estimated as the y-intercept of a Gregory regression over years 1-150) across our 10 models, we found the mean scaling factor is 0.488 (Table 1c), which is similar to the aforementioned IPCC AR6 value.Compared to ERF values calculated using Fixed-SST abrupt-4xCO2 runs (Smith et al., 2020), our abrupt-4xCO2 ERF values are 0.064 ± 0.33 W/m 2 different.However, Fixed-SST runs do not exist for abrupt-2xCO2, so we cannot determine the precision of our scaling factors.
We have performed this analysis using several scaling factors including μ = 0.5, the IPCC scaling value (μ = 0.476) and the model-specific ratio for the years 1-150 (an ensemble mean value of μ = 0.488).Our conclusions are robust across these scaling factors.Hereafter, we will use the model-specific scaling factor calculated over the years 1-150 in Table 1c, and we will refer to results from abrupt-4xCO2 experiments multiplied by the model-specific scaling factor as "scaled" simulations.

Transient Warming and Effective Equilibrium Climate Sensitivity
The time series of globally averaged annual surface air temperature anomaly shows that the transient warming in abrupt-2xCO2 experiments is smaller compared to the scaled warming in abrupt-4xCO2 experiments in CMIP6 models (Figure 1a).To quantify warming differences across forcings, we consider a percent difference defined as: where T 4x and T 2x are the final 20-year average temperatures in the scaled abrupt-4xCO2 and abrupt-2xCO2 runs respectively.When we compare the final 20-year average of the temperature anomaly, CESM2 and MRI-ESM2-0 have 44.8% and 37.9% less warming in abrupt-2xCO2 than scaled abrupt-4xCO2, while the ensemble average difference is 4.5% (Figure 1a).Note.(a) The slope of the temperature anomaly against time for years 21 to 150, (b) the effective climate sensitivity, (c) the effective radiative forcing estimated from years 1 to 150, and (d) the net radiative feedback parameter estimated from years 1 to 150 for abrupt-2xCO2.Each statistic includes the (a, b, d) scaled abrupt-4xCO2 value or (c) the unscaled abrupt-4xCO2, (a, b, d) the percent difference (Equation 1), or (c) the ratio of abrupt-2xCO2 and abrupt-4xCO2.The ensemble mean percent difference, ratio, and their respective standard deviations are calculated as the mean of each model's percent difference/ratio and the standard deviation of that column.All of the linear regressions have p < 0.05 (*) except for the column (a) MRI-ESM2-0 abrupt-2xCO2, which has a p-value of p = 0.0503.Using a t-test, we find that the difference in the radiative feeback parameter between forcings is significant ( p < 0.05) for CanESM5, CESM2, IPSL-CM6A-LR, HadGEM3-GC3-1-LL, and MRI-ESM2-0 (Table S4 in Supporting Information S1).Our models of interest, CESM2 and MRI-ESM2-0 are bolded.
The substantially (44.8% and 37.9%) smaller abrupt-2xCO2 warming compared to the scaled abrupt-4xCO2 warming in MRI-ESM2-0 and CESM2 (respectively) cannot be explained by non-logarithmic CO 2 -forcing in these models because we scaled the simulations using model-specific ERF ratios (Table 1c).Additionally, CESM2 and MRI-ESM2-0 have the largest warming rate difference between abrupt-2xCO2 and abrupt-4xCO2, as in, the most suppressed warming rate at abrupt-2xCO2 compared to abrupt-4xCO2 (Table 1a).The rate of change in temperature over time (the slope of a linear regression of annual mean temperature over time) for MRI-ESM2-0 is negative for years 21-150 (the period after the initial rapid warming; Figure 1a).In other words, MRI-ESM2-0 has stalled in its warming over the final century of the 2xCO2 run (but not in its abrupt-4xCO2 run).In contrast, GISS-E2-2-G shows the reverse behavior wherein the rate of warming at abrupt-4xCO2 is near-zero.
We see a similar pattern when comparing the EffCS across forcings.We perform a Gregory regression assuming the standard model of global energy balance, defined as: where R is the TOA radiative anomaly, T is the near-surface air temperature anomaly, F is the ERF, and λ is the radiative feedback parameter.If the forcing is a doubling of CO 2 (F 2x ) or scaled to a doubling of CO 2 (e.g., μF 4x , where μ is the model specific scaling factor), the EffCS can be estimated as the x-intercept, or the temperature when the excess TOA radiative flux reaches zero: EffCS where λ 2x and λ 4x are the net radiative feedbacks for abrupt-2xCO2 and abrupt-4xCO2, respectively.The abrupt-2xCO2 EffCS is smaller than the abrupt-4xCO2 EffCS in nearly all CMIP6 models studied here (Table 1b); however, CESM2 and MRI-ESM2-0 have the largest difference (47.3% and 23.7%, respectively) owing to large differences in their radiative feedback parameters (Table 1d).Interestingly, GISS-E2-2-G has only a 5.5% difference in EffCS and a 5.7% difference in the radiative feedback parameter between abrupt-2xCO2 and abrupt-4xCO2 (Table 1b, d).
While previous studies (e.g., Bloch-Johnson et al., 2021) have focused on the dependence of transient warming on differences in radiative feedbacks (and thus EffCS) between warming levels, here we also consider the global effective heat capacity.The role of effective heat capacity can be seen by extending the linear energy balance model (Equation 2) to include time dependence (Donohoe et al., 2014): where C is the heat capacity of the Earth system and t is time.In our energy balance equation, the time evolution of temperature is controlled by three variables: the heat capacity C (in units of J/m 2 /K), the ERF (W/m 2 ), and the radiative feedback parameter (W/m 2 /K).We imagine a model in which warming signals penetrate downward through the ocean over a depth that may depend on climate state.Heat absorbed by the Earth system corresponds with either an increase in surface air temperature or an increase in the effective depth over which a temperature anomaly is distributed (an increase in effective heat capacity).We can calculate the heat capacity using the following equation: In what follows, we will compare CESM2 and MRI-ESM2-0 to the remaining eight-model ensemble mean to explore the following research questions: 1. What are the roles of radiative feedbacks and effective heat capacity in driving reduced warming in abrupt-2xCO2 relative to the scaled abrupt-4xCO2?2. What mechanisms cause differences in radiative feedbacks and effective heat capacity between abrupt-2xCO2 and scaled abrupt-4xCO2 scenarios?

Net Radiative Feedback Across Forcings
A more-negative radiative feedback parameter (and consequently, lower EffCS) in abrupt-2xCO2 than in abrupt-4xCO2 contributes to reduced warming in abrupt-2xCO2 compared to scaled abrupt-4xCO2.In Table 1d, we see that the radiative feedback parameter (calculated over 150 years) across the mean ensemble is more negative in abrupt-2xCO2 ( 1.12 ± 0.38 W/m 2 /K) compared to abrupt-4xCO2 ( 1.06 ± 0.40 W/m 2 /K) with an ensemblemean percent difference of 8.20% ± 8.36.GISS-E2-1-G and GISS-E2-2-G are the only models where the radiative feedback is more positive in abrupt-2xCO2.CESM2 and MRI-ESM2-0 are once again outliers, in that the radiative feedback parameter has the largest difference across forcings, with a percent difference of 92.1% and 30.8%, respectively (Table 1d; Figure 1b; Table S4 in Supporting Information S1).

The Effective Heat Capacity Across Forcings
The time series of heat capacity for CESM2 and MRI-ESM2-0 also differs from the ensemble mean (Figure 1c).The ensemble mean heat capacity is nearly identical across forcings; the final 20-year average effective heat capacity is 0.050 ± 0.28 GJ/m 2 /K larger at abrupt-4xCO2 than abrupt-2xCO2 (increasing from 1.85 GJ/m2/K at abrupt-2xCO2 and 1.90 GJ/m2/K at abrupt-4xCO), or, equivalently, the effective depth of heat storage is 11 ± 65 m deeper in abrupt-4xCO2 than abrupt-2xCO2 (Table 2a).However, in CESM2 and MRI-ESM2-0, the abrupt-2xCO2 heat capacity is substantially larger than that of abrupt-4xCO2 at all times (Figure 1c).The final heat capacity in CESM2 and MRI-ESM2-0 is 0.8 and 1.5 GJ/m 2 /K larger in abrupt-2xCO2 than in abrupt-4xCO2, nearly 190 and 350 m deeper, respectively.In contrast, GISS-E2-2-G has a larger effective heat capacity at abrupt-4xCO2 than abrupt-2xCO2 and subsequently a deeper effective depth of heat storage (155 m deeper; Table 2a); this may be related to the suppressed abrupt-4xCO2 warming rate in GISS-E2-2-G.

The Relative Importance of the Net Radiative Feedback and the Effective Heat Capacity
To explore the relative importance of feedbacks and heat capacity in setting the differences in global warming rate between forcing levels, we consider the energy balance model (Equation 4) in the following form: We use values of the radiative feedback parameter, the effective radiative forcing, and the effective heat capacity calculated over 150 years (Table S6 in Supporting Information S1) to recreate a global-mean time series for each model.To fit the temperature time series, we use a 1-year time step, and heat capacity data filtered using a Savitzky-Golay filter, a method of locally estimated scatterplot smoothing (LOESS; Chen et al., 2004;Savitzky & Golay, 1964), 1-degree polynomial fit with a window of 5 years to reduce the impact of natural variability.The EBM fits the global-mean annual temperature data well with an average error of ±0.08 and ± 0.14 K for the abrupt-4xCO2 and abrupt-2xCO2 ensemble mean data, respectively (Table S6, Figure S1 in Supporting Information S1).
We run two EBM experiments to calculate the relative importance of the radiative feedback parameter and the effective heat capacity: (a) the radiative feedback parameter from abrupt-2xCO2 and the effective heat capacity from abrupt-4xCO2 (referred to as the "λ 2x /C 4x " experiment), and (b) the radiative feedback from abrupt-4xCO2 and the effective heat capacity from abrupt-2xCO2 (referred to as the "λ 4x /C 2x " experiment).The EBM time serieses are in Figure 1f.Thus, the relative importance of the radiative feedback is, and the relative importance of the effective heat capacity is, with a residual of To estimate each relative importance, we use the final 20-year average of the global-mean temperature time series (Table 2b).Five models have a relative importance of effective heat capacity larger than the ensemble average: CESM2, GISS-E2-1-G, GISS-E2-2-G, HadGEM3-GC31-LL, and MRI-ESM2-0 (Table 2b).For CESM2, the radiative feedback parameter accounts for 80.8% and the effective heat capacity accounts for 42.2% of the difference between scaled abrupt-4xCO2 and abrupt-2xCO2 runs (with a residual of 23.0%).In MRI-ESM2, the radiative feedback parameter accounts for 49.5% and the effective heat capacity accounts for 69.7 of that difference (with a residual of 17.3%).Interestingly GISS-E2-1-G, GISS-E2-2-G, and HadGEM3-GC31-LL also  Note.The final 20 years average of (a) the effective heat capacity, (b) the relative importance of the radiative feedbacks (Equation 7) and the effective heat capacity (Equation 8), (c) the change in AMOC decline from pre-industrial, and (d) the global-mean depth of 55% heat storage for (a, c, d) abrupt-2xCO2, abrupt-4xCO2, (a, d) the difference between abrupt-2xCO2 and abrupt-4xCO2, (b) the residual relative importance (Equation 9), and (c) the ratio of the abrupt-2xCO2 and the unscaled abrupt-4xCO2, represented as a percent.See Table S6 in Supporting Information S1 for the average fit error of EBM results.GISS-E2-1-G does not have piControl streamfunction data available, so the first 10 and final 20 years are used.The ensemble mean is only included for analysis where all 10 model data are available.Our models of interest, CESM2 and MRI-ESM2-0 are bolded.
show a large role for heat capacity changes, with the radiative feedback parameter accounting for 11.0%, 30.7%, and 65.6% and the effective heat capacity accounts for 88.0%, 63.7%, and 39.2% of that difference (with a residual of 1.7%, 6.2%, and 4.8%), respectively; although these three models show little difference between the abrupt-2xCO2 and scaled abrupt-4xCO2 EffCS ( 0.01, 0.13, and 0.48 K difference in EffCS; Table 1a, b).Contrastingly, for the ensemble, the radiative feedback parameter accounts for 102.9% and the effective heat capacity only accounts for 3.1% of the difference in warming between scaled abrupt-4xCO2 and abrupt-2xCO2 (with a residual of 0.3%).
The EBM has a nonlinear dependence on the parameters λ and C. Thus, we expect that models with large differences in parameters (the radiative feedback parameter and effective heat capacity) across forcings to have large residuals, as is the case for CESM2 and MRI-ESM2-0.In summary: 1.For CESM2, a more-negative radiative feedback parameter is the dominant mechanism and a larger effective heat capacity is a significant mechanism (compared to the ensemble) in the slowed warming in abrupt-2xCO2 simulations, and 2. For MRI-ESM2-0, the effective heat capacity is the dominant mechanism causing less warming in abrupt-2xCO2 simulations.

Mechanisms of Radiative Feedback Changes
The phenomenon of EffCS increasing under larger forcing is often characterized by a global-mean second-order feedback temperature dependence (e.g., Bloch-Johnson et al., 2021).However, we know that the spatial pattern of warming also has an impact on sensitivity (e.g., Armour et al., 2013;Dong et al., 2020;Rose et al., 2014;Zhou et al., 2017).If the spatial pattern of warming changes with forcing level (Good et al., 2015;Lin et al., 2019;Mitevski et al., 2023), then a global-mean framework would not be sufficient to understand changes in sensitivity with forcing.Thus, we analyze the spatial structure of the global temperature anomaly (Figure 2a) and radiative feedbacks (Figure 2b).
Our decomposition of the global-mean Gregory regression finds that CESM2 and MRI-ESM2-0 exhibit the largest difference in the shortwave cloud feedback across forcings, increasing 0.23 and 0.19 W/m 2 /K from abrupt-2xCO2 to abrupt-4xCO2, which accounts for 36.8% and 121.2% of the total feedback difference across forcings, respectively (Table S5 in Supporting Information S1; Andrews et al., 2012).Therefore, we focus on the shortwave cloud budget hereafter.Both models also see an increase in the longwave clear sky feedback with forcing (Table S5 in Supporting Information S1), consistent with previously proposed mechanisms for a positive second-order temperature dependence (Bloch-Johnson et al., 2021).
We estimate the spatial structure of the shortwave cloud feedback using the approximate partial radiative perturbation (APRP) method (Taylor et al., 2007) wherein we divide the changes in the spatial energy budget calculated from APRP by the global mean temperature anomaly for the years 1-150.We find that CESM2 and MRI-ESM2-0 have a strongly negative shortwave cloud feedback in the North Atlantic, south of Greenland, in abrupt-2xCO2 likely due to an increase in low cloud cover (Figure 2b; Ceppi et al., 2016Ceppi et al., , 2017)).In contrast, the ensemble shortwave cloud feedback is more negative in abrupt-4xCO2 than abrupt-2xCO2.Additionally, CESM2 and MRI-ESM2-0 stand out as having cooling in the North Atlantic indicative of the North Atlantic Warming Hole (NAWH) in abrupt-2xCO2 and substantially less warming in the same location in abrupt-4xCO2 (Figure 2a).The enhanced NAWH low cloud cover (Figures 2a and 2b; Jackson et al., 2015;Zhang et al., 2010) under abrupt-2xCO2 for CESM2 and MRI-ESM2-0 is consistent with Atlantic Meridional Overturning Circulation (AMOC) decline.AMOC decline has a marked impact on global energy transport (Drijfhout, 2015;Kostov et al., 2014;Liu et al., 2020;Mecking & Drijfhout, 2023;Poletti et al., 2024); particularly, AMOC decline impacts the spatial pattern of warming by reducing the northward transport of heat, leading to colder SSTs in the subpolar North Atlantic, increased estimated inversion strength, and enhanced low cloud cover at the NAWH (Drijfhout, 2015;Drijfhout et al., 2012;Hu et al., 2020;Lin et al., 2019;Mitevski et al., 2023;Rugenstein et al., 2013;Trossman et al., 2016), just as we see in CESM2 and MRI-ESM2-0 in abrupt-2xCO2.The APRP and spatial feedback results for clear sky, albedo, and net effects are in supporting Figures S3 and S4 in Supporting Information S1.

Ocean Dynamics and the Atlantic Meridional Overturning Circulation
Differences in AMOC decline across models can be linked to differences in hemispheric warming (Lin et al., 2019) and in ocean heat storage and uptake (Frolicher et al., 2015) across models.Given the impact of North Atlantic Ocean dynamics on the spatial pattern of warming, we further investigate the AMOC strength anomaly and the effective depth of heat storage.Lin et al. (2023) found that models with a stronger climatological AMOC experience larger declines at abrupt-4xCO2 and in 1% ramping experiments.However, AMOC decline may not scale with forcing; as in some models may approach a nearly complete AMOC shutdown under abrupt-2xCO2, leaving little room for further decrease in abrupt-4xCO2.Since the impact that AMOC decline has on transient global warming scales with the magnitude of the decline (Hu et al., 2020), a disproportionate AMOC decline in lower forcing scenarios would introduce a disproportionate decrease in global warming rate in abrupt-2xCO2 compared to abrupt-4xCO2 due to a lower limit in AMOC strength (i.e., if AMOC nearly collapses under abrupt-2xCO2, it cannot decline any more than that under abrupt-4xCO2).
We define the AMOC strength (Ψ) as the maximum streamfunction anomaly value north of 30°N and below 500 m (Lin et al., 2019) and the AMOC decline as the difference between the piControl and final 20-year average AMOC strength.In abrupt-4xCO2, the ensemble average AMOC strength decreases while in abrupt-2xCO2, the ensemble average AMOC strength decreases and then partially recovers (Figure 1d).The abrupt-2xCO2 ensemble average has larger differences across models, particularly during the decades of AMOC recovery.
CESM2 and MRI-ESM2-0 have the most disproportionate AMOC declines in abrupt-2xCO2 compared to abrupt-4xCO2 (Figure 1d).CESM2 and MRI-ESM2-0 decline 72% and 95% of the abrupt-4xCO2 value under abrupt-2xCO2 forcing (Table 2c).CNRM-CM6A-LR also shows a larger AMOC decline in abrupt-2xCO2 than abrupt-4xCO2 with a ratio of 62% (Table 2d) but without large changes in EffCS (Table 1b) or effective heat capacity (Table 2a) across forcings.Since we only have a subset of the total ensemble in our AMOC analysis, we cannot compare the AMOC decline ratios to an ensemble mean.Therefore, it's possible that CNRM-CM6A-LR has a typical abrupt-2xCO2 AMOC decline (when compared to the theoretical ensemble mean) which is not large enough to cause the nonlinearities seen in CESM2 and MRI-ESM2-0.In contrast, abrupt-2xCO2 AMOC decline in CanESM5 is 38% of the abrupt-4xCO2 value.GISS-E2-1-G is the only model to experience AMOC recovery to its initial strength in abrupt-2xCO2 (as in, the AMOC strength declines and then strengthens).Mitevski et al. (2021) shows that GISS-E2-1-G has a disproportionately large AMOC decline and smaller EffCS at abrupt-3xCO2, suggesting a large degree of model dependence in the dynamics identified here.
We can use an analysis of column integrated heat storage to find the depth over which most of the anomalous heat is stored: T(t,z′,λ,θ) We can solve for the depth z at which some percentage of heat storage has accumulated.In comparing the heat capacity time series and the global depth of heat storage time series, we found that the depth of 55% heat storage was most representative of the heat capacity (see Supporting Information S1); however, these results are robust between 50% and 80% (Figure S5 in Supporting Information S1).
While the Southern Ocean is known to dominate ocean heat storage (Frolicher et al., 2015;Winton et al., 2013), we focus our analysis on the depth of heat storage since it is correlated with the effective heat capacity (Donohoe et al., 2014).We find that the difference in depth of ocean heat storage from abrupt-4xCO2 to abrupt-2xCO2 is roughly equivalent in the Southern and Northern oceans.
All of the models show larger peaks in depth of heat storage in the Southern Ocean and the North Atlantic/ North Pacific in abrupt-2xCO2 simulations.CESM2 has enhanced depth of heat storage in abrupt-2xCO2 at all latitudes, compared to the ensemble mean, but especially in the North Atlantic/North Pacific and the Southern Ocean.However, the difference in depth of heat storage for MRI-ESM2-0 is substantially larger than the ensemble mean only in the North Atlantic/North Pacific.Thus, AMOC decline may act as a mechanism of enhanced depth of heat storage through surface cooling in the subpolar North Atlantic (Kostov et al., 2014;Winton et al., 2013).
Given the North Hemisphere signal in the zonal structure, we next consider the final 20-year average depth of heat storage in the Northern Hemisphere (Table 2d; Figure S7 in Supporting Information S1).CESM2, MRI-ESM2-0, and GISS-E2-1-G have the deepest heat storage in abrupt-2xCO2 (but not necessarily at abrupt-4xCO2) and the largest difference between abrupt-2xCO2 and abrupt-4xCO2.Thus, the three models with the largest effective heat capacity (Figure 1c), the largest importance of heat capacity in determining transient warming, and the deepest abrupt-2xCO2 ocean heat storage are also the three models with disproportionate AMOC declines (either at abrupt-2xCO2 or abrupt-3xCO2; Table 2; Mitevski et al., 2021).Possibly, ocean stratification, more broadly, scales nonlinearly with forcing; however, the exact mechanisms linking AMOC decline to changes in global depth of heat storage and effective heat capacity deserve further study.

Conclusions
We began this study with the observation that the rate of transient warming in CESM2 and MRI-ESM2-0 are significantly reduced in abrupt-2xCO2 compared to scaled abrupt-4xCO2 simulations, even when accounting for the non-logarithmic scaling of CO 2 -forcing with concentration.Previous literature attributes differences in warming rates across forcing levels to differences in radiative feedbacks and thus EffCS (e.g., Meraner et al., 2013).These differences are often characterized in terms of a global-mean second-order feedbacktemperature dependence (e.g., Bloch-Johnson et al., 2021).However, we find that both radiative feedbacks and the global effective heat capacity are necessary in explaining warming differences between abrupt-2xCO2 and abrupt-4xCO2 simulations of most models.Additionally, radiative feedbacks and the heat capacity appear to be linked by spatial warming pattern changes possibly caused by nonlinearities in AMOC decline with forcing.Thus, a global-mean framework would be insufficient for explaining higher model sensitivity at higher forcings.
For CESM2 and MRI-ESM2, the more-negative radiative feedback parameter in abrupt-2xCO2 is in part caused by reduced shortwave cloud feedbacks from disproportionately large AMOC decline.The larger effective heat capacity in abrupt-2xCO2 is reflected in enhanced depth of ocean heat storage, particularly in the Southern Ocean and Atlantic Ocean.Therefore, a large effective heat capacity and enhanced depth of Northern Hemisphere heat storage are correlated with models that exhibit large AMOC declines at abrupt-2xCO2.This indicates a potential mechanism where AMOC decline can slow global warming through the depth of ocean heat storage and enhanced low cloud cover.This mechanism is relevant at any forcing, not just abrupt-2xCO2.For instance, GISS-E2-1-G experiences disproportionate AMOC decline at abrupt-3xCO2 (Mitevski et al., 2021) rather than abrupt-2xCO2.Thus, in other GCMs it may be other forcings levels which experience suppressed warming due to AMOC nonlinearity.
This study is constrained first by 150-year runs (while ocean interactions can occur on centennial to millennial timescales; Ackerman et al., 2020;Cheng et al., 2013;Weaver et al., 2012), second by the number of available abrupt-2xCO2 data sets, and third by the lack of Fixed-SST runs for abrupt-2xCO2 scenarios to more accurately estimate the ratio of the radiative forcings.
Ultimately, nonlinearities in the magnitude of AMOC decline across forcings are a possible cause of reduced warming in CESM2 and MRI-ESM2-0 under an abrupt-doubling of CO 2 , through both radiative feedbacks and the effective heat capacity.The phenomenon where AMOC decline is proportionally larger at lower forcings (Drijfhout, 2015;Good et al., 2015;Mitevski et al., 2021;Rugenstein et al., 2016) is well documented.The magnitude of AMOC decline is highly variable across models (Cheng et al., 2013;Frolicher et al., 2015;Lin et al., 2023) and introduces significant variability in the global temperature pattern (Lin et al., 2019).In some GCMs, the AMOC declines by well over 50% of its initial strength under abrupt-doubling of CO 2 , which makes it impossible to see a proportional decline under abrupt-quadrupling unless the circulation is to reverse.Thus, forcing-dependent changes in ocean dynamics can dramatically impact the rate of transient warming across GCMs while introducing significant variability across models.
Abrupt-4xCO2 simulations are still used definitionally to calculate effective climate sensitivity (EffCS), and these EffCS values have been argued to govern future warming independent of the scenario used (e.g., Sherwood et al., 2020;IPCC AR6).We've demonstrated that this is a poor assumption within a subset of CMIP6 models for which AMOC decline with forcing is nonlinear with forcing level.Second, we have analyzed the mechanisms by which a nonlinear AMOC decline with forcing level causes warming under 2xCO2 to be substantially less than warming under 4xCO2 (after normalizing by ERF differences).Here, we find links between AMOC decline, spatial patterns of SST changes, and radiative feedbacks-similar to several previous analyses (e.g., Drijfhout, 2015;Drijfhout et al., 2012;Lin et al., 2019;Trossman et al., 2016) but at odds with the common assumption that feedback differences between 2xCO2 and 4xCO2 scale with global-mean warming (Bloch-Johnson et al., 2021).
In summary, our analysis reveals that AMOC nonlinearity with CO 2 forcing level affects global-mean warming in large part through changes in the effective heat capacity, rather than feedback changes alone.When AMOC changes are substantially different under lower forcing, abrupt-quadrupling experiments alone will not constrain effective equilibrium climate sensitivity.As we exceed 1.5 times pre-industrial CO 2 concentrations (UK Met Office, 2021), simulations of climate and ocean dynamics across different forcings become increasingly relevant for projecting global warming in the 21st century.

Figure 1 .
Figure 1.(a) Surface temperature anomaly against time, (b) the net TOA radiative flux against surface temperature anomaly (see Equation 2), (c) the effective heat capacity against time for abrupt-2xCO2 and scaled abrupt-4xCO2.The legend beneath the first row corresponds with panels (a)-(c).(d) The AMOC strength anomaly against time.(e) The difference in the zonal-mean depth of 55% heat storage between abrupt-2xCO2 and abrupt-4xCO2 (abrupt-2xCO2 minus abrupt-4xCO2) after the first 20 years.(f) The EBM surface temperature anomaly against time.(a)-(f) We use annual mean data which is then (a, c, d, f) filtered using a Savitzky-Golay with a 31-year filter (temperature) or a 5-year (heat capacity and AMOC strength) filter to enhance readability.The abrupt-4xCO2 data is scaled in (a-c, f) and unscaled in (d).

Table 1
Comparison of Warming, ERF, and Feedbacks in

Table 2
Comparison of Heat Capacity, AMOC Decline, and Depth of Heat Storage in