Time‐Evolving Radiative Feedbacks in the Historical Period

We investigate the time‐dependence of radiative feedback in the historical period (since the late 19th century), by analyzing experiments using coupled atmosphere–ocean climate models with historical greenhouse gas, anthropogenic aerosol, and natural forcings, each separately. We find that radiative feedback depends on forcing agent, primarily through the effect of cloud on shortwave radiation, because the various forcings cause different changes in global‐mean tropospheric stability per degree of global‐mean temperature change. The large time‐variation of historical feedback driven by observed sea surface temperature change alone, with no forcing agents, is also consistent with tropospheric stability change, and differs from the similarly large and significant historical time‐variation of feedback that is simulated in response to all forcing agents together. We show that the latter results from the varying relative sizes of individual forcings. We highlight that volcanic forcing is especially important for understanding the time‐variation, because it stimulates particularly strong feedbacks that tend to reduce effective climate sensitivity. We relate this to stability changes due to enhanced surface temperature response in the Indo‐Pacific warm pool.

The more recent CMIP6 data set (Phase 6 of CMIP) includes the experiments of the single-forcing Detection and Attribution Model Intercomparison Project (DAMIP; Gillett et al., 2016) with coupled atmosphere-ocean GCMs (AOGCMs), and the Radiative Forcing Model Intercomparison Project (RFMIP; Pincus et al., 2016) with atmosphere-only GCMs (AGCMs).The latter allow model-specific diagnosis of historical effective radiative forcings as a function of time.This provides an advantage over the CMIP5 data set available to Gregory et al. (2020), who assumed the same forcing for all AOGCMs in the absence of further information.Dong et al. (2021) showed that feedbacks in CMIP6 AOGCM historical experiments are mainly driven by GHG forcing from around 1940 onward.In the present work, we use CMIP6 experiments to explain time-dependence of historical feedback in terms of the time-variations of and differences among the forcing agents.
CMIP5 and CMIP6 AOGCMs generally fail to reproduce the observed historical SST pattern of recent decades, especially in the Pacific, and the consequent large positive α (Andrews et al., 2018(Andrews et al., , 2022;;Dong et al., 2019Dong et al., , 2021;;Gregory & Andrews, 2016;Gregory et al., 2020;Wills et al., 2022).Since AOGCMs are essential for producing physically based estimates of future warming, we need to know why they do not reproduce the past in this respect.Not having a multi-centennial observational SST record for comparison, we cannot tell whether the relevant SST pattern is an unusual unforced variation, or a response to recent forcing.In either case, improved understanding of simulated historical feedbacks will be useful in explaining the difference from the observed record.
The objectives of this paper are to investigate the evolution of radiative feedbacks in the historical period, and to explain this through the time-varying mix of historical forcing agents.The following sections are organized thus: Section 2 details the methods and data used.Section 3 is a short section confirming the additivity of individual forcings.Section 4 is an analysis of historically relevant forcing agents and their mean associated radiative feedbacks across the historical period.Section 5 discusses how feedbacks for different historical forcing agents evolve over time, whilst Section 6 assesses the extent to which we can explain the evolution of historical feedbacks through the mix of historical forcing agents.Section 7 then discusses possible reasons for the significantly less-amplifying feedbacks from volcanic eruptions, which are relevant to the evolution of historical feedbacks.Finally, Section 8 provides a summary and conclusions of this work, and outlines needed future work.

Experiments, Models, and Data Processing
The data used in this paper are detailed in Table 1, and were obtained from the Earth System Grid Federation archives, which hold CMIP data from multiple centers (Cinquini et al., 2014).The data include historical single-forcing simulations with both AOGCMs (hist-x, from DAMIP; Gillett et al., 2016) and AGCMs with fixed SSTs (piClim-histx, from RFMIP; Pincus et al., 2016), where x is a short name for the forcing agent.We chose variants with the same initialization ("i1" variant label), physics and forcing definition as in piControl, except where noted in Table 1.We included all realizations that contained all of the required variables, and our results from each experiment of each model are ensemble averages.We calculated multi-model mean (MMM) values from the individual model ensemble averages, with equal weighting for each model.Also included are data from the amip-piForcing experiments which use AGCMs with SSTs prescribed according to the Atmosphere Model Intercomparison Project II data set, derived from historical observations (Andrews et al., 2018(Andrews et al., , 2022;;Gregory & Andrews, 2016;Hurrell et al., 2008;Webb et al., 2017).Only single ensemble members were included for each model for these experiments, but this is not expected to have a substantial impact on the results, because feedback time-variations in amip-piForcing are largely dictated by the prescribed SSTs, since atmospheric unforced variability is comparatively small (Gregory & Andrews, 2016).The set of models with amip-piForcing experiment outputs differs slightly from that used for historical single-forcing experiments (Table 1), but this is unimportant for the results discussed below.
Global averages refer to area-weighted averages of the variables of interest, except for stability change S which refers to the average change in EIS (Wood & Bretherton, 2006) over oceans equatorwards of 50°S and 50°N.This choice follows Ceppi and Gregory (2019) who found that area-mean EIS change in this region is well correlated with the global radiative response in observations and in two GCMs.

Calculating Forcing and Climate Response
We calculate the forcing and the surface-temperature-mediated climate response from the data detailed in Table 1 for each forcing x.
The TOA radiative imbalance, N piClimx , from the fixed-SST experiment piClim-x relative to piClim-control equals the effective radiative forcing F x according to Equation 1, since R ≈ 0 with SST not allowed to change.The surface forcing is evaluated in a similar way, as the difference in net downward surface heat flux between piClim-x and piClim-control, and includes rapid adjustment (like the ERF).Unlike TOA forcing, the surface forcing has sensible and latent heat as well as radiative flux components.The time-mean global averages of the TOA and surface forcings are equal if energy is conserved, since the global heat content of the atmosphere must have zero trend in the steady state of the piClim-x experiment, but locally they are not equal.
The TOA radiative imbalance, N histx , from the coupled experiment hist-x relative to piControl includes both forcing and response that is, to isolate TOA radiative changes R x that are mediated by SST change due to agent x.

Volcanic Events
We identify radiatively significant volcanic forcing events by inspecting the time series of effective forcing in piClim-histnat.We used SciPy's peak finding algorithm, which uses a simple comparison of neighboring values, to find negative peaks of greater than 0.3 W m −2 magnitude in effective forcing.Identified peaks of negative forcing are shown in red in Figure 1.The first one, which occurs in 1861, lines up roughly with the Dubbi (Wiart & Oppenheimer, 2000) and Makian (Picas & Grab, 2020) eruptions of that year, but is unidentified in the data set of Durack and Taylor (2022) based on ice core data (Arfeuille et al., 2014).We call this one
We regard negative effective forcing in hist-nat (gray bars in Figure 1) as indicating the presence of substantial volcanic forcing.Zero forcing in hist-nat indicates no forcing difference from piControl, which contains a constant background volcanic aerosol (equivalent to the historical average over 1850-2015; Eyring et al., 2016).Consequently, hist-nat has a weak positive forcing relative to piControl during non-volcanic periods (Gregory et al., 2013), such as 1920-1960, when warming occurs in hist-nat (Figure 1b).Solar forcing is also present in hist-nat, but is comparatively small.

Confirming That Forcing and Response Combine Linearly
In order to check that historical N is a linear combination of F and R (as assumed by Equation 2), we use the TOA radiative imbalance: amip-hist =  −  from the amip-hist experiment, which prescribes the historical observed SST and historical forcing agents (T.Zhou et al., 2016b),  amip-piForcing = − from the amip-piForcing experiment, which prescribes the same SST but no forcing agents so F = 0, and  piClim-histall =  from the piClim-histall experiment, which prescribes historical forcing agents but no SST change, relative to piClim-control (as in Section 2.2).

We expect
amip-hist =  −  = piClim-histall + amip-piForcing. (3) We test Equation 3 through Figure 2, which shows that this holds over the historical period.An orthogonal distance regression (Boggs & Rogers, 1990) The seven largest peaks of negative forcing of the period (corresponding to the seven most climatologically significant volcanic eruptions) are marked by red dots, whilst years of measured negative forcing (i.e., volcanic forcing, which makes up "hist-nat-volc," defined in the text) are shaded in gray.Instantaneous radiative forcing from changes in downwelling shortwave at the TOA is shown in blue.).The linear fit gives a gradient m of correlation ρ.This linear fit uses orthogonal distance regression, which finds a least squares fit according to error along both axes.This is done by accounting for approximately double the variance in the y-axis, which is a sum of two experimental outputs, compared to along the x-axis, which is obtained from a singular experimental output.The same amip-hist variants were used as for amip-piForcing.
is performed, using the fact that the variance in the y-axis should be approximately double the variance in the x-axis since the y-axis is the sum of two model integrations with independent unforced variability, as opposed to the single model output used in the x-axis.This fit has high correlation (ρ = 0.965) and gradient (0.962) close to the 1:1 line.Thus the climate response in AGCMs to the combination of historical forcings can be accurately obtained by subtracting the forcing-only experiment from the experiment involving both forcing and SST change.
We therefore assume the same applies for individual forcing agents in AOGCMs, for which there are not corresponding experiments to test it.

Dependence of Radiative Feedback on Forcing Agent
For each time-dependent forcing in the MMM of historical coupled experiments, R and T are well-correlated during 1850-2015 (Figure 3), but the best-fit constant α from the regression slope of R against T over the whole period depends on the forcing agent.As in Salvi et al. (2022), historical aerosol forcing is found to cause more amplifying feedbacks (smaller magnitude α) than historical GHG forcing, and historical net forcing causes less amplifying feedback (larger magnitude α) than historical GHG forcing.As argued in Salvi et al. (2022), these two findings are related under the assumption that the net historical forcing is primarily a combination of positive GHG forcing and negative aerosol forcing.This argument is shown geometrically in Figure 4: approximating the historical response as a sum of historical aerosol and GHG responses, as in panel (a), gives a gradient of R against T that is larger than the gradient for historical GHG forcing alone.Panel (b) shows that, if the volcanic response is relatively small then it can have little impact on the historical gradient even if its associated α is large.
In order to understand the dependence of α on historical forcing agents, we use the refined model of Ceppi and Gregory (2019).The Earth energy budget model of Section 1 gives  A geometric (not to scale) explanation of why the historical all-forcing feedbacks do not lie between the historical anthropogenic aerosol (AA) and GHG feedbacks.(a) If the historical all-forcing case is roughly a linear combination of the historical AA (blue) and GHG (orange) results, then its feedback parameter (the gradient of green) will be more positive than the hist-GHG feedback parameter.(b) If natural forcing has a large feedback parameter (gradient of pink) but is significantly weaker than AA and GHG forcing, then the historical all-forcing feedback parameter will be less positive than without natural forcing, but still more positive than for GHG forcing.
SALVI ET AL.

10.1029/2023JD038984
6 of 17 if α is constant.To account for the variation of α shown by GCM experiments, Ceppi and Gregory (2019) proposed a model wherein R is described as a combined response to both T and global average atmospheric stability change S: where and Through uniform warming experiments in two models, Ceppi and Gregory (2019) found that τ and σ are approximately constant in each model, i.e. the radiative response and stability change are both roughly linear with uniform global surface temperature change.Therefore, in their experiments, differences in α = dR/dT for different forcing agents are explained by differences in dS/dT (Equation 5).
To test whether this explanation applies to CMIP6 historical forcings, we calculate linear regression slopes α = dR/dT and dS/dT from the MMM R(t), S(t), and T(t) of each of the hist-x experiments and the historical experiment, using data for 1850-2015, except that for"hist-nat-volc" we use the volcanic forcing years only.There is a positive linear relationship between α and dS/dT (Figure 5a).The relationship is consistent with Equation 5and with the hypothesis of Ceppi and Gregory (2019).It indicates that MMM feedback differences between historical natural and anthropogenic forcings are due to different stability responses, as previously found for hist-aer and hist-GHG (Salvi et al., 2022).
Furthermore, there is a strong and significant relationship (ρ > 0.75) between dS/dT and α across historical forcing agents in every model by itself, except in MIROC6.Although τ and σ (Equation 5) might be different in each model (we have not investigated this statistically), when all models and forcing agents are considered together the relationship between dS/dT and α remains significant (ρ = 0.72, p < 0.01 of the null hypothesis of no relationship, Figure 6).Therefore from both the individual models, as well as the MMM, we conclude that the differences among the forcing agents in feedback for the full historical period can be explained by the different changes in stability that they produce.
Taking the difference between Equation 5 for forcing agent x and GHG gives if τ and σ are the same for all forcing agents in the MMM, as found for individual models by Ceppi and Gregory (2019).Equation 8is the equation of a straight line through the origin with slope σ.The regression line of α x − α GHG against dS/dT x − dS/dT GHG (Figure 5a) has this form.Hence we can identify σ for the MMM with the slope of the regression line, which is 5.05 ± 0.7 Wm −2 K −1 , and is consistent, within uncertainty, with σ for both models analyzed by Ceppi and Gregory (2019).
To gain further physical insight, we break α down into cloud and non-cloud feedbacks i.e. those from water vapor, lapse rate, and albedo changes.For SW components, we use the "approximate partial radiative perturbation" method (APRP, Taylor et al., 2007), and for LW the radiative kernel method (Pendergrass et al., 2018;Soden et al., 2008).The kernel method can also be used for SW and is expected to agree with APRP (Zelinka et al., 2020), but we find that the APRP method gives a smaller residual and so is preferred.
Net non-cloud feedbacks (Figure 5b), mostly in the LW (Figure 5h), show some contribution to overall differences in α between forcing agents.However, SW cloud feedbacks (Figure 5f) are the most important, with both the highest spread in and correlation between α and dS/dT.This is in agreement with the findings from Ceppi and Gregory (2019)   (which are the same as in Figure 5a).Correlation and p-value is across models and experiments, excluding the MMM.

Time-Evolution of Radiative Feedback
Having looked at radiative feedbacks calculated using regressions across the full historical period (1850-2015), we next assess how historical feedbacks change with time during this period.Figure 7 shows the ERF of each forcing agent as a function of time.The historical period can be roughly divided into three: an early period where both anthropogenic aerosol (AA) and GHG forcings are small but several significant eruptions occur (1850-1914); a middle period where both AA and GHG forcing are substantial but no major volcanic eruptions occur ; and a late period which contains strong forcing from both AAs and GHGs alongside multiple major volcanic eruptions .
Natural forcing exhibits a highly variable temporal structure (pink in Figure 7), dominated by the contribution of volcanic forcings, while solar forcing remains weak throughout (Figure 1a).By contrast to the smoothly evolving hist-GHG and hist-aer forcings, natural forcing varies over very brief timescales as forcing from volcanic eruptions decays rapidly over a few years and many years can pass between significant volcanic eruptions (Ammann et al., 2003).
Figure 8 shows the corresponding time-evolving feedbacks and dS/dT for regressions over 30-year sliding windows, following the approach of Gregory and Andrews (2016) and Gregory et al. (2020), compared with the feedbacks from regression over the full time series (dashed horizontal lines).Both α and dS/dT vary significantly over time in all cases except hist-GHG.Furthermore, except for hist-GHG and excluding the middle period in hist-nat (because it contains no significant forcing), there is a high correlation (ρ > 0.75) between variations in time of α and dS/dT for each forcing.
However, the significance of correlations between α(t) and dS/dT(t) cannot be determined from the overlapping 30-year regression windows due to autocorrelation, which is difficult to allow for.Hence, we examine the relationship between α(t) and dS/dT(t) using non-overlapping windows.We divide the 1865-2015 period into five 30-year windows for hist-aer, hist-GHG, and the historical cases.The amip-piForcing simulation only has four windows, due to a later start date.We also only use four windows for the hist-nat case as we ignore the 1925-1955 window, which is free from significant volcanic eruptions.
The bottom row of Figure 8 (panels k-o) shows linear fits of α(t) and dS/dT(t) for all sliding windows (dots, dashed gray line) and using only non-overlapping windows (black circles with error bars, solid black line).From panels k and l, the different best-fits when using overlapping or non-overlapping windows suggest that there is some over-fitting when using overlapping windows.Using non-overlapping windows, the correlation in time of α(t) and dS/dT(t) is high (ρ > 0.8) in all cases except for hist-GHG forcing, and the regression slopes of α(t) against dS/dT(t) are significantly greater than zero (p < 0.05) for hist-aer and hist-all.For hist-nat and amip-piForcing they might not be significant but are nonetheless unlikely to arise from random noise (p < 0.11 and p < 0.06, respectively).Because the four experiments are independent, we can multiply together their p-values to obtain the probability of obtaining a positive correlation in all four cases if there were actually no relationship between α and dS/dT in models.The product is very small (p ≃ 10 −5 ).We interpret this as strong evidence of the existence of a relationship.
In the following subsections, we go into more detail for each of the forcings.

Hist-aer
Historical AA forcing (Figure 8a) causes a relatively large α between around 1920-1930, in contrast to the lower α (compared to under GHG forcing) that was calculated over the full historical period (Figure 5).This feature is robust in dS/dT across models (not shown) but not obviously related to the evolving patterns of aerosol forcing, which are known to have shifted from being mainly over Europe and north America to their increasing prominence in east and south Asia (Lamarque et al., 2010;Wang & Wen, 2021), despite shifting forcing patterns being tied to different efficacies (Persad & Caldeira, 2018).The reason for the change in feedbacks for hist-aer could be related to changing relative concentrations of different AAs, as highlighted by an increase in radiative forcing from sulphates relative to black carbon over the historical period (Myhre et al., 2014, their Figure 8.8).A shift  2019) of larger α (lower sensitivity) from black carbon than sulphate forcing.This is an uncertain explanation, however, as other works found opposite results (Nordling et al., 2021;Richardson et al., 2019).Furthermore, feedbacks remain closer to the value from regression across the full period after forcing becomes more significant (i.e., after the year 1900), which suggests that this could be the result of internal variability.

Hist-GHG
For hist-GHG (Figures 8b and 8g), the low correlation (ρ = −0.44)across time between dS/dT and α is due to the lack of forced variations in these two parameters.Feedbacks vary closely around the value obtained from regression across the full period.With the hist-GHG forcing pattern remaining homogeneous throughout the historical period, only its magnitude slowly increasing, the inter-annual evolution of dS/dT and α is dominated by unforced variability, unlike the other cases where the forcing patterns and/or magnitudes vary more rapidly in time.This is highlighted by the good correlation of R with T over a larger T range than for the other experiments (Figure 3), such that for hist-GHG both dR/dT and dS/dT are nearly constant, and their correlation is  1).Horizontal dashed lines show regressions across the full time series (as in Figure 5a).Pearson correlation coefficients of these fits to α (ρ), as well as regression p-values, are shown.In brackets, we show ρ and p for fits using overlapping windows.For hist-nat, again, windows over the period free from eruptions are ignored in the fit.Peaks of volcanic forcing are shown by red crosses for the hist-nat and historical cases.Vertical gray lines demarcate the three periods of different forcing flavors.Bottom row: year-by-year scatter of α and dS/dT from 30-year windows.Circles with black edges show non-overlapping windows, whilst dots show windows that overlap with these.Linear fit coefficients for the gradient σ and intercept τ are shown, across independent windows (black, solid line fit) and all available windows (gray, dotted line fit).The window free of significant volcanic eruptions is ignored for hist-nat as above, and is shown by a white circle with a black edge in panel (m).
SALVI ET AL.
10.1029/2023JD038984 10 of 17 negligible in time.The lack of dS/dT and α variability in hist-GHG can be seen clearly from the tight scatter of points compared to the greater spread seen in other cases (Figure 8l compared with other panels of the bottom row).

Hist-nat
The hist-nat case is dominated by volcanic forcing (Figures 8c and 8h).This causes substantial variation in feedbacks across the period.Low correlation between dS/dT and α in the middle period lines up with a lack of significant eruptions, which means there is no forced signal then.In other periods, the hist-nat experiment shows significant variations in α and dS/dT which are possibly related to different eruptions causing different feedbacks, for example, via differences in the spatial distributions of stratospheric aerosol and hence radiative forcing.
Strong volcanic forcing periods tend to have higher α than for hist-GHG forcing, which is consistent with larger alpha when considering the entire period.

Historical
In the historical all-forcing case (Figures 8d and 8i) before 1900, variations in α and dS/dT very closely mirror those found in hist-nat, because AA and GHG forcing are relatively small.After 1980, α tends toward similar values as in hist-GHG.This could be due to a combination of factors, namely that hist-GHG forcing dominates toward the end of this historical period, whilst the strong Pinatubo eruption, with higher α, and aerosol forcing, with lower α, likely cancel out in their effects on radiative feedbacks.
The reason for the large variations in α and dS/dT in the middle period in the historical all-forcing case is less clear.One possible explanation is that AA and GHG forcing tendencies roughly cancel out (Figure 7), such that regressions in temperature over this period are sensitive to unforced variations in T. Alternatively, variations in α for hist-aer may explain it as follows: high α from hist-aer during 1910-1930 results in α being less positive in the historical all-forcing case than for hist-GHG over the same period, whilst low α for hist-aer during 1935-1955 results in more positive α for the historical all-forcing case (refer to Figure 4).

Amip-piForcing
The amip-piForcing experiment is different from the DAMIP historical experiments, since it prescribes observed historical SSTs instead of prescribing forcings and allowing SSTs to evolve accordingly.The observed SST pattern imposed in amip-piForcing differs substantially from that simulated by coupled models under historical forcing (Wills et al., 2022).Consistent with this, Figures 8e and 8j shows a different time-evolution of radiative feedbacks in amip-piForcing to the historical all-forcing case.While we do not expect the coupled MMM to reproduce the variability found in the observed single realisation of historical climate, the disagreement may still reflect a failure of models to recreate historical SST patterns, whether forced or unforced (Dong et al., 2021;Gregory et al., 2020;Wills et al., 2022).
Given that volcanic forcing seems to have strong effects on the evolution of feedbacks in the hist-nat and historical cases, we compare the evolution in hist-nat to amip-piForcing.In both, α is high at the start (but the earliest period is missing in amip-piForcing, limiting the comparison), decreases in the early 20th century, and rises steadily toward the end of the historical period.The timing of these features does not line up well, however.This could point to a couple of issues.Considering that volcanic eruptions are highly variable in the forcing and response that they produce among individual ensemble members of historical and hist-nat simulations, historical simulations might be unable to capture the actual historical volcanic response because of unforced variability affecting the state of the atmosphere at the time of the eruption.Otherwise, there may be inaccuracy in the prescribed volcanic forcing in the historical simulations; this particularly applies to the very strong Krakatoa eruption, since this occurred well before the satellite record.
Observational errors in the AMIP-II SST boundary condition data set may also play a role (Andrews et al., 2022;Fueglistaler & Silvers, 2021).

Dependence of σ on Forcing Agent
From the linear regression to time-variation in non-overlapping windows, we obtain slopes σ and intercepts τ for each forcing agent (Equation 5).Values differ, but are not generally statistically incompatible between pairs of forcing agents.It is relevant to note that the smallest standard errors are estimated in the historical case, which involves multiple forcing agents.While the relative importance of forcings varies in time, the historical α(t) can be explained well with constant τ and σ, suggesting that these parameters do not depend strongly on the forcing agent.The value here of σ = 3.88 ± 0.18 is not significantly different to the σ = 5.05 ± 0.7 obtained in Section 4.

Time-Dependence of Feedback Related to Variations in Forcing
To quantify contributions of different forcing agents, which drive different stability changes and thus different feedbacks, to historical radiative feedbacks, we reconstruct the historical response R rec (t) from the results of historical single-forcing experiments through that is, we sum the temperature changes predicted by the hist-aer and hist-GHG experiments multiplied by fixed feedback parameters calculated from the full 1850-2015 period (the same values as in Figure 3), and add in the response contributions from the hist-nat R nat (t) case directly.We follow a similar procedure to reconstruct stability change using the same constant  (∕ ) GHG = 0.125 and  (∕ ) aer = 0.072 as shown by the horizontal dashed lines in the middle row of Figure 8).
Fixed α GHG and α AA are justified since there is little variability in α GHG (t) over time, and much of the variability in α AA (t) occurs earlier on when the AA forcing is relatively weak; later on, variations are relatively small as aerosol forcing becomes stronger.Hence, these variations should not contribute strongly to the historical evolution of feedbacks since the early period is dominated by volcanic forcing.α nat , however, varies significantly in time such that fixed α nat does not accurately apply.Finally, we reconstruct temperature change T rec (t) through (11) The reconstructed historical R rec and S rec are shown in Figures 9a and 9b in red.Figures 9a-9c show the sums R sum , S sum and T sum from the hist-x experiments in blue (as 3-year running means, to reduce short-term unforced variability): The CMIP6 MMM historical R hist , S hist and T hist are shown in black.
The left column of Figures 9a-9c shows that the reconstruction explains R hist and S hist well, with correlations of ρ(R rec , R hist ) = 0.90 and ρ(S rec , S hist ) = 0.79.The correlation is not expected to be perfect because of unforced variability, missing forcings in the reconstruction (ozone, land-use change) and non-linearities responses to different forcings, though the high correlations suggest these factors are minor.Nonetheless, these are higher correlations than ρ(R sum , R hist ) = 0.83 and ρ(S sum , S hist ) = 0.67.This may be due to summed unforced variability in the sum of responses, with greater unforced variability in R and T.
The time-dependent α = dR/dT and dS/dT (Figures 9d and 9e) are calculated in three ways: in each case by regression in a 30-year sliding window.For α and dS/dT, the first estimate represents the actual time-variation; the second estimate tests the reconstruction assuming α constant in time but different for GHG and AA; and the third estimate tests the additivity of the individual responses.Note that α hist is the same here (Figure 9d) as in Figure 8d.
The top-right panels of Figures 9d and 9e shows that α hist and dS/dT hist for the CMIP6 MMMs are quite well reproduced by both the reconstruction and the sum, with statistically significant correlations (p < 0.01 of zero regression slope): ρ(α rec , α hist ) = 0.53, ρ(dS/dT rec , dS/dT hist ) = 0.62, ρ(α sum , α hist ) = 0.56, ρ(dS/dT sum , dS/dT hist ) = 0.72.In the early period, α rec and dS/dT rec are closer than α sum and dS/dT sum to α hist and dS/dT hist .This may point to this period containing significant unforced variability, reflected in the sum of responses but less present in the reconstruction as unforced variability in T is smaller than in R. Around 1940, the reconstruction and the sum both give underestimates, especially for α.These periods also have small trends in T (Figure 9f), which suggests that the discrepancies could be due to low signal-to-noise.Nonetheless, the reconstruction captures the shape of the time-variation of α hist and dS/dT hist .This supports the reasoning that AOGCM-derived historical feedbacks can be interpreted in terms of a time-varying mix of GHG, AA and volcanic forcings.Hence, any non-linearities that arise from the combination of these agents do not play a first-order role for the time-variation in historical feedbacks.

Explaining the Lower Climate Sensitivity of Volcanic Forcing
As highlighted by Figure 5, the overall effect of volcanic forcing is to produce significantly less amplifying radiative feedback (lower climate sensitivity) than GHGs.This can be seen when looking at α for hist-nat (all years, pink) and is emphasized by regressions over only volcanic forcing years (hist-nat-volc, red).In order to understand this, we restrict our attention to the two largest volcanic eruptions of this period: Krakatoa (1883) and Pinatubo (1991).These eruptions have a clear signal in all of the models analyzed here, with correspondingly the largest responses in the MMM (Figure 1).
In the analysis that follows, we consider 1882-1886 (for Krakatoa) and 1990-1994 (for Pinatubo), i.e. for each of these eruptions we have taken the year of the eruption, and the three subsequent years.These years following the peak forcing are included to allow the volcanic forcing signal to decay to a level which cannot be distinguished from unforced variability.The Krakatoa and Pinatubo eruptions are qualitatively similar in terms of radiative forcing, as both were equatorial and began in NH summer months (Mass & Portman, 1989;Stenchikov et al., 1998).
For other forcings, we consider 1995-2015.Figure 10 shows time-mean forcing and temperature change for all the forcings, computed from time means each divided by their signed area means; hence they are mostly positive and the area-mean is +1 in all cases, including negative hist-aer and volcanic forcing.Figures 10a and 10b show zonal-mean forcing at the TOA and at the surface, evaluated as described in Section 2.2.Volcanic forcing does not exhibit any unusual feature as a function of latitude that could account for the low climate sensitivity, in contrast to hist-aer, which has strong tropical-extratropical and NH-SH contrasts.The differences among forcings are reflected in the zonal mean surface temperature changes (Figure 10c).Temperature change is greater at higher northern latitudes (around 40N) for hist-aer compared to hist-GHG, whilst for volcanic forcing it is more similar to hist-GHG.Large temperature changes are seen at high southern latitudes for volcanic forcing, which might be due to Antarctic sea ice change.This may stem from the negative SH forcing under volcanic eruptions, which does not occur in the other cases, with the sea ice response being larger under cooling than under warming.
Low climate sensitivity (large α) is correlated with large dS/dT (Equation 5), which results from enhanced surface warming in tropical regions of deep convection, especially the West Pacific warm pool (Andrews & Webb, 2018;Ceppi & Gregory, 2017;Fueglistaler, 2019), where climatological SST is highest.In these regions, warming propagates easily from the surface to the atmosphere.Warming in less convective regions, by contrast, causes little change in free-tropospheric temperature.Thus the difference between warming in the warm pool and elsewhere is indicative of the change in tropospheric stability.To focus on this effect, we evaluate the distribution of TOA forcing, surface forcing, and temperature change over the sea in the tropics (30°S-30°N) as a function of the area-weighted distribution of tropical climatological SST (Figures 10d and 10e).In these distributions, equal intervals in SST percentile (horizontal axis) correspond to equal surface areas, so that a field which is uniform in space would have a constant value as a function of SST percentile.Following Fueglistaler (2019), we define the warmest 30% of tropical SSTs as the "warm pool."At the TOA (Figure 10d), in contrast to other historical forcings, volcanic forcing is greater over the warm pool than in the rest of the tropics.This might explain why SST change (Figure 10f) in the warm pool is greater than elsewhere, which could then explain the significantly greater dS/dT for volcanic forcing.Such enhanced warm pool SST temperature change has been found in other work when looking at Pinatubo-style eruptions (Günther et al., 2022).Evidence against this argument is that volcanic forcing at the surface does not seem to be significantly larger in the warm pool than in the rest of the tropics when contrasted with other historical forcings (Figure 10e), while we expect surface temperature change to be more related to surface than TOA forcing.
We propose another mechanism that may contribute to the significantly larger dS/dT and hence smaller temperature change for volcanic forcing than for GHG forcing.The response of the upper ocean mainly determines the SST change to volcanic forcing, because it appears and disappears in only a few years, too quickly for the slower processes of heat exchange with the deeper ocean (Gregory et al., 2015(Gregory et al., , 2016;;Held et al., 2010).This is less relevant for historical AA and GHG forcings, which evolve slowly over time.Therefore the geographical variation of upper-ocean heat capacity or wind-driven advection might dominate the pattern of SST change for fairly uniform forcings over very short timescales.If this pattern correlates with climatological SSTs, stability changes per unit of temperature change may be larger.Rugenstein et al. (2016) provide insight into what this pattern might look like.For abrupt 4×CO 2 , they find greater warming in the warm pool and less outside of deep convective regions in early years (see their Figure 4).This relatively larger warm pool warming disappears after a few years, suggesting a strong SST pattern-driven stability response initially that becomes weaker over time, per unit of warming.If this is the characteristic pattern of the response to short-lived forcing, it could explain why volcanic events cause greater dS/dT (and α) than other historical forcing agents.

Summary and Conclusions
The findings of this work can be summarized as follows.
• In the MMM of CMIP6 historical experiments, the time-variation of the radiative feedback parameter α (a positive number, larger for smaller effective climate sensitivity) is substantial, and driven by volcanic forcing and the changing ratio of AA and greenhouse-gas (GHG) forcing.Volcanic forcing dominates before about 1915, then AA and GHG up to about 1960.All three kinds of forcings contribute in more recent years.• We confirm that the dependence of radiative feedback on forcing agent can be explained through differences in dS/dT, the sensitivity of global-mean tropospheric stability to global-mean surface temperature (Ceppi & Gregory, 2019;Salvi et al., 2022).• Time-variation of dS/dT explains the forced time-variation in α for each forcing agent individually, and for all agents together, as long as the forcing signal is sufficiently large to cause variations in dS/dT and α that exceed unforced variability.• Time-variation of dS/dT in the MMM of amip-piForcing experiments (with observed SST and no radiative forcing) explains the time-variation of α in this case, which differs from the CMIP6 MMM historical α, possibly because AOGCMs fail to recreate observed SST patterns (especially in recent decades).• Shortwave (SW) cloud feedbacks play the largest role for net feedback differences among forcing agents.
Non-cloud feedbacks (SW and longwave combined) contribute significantly but to a lesser extent to the differences.This augments the body of research on the SST pattern effect (Stevens et al., 2016) being related to radiative feedbacks through stability and SW cloud feedbacks (Ceppi & Gregory, 2017, 2019;Kamae et al., 2019;Mackie et al., 2021).• Analysis of the strongest two eruptions of the 1850-2015 period (Pinatubo and Krakatoa) highlight very large dS/dT for volcanic eruptions, which results in lower climate sensitivity from their forcing.• Large dS/dT for volcanic eruptions is tied to strong SST response (cooling) in the west Pacific warm pool.
This might be linked to volcanic forcing being stronger over the warm pool, or it might be characteristic of the rapid response of the ocean to forcing.
Our findings are broadly consistent with those of Gregory et al. (2020), who analyzed CMIP5 data.We reaffirm that α for GHG forcing has little time-variation, but that there is significant time-variation in α for the combination of historical forcings, which is to some extent a result of the different time-evolution of the magnitudes of the various forcings, at least in the MMM.
Unlike Gregory et al. (2020), we find significant time-variation of α early in hist-nat, maybe because we have knowledge of evolving effective forcing from the CMIP6 piClim experiments with individual models, while they had to estimate it from aerosol optical depth, which was prescribed in all models.Knowledge of the strong forcing from Pinatubo (1991) also leads to a different explanation of the final years of the historical simulation, when Gregory et al. (2020) posited that GHG forcing dominated.We suggest that α being less positive (larger effective climate sensitivity) for AA than for GHG, and more positive for volcanic forcing than for GHG, together leave historical α similar to GHG α in these years.
Our results are mostly consistent with Dong et al. (2021), who also used CMIP6 data, though we find more significant time-variation in feedbacks for the historical experiment, especially during the period of slowly varying T around 1940.This is likely related to our use of regressions on the MMM as opposed to Dong et al. taking the mean of model regressions; we expect their method to result in regression dilution (Gregory et al., 2020), which causes the magnitude of the slope α to be underestimated.We do not expect the difference to be due to their having included data from GFDL-CM4, which we did not.
Other works have studied the evolution of feedbacks following an abrupt change in CO 2 forcing, and how this is related to the pattern effect (Andrews et al., 2012(Andrews et al., , 2015;;Paynter et al., 2018;Rohrschneider et al., 2019;Rugenstein et al., 2020).It may be helpful to compare these abrupt CO 2 forcing cases to cases involving abrupt forcings from other agents, such as anthropogenic or volcanic aerosols.Experiments could also be performed with prescribed step TOA forcings in various patterns, in order to identify how climatological SSTs relate to SST changes under different forcing patterns with the same forcing timescales.Such work would give information about whether differences in forcing timescale are the main driver of differences in dS/dT between GHG and volcanic forcings, by disentangling the contributions of forcing patterns and forcing timescales from SST pattern changes.
The small number of large-magnitude volcanic forcing events in the historical period leaves it unclear whether such eruptions would generally cause high dS/dT, rather than it happening by chance in the cases we have studied.Further insight could be gained from experiments with synthetic data sets of prescribed stratospheric aerosol to mimic volcanic eruptions, perhaps at different latitudes or times of the year (as done in Zhuo et al. (2021)), or by simulating stronger versions, or larger ensembles, of historically weaker volcanic eruptions.The Model Intercomparison Project on the climatic response to Volcanic forcing (VolMIP; Zanchettin et al., 2016) aims to address many aspects of uncertainty with regards to volcanic forcings, but runs of volc-long-eq (long-term response to volcanic forcing) are only available for four models and there is not a concurrent effective forcing data set.
Experiments with a complete set of individual historical forcing agents would also be useful, including volcanic and solar separately (e.g., hist-volc and hist-sol; Gillett et al., 2016), land use, ozone, black carbon and sulphate aerosols.For each agent, historical AOGCM and AGCM experiments are needed, the latter with climatological SST in order to get the time-evolving effective radiative forcing required to calculate feedbacks.These further experiments would help complete the explanation of the historical pattern effect on radiative feedback.
refers to the CMIP6 r variant labels, with noted physics (p) and forcing (f) labels.All variants used the same i1 initialisation label.Unavailable data are marked by an ×. a All piControl runs used the p1f1 labels, except for CNRM-CM6-1 which used p1f2.b

Figure 1 .
Figure1.Annual means of effective top-of-atmosphere (TOA) radiative forcing over time (top) and global average surface temperature change (bottom) in hist-nat over the 1850-2015 historical period.The seven largest peaks of negative forcing of the period (corresponding to the seven most climatologically significant volcanic eruptions) are marked by red dots, whilst years of measured negative forcing (i.e., volcanic forcing, which makes up "hist-nat-volc," defined in the text) are shaded in gray.Instantaneous radiative forcing from changes in downwelling shortwave at the TOA is shown in blue.

Figure 2 .
Figure2.Sum of flux imbalances (N) for piClim-histall and amip-piForcing experiments against N for the amip-hist experiment.Each dot is a yearly average, for the model mean of models that had data for all required experiments for this figure (CanESM5, IPSL-CM6A-LR, CNRM-CM6-1, and MIROC6).The linear fit gives a gradient m of correlation ρ.This linear fit uses orthogonal distance regression, which finds a least squares fit according to error along both axes.This is done by accounting for approximately double the variance in the y-axis, which is a sum of two experimental outputs, compared to along the x-axis, which is obtained from a singular experimental output.The same amip-hist variants were used as for amip-piForcing.

Figure 3 .
Figure 3. Multi-model mean responses R against surface temperature change T, for (a) hist-aer, (b) hist-GHG, (c) hist-nat, and (d) the historical experiments.Each dot marks a yearly average, with linear best-fits (of gradient α, with standard regression error) in solid gray.

Figure 4 .
Figure4.A geometric (not to scale) explanation of why the historical all-forcing feedbacks do not lie between the historical anthropogenic aerosol (AA) and GHG feedbacks.(a) If the historical all-forcing case is roughly a linear combination of the historical AA (blue) and GHG (orange) results, then its feedback parameter (the gradient of green) will be more positive than the hist-GHG feedback parameter.(b) If natural forcing has a large feedback parameter (gradient of pink) but is significantly weaker than AA and GHG forcing, then the historical all-forcing feedback parameter will be less positive than without natural forcing, but still more positive than for GHG forcing.

Figure 5 .
Figure5.Multi-model mean radiative feedbacks α against dS/dT, with values calculated from regression over the full historical period (1850-2015) for hist-GHG, hist-aer, hist-nat, and historical.Also shown are values for regression over only volcanic years of hist-nat ("hist-nat-volc," see Figure1for selection).We show here a breakdown into non-cloud and cloud feedback components, in SW (from the APRP method) and LW (from radiative kernels).Values are shown relative to the hist-GHG experiment, for easier comparison between non-cloud and cloud contributions to combined feedbacks.Axes are shared throughout.Gradients m of least squares fits are shown in each panel.Error bars denote estimated regression errors.

Figure 6 .
Figure 6.Radiative feedback α against dS/dT for each forcing in each individual model and in the multi-model mean (MMM) (which are the same as in Figure5a).Correlation and p-value is across models and experiments, excluding the MMM.

Figure 7 .
Figure 7. Forcing over time in the historical case and for simulations of component historical forcing agents.Values were calculated from RFMIP piClim-histx experiments with piClim-control as the control.Vertical gray lines demarcate the three different periods of different forcing flavors.

Figure 8 .
Figure8.Top and middle rows: Time-evolving feedback parameter α (top row) and dS/dT (middle row) for various historical experiments as calculated using 30-year sliding windows.Colored shaded regions denote regression errors.Dashed time-series in the top row show best linear fits of dS/dT to α using the non-overlapping windows from 1855 to 2015 in all experiments, except hist-nat where the fit omits the 1915-1945 window during which no significant volcanic forcing occurs (see Figure1).Horizontal dashed lines show regressions across the full time series (as in Figure5a).Pearson correlation coefficients of these fits to α (ρ), as well as regression p-values, are shown.In brackets, we show ρ and p for fits using overlapping windows.For hist-nat, again, windows over the period free from eruptions are ignored in the fit.Peaks of volcanic forcing are shown by red crosses for the hist-nat and historical cases.Vertical gray lines demarcate the three periods of different forcing flavors.Bottom row: year-by-year scatter of α and dS/dT from 30-year windows.Circles with black edges show non-overlapping windows, whilst dots show windows that overlap with these.Linear fit coefficients for the gradient σ and intercept τ are shown, across independent windows (black, solid line fit) and all available windows (gray, dotted line fit).The window free of significant volcanic eruptions is ignored for hist-nat as above, and is shown by a white circle with a black edge in panel (m).

Figure 9 .
Figure 9. Model results from the historical case (black) compared to the sum of model results from hist-nat, hist-GHG, and hist-aer cases (blue), along with the reconstruction using fixed α for anthropogenic aerosol and GHG forcings but time-varying α for natural forcing (red, detailed in the text).(a-c) 3-year running means of (a) the radiative response, (b) 50S-50N EIS changes over ocean, (c) surface air temperature change.(d-f) 30-year regressions of same variables as to their left (d-e) over surface temperature and (f) over time.Note that for T and dT/dt, the reconstruction and sum are identical as the reconstruction uses the sum of model output temperatures.

Figure 10 .
Figure10.(a-c) Zonal patterns of time-mean historical (a) forcing at the TOA, (b) surface forcing, and (c) surface temperature change, each divided by its global average, during 1995-2015 from hist-aer, hist-ghg and all historical agents, and during the years with significant forcing from the two largest volcanic events in the historical period (Krakatoa and Pinatubo).(d-f) Distributions of TOA effective forcing (d), surface forcing (e) and surface temperature change (f) over the sea within the tropics, as functions of climatological sea surface temperature, time mean over the same years as (a-c) and each divided by its area mean i.e. such that all values would be unity for a uniform field.Vertical dashed lines delimit the 30th percentile bounds as used in the SST# measure, whilst horizontal dashed lines denote average values within the temperature ranges that they span.

Table 1
Variant Labels for the Model and Experiment Realizations Used in This Paper