The Green's Function Model Intercomparison Project (GFMIP) Protocol

The atmospheric Green's function method is a technique for modeling the response of the atmosphere to changes in the spatial field of surface temperature. While early studies applied this method to changes in atmospheric circulation, it has also become an important tool to understand changes in radiative feedbacks due to evolving patterns of warming, a phenomenon called the “pattern effect.” To better study this method, this paper presents a protocol for creating atmospheric Green's functions to serve as the basis for a model intercomparison project, GFMIP. The protocol has been developed using a series of sensitivity tests performed with the HadAM3 atmosphere‐only general circulation model, along with existing and new simulations from other models. Our preliminary results have uncovered nonlinearities in the response of the atmosphere to surface temperature changes, including an asymmetrical response to warming versus cooling patch perturbations, and a change in the dependence of the response on the magnitude and size of the patches. These nonlinearities suggest that the pattern effect may depend on the heterogeneity of warming as well as its location. These experiments have also revealed tradeoffs in experimental design between patch size, perturbation strength, and the length of control and patch simulations. The protocol chosen on the basis of these experiments balances scientific utility with the simulation time and setup required by the Green's function approach. Running these simulations will further our understanding of many aspects of atmospheric response, from the pattern effect and radiative feedbacks to changes in circulation, cloudiness, and precipitation.


Introduction
The response of the atmospheric state to changes in surface temperature is one of the most critical and extensively studied connections in the coupled climate system.For example, Arrhenius (1896), a seminal global warming study, argued that the amount of warming that occurs in response to a CO 2 change is determined in part by how the net downward top-of-atmosphere (TOA) radiative flux, N, depends on the surface temperature, T. This dependence in turn depends on how numerous aspects of the atmosphere and surface, such as the spatial distribution of water in all its phases, respond to surface temperature changes (Hansen et al., 1984;Soden et al., 2008;Stevens & Bony, 2013).
The simplest way to model the dependence of some aspect of the atmospheric state on surface temperature is to assume that the quantity in question scales linearly with global-mean surface warming, so that the dependence can be summarized as a constant scalar.For example, suppose N and T are the global-mean values of N and T respectively.If we define the radiative feedback parameter λ to be the dependence of N on T, ∂N/∂T (where a negative value implies a stabilizing radiative feedback; the opposite-signed value is sometimes called the climate feedback parameter), then λ is often assumed to be constant (Forster et al., 2021;Gregory et al., 2004).
Counter to this, λ in many coupled atmosphere-ocean model simulations with constant forcing evolves with time (e.g., Andrews et al., 2015;Murphy, 1995;Senior & Mitchell, 2000;Williams et al., 2008).Such a variation could be caused by a nonlinear response of N to global-mean surface temperature, for example, due to feedback temperature dependence (Bloch-Johnson et al., 2015, 2021;Colman & McAvaney, 2009;Meraner et al., 2013).However, these changes in λ typically occur after a few decades (Andrews et al., 2015).This timing is consistent even under different amounts of CO 2 forcing, so that the global-mean warming ΔT undergone during this time is different (e.g., Figure S1 of Bloch-Johnson et al., 2021).As a result, this change in λ is likely not due to the total global-mean warming, but rather to some other aspect of the warming that varies similarly with time regardless of the level of CO 2 forcing.Armour et al. (2013) proposed that temporal variations in λ are due to changes in the spatial pattern of surface temperature, with the shift in λ under constant CO 2 forcing occurring when regions of deep ocean heat uptake begin to warm (Armour, 2017;Rose et al., 2014), which occurs at similar times under different constant CO 2 forcings (Rohrschneider et al., 2019).Under this theory, surface temperature changes in different locations set off different atmospheric responses, and therefore different changes in N.
This interpretation is supported by results with atmosphere-only models.A number of studies have found that running such models with prescribed sea surface temperature and sea ice boundary conditions from a coupled simulation induces similar changes in radiative fluxes as the original coupled simulation (e.g., Andrews et al., 2015;Haugstad et al., 2017;Qin et al., 2022;Ringer et al., 2014).As a result, atmosphere-only models can serve as useful tools for understanding the response of N to different patterns of warming.Studies in which these atmosphere-only models have been run with reconstructions of historical temperatures and sea ice find that the value of λ has varied significantly across the last century (Andrews et al., 2022;Gregory & Andrews, 2016), having a much larger range of values than in simulations with constant CO 2 forcing despite having a much smaller range of global-mean surface temperature.
These papers have specifically suggested that changes in the radiative feedback can be explained by a linear spatial model, for example, ΔN ∝ ΔT(ϕ,θ), where ϕ and θ are latitude and longitude respectively.A similar model has also been used in the dynamics literature, in which various aspects of the atmosphere and its circulation are assumed to depend linearly on the spatial field of sea surface temperature (e.g., Barsugli & Sardeshmukh, 2002;Barsugli et al., 2006;Branstator, 1985;Deser & Phillips, 2006;Schneider et al., 2003;Zhou et al., 2020).
The linear spatial assumption allows one to use a Green's function method approach, in which the response of an atmospheric variable to the full pattern of surface temperature change can be thought of as the sum of the responses of that variable to the change in surface temperature in each location taken in isolation (Branstator, 1985;Holzer & Hall, 2000).Using this insight, Barsugli and Sardeshmukh (2002) and Barsugli et al. (2006) overlaid the surface of the tropical oceans with a lattice of patches, for each of which they ran an atmosphere-only model simulation in which the surface temperature in that patch was perturbed.They used the resulting output to estimate the sensitivity of various atmospheric quantities to tropical surface temperature changes.
More recently, Zhou et al. (2017) adopted the approach of Barsugli and Sardeshmukh (2002) to specifically understand the dependence of the cloud radiative effect (the contribution of cloudiness to net top-of-atmosphere radiative fluxes) on tropical surface temperature.Dong et al. (2019) expanded this approach to cover all top-ofatmosphere (TOA) fluxes and all regions of the Earth.Both studies found that the most distinct feature of the response of TOA fluxes is that strong negative feedbacks occur in response to regions of tropical convection (in particular the western tropical Pacific), a finding that has been supported by subsequent studies (Alessi & Rugenstein, 2023;Zhang, Zhao, & Tan, 2023).Zhou et al. (2023) showed that these Green's functions could be used to explain the different efficacies associated with different forcing agents.Patch perturbations are not the only method for estimating the linear spatial dependence of atmospheric state on sea surface temperature.For example, Li et al. (2012) used random patterns of surface temperature change instead of patches to more efficiently estimate the most dominant aspects of this dependence.Li et al. (2012)'s random perturbation method (RPM) has been used in dynamical studies (e.g., Baker et al., 2019;Li & Forest, 2014;Patterson et al., 2022), though we are not aware of any of that applied it to radiative feedbacks.Liu, Lu, Garuba, Leung, et al. (2018) applied the fluctuation-dissipation theorem to a preindustrial control simulation to estimate a climatic variable's dependence on the field of surface fluxes.Bloch-Johnson et al. (2020) applied a similar method to find the dependence of TOA radiative fluxes on sea surface temperature, finding that five of six coupled climate models had strong negative radiative feedbacks in regions of tropical convection, as above.
While these methods provide various benefits, patch simulations have the advantage that they clearly demonstrate the physical, causal relationship between a given region's surface temperature changes and resulting atmospheric changes, and do not have contamination from the response to temperature changes elsewhere, as can occur with these other approaches.As a result, many modeling groups have run or are planning on running these simulations.These simulations provide a simple way of comparing the diversity of atmospheric responses across models, such as different responses to warming in tropical ascent regions in the Atlantic (e.g., Dong et al., 2019;Zhou et al., 2017) or generally (e.g., NASA GISS-E2-R in Bloch-Johnson et al., 2020).Dong et al. (2020) provides further evidence of this diversity, showing that Green's functions from one model do not always show skill in explaining the feedbacks of other models.However, it is unclear if this diversity of behavior is due to true differences in model physics, or simply differences in the experimental setups used to generate the Green's functions.
The protocol is presented in Table 1, whose variables are defined as in Figure 1.In Section 2, we review the atmospheric Green's function method, showing the choices that must be made in performing such an analysis.In Section 3, we present the protocol in more detail, showing the reasoning behind the decisions made and explaining how readers can participate in the project.In Section 4, we present some nonlinear results found in the course of developing this protocol.Finally, in Section 5 we summarize the proposed project.

The Atmospheric Green's Function Method
We now present the atmospheric Green's function method.We write "atmospheric" to distinguish this method from the use of Green's functions to understand the response of the ocean (Khatiwala et al., 2001;Newsom et al., 2020;Zanna et al., 2019) or a coupled system with an atmosphere and a slab ocean (Liu, Lu, Garuba, Huang, et al., 2018) to sea surface perturbations.While it is possible to use Green's functions to understand the time evolution of the atmospheric response (Holzer & Hall, 2000), we focus our discussion on monthly and longer time-scales, for which we assume that the response is effectively instantaneous.
Let f represent some aspect of the atmosphere whose value depends in part on the spatial field of surface temperature.For simplicity, we assume f is a scalar, such as a global average (like N above) or a value at a fixed location (e.g., the net TOA radiative flux at (0°, 0°)), but similar ideas apply when f is a spatial variable itself (e.g., see Dong et al., 2019;Zhang, Zhao, & Tan, 2023).
Suppose the response of f to perturbations in surface temperature around some initial state is proportional to the size of those perturbations, and that the response of f to multiple surface temperature perturbations is equal to the sum of the responses of f to the individual perturbations.This would then imply that the response of f to a full, global pattern of surface temperature can be found by subdividing this pattern into many individual, localized perturbations, and taking the sum of the responses of f to these perturbations.
The atmospheric Green's function method builds on this insight by first estimating the linear dependence of f on local changes in surface temperature, and then estimating the response of f to general patterns of surface temperature change by summing across the response to these local changes.The method is specifically applied to atmosphere-only general circulation models.Given that the dependence of f on surface temperature differs between models, in this explanation we assume that the same atmospheric model is used throughout.
Since most atmospheric models can only prescribe surface temperature over the ice-free ocean (while they can set the sea surface temperature in ice-covered regions, they cannot directly set the temperature of the ice surface), we focus on the dependence of f on the surface temperature in these regions.Direct dependence of f on land and ice surface temperatures can be estimated using statistical techniques (e.g., Bloch-Johnson et al., 2020;Ceppi & Nowack, 2021), but in the atmospheric Green's function literature, land and ice surface temperatures are assumed to depend on ice-free ocean surface temperature and other boundary conditions, just like atmospheric variables.
Although the sea surface temperature field, SST(ϕ, θ), is continuous for the real Earth, it is discretized for atmospheric models.We therefore express SST, and other spatial variables based on it, as vectors (e.g., SST ̅→ ) whose Step 4. The response of f to a given pattern of surface temperature change ΔSST ̅→ can be estimated using the normalized derivative, ∂f /∂SST * ̅̅→ and the area in each grid cell, a i .A Green's function setup can be evaluated by comparing the response to a surface temperature time series simulated by an atmospheric model (black line) and estimated by the Green's function method (orange line).In this paper, the simulated time series is typically an ensemble mean of realizations with different initial conditions.elements correspond to the atmospheric model's surface grid cells, and estimate the dependence of f on SST ̅→ .For an alternate interpretation of the method in terms of continuous fields, see Appendix A.
Figure 1 illustrates how we estimate the linear dependence of f on variations in SST ̅→ .First, we run a control simulation of the atmospheric model.We must choose boundary conditions for this run to serve as the base state for our method, specifically a set of 12 maps of SST ̅→ and sea ice fraction SIC ̅→ representing a repeating climatology of each variable (some models may also prescribe sea ice thickness).We write these as sets where m is an index representing the month and c indicates these represent the control state.For plotting purposes, we define SST ̅→ c and SIC ̅→ c to be annual averages of these two sets, respectively.We also must choose values for various forcing agents (e.g., CO 2 and other long-lived greenhouse gases, aerosols), which we keep constant across all experiments ({F} c ).We run this control simulation for an initial spinup of s c years, followed by y c post-spinup years.We then take the average value of f during the post-spinup years as the control value, f c .
In the second step, we cover the surface with a lattice of patches.Each patch p is a perturbation of the sea surface temperature field over a region of latitudinal width δϕ p and longitudinal width δθ p centered at (ϕ p , θ p ). Suppose we have a grid cell i centered at (ϕ i , θ i ).Patch p's perturbation for grid cell i will then be: such that the patch reaches an extreme amplitude of A p at (ϕ p , θ p ).Note that we assume the appropriate arithmetic is used in calculating θ i θ p for cells that straddle the discontinuity in longitude.Negative values of A p imply a cooling patch, while positive values imply a warming patch.We explore alternative patch shapes in Section 3.3.4.
For ) and forcing agents ({F} c ) as the boundary conditions of a new atmosphere-only simulation.For simplicity, we are not exploring sea ice changes in the first phase of GFMIP (for an example of incorporating sea ice changes in the Green's functions method, see Dong et al., 2019).We run each patch simulation for an initial spinup period of s p years, and then an additional y p years.
We then average f during the last years of y p of each patch simulation and subtract the control value, f c , to give the change in f caused by the patch warming or cooling, Δf p .
In the third step, we estimate the linear dependence of f on perturbations of SST ̅→ around the base state.There are different ways to formulate this dependence.For example, for each grid cell i, we could calculate ∂f/∂SST i , the infinitesimal change in f caused by an infinitesimal change in the SST value of grid cell i.This derivative has a few disadvantages.First, ∂f/∂SST i depends on the area of the grid cell (i.e., all else being equal, bigger grid cells will cause a larger change in f ).As a consequence of this, values of ∂f/∂SST i can significantly change under regridding.Second, we are ultimately interested in the response of f to the full pattern of warming, that is, the global derivative, ∂f /∂〈SST ̅→ 〉, where 〈⋅〉 is the area-weighted spatial average over the ice-free ocean.Since ∂f/∂SST i is typically orders of magnitude smaller than ∂f /∂〈SST ̅→ 〉, it is difficult to intuitively understand how a given value of ∂f/∂SST i affects the overall response.
To address these issues, we instead use the normalized derivative, ∂f /∂SST ̅→ * .For a given grid cell i, we define the normalized derivative as where a i is the area of the grid cell, and a tot ≡∑ i a i is the total area of the ice-free ocean.The normalized derivative has the advantage that it is "intensive"-that is, its value does not depend on the area over which it is calculated, so that regridding a map of the normalized derivative does not affect its values (aside from the smoothing that generally comes from regridding).
The factor of a tot ensures that normalized derivative values have the same order of magnitude as the global derivative; specifically, assuming linearity, so that the global derivative is simply the weighted average of the normalized derivative, where the weights are given by each grid cell's size and SST perturbation.In this sense, a grid cell's normalized derivative gives the value the global derivative would have if that grid cell were representative of the whole ice-free ocean.
Put another way, if we perturb the SST in a given grid cell, we change both f and 〈SST〉, and the normalized derivative is the ratio of these changes.This follows from Equation 2, in that if we perturb the cell's SST by an infinitesimal amount ∂SST i , the ice-free ocean-mean perturbation will be (a i /a tot )∂SST i .This implies that we can estimate the normalized derivative in a grid cell by considering the response of f to the ice-free ocean mean SST change caused by patches that include that cell.For example, if patch p includes grid cell i, then we could estimate ∂f /∂SST * i as Δf p /〈ΔSST p ̅→ 〉, where we know 〈ΔSST p ̅→ 〉 by construction, and we estimated Δf p above.
However, multiple patches will typically include grid cell i.As a result, we can estimate ∂f /∂SST * i by taking a weighted average across all patches (third step of Figure 1), where the weights are ΔSST p,i , which is patch p's SST perturbation in grid cell i.ΔSST p,i = 0 for all patches that do not include grid cell i, so these patches do not contribute to this average, while patches whose centers are close to the grid cell i's center contribute the most (because of the shape of the patch perturbation given in Equation 1).Note that in the absence of land and sea ice, patches of the form in Equation 1have where a p is the area of the patch.However, most patches overlap land and sea ice, so that this approximation cannot be generally applied.
Various steps can then be taken to improve our estimate of ∂f /∂SST * i : • One can make two estimates, one using warming patches with A p > 0 (e.g., those in the "+" row of Figure 2) and the other using cooling patches with negative A p < 0 (e.g., the "-" row of Figure 2), and then take the average of the two (e.g., the "±" row of Figure 2).While these derivatives should be identical under linearity, they typically are not, as discussed in Section 4. • One can construct derivatives for different times of year (Alessi & Rugenstein, 2023;Bloch-Johnson et al., 2020;Dong et al., 2019), such as for different seasons or months, by considering only the patch anomalies for the relevant time periods.• One can perform significance tests on ∂f /∂SST * ̅̅→ , removing values that are not statistically significantly different from zero (Alessi & Rugenstein, 2023;Dong et al., 2019;Zhang, Zhao, & Tan, 2023), on the grounds that such values may arise from noise rather than representing the forced response.Zhang, Zhao, and Tan (2023) find evidence that doing so can improve the method's accuracy, but also caution that performing such tests on individual flux components of N causes their derivatives to no longer sum to N itself.To keep our analysis simple, we do not consider the utility of such tests in this paper.
In the final step in Figure 1, we use the normalized derivative to estimate the changes in f that would be caused by an arbitrary pattern of temperature change ΔSST ̅→ : We can repeat this for each entry in a time series of surface temperature change, ΔSST ̅→ (t), to estimate the associated time series of Δf(t) (e.g., the orange line in Step 4 in Figure 1).Note that if monthly or seasonal derivatives of f are used, they can be cyclically applied to monthly or seasonal averages of ΔSST ̅→ (t) to create estimates of Δf(t).
If we wish to test the accuracy of the method, we could then compare this estimate of Δf(t) to the time series produced by running the atmosphere-only model with {SST ̅→ m } c + ΔSST ̅→ (t) as its boundary condition, once more keeping sea ice and forcing agents fixed.The simulated value of Δf(t) may exhibit variations due to internal variability of the atmosphere and land, which may cause Green's function reconstructions of Δf(t) to differ from simulated values.We can reduce the influence of internal variability by running an ensemble of simulations, each with different initial conditions.The time series of the simulated ensemble mean (black line) and Green's function estimate (orange line) of Δf(t) can then be compared, for instance by calculating the root mean square error (e.g., Equation 6 in the next section).

Journal of Advances in Modeling Earth Systems
10.1029/2023MS003700

Applying the Method to N
A number of recent papers have used this Green's function method, or variations on it, to investigate the dependence of TOA radiative fluxes on sea surface temperatures (Alessi & Rugenstein, 2023;Dong et al., 2019;Zhang, Zhao, & Tan, 2023;Zhou et al., 2017).For example, Figure 2 shows the normalized derivative of the globally averaged net TOA radiative flux, N, for a variety of atmosphere-only models.The top row shows the half-amplitudes of the patches used to construct these derivatives (i.e., the contours within which the patch perturbation is at least A p /2).Some patch layouts were designed to be regularly spaced in latitude and longitude, while others have patches with a consistent area over some or all of the Earth.
The bottom rows show derivatives of N with respect to SST ̅→ (i.e., the response of the global-mean net TOA radiative flux in a given location, not to be confused with the response of local values of N to local or global SST changes) constructed using a variety of Green's function method setups.There is a fairly consistent picture across the derivatives: the most negative dependence N on local surface temperature occurs in areas of tropical convection, especially over the western tropical Pacific.A similar pattern over the tropical Pacific is also seen for the ICON model (Figure S1 in Supporting Information S1).
Figure 3 illustrates the cause of these common features using some example patches (see also Andrews & Webb, 2018;Ceppi & Gregory, 2017;Dong et al., 2019;Zhou et al., 2017).Panel a shows the surface temperature change, ΔSST ̅→ , associated with applying an A p = +2K warming patch to a region of the western equatorial Pacific, where deep convection occurs.Warming in this region propagates upwards and then outwards to broadly warm the free tropical troposphere.This increases outgoing longwave radiation, and also increases lower tropospheric stability in subsiding regions, thus promoting low cloudiness (Wood & Bretherton, 2006).The resulting change in net TOA radiative flux, ΔN → , is shown in panel d.Thus, the more negative regions in Figure 2 are primarily a result of nonlocal lapse rate and low cloud feedbacks (i.e., even though the cloud response occurs primarily in subsiding regions, this response is due to warming in convecting regions, so that convecting regions have negative values This response is in contrast to the response to surface warming in the subsiding tropics and extratropics, which is mostly local.While warming in these regions can have large nonlocal responses, these responses are typically mediated by surface temperature changes elsewhere (i.e., these teleconnections typically involve an oceanic response, and thus an atmosphere coupled to a slab or dynamic ocean, e.g.Kang et al., 2008;Feldl & Roe, 2013;Liu, Lu, Garuba, Leung, et al., 2018;Kim et al., 2022;Luongo et al., 2023;Zhang, Zhao, He, et al., 2023).Since the atmospheric Green's function method aims to capture the atmospheric response to a given region's surface temperature changes independent of surface temperature changes anywhere else, the ocean response is deliberately suppressed (i.e., SST is prescribed in patch simulations), and only direct atmospheric effects are captured.
Keeping in mind that we are limiting our attention to the atmospheric response, surface temperature perturbations in the subsiding tropics typically cannot propagate beyond the boundary layer (panel b), such that they mostly cause a local decrease in lower tropospheric stability and subsequent loss of low clouds (panel e).In the extratropics, surface warming is often balanced by local horizontal circulation changes which can be maintained via the Coriolis force (panel c; see all Hoskins & Karoly, 1981).This circulation further inhibits cloud formation through advection of colder, drier air to the warming region (Williams et al., 2022).In both cases, responses are mostly local.Given that the ΔN in panels e and f are of a similar order of magnitude as in panel d but cover a much smaller area, the resulting change in globally averaged N is smaller, so that the derivative of N associated with SST changes in these regions is also smaller.
In spite of the similarities between derivatives in Figure 2, substantial differences remain (e.g., the strength of negative feedbacks in the Caribbean).Some of these differences may reflect variations in the application of the Green's function method.For example, the two HadAM3 derivatives (both created for this study) differ only in their patch layouts, but have differently sized negative regions in the tropical Pacific.Similar differences are seen for the two CanESM5 derivatives over the tropical Pacific (Figure S1 in Supporting Information S1).By proposing a common protocol, GFMIP ensures differences in derivatives are due only to the atmospheric models themselves.

The GFMIP Protocol
Table 1 summarizes the GFMIP protocol, where variables are defined as in Figure 1.The protocol consists of a control simulation, a set of patch simulations, and a pair of diagonistic simulations, as well as some optional simulations.Boundary conditions for all can be found at https://gfmip.org.We present the parameters for this protocol below, but first discuss the sensitivity tests used to inform our choices of these parameters.

HadAM3 Sensitivity Tests
Each element of the protocol was chosen by assessing existing setups and performing sensitivity tests, most with the atmosphere-only component of the HadCM3 model, HadAM3 (Pope et al., 2000), which can be run inexpensively while still having a realistic coupled control climatology (Gordon et al., 2000;Tett et al., 2022).
For each sensitivity test with HadAM3, all experimental setups are as in Table 1 unless otherwise mentioned, with the exception that we use preindustrial (as opposed to year 2000) values of all forcing agents.Note that we use annually averaged derivatives for our Green's functions, as using monthly or seasonal averages does not significantly improve our results (Figure S2 in Supporting Information S1), though it can for other models (Alessi & Rugenstein, 2023;Bloch-Johnson et al., 2020;Dong et al., 2019).
We evaluate each setup's skill by seeing how well it reconstructs the response of HadAM3 to historical sea surface temperatures.We first run an ensemble of n e = 9 simulations of HadAM3, each with the same time-evolving SST boundary conditions, SST ̅→ hist (t), but different initial conditions, generated by branching the run every 10 years from the setup's control simulation (the ensemble helps remove the influence of internal variability on N).We construct SST ̅→ hist (t) by starting with a monthly climatological base state, {SST m } c , and then adding the monthly anomalies of the AMIP data set between 1871 and 2015 (Gates, 1992), ΔSST ̅→ hist (t).These anomalies are calculated relative to the monthly climatology from 1971 to 2020 (the protocol's {SST ̅→ m } c ), so that with the exception of the sensitivity test in Section 3.2.1,SST ̅→ hist (t) is just the AMIP time series.We once more keep sea ice and forcing agents fixed to their control values.Each ensemble member produces a time series of the change in N, ΔN hist,sim,e (t), and ΔN hist,sim (t) ≡ ∑ e ΔN hist,sim,e (t)/ n e ) is the ensemble-mean time series.
For each Green's function setup, we calculate the derivative of N with respect to SST ̅→ and then apply it to ΔSST ̅→ hist (t) as in Equation 5to estimate the time series of the change in N, ΔN hist,GF (t).The setup skill is then defined as the root mean square error between this reconstruction and the ensemble-mean simulated response, where n t is the number of years in the historical time series, that is, n t = 145.Generally, we seek experimental setups that balance minimizing RMSE hist with minimizing computational expense.

Boundary Conditions
As shown in Step 1 in Figure 1, we must choose which monthly climatologies of sea surface temperature ) and sea ice fraction ) to use as boundary conditions for our control simulation.Two options have been used for the climatology source in the past-the piControl simulation of the coupled GCM associated with the atmospheric model in question (e.g., the top row of Figure 4; this was also used for CanESM5 and HadAM3 in Figure 2), or recent decades of the AMIP data set (Gates, 1992), which is based on observations (bottom row of Figure 4, which specifically uses years 1971-2020; similar periods were used for the rest of the models in Figure 2).These two base states have different sea surface temperature (panels a and e in Figure 4) and sea ice fraction (panels b and f in Figure 4) climatologies.When used to perform the Green's function method with HadAM3, they result in different derivatives of N (panels c and g in Figure 4).As mentioned above, we must also choose {SST ̅→ m } c and {SIC ̅→ m } c when we run our ensemble of historical simulations, also leading to different values of ΔN hist,sim (t) (black lines in panels d and h in Figure 4).The different derivatives of N lead to different Green's function estimates of these time series, ΔN hist,GF (t) (orange lines in panels d and h in Figure 4).
Our results suggests that, as long as the same base state is used to make both ΔN hist,sim (t) and ΔN hist,GF (t), the RMSE hist will be similar, for example, 0.27 and 0.23 Wm 2 K 1 for the piControl and AMIP climatologies respectively.This skill decreases if we use different base states for the simulation and its Green's function reconstruction, with an error of 0.36 Wm 2 K 1 if AMIP is used for ΔN hist,GF (t) and HadCM3 piControl is used for ΔN hist,sim (t).This implies that differences in base states may help explain differences in atmospheric response between models, rather than differences in model physics.For example, this could help explain the discrepancies between true and reconstructed CMIP6 radiative feedbacks in Dong et al. (2020).
Given the similar RMSE hist for these two base states, we have chosen the AMIP state, as it is model agnostic, allowing us to generate a common set of boundary conditions for the control, patch, and diagnostic experiments requested by the GFMIP protocol.While we specifically have chosen a climatology over 1971 to 2020, we expect small changes in this range to have modest effects, so that existing simulations using roughly the same period will be considered as fitting the protocol.
We have recalculated the HadAM3 derivative of N with a CO 2 concentration four times the preindustrial level (panels d-f in Figure S3 in Supporting Information S1).The resulting derivative has only modest differences, and using it instead of the preindustrial derivative has a negligible effect on RMSE hist (panel k in Figure S3 in Supporting Information S1).While differences in aerosols may be more impactful, as they are more spatially heterogeneous, the version of HadAM3 used for this analysis does not allow us to test this.We ask that participants set forcing concentrations to year 2000 values so as to be close to existing setups, and will explore the impact of different background aerosol concentrations on atmospheric Green's functions in future work.

Spinup Years (s c )
It may be useful to exclude the initial years of the control simulation, since it takes time for the atmosphere and land to adjust to imposed boundary conditions.In Figure 1, the variable s c represents the number of years of the control simulation we exclude in performing our method.We recalculate the RMSE hist associated with the Green's function method in panels d and h of Figure 4 for values of s c ranging from 0 to 100 years (panel a of Figure 5; note that we use the same scale for the y-axis in all four panels).Each point in panel a differs in that the control value of N is the average value in the interval from year s c + 1 to year s c + y c , where y c = 20 years.We use such a large range of values of s c to demonstrate that the variation in RMSE hist due to the initial atmospheric/land adjustment is much smaller than the subsequent variations due to internal variability alone, which occurs simply due to the fact that the average value of N in a given 20-year window varies over the course of the control simulation.However, the value of N in the first year of the control simulation for the HadAM3 piControl base state is a clear outlier (Figure S4 in Supporting Information S1).To be conservative, we therefore propose including a year of spinup in control simulations.

Post-Spinup Years (y c )
Next, we consider how many years to run the control simulation beyond the spinup period.We considered potential control simulation lengths, y c , of 1-40 years.To test these values, we ran a control simulation for 120 years and discarded the first year as a spinup.For each value of y c , we constructed an ensemble by dividing the remaining 119 years into an ensemble of floor(119/y c ) members, each y c years long (e.g., for y c = 40, we had an ensemble of two intervals, one from year 1-40, the other year 41-80).For each ensemble member e, we averaged the value of N over this interval to give us our control value.We then proceeded with the Green's function method, resulting in a value for RMSE hist,e .For each y c we then took the ensemble-mean RMSE hist , which we show in panel b of Figure 5.
For both base states, there is a steady decay in error that saturates after about a decade.The general shape of this curve can be derived by considering the factors that contribute to the RMSE hist .Suppose we make the simplifying assumption that annual averages of natural variations in f are normally, identically, and independently distributed, with standard deviation σ f , then: • the standard error of our estimate of f c will be σ f ̅̅̅̅̅̅̅̅̅ 1/ y c √ • the standard error of our estimate of f p will be σ f ̅̅̅̅̅̅̅̅̅ ̅ 1/ y p √ • the standard error of their difference, Δf, will be σ f ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ 1/ y c + 1/ y p √ • the standard error of our estimate of a sinusoidal patch's sensitivity of f to surface temperature change, Δf /〈ΔSST p ̅→ 〉, which we denote by σ p , will be where we have used Equation 4, which assumes the patch is land and free.
If we also assume that patches are roughly the same size (i.e., that a p is the same for all patches), then σ p is the same for all patches.In this case, the standard error of the normalized derivative in a given cell ∂N/∂SST * i will be proportional to σ p (from consideration of Equation 3), as will the standard error of ΔN hist,GF (t) (by Equation 5).For large historical simulation ensembles (i.e., large n e ), the standard error of ΔN hist,sim will approach 0, in which case RMSE hist will also be proportional to σ p (by Equation 6).Therefore, Equation 7implies that as we vary y c , RMSE hist should vary roughly linearly with ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ 1/ y c + 1/ 10 √ (where we've set y p at 10 years), which it appears to do (left panel, Figure S5 in Supporting Information S1).As a result, RMSE hist decays roughly with the square root of the inverse of y c in panel b of Figure 5. Equation 7illustrates that the skill associated with a given experimental setup depends on σ f , which can vary greatly with the variable in question, and to a lesser extent on the model as well.As a result, experimental setups that work well for studying one variable may be insufficient for studying other variables.
For example, for HadAM3, we estimate σ N is 0.16 Wm 2 .We estimate similar values for other atmospheric models (e.g., 0.14 Wm 2 for CanESM5, 0.24 Wm 2 for ICON), and so given that the experimental setup in Table 1 is primarily calibrated using reconstructions of N with HadAM3, we expect it will work similarly well for reconstructions of N in other models.
However, N is a global average.If one wanted to study the net TOA radiative flux, N, in a specific location, this setup may not be sufficient.For example, let us refer to N in the grid cell that includes the location (0°, 0°) as N 0,0 .For HadAM3, the standard deviation of annual averages of this value is 2.21 Wm 2 .This number is larger for N 0,0 than for N because there is some spatial cancellation in the influence of atmospheric noise on N. If we wanted to achieve the same skill at reconstructing N 0,0 as N using parameters from the GFMIP protocol, Equation 7 implies we must have 1/y c + 1/y p = 0.00078 years 1 , which would require both y c and y p to be at least a thousand years.This atmospheric noise also means that the simulated response of local N to a given surface temperature field may have a larger value not explained by those surface temperature changes than the response of global N.For both reasons, Green's function recreations of spatial maps of TOA radiative flux can have large errors (Zhang, Zhao, & Tan, 2023), although the nonlinearities we discuss in Section 4 may also play a role.While the random perturbation method mentioned above may more efficiently estimate the response of local values to sea surface temperature changes (Li et al., 2012), we note that this may have limitations due to these same nonlinearities, as we discuss below.
Returning to panel b of Figure 5, 10 years appears sufficient to reduce noise in our estimates of N c .However, there is only one control simulation, while there are many patch simulations.We therefore feel the cost of being conservative with a control simulation is minimal (i.e., increasing y c is much less costly than increasing y p ), and so specify that y c be 20 years.

Spinup Years (s p )
We now recalculate RMSE hist with a patch simulation spinup period ranging from 0 to 20 years (panel c of Figure 5), once more using a large range of values to show the effects of internal variability.Instead of considering different base states, we plot different values of the patch amplitude parameter, A p .Note that we use the HadAM3 piControl base state in panels c and d, for which we calculated more values of the patch amplitude A p .Due to the nonlinearities discussed in Section 4, we use averages of cooling and warming derivatives, specifically A p = ±1K, ± 2K, and ± 4K.
There appears to be little variation of RMSE hist with s p , suggesting a spinup period may not be needed.We believe this is because our patch experiments were branched directly from the end of the control simulation.We assume that if the patch simulations start with different enough initial conditions, they might still need a year of spinup.Our protocol recommendation is therefore to either branch patch simulations from the control simulation, or to leave out the first year of each patch simulation as a spinup year.We note that branching from the end of the control simulation allows the resulting experiment to be used to understand to the temporal response of the atmosphere (e.g., Holzer & Hall, 2000), which may be of interest to GFMIP participants, and so we encourage this option.

Post-Spinup Years (y p ) and Maximum Perturbation (A p )
Panel d of Figure 5 is analogous to panel b, except we consider the post-spinup length of patch simulations.For this test, we ran each patch simulation for a total 40 years.Since our patch simulations branched directly from our control simulations, we do not leave out a spinup year.As above, for each value of y p , we divided the length of the Journal of Advances in Modeling Earth Systems 10.1029/2023MS003700 patch simulation into an ensemble of floor(40/y p ) intervals (e.g., for y p = 15, there were two intervals, one from years 1-15, the other from years 16-30).For each ensemble member e, we calculated each patch p's ΔN p by taking its average over that interval, and then used these values to generate a derivative of N and subsequently an RMSE hist,e of the historical reconstruction.For each y p we then took the ensemble mean to create RMSE hist , which we show in panel d of Figure 5.
Equation 7 suggests that using larger temperature perturbations and running longer patch experiments are both ways to improve the skill of the Green's function reconstruction, and so we consider the best values of y p and A p simultaneously.As suggested by Figure 2, the derivatives of N associated with positive and negative values of A p differ, with the latter typically being more positive than the former, and in fact the top panel of Figure S6 in Supporting Information S1 shows that cooling and warming derivatives by themselves both poorly reconstruct ΔN sim (t).While we save the discussion of the nonlinearity of these results for Section 4, for now we rule out using only one sign of A p in the protocol.
For y p = 1 years, the root mean square error in excess of the asymptotic value of ∼0.25Wm 2 is roughly inversely proportional to A p (middle panel of Figure S5 in Supporting Information S1), in keeping with Equation 7, which also successfully predicts the reduction in error proportional to ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ 1/ y p + 1/ 20 √ (where we've set y c to 20 years; right panel of Figure S5 in Supporting Information S1).However, the reduction in error associated with the A p = ±4K asymptotes at a higher error than ±1 and ± 2K.We believe this is because the ± 4K perturbations are large enough to cause nonlinear behavior not associated with the response to the more modest historical temperature perturbations.
As a result, we face two tradeoffs when planning patch simulations-more years give more accurate reconstructions, but take more computing resources; and higher perturbations have better signal-to-noise ratios, but can introduce nonlinear effects.For HadAM3, p y = 10 years and A p = ±2K appears to be an optimal spot for both tradeoffs.For GFDL-AM4, extending p y from 10 to 30 years with A p = 1.5 K does not significantly affect the reconstruction of the response to observed temperatures, at least when sensitivity tests are not applied (left column of Figure 12 in Zhang, Zhao, & Tan, 2023).We therefore have chosen p y = 10 years and A p ± 2K for the protocol.

Patch Layout (ϕ p , θ p , δϕ p , δθ p )
The choice of the placement and size of the patches used in the patch experiments presents us with another set of tradeoffs: patch layouts with smaller patches increase the total number of patches that must be run and decrease the signal associated with any given patch (i.e., decrease a p in Equation 7), thus increasing the error for a given experimental setup.However, patches that are too large may obscure the very spatial variations in response to SST ̅→ that we wish to study with our Green's functions.
Figure 6 illustrates this tradeoff.We have performed a case study over the tropical Pacific (specifically 100°W to 60°E and 30°S to 30°N) in which we consider seven patch layouts whose half-amplitudes (i.e., locations where the patches reach half of A p ) are shown in the top row of -the first consisting of a single, uniform patch across the entire study domain, and the rest consisting of sinusoidal patches using Equation 1(for full details on each setup, see Table S1 in Supporting Information S1).Note that since these simulations were performed before the protocol was developed, they differ from the protocol in that they use HadCM3 piControl boundary conditions, a 120-year control simulation with s c = 1 year and y c = 119 years, and 40-year patch simulations with s p = 0 and y p = 40 years.
The second row shows the ± 2K derivatives of N for each patch setup, and the bottom panel shows the resulting RMSE hist from using these derivatives (note that for this case study, the ΔSST ̅→ hist (t) used to calculate both ΔN hist,sim (t) and ΔN hist,GF (t) is set to 0 outside of the case study domain).As expected, "low resolution" layouts (i.e., with a few, relatively large patches), have worse reconstruction skill.However, note that while the Zhou et al. setup uses smaller patches, it also uses fewer patches, as its patches overlap less than the Dong et al. setup.Having overlaps between patches does not seem to improve the skill score.
We can also see this by comparing the full-domain HadAM3 derivatives of N from Figure 2  All of the patch layouts considered for the tropical Pacific case study were equally sized and spaced in terms of latitude and longitude.Patches with the same longitudinal width δθ p can become vanishingly small close to the poles.As a result, for equal latitude/longitude layouts, many more patches are used to cover the same area at high latitudes, and since these patches cover smaller areas, they must be run for longer to achieve the same degree of signal (e.g., Equation 7).Dong et al. (2019) addressed this issue by using a larger A p and δϕ p for extratropical patches, while Alessi and Rugenstein (2023) adopted "equal-area" patches, in which δθ p (ϕ p ) is a function of the latitude of the patch (Figure 2).
Figure S7 shows a case study in which two different patch layouts are used around the Southern Ocean.The top row uses the equal-area patch layout from Alessi and Rugenstein (2023), while the bottom row shows an equal lat./lon.layout grid as in Dong et al. (2019), for which patches polewards of 50°have δϕ p = 40°and δθ p = 80°.The equal-area layout uses only 14 patches to cover the area covered by 36 patches in the equal-lat/lon layout, greatly decreasing the number of simulations needed.Neither layout produces a derivative of N with particularly large values (note that the colorbar scale matches those used in other figures).As discussed above in relation to Figure 3, extratropical warming can have large effects on global climate, but these are typically mediated by the oceanic response, which is not included in the atmospheric Green's function method.We also note that there is more spatial variation in the equal lat./lon.grid, but it is unclear if this is physical or due to internal variability.
Because of the lack of strong atmosphere-only feedbacks, we have opted for the computational savings and decreased error of using equal-area feedbacks to cover extratropical regions.However, for the tropics, where area differences are minimal, we use an equal lat./lon.grid, as it is more intuitive and maximizes the use of existing experiments.The resulting hybrid patch layout is outlined in Table 1 and shown as the "Zhou et al. (equal-area extratropics)" layout in the top row, third column of Figure 2. Note that extratropical patches have roughly the same size as patches at the equator.

Shape (ΔSST(ϕ, θ))
The sinusoidal patch shape (Equation 1) avoids sharp, unphysical gradients of surface temperature (Barsugli & Sardeshmukh, 2002).To test the utility of doing so, we have run patches with two other shapes: a uniform rectangular perturbation with no smoothing at the edge (i.e., consisting of Heaviside step functions), and the same with smoothing at the edge, taking the form of tanh functions with e-folding scales of 1°.We have performed this for three patch layouts defined in Table S1 in Supporting Information S1: 40°by 120°, Dong et al. shifted, and 20°b y 80°.All other setup details are the same as the sinusoidal case study above.Note that a rectangular patch that is entirely over ice-free ocean has a globally averaged temperature perturbation four times larger than a sinusoidal patch with the same patch size and A p .
Figure S8 shows the results.As with Figure 6, using smaller patches improves the skill at computational expense.The tanh smoothing has a minimal (or even deleterious) effect on the skill.None of the setups achieve a similar skill to the sinusoidal patches.Since sinusoidal patches are more strongly peaked at their centers, they may capture finer spatial detail in derivatives of N than rectangular patches.This may also explain why rectangular patches produce derivatives with smaller spatial variation.In any event, we see no advantage to abandoning what has become the conventional patch shape, and therefore choose Equation 1 for the protocol.

Diagnostic Simulations
Diagnostic simulations help us assess the skill of the Green's function method.We request participants run two diagnostic simulations, each with sea ice {SIC ̅→ m } c and forcing agents {F} c as in the control and patch simulations, but with the SST boundary conditions set to the following time series: • historical: time-evolving SST using the AMIP time series, from 1871 up to 2020 (Gates, 1992).This is the same as the ΔSST ̅→ hist (t) described above.The GFMIP historical experiment is similar to the CFMIP amip-piForcing experiment (Webb et al., 2017), but with fixed sea ice, and with forcing held at values from year 2000.
• abrupt4x: time-evolving SST based on the CMIP6 ensemble average of the first 150 years of the abrupt4x experiment.
For both simulations, we ask that participants run ensembles with different initial conditions, if possible, to reduce uncertainty.

Optional Simulations
We also suggest a series of optional, "Tier 2" experiments: • ± 4K patches: same as the A p = ±2K patch simulations, but with A p = ±4K.
• uniform perturbations: same as the patch simulations, except instead of patch perturbations, there are uniform SST perturbations of either ± 2K or ± 4K.• modes of interannual variability: same as the patch simulations, but with SST patterns corresponding to the dominant modes of ENSO, PDO, IOD, and AMO.

Journal of Advances in Modeling Earth Systems
10.1029/2023MS003700 BLOCH-JOHNSON ET AL.

Requested Variables
Data submitted to GFMIP should follow CMIP6 conventions, and be CMORized, with variable names and units consistent with their CMIP6 values.We request monthly averages of the 2D and 3D variables given in Tables 2 and 3 respectively for all control, patch, diagnostic, and optional simulations.We also welcome higher temporal resolution data from the initial years of patch simulations branched from the control simulation, which may be of use in constructing temporal Green's functions to understand atmospheric response at the sub-seasonal to seasonal time scale.The analogous CMIP6 variables can be found at https://clipc-services.ceda.ac.uk/dreq/u/MIPtable::Amon. html.If an atmosphere model's native grid is not along lines of latitude and longitude, we ask that output first be interpolated to a latitude-longitude grid.
We ask participants to upload data as netCDF files (We acknowledge Wing et al., 2018, which we used as the basis for this description.).

Download and Upload Information
The boundary conditions required for the control, patch, and diagnostic simulations are available for download at https://gfmip.org.This site also contains instructions for data upload and download.Upon publication of a paper summarizing results, uploaded data will become publicly available.

Nonlinearity (ΔN ∝ Δ̸ SST ̅→ )
In Figure 2, the derivatives of N with respect to SST ̅→ are generally more negative in experiments with warming than with cooling patches.This nonlinear result is consistent across models, suggesting that it is due to fundamental physics.
The top row of Figure 7 shows derivatives of N for HadAM3 with a range of values of A p (these were calculated using the HadCM3 piControl base state, so that panel h of this figure is the same as panel c of Figure 4 above).The sign of A p has a much larger effect on the derivative than the magnitude of A p , and there is a jump in the global mean value (given in the title of each panel) of 1.82Wm 2 between A p = 1K and A p = +1K, so that the asymmetry between warming and cooling exists even for small perturbations.
The magnitude of A p still has some effect, such that the negative regions of

∂N/∂SST
̅→ * associated with areas of tropical convection get more negative and extend spatially as A p increases from +1K to +4K. Figure 8 in Zhang, Zhao, and Tan (2023) shows a similar intensification between A p = 1.5 and 4K for GFDL-AM4.For HadAM3, these changes are not quite symmetric, such that the ± 1K derivative of N (panel g of Figure 7) is different than the ± 4K derivative (panel l of Figure 7 and panels c and d of Figure 5).
In addition, there appears to be a nonlinearity associated with patch size.In Figure 6, the third and fourth rows show the derivatives of N for A p = +2 and 2K respectively.The title of each panel is the average value of the derivative of N over the case study region demarcated by the black rectangle.We note the following properties: • If warming (+2K) patches are used, smaller source patches cause more negative derivatives of N. • If cooling ( 2K) patches are used, smaller source patches cause more positive derivatives of N.   • Consequently, the smaller the source patches, the larger the asymmetry between warming and cooling.For example, the average derivative for a uniform warming over the study region is 0.35 Wm 2 K 1 more negative than for a uniform cooling (first column).However, for the Dong et al. shifted layout, the average derivative is 3.37 Wm 2 K 1 more negative (last column).Extending our perturbation to a larger domain, the average  derivative for a uniform warming over the entire globe is actually 0.17 Wm 2 K 1 more positive than for a uniform cooling (not shown), in keeping with this spatial trend.
All of these nonlinearities may have a common explanation, proposed by Williams et al. (2023), which documented similar nonlinearities with the ICON model.In order to set off deep convection, the surface conditions of tropical regions must reach a threshold in which their subcloud moist static energy is larger than the saturated moist static energy aloft (Williams & Pierrehumbert, 2017).Areas that can deeply convect will serve as sources of higher saturated moist static energy for the surrounding region, with values falling off as one gets further from the convecting source.This is consistent with all three nonlinearities seen above: • Asymmetry: asymmetry between warming and cooling patches occurs because warming a convecting region increases the moist static energy supplied to the tropical free troposphere, increasing lower tropospheric stability and thus low cloudiness in the manner depicted in the first column of Figure 3.Alternatively, cooling a small patch of a convecting region may drop that region below the convective threshold, removing the location in question as a source of moist static energy.This might not have large effects because other nearby regions might still convect and supply moist static energy of a similar value.• Magnitude-dependence: larger values of A p may cause regions that would not otherwise pass the threshold for convection to do so.In particular, notice how the map of A p = +4K in panel f of Figure 7 has negative feedbacks over a much broader area than A p = +1K in panel d.Such an effect could work symmetrically, so that under cooling, fewer and fewer regions would convect deeply if cooled in isolation, and thus fewer would be able to affect the full free troposphere (e.g., panel a vs. panel c in Figure 7).• Patch size-dependence: many of the arguments made in the previous two bullet points relied on the notion that small patches were being warmed in isolation, allowing them to exceed the surface moist static energy of other nearby regions.If we instead perturb these small patches in unison, then the relative values of the tropospheric moist static energy between them will stay fairly similar, and fewer regions will switch from convecting to not convecting and vice versa, so that the impact per total warming will be smaller.Not only would this explain the range of behavior seen in Figure 6, it would also explain the difference between it and Figure S4 in Supporting Information S1: rectangular patches with perturbations over a broader area having less extreme and more linear responses than their sinusoidal equivalents.
Alternatively, the patch size-dependence could also be explained by the "Laplacian of warming" mechanism, which finds changes in vertical velocity and surface convergence are proportional to both SST ̅→ and its Laplacian (Back & Bretherton, 2009;Duffy et al., 2020).Patch size has an especially large influence on the Laplacian of SST: for a given amplitude, smaller patches have a larger Laplacian of SST, and therefore may have larger vertical velocity anomalies.Therefore, small patches may have large artificial anomalies of vertical velocity, which could affect cloudiness, and therefore N.This mechanism may also explain why some models experience asymmetries in the extratropics (Figure 2).Asymmetry to cooling versus warming may be stronger still if sea ice is allowed to vary (Liu et al., 2020).Further work is need to understand the importance and interaction of these mechanisms.
These mechanisms may also help us understand why the Green's function method often overestimates the magnitude of the change in N in response to the pattern of warming found in abrupt4x simulations in some models.Though the method successfully recreates the response of CAM5 to a range of forcing-induced SST patterns (Zhou et al., 2023), it produces too strong a response to the abrupt4x pattern in CAM4 (Dong et al., 2019), HadAM3 (panel e of Figure 8), and GFDL-AM4 (Figure 12 in Zhang, Zhao, & Tan, 2023, though note that when sensitivity tests are used, the result is too weak a response, as in their Figure 2).The black solid line shows the ensemble-mean simulated ΔN(t) (again over nine simulations with different initial conditions) of HadAM3 run with ΔSST ̅→ (t) from the first 150 years of an abrupt4x simulation of HadCM3, and the solid orange line shows the Green's function estimate using the GFMIP protocol, which is increasingly lower than the black line over time.While we might not expect the Green's function estimate to work during the initial years of rapid warming when the atmosphere is not equilibrated, the method overestimates the response for the entire simulation.
A potential correction would be to use derivatives of N estimated using only warming patches, since ΔSST ̅→ (t) is typically positive for abrupt4x simulations.However, using a derivative derived from patches with A p = +2K leads to an even worse underestimate (e.g., see Figure 7 and Figure S6 in Supporting Information S1).Similar

Journal of Advances in Modeling Earth Systems
10.1029/2023MS003700 results hold for CanESM5 and ICON (Figure S9 in Supporting Information S1), such that using only warming patches results in large overestimates of the magnitude of the response of N to abrupt4x warming over the tropical Pacific.
To understand better why the Green's function method is successful in reproducing the response of N to the historical pattern but unsuccessful for the abrupt4x pattern, we decompose these warming patterns, ΔSST ̅→ (t), into two components: a uniform perturbation, which has the same time-varying ocean-mean value as the full pattern, 〈ΔSST ̅→ (t)〉, but perturbs the SST field uniformly (dotted lines in Figure 8), and a zero-mean term ΔSST ̅→ (t) 〈ΔSST ̅→ (t)〉, which is the anomaly in the full pattern when this uniform field is subtracted (dashed lines in Figure 8).
If we perform ensembles of simulations with HadAM3 for the uniform perturbations and zero-mean patterns analogous to those we ran for the full patterns and calculate the resulting ensemble-mean time series of ΔN(t), the sum of the uniform perturbation ΔN(t) and the zero-mean pattern ΔN(t) is quite close to the ΔN(t) associated with the full warming pattern (Figure S10 in Supporting Information S1).This additivity suggests that as long as the Green's function method can recreate the responses to the uniform perturbation and zero-mean pattern individually, it should be able to recreate the response to the full warming pattern.
We consider the uniform perturbation first.One way of interpreting the area-mean values of derivatives of N seen in the titles of many panels in Figures 6 and 7 is as the estimated change in N that would result from a uniform 1K warming over the domain in question.While accounting for asymmetry in warming versus cooling reduces the variation in these values, the ±2 row in Figure 6 still exhibits substantial variation for different patch layouts.In fact, the Green's function estimate of N is biased negatively for the abrupt4x uniform perturbation (dotted lines in panel e, Figure 8), although it does better with the response to the more modest uniform perturbations of the historical time series (panel b of Figure 8).The negative bias for the abrupt4x uniform perturbation is even larger than for the full pattern itself (solid lines in panel e, Figure 8).
As for the zero-mean pattern, while the Green's function method successfully recreates the historical time series of N (dashed lines in panel c, Figure 8), it unsuccessfully recreates the response to the abrupt4x time series (dashed lines in panel e).However, in this case, its estimates are biased positively, not negatively.Panels d and f show the last years of the zero-mean patterns of the historical and abrupt4x time series, respectively.In this case, the abrupt4x pattern appears more, rather than less, spatially varying than the patches used to construct our derivatives of N.
These results suggest that more heterogeneous warming tends to cause more of a decrease in N, and there is some evidence for this in Figure 6, in that the patch layouts with smaller patches tend to have more negative area-mean "±2" and "+2" feedbacks.However, the fact that the "±2" uniform tropical Pacific patch layout does not fit with this trend suggests the picture is more complicated, and needs further investigation.Regardless, it is clear that in addition to a "pattern effect," whereby changes in the location of warming alters the global-mean feedback strength, there is a "patchiness effect," whereby changing the heterogeneity of the warming pattern also alters the feedback strength (see also Rugenstein et al., 2016, who found that surface flux heterogeneity affected the strength of the radiative feedback).
The nonlinearities associated with patch simulations are therefore not an artifact of these simulations' experimental setups.Instead, patchiness is a general feature of the response of the atmosphere to a given field of temperature change that must be accounted for in some way.For example, derivatives estimated using the random perturbation method (Li et al., 2012) will also depend on the heterogeneity of the random perturbations (determined perhaps both by the number and proximity of the patches), such that, just as with the Green's functions method, the resulting derivatives may not apply directly to all warming patterns.
Future work should extend the linear model of the atmospheric response to sea surface temperatures to account for the nonlinearities associated with the heterogeneity of the temperature pattern.We hope that GFMIP will not only standardize our analysis of the linear response to surface warming, but provide results that help in the development of this nonlinear extension.Such an extension may incorporate alternative simple models of the timeevolution of the radiative feedback, such as those using SST# (Fueglistaler, 2019), lower-tropospheric stability S (Ceppi & Gregory, 2019), or 500 hPa tropical temperature (Dessler et al., 2018).

Journal of Advances in Modeling Earth Systems
10.1029/2023MS003700

Conclusions
The dependence of atmospheric state on the sea surface temperature is a matter of critical scientific interest.In particular, the "pattern effect" has emerged as a key source of uncertainty in our projections of global warming, and the atmospheric Green's function method is a uniquely helpful tool for studying it.It allows us to decompose an atmospheric response to surface temperature changes into responses to changes in specific regions, making clear the local and nonlocal effects associated with these changes.The method has already reshaped our understanding of why the radiative feedback changes over time, both for the case of historical warming and under constant CO 2 forcing.While the Green's function results so far have pointed to certain qualitative similarities between models, it is unclear how much their differences are due to true differences in atmospheric model physics as opposed to differences in experimental setup.
The Green's Function Model Intercomparison Project will provide a standard for performing the atmospheric Green's function method, so that differences in participants' results will reflect true model differences.The protocol has been developed such that every choice reflects experimental tests measuring the sensitivity of the process to the choice in question.The development of the protocol underscored some principles involved in making a Green's function setup: the importance of using both warming and cooling patch experiments; the tradeoffs between the magnitude and size of patch perturbations and the length of the control and patch simulations; and the fact that different variables might require higher precision (and thus potentially longer simulations) than others.
Our analysis joins a growing body of literature establishing that the Green's function method can successfully recreate the response of an atmospheric model's net TOA radiation flux to historical changes in the surface temperature pattern.Not only does the method allow us to establish a causal relationship between surface temperature changes in different regions and an atmosphere response, it also allows us to trace the pathways and mechanisms by which the surface temperature changes cause those responses.Moreover, the qualitative consistency in the derivatives seen in Figure 2 suggests that the arguments of Zhou et al. (2017) and Dong et al. (2019) are robust across models, giving confidence that these studies provide fundamental physical insight into the pattern effect.
On the other hand, the preliminary results and sensitivity tests compiled for this paper show that the response of the atmosphere to surface temperature perturbations has nonlinearities associated with the sign, magnitude, and size of the perturbation.Recent work suggests that this nonlinearity may arise from convective thresholds working in conjunction with the weak temperature gradient, as well as from influences of the Laplacian of SST on vertical velocity over the perturbation in question.As with some previous studies, we find that for many models, the Green's function method estimates a response of the global-mean net outgoing TOA radiative response, N, to the warming caused by large CO 2 changes (such as in abrupt4x simulations) that is too negative.Our findings suggest the response of N to a pattern of SST change depends on the spatial heterogeneity of the pattern, with some evidence that more heterogeneous patterns cause a more negative N.
In conclusion, we think that the GFMIP results will be useful for analyzing the atmospheric response to historical climate change and for accounting for nonlinearities in the response to warming under further CO 2 increases.A refined understanding of the net TOA radiative response will improve our projections of both near and long-term warming, and can aid in the construction of linear response functions (Hassanzadeh & Kuang, 2016;Liu, Lu, Garuba, Leung, et al., 2018), which are operators that determine the time-evolution of climatic variables of interest, such as the surface temperature field.GFMIP could also provide insight into many other aspects of the atmosphere's response to surface temperature changes, such as changes in atmospheric circulation and heat transport, precipitation, and land warming.For all of these reasons, we hope that other groups will join us in carrying out the GFMIP protocol.
Suppose we have a continuous field SST(ϕ, θ), defined over the ice-free ocean.We can then define a field ∂ 2 f/ ∂SST∂a| (ϕ,θ) , which is the infinitesimal change in f due to an infinitesimal change in SST over an infinitesimal area a around the point (ϕ, θ).We call this the continuous derivative of f with respect to SST, and it has units of f divided by K and m 2 .
The change in f associated with perturbing a cell's temperature, ∂f/∂SST i , can then be calculated as an integration over the cell as follows: where r is the radius of the Earth, (ϕ i , θ i ) is the center of the cell, and δϕ i and δθ i are the latitude and longitude widths of the cell.(Note that we're assuming that cells have rectangular shapes in lat-lon space, and also that the appropriate arithmetic is applied when dealing with cells that straddle the discontinuity in longitude.)Thus, ∂f/ ∂SST i has the units of f divided by K.
As discussed in the text, ∂f/∂SST i is an ideal metric, as it depends on the size of the grid cell (i.e., it is an "extensive" variable).For instance, if we assume ∂ 2 f/∂SST∂a| (ϕ,θ) is constant over a given grid cell i, Equation A1 becomes ∂f/∂SST i = a i ∂ 2 f/∂SST∂a, where a i is the grid cell's area.
To remedy this, in the text we define a quantity called the normalized derivative.For a given cell, we define it as follows: where a tot ≡∑ i a i is the total area of the ice-free ocean.
The normalized derivative can be thought of as the value that the global derivative ∂f/∂〈SST〉 (the derivative of f with respect to the average SST value over the ice-free ocean) would have if grid cell i were representative of the whole globe.∂f /∂SST * i does not depend on grid cell size (i.e., it is an "intensive" variable), it has the same units as the global and grid-cell derivatives (∂f/∂〈SST〉 and ∂f/∂SST i respectively), and it has a similar order of magnitude as the global derivative.
However, there's a simpler way of thinking about the normalized derivative.The units of the continuous derivative have an extra m 2 in the denominator compared to the global derivative we are ultimately interested in studying.To make them comparable, we can multiply the continuous derivative by some characteristic area.Choosing the area of the whole ice-free ocean for this characteristic area will give a similar order of magnitude as the global feedback.Physically, choosing this area is the same as assuming that the given point at which the continuous derivative is being evaluated is representative of the whole ice-free ocean, and then calculating the global feedback under this assumption.
As a result, the normalized derivative used throughout this paper is just a discretized version of the continuous derivative multiplied by the area of the ice-free ocean; that is, a continuous version of the normalized derivative could be defined as: Since it is typically discretized, this means that the normalized derivative will only have the approximate value of the continuous derivative, but this approximation will get better the higher the resolution of the grid.

Figure 1 .
Figure 1.A schematic illustrating the Green's function method for modeling the dependence of an atmospheric variable, f, on the sea surface temperature field, SST ̅→ .Step 1. Run a control simulation of an atmospheric model with fixed climatologies of sea surface temperature (SST ̅→ c ) , sea ice fraction (SIC ̅→ c ) , and forcing agents {F} c , with a spinup of s c years and a post-spinup period of y c years, to estimate f c .Step 2. For each patch in a lattice overlaying the ocean surface, run the atmospheric model with that patch as a perturbation to the SST ̅→ field with a spinup period of s p years and a post-spinup period of y p years to estimate the resulting change in f, Δf p .Each patch has amplitude, shape, and position parameters (A p , δϕ p , δθ p , ϕ p , θ p in Equation 1).Step 3. Define the normalized derivative of f with respect to SST in a given grid cell i, ∂f /∂SST * i , as the average of each patch's dependence of f on its ocean-averaged SST perturbation, 〈SST ̅→ p 〉, weighted by how strong the patch perturbation is in that cell.Step 4. The response of f to a given pattern of surface temperature change ΔSST ̅→ can be estimated using the normalized derivative, ∂f /∂SST * ̅̅→ and the area in each grid cell, a i .A Green's function setup can be evaluated by comparing the response to a surface temperature time series simulated by an atmospheric model (black line) and estimated by the Green's function method (orange line).In this paper, the simulated time series is typically an ensemble mean of realizations with different initial conditions.

Figure 2 .
Figure 2. Normalized derivatives of N with respect to SST ̅→ , ∂N /∂SST ̅→ * , estimated using the Green's function method.The black ellipses in the top row show the halfamplitudes (the contours within which the patch perturbation is at least A p /2) for the patches used to estimate that column's derivatives.The bottom three rows shows the resulting derivatives.The third row shows derivatives estimated using positive values of A p (warming patches), the fourth row shows derivatives estimated using negative values of A p (cooling patches), and the second row shows their average.Data attribution is given by the names underneath each atmospheric model name in each of the column's titles.Note that there are two HadAM3 derivatives shown that differ only in patch layout.

Figure 3 .
Figure 3. Three examples of patch warming and resulting changes in N → , modeled on Figure 2 in Zhou et al. (2017).Black ellipses show half-amplitudes of the patch perturbation, as in the first row of Figure 2. Surface warming in a region of tropical ascent warms the entire tropical troposphere (panel a), increasing lower tropospheric stability elsewhere, which promotes low cloudiness and leads to broad decreases in N → (panel d).This causes the ubiquitous negative values associated with tropical convecting regions in Figure 2 of this paper.Subsidence inhibits warming from propagating upwards (panel b), while warming in the extratropics can be balanced by local circulation via the Coriolis force (panel c), such that warming in these regions mostly results in a destabilization of the local boundary layer and a loss of low clouds (and therefore an increase in local N, panels e and f).

Figure 4 .
Figure 4.A comparison of using two base states to generate Green's functions: a climatology of HadCM3's piControl simulation (panels a-d), and the most recent decades of the AMIP (observational) time series (panels e-h).Differences in the sea surface temperature (panels a and e) and sea ice fraction (panels b and f) climatologies can lead to differences in the normalized derivative of N with respect to SST ̅→ (panels c and g).Panels d and h show time series of ΔN for ensemble means of 9 simulations of HadAM3 run with a time series of historical temperature anomalies added to each row's respective base state (black lines).Applying the Green's function method results in time series estimates (gold lines) with root mean square errors as calculated by Equation 6.

Figure 5 .
Figure 5. Root mean square error in reconstructing historical ΔN, RMSE hist , calculated with Equation 6 using different values of spinup and post-spinup length for control and patch simulations.Panel a shows the dependence of RMSE hist on control simulation spinup length, s c , where large values of s c are included to show how RMSE hist can vary due to internal variability.Aside from the changing values of s c , the Green's function setup follows the GFMIP protocol (Table 1) except that the brown values use the HadCM3 piControl base state, as in the top row of Figure 4. Panel b shows the same, but for variations in the post-spinup simulation length, y c .Panels c and d shows the dependence of RMSE hist on patch simulation spinup length, s p , and patch simulation post-spinup length, y p , respectively, calculated with different magnitudes of A p .Note that the Green's function setups in this row use the HadCM3 piControl base state.Figure S5 in Supporting Information S1 shows rescalings of panels b and d using Equation 7.
using the Zhou et al. and Dong et al. shifted layouts, which require 109 and 147 patches respectively, and whose ± 2K derivatives reconstruct historical N(t) with errors of 0.27 Wm 2 K 1 and 0.26 Wm 2 K 1 respectively (once more using the HadCM3 piControl base state), compared to 0.4 Wm 2 K 1 for a uniform perturbation.Based on these results, we Journal of Advances in Modeling Earth Systems 10.1029/2023MS003700 recommend using the Zhou et al. patch layout for the tropics, which have a good balance of reconstruction skill and number of patches required.

Figure 6 .
Figure 6.Normalized derivatives of N over the tropical Pacific (100°W to 60°E and 30°S to 30°N, as shown by the black rectangles in the maps above) calculated for HadAM3 using a variety of patch layouts and values of A p .First row shows patch half-amplitudes for each column as in Figure 2, where the first column has a uniform perturbation over the study region and the rest have patches as defined by Equation 1.The next three rows show derivatives of N estimated with A p = +2K (third row), 2K (fourth row), and their average (second row).Each panel's title gives the study region's ocean-mean value of the derivative.The bottom panel shows the scatterplot of the root mean square error in reconstructing historical ΔN, RMSE hist , calculated with Equation 6 using the ±2K derivatives in the second row, against the number of patches associated with each setup.

Figure 7 .
Figure 7. Normalized derivatives of N for HadAM3, calculated using a range of values of A p .Derivatives in the top row were calculated using cooling (panels a-c) or warming (panels d-f) patch perturbations.Panels g-i show averages of the cooling and warming derivatives, while panels j-l show their differences.Numbers at the end of each panel's title give the ocean-mean value of the derivative.Note that the HadCM3 piControl base state was used to estimate these derivatives.

Figure 8 .
Figure 8. HadAM3's ensemble-mean response of N to historical (top row) and abrupt4x (bottom row) surface warming patterns (ΔSST ̅→ (t); black solid lines in panels a and e), as well as the ensemble-mean response of N to ΔSST ̅→ (t)'s decomposition into a uniform perturbation (〈ΔSST ̅→ (t)〉; black dotted lines in panels b and e) and a zeromean pattern (ΔSST ̅→ (t) 〈ΔSST ̅→ (t)〉; black dashed lines in panels c and e).Orange lines show the reconstruction of these time series using the function method, following the GFMIP protocol.Panels d and f show the zero-mean patterns in the final year of each simulation, t f .