Model Biases in the Atmosphere‐Ocean Partitioning of Poleward Heat Transport Are Persistent Across Three CMIP Generations

The observed partitioning of poleward heat transport between atmospheric and oceanic heat transports (AHT and OHT) is compared to that in coupled climate models. Model ensemble mean poleward OHT is biased low in both hemispheres, with the largest biases in the Southern Hemisphere extratropics. Poleward AHT is biased high in the Northern Hemisphere, especially in the vicinity of the peak AHT near 40°N. The significant model biases are persistent across three model generations (CMIP3, CMIP5, CMIP6) and are insensitive to the satellite radiation and atmospheric reanalyzes products used to derive observational estimates of AHT and OHT. Model biases in heat transport partitioning are consistent with biases in the spatial structure of energy input to the ocean and atmosphere. Specifically, larger than observed model evaporation in the tropics adds excess energy to the atmosphere that drives enhanced poleward AHT at the expense of weaker OHT.


Introduction
The combined meridional heat transport (MHT) by the ocean and atmosphere moderates spatial gradients in temperature on Earth.In the absence of MHT, the equator-to-pole temperature gradient would be approximately three times larger than observed based on radiative considerations alone (Pierrehumbert, 2010), rendering the tropics uninhabitably warm and the high latitudes uninhabitably cold.Observational estimates of the partitioning of MHT between poleward atmospheric heat transport (AHT) and poleward oceanic heat transport (OHT) show that OHT exceeds AHT in the deep tropics (equatorward of 10°) while AHT dominates in the mid-and highlatitudes of both hemispheres (Mayer et al., 2021;Oort & Haar, 1976;Trenberth & Caron, 2001;Vonder Haar & Oort, 1973).
The partitioning of MHT between AHT and OHT impacts climate and its changes.For example, the convergence of OHT in the extratropics is inherently linked to the surface energy budget and thus demands a surface temperature response, whereas the convergence of the same quantity of AHT in the atmosphere can be radiated to space with less impact on surface climate (Cardinale et al., 2020).Indeed, previous work by Enderton and Marshall (2009) has shown that aquaplanets with nearly identical total MHT but different AHT-OHT partitioning can have very different climates (e.g., different surface temperature and sea ice distributions).
Given the dependence of climate on MHT and its partitioning between AHT and OHT, we ask here: how well do coupled climate models represent the observations of these quantities?This question was briefly addressed in Chapter 9 of the Intergovernmental Panel on Climate Change fifth assessment report (Flato et al., 2013) which concluded that model OHT was within the wide range of observational OHT estimates.Comparison of observational and model AHT-OHT partitioning is difficult because the standard methodology for partitioning MHT between AHT and OHT differs between observations and models due to the contrasting reliability and availability of the climate fields used to calculate AHT and OHT.Recent work (Donohoe et al., 2020) demonstrated a near equivalence of the model and observational approaches used to calculate AHT-OHT partitioning in a model setting, enabling a comprehensive observational-model comparison.In this study we apply these methods to three generations of coupled model simulations (Phases 3, 5, and 6 of the Coupled Model Intercomparison Project, CMIP) and to several observational radiation and atmospheric reanalysis products.Our aim is to determine whether the models accurately capture MHT and its partitioning between AHT and OHT derived from observational datasets.
In Section 2 we provide an overview of the observational and model methodologies for partitioning MHT into AHT and OHT and demonstrate the near equivalence of these two approaches.In Section 3, we compare the observational and model MHT partitioning across the three different model generations (CMIP3, CMIP5, and CMIP6) and examine the sensitivity of our findings to the choice of observational data sets used to partition MHT.In Section 4 we consider an alternative method for comparing AHT-OHT partitioning in models and observations from the processes that contribute to spatial gradients in energy input to the atmosphere and ocean.A summary and discussion follows.

Methods for Partitioning MHT Into AHT and OHT in Observations and Coupled Models
The methodology used to partition MHT into AHT and OHT in coupled climate models and observations is described in detail in Donohoe et al. (2020).Here we summarize the conceptual approach.In climate models, AHT and OHT are diagnosed from the transport required to balance the energy input into the atmosphere and ocean respectfully.Specifically, where a is the radius of the Earth, Θ is the latitude (with θ a latitude variable of integration), and F is the net energy input to the atmosphere, ocean, or combined atmosphere-ocean system.The total MHT can be found by taking F to be the radiative flux at the TOA (RAD TOA ), OHT by taking F to be the net surface heat flux (SHF = radiative plus turbulent flux into the ocean), and AHT by setting F to be the net energy input to the atmosphere (RAD TOA -SHF).The * denotes that the global mean of each energy flux term has been removed to ensure heat transport goes to zero at both poles.This adjustment is necessary because climate models do not conserve energy globally (up to 1 W m 2 imbalances) in both the atmosphere and ocean (Lucarini & Ragone, 2011).
In contrast to models, the observed surface energy budget is not closed (Stephens et al., 2012;Trenberth et al., 2009).Thus, observational OHT cannot be estimated from SHF in Equation 1. Instead, we use an approach following Trenberth andCaron (2001) andVonder Haar andOort (1973): observational MHT is calculated using Equation 1 with satellite RAD TOA (Loeb et al., 2018); observational AHT is calculated from high frequency atmospheric reanalysis as the vertically and zonally integrated meridional energy flux; and observational OHT is calculated as the residual of satellite derived MHT and reanalysis derived AHT.In the AHT calculation, the vertically averaged moist static energy is removed before integrating (Cardinale et al., 2020;Donohoe & Battisti, 2013;Donohoe et al., 2020), effectively applying a mass correction needed to make the AHT calculation physically meaningful (Liang et al., 2018;Trenberth & Stepaniak, 2003).Donohoe et al. (2020) demonstrated that the "observational" and "model" approaches calculated nearly identical AHT when applied to a NCAR CESM1 coupled simulation.We extend this result to show that the two approaches give nearly identical partitioning of MHT into AHT and OHT (cf. the dashed and solid red and blue lines in Figure S1 in Supporting Information S1) with a root mean squared difference AHT (and OHT) between the two methods of 0.07 PW.We use this result to justify the examination of potential model biases in MHT partitioning using these two methodologies.

Results: Model Biases in MHT Partitioning
Climate model biases in MHT partitioning are analyzed using pre-industrial control simulations from three different CMIP generations (Eyring et al., 2016;Meehl et al., 2007;Taylor et al., 2012) and several different sets of observational products (see Supporting Information for details).We focus on pre-industrial simulations in order to compare results between CMIP generations (which have different years of simulation and forcing for historical simulations) and evaluate the potential differences between historical and pre-industrial climate in Section 3.2.The presentation of our results is organized as follows.Section 3.1 presents the observational estimate of MHT partitioning using the most contemporary and high resolution data available, which is compared against the MHT partitioning in the three CMIP ensembles.Section 3.2 analyzes the sensitivity of our results to the observational data used by comparing eight different observational estimates of MHT partitioning against the multi-generation CMIP ensemble mean.The results show that the sign and spatial structure of model biases in MHT partitioning are consistent across model generation and observational data sets used.

Consistent Model Biases in AHT-OHT Partitioning Across Three CMIP Generations
In this section, we use CERES Energy Balanced and Filled (EBAF) TOA radiation (Loeb et al., 2009) and the ERA5 atmospheric reanalysis (Hersbach et al., 2020) to calculate an observational estimate of MHT and its partitioning over the period 2001-2020.This observational estimate (solid line) is compared against each of the three CMIP ensembles (in each row of Figure 1; with dashed lines showing individual models and the thick dashed lines showing CMIP ensemble averages).We focus here on robust differences between the model ensemble mean in each CMIP generation and the observational estimate which are defined by biases greater than two standard errors of the ensemble mean (the ensemble standard deviation divided by the square root of the ensemble members, corresponding to roughly a 95% confidence interval, see Figure S6 in Supporting Information S1).Observational errors on AHT due to inter-annual variability are less than 0.1 PW (see Figure S9 in Supporting Information S1) whereas it remains unclear how structural errors in satellite radiation project onto MHT uncertainty (Wunsch, 2005).
Poleward MHT peaks near 35°in both hemispheres in both models and observations (Figure 1), due to Earth-Sun geometry constraints (Stone, 1978).However, across all three CMIP generations, the amplitude of the ensemble mean poleward MHT in models is biased low in the mid-latitudes of the Southern Hemisphere (SH) whereas the model bias in peak MHT in the Northern Hemisphere is only significantly low in CMIP5.The inter-model spread in peak MHT (2 standard deviations) is as large as 23% of the ensemble mean in the SH and about half as large in the NH.Donohoe and Battisti (2011) demonstrated that the inter-model spread and bias in MHT in CMIP3 results from biases and spread in the albedo of clouds which impact the equator-to-pole gradient of absorbed solar radiation.
We next analyze the partitioning of MHT between OHT and AHT.In the NH, the model ensemble mean is significantly biased toward too little poleward OHT and too much poleward AHT in all three CMIP generations.The model bias toward smaller than observed OHT extends poleward to the Arctic where OHT has been demonstrated to have large impacts on sea ice extent (Docquier & Koenigk, 2021;Holland et al., 2006;Seager et al., 2002).
In the SH, poleward OHT in the models is biased low relative to the observational estimate in all three CMIP generations.The largest biases in OHT are found the vicinity of 40°S.The observational estimate of poleward OHT is only exceeded in three model simulations (two in CMIP3 and one in CMIP5).In contrast, the poleward AHT in the SH is not significantly different between the model ensemble means and observational estimates.We note that while a minority of individual models have poleward OHT as strong as the observed estimate, these models are biased toward too much total poleward MHT; no single model matches the observed MHT, AHT and OHT simultaneously (Figure S8 in Supporting Information S1) despite the model ensemble spreads overlapping the observational estimates for either MHT, AHT or OHT independently.
These results suggest that in the SH, the majority of the model biases in MHT are a result of biases in OHT, whereas in the NH the models generally simulate too much poleward AHT and too little poleward OHT.
Alternatively, the fractional contribution of AHT-OHT to total MHT (i.e., normalizing each model by the model specific MHT) is biased toward too much poleward AHT and too little poleward OHT with biases that are nearly hemispherically symmetric between the two hemispheres (not shown).Importantly, the sign and spatial structure of model biases in MHT and AHT-OHT partitioning are remarkably consistent across the three CMIP generations spanning over 20 years of progress in climate modeling.

Sensitivity of Results to Observational Data Sets Used
We next consider whether the identified model biases in AHT-OHT partitioning are sensitive to the choice of observational data sets (TOA radiation and atmospheric reanalysis) used to partition MHT.We use the mean of all ensemble members across all three CMIP generations, referred to as the CMIP-mean, as a reference for all analyses in this subsection.
We begin by analyzing the MHT and AHT/OHT partitioning estimated using two additional satellite-derived observational estimates of TOA radiation (see Supporting Information S1 for details): the unadjusted CERES single scanner footprint (SSF) data and the ERBE satellite data (Barkstrom & Hall, 1982) which spans the 1984-1990 (left panels of Figure 2 bordered by the black box).In these three panels, the choice of TOA radiation product alters the calculated observational MHT (solid black line) whereas the AHT is unchanged between panels (ERA5 is used in each).Because the observational OHT is calculated from the difference of MHT and AHT, the observational OHT estimate (solid blue line) also varies between panels.Observational MHT calculated from the three different TOA radiation products is consistently larger than the CMIP-mean in both hemispheres.Model biases in MHT are largest when the CERES SSF product is used (Figure 2e) and smallest when the ERBE product is used (Figure 2c).The CMIP-mean OHT is biased low compared to that derived from all three TOA radiation datasets with largest magnitude biases when CERES SSF is used, especially in the SH.Model biases in AHT/ OHT partitioning are insensitive to observational TOA radiation data set used which give a consistent estimate of MHT despite their substantial (≈5 W m 2 = 2.5 PW globally integrated) differences in global mean TOA radiative balance associated with absolute calibration uncertainty (Loeb et al., 2009).
We next analyze the sensitivity of our results to the choice of atmospheric reanalysis used to calculate the AHT (Figure 2 panels (a), (b), (d), (f) and (h)).In these five panels, the MHT is identical (calculated using the CERES EBAF product) whereas the AHT is calculated from the ERA5, ERA-interim (Dee et al., 2011), NCEP (Kalnay et al., 1996), MERRA2 (Gelaro et al., 2017), and JRA-55 (Kobayashi et al., 2015) reanalyzes.Since OHT is calculated from the residual of MHT and AHT, the OHT difference between the three panels are equal and opposite to the inter-panel differences in AHT.The CMIP-mean bias toward too much poleward AHT and too little poleward OHT is apparent using all five observational estimates of AHT.Poleward AHT is largest when using ERA5 followed closely by JRA-55, MERRA2 and then ERA interim, whereas using NCEP produces the smallest poleward AHT with the most notable difference near the peak in the SH at 40°S.Therefore, model biases in the AHT-OHT partitioning are smallest using ERA5 and largest using NCEP.These results suggest that the sign and spatial structure of model biases in MHT partitioning are consistent across atmospheric reanalysis datasets, whereas the magnitude of the bias depends on the reanalyses product used.Differences in AHT calculated between the different reanalyses are not impacted by differences in the spatial resolution (see analysis and Figure S2 in Supporting Information S1) as even the coarsest product (NCEP) resolves the spatial scales responsible for the vast majority of AHT.The similarity of the AHT derived from different reanalysis is consistent with the conclusions of Liu et al. (2020).
Finally, we evaluate whether heat storage due to the transient response to anthropogenic forcing impacts our observational estimates of OHT.The Earth is not in equilibrium but, rather, is accumulating energy at an average rate of 0.7 W m 2 globally (Johnson et al., 2016).The vast majority of this energy accumulation is stored in the ocean (Von Schuckmann et al., 2016) and it is possible that the spatial structure of this energy storage projects onto our diagnoses of observational OHT for the following reason: observed "implied" OHT is calculated from the spatial integral of inferred surface heat fluxes (TOA radiation plus AHT convergence) and the latter is balanced by the sum of OHT divergence and ocean heat storage in a transient system.We diagnose the impact of observed ocean heat storage on the implied OHT (OHT STORAGE ) from the trend in ocean heat content, derived from UK Hadley Center EN4 objective ocean analysis (Good et al., 2013) over the CERES period (see Supporting Information for details).OHT STORAGE is removed from the "implied" OHT to estimate the "true" OHT (solid teal line in Figure 1f) that must be transported laterally in the ocean to close the ocean energy budget.OHT STORAGE is very small (<0.1 PW in magnitude) and, thus, the diagnosed "true" OHT is visually indistinguishable from the observational "implied" OHT (solid blue line in Figure 1f).The global mean ocean heat uptake of 0.7 W m 2 translates to 0.4 PW of global energy input to the ocean but the implied OHT of ocean heat storage is significantly smaller in magnitudes due to ocean heat uptake being more globally uniform than regionally isolated.The negligible impact of ocean heat storage on "implied" OHT over the historical period is consistent with the small (<0.1 PW) differences between OHT in the ensemble mean of historical CMIP5 simulations averaged over the 2000-2018 time period as compared the pre-industrial control simulations using the same models (Figure S3 in Supporting Information S1).
Collectively, these results suggest that the sign of model biases in AHT-OHT partitioning is robust to different observational products (satellite TOA radiation and atmospheric reanalysis) used to partition MHT.Additionally, the spatial pattern of transient heat uptake by the ocean makes a negligible impact on estimated OHT.However, the magnitude of the model bias in AHT-OHT partitioning does vary with observational datasets used.In this regard, the use of CERES EBAF and ERA5 data for our primary analysis (Figure 1) is a conservative estimate of model biases in AHT-OHT partitioning (a smaller OHT bias is found only when using the combination of ERBE and ERA5 products).

Biases in Energy Input to the Atmosphere and Ocean and Inferred AHT and OHT Biases
Here we evaluate potential causes of the persistent model biases in AHT and OHT in terms of model biases in the spatial structure of energy input into the ocean and atmosphere.Starting in the ocean, energy conservation demands that OHT across a latitude band balances the net surface heat flux out of the ocean (-SHF by our sign convention) integrated over the polar cap bounded by that latitude, which from Equation 1 is represented by: where we have neglected the ocean heat uptake in Equation 2 (as discussed in Section 3.2).SHF is equal to the net downward surface radiation (RAD SURF ) into the ocean minus the upward turbulent energy fluxes of sensible (SENS) and latent heat (L v E): Substitution of Equation 3 into Equation 2 allows the OHT to be decomposed into the implied transports of each term contributing to SHF: where, for example, the OHT implied by evaporation (OHT E ) is: where, as in Equations 1 and 2, the * indicates that the global (ocean domain) mean has been removed from the term.Because SENS* is small compared to the other terms (Figure 3c) and RAD SURF is dominated by solar input to the surface (Figures S5E and S5F in Supporting Information S1), the predominant energy balance is between solar radiation and evaporation.Most of solar radiation is absorbed in the tropics, where it is only partially compensated by local evaporative cooling, leaving some portion of the absorbed energy to transported polewards by the ocean.The degree of the compensation between tropical absorption of solar radiation and tropical evaporation determines the strength of the OHT, with weaker compensation requiring more OHT.With this picture in mind, we can use Equations 2 and 5 to understand the role of radiation and evaporation biases in creating model OHT biases.
The latitudinal structure of CMIP-mean L v E, SENS and SURF RAD over the ocean domain is compared to observational estimates of the same quantities with L v E and SENS taken from the WHOI Objectively Analyzed (OA) Air-Sea Flux product (Yu et al., 2004) and SURF RAD estimates from the CERES EBAF surface product (Kato et al., 2018) in Figure 3c.Evaporation is biased high in models (relative to the observational estimate) at all latitudes except the Arctic (Figure S4 in Supporting Information S1).Evaporation biases are largest (>20 W m 2 ) in the subtropics of both hemispheres and are much smaller in the high latitudes.These evaporation biases manifest as enhanced subtropical ocean energy loss by E* in the models (cf. the dashed and solid lines in Figure 3c) and an implied model bias toward too little (by approximately 0.4PW) poleward OHT due to evaporation in each hemisphere (OHT E , green line in Figure 3d).Thus, evaporation biases alone explain the majority of the model bias in OHT identified in Section 3 (compare green and dashed black lines in Figure 3d).
The observational RAD * SURF has a stronger equator-to-pole gradient than that in climate models (cf. the solid and dashed orange lines in Figure 3c) especially in the SH.Model biases in RAD * SURF are associated with larger than observed downwelling solar radiation into the extratropical Southern Ocean (Figure S5E in Supporting Information S1) due to clouds that are optically thinner than observed (Donohoe & Battisti, 2012).As a result, observed poleward OHT RAD,SURF is larger than that in models with larger magnitude (0.4 PW) biases in the SH.The model biases in OHT RAD,SURF mirror the impact of TOA radiation biases on MHT (left panels of Figure 1) including the partitioning between shortwave and longwave biases within each hemisphere, suggesting that model biases in MHT and OHT in the SH are due to biases in shortwave absorption whereas those in the NH are due to biases in OLR and net surface longwave (Figures S5B and S5F in Supporting Information S1).
The sum of model biases in OHT E , OHT RAD,SURF and OHT SENS (solid black line in Figure 3d) finds that models would have weaker than observed poleward OHT of 0.6 PW in the NH and 0.8 PW in the SH based on biases in energy input to the ocean.This overall inferred OHT bias is primarily due to a nearly hemispherically mirrorimaged bias in OHT E which is enhanced by poleward OHT RAD,SURF in the SH.The bias in OHT inferred from surface flux biases matches the spatial structure but exceeds in magnitude the OHT biases calculated in Section 3 from TOA radiation and atmospheric reanalysis (dashed black line in Figure 3d).These two calculations of model OHT biases do not have to match as they use different conceptual approaches and rely on completely independent observational climate fields.Nonetheless, the consistency of the sign, spatial pattern, and magnitude of the OHT biases calculated using the two different approaches suggest that the model biases in surface energy fluxes are large enough to account for the AHT-OHT partitioning biases inferred from the residual TOA radiation and AHT estimates.
We use a similar calculation of the model biases in implied AHT from the spatial structure of energy input to the atmosphere to compute an alternative estimate of AHT biases to those calculated in Section 3. The AHT analog to Equation 4 is: where the atmospheric analog to Equation 5 for the AHT due to evaporation (AHT E ) is: The spatial integral is over a global (land plus ocean) domain.Here RAD ATMOS is the net radiative heating of the atmospheric column which is equivalent to the net radiation at TOA minus RAD SURF .Fajber et al. (2023) demonstrated that poleward AHT is primarily determined by evaporation (AHT ≈ AHT E ) because L v E* dominates the spatial structure of energy input to the atmosphere.We note that L v E* spatially integrated over the ocean domain has opposing impacts on AHT E versus OHT E (and likewise for SENS* and AHT SENS vs. OHT-SENS ).This arises because excess evaporation over the low latitudes (E* > 0) adds energy to the atmosphere to enhance the demand for poleward AHT at the expense of removing energy from the low-latitude ocean to reduce the demand for poleward OHT.This mechanism (OHT E = AHT E ) is the foundation of the compensation between changes in AHT and OHT proposed by Bjerknes (1964).
To more clearly see the compensation between biases in AHT-OHT due to model biases in L v E* (and SENS*) over the ocean domain, we take the following approach to compare models and observations of AHT via Equations 6 and 7. First, AHT E and AHT SENS are calculated from the observational WHOI OA evaporation and sensible heat flux data over the ocean domain only, and are compared to analogous model calculations over the ocean domain.Then, the contribution of turbulent energy fluxes over land to the combined AHT E and AHT SENS is estimated from the CERES EBAF net surface radiation spatially integrated over land.This approach assumes that (via surface energy balance) surface radiative gain is balanced by turbulent loss.These calculations are compared to analogous calculations in the models.Finally, AHT RAD,ATMOS is calculated from the CERES EBAF TOA and surface data over the global domain and is compared to the analogous global domain calculation in models (orange lines in Figures 3a and 3b).This strategy circumvents the lack of reliable observational estimates of turbulent energy fluxes over land -instead inferring them from a like-with-like observational-to-model comparison of surface radiation over land and assuming that RAD SURF is balanced by upward turbulent fluxes from the land to the atmosphere (the latter assumption has been validated in models).
Model biases in AHT E compose the vast majority of AHT biases diagnosed from Equation 6(cf. the green and solid black lines in Figure 3b) and suggest that the stronger than observed poleward AHT in models is driven by an enhanced equator-to-pole gradient in evaporation.Model RAD * ATMOS is more negative in the deep tropics as compared to observations (due to stronger longwave cooling in the models-Figure S5 in Supporting Information S1) which contributes to smaller AHT RAD,ATMOS export from the tropics in the models that generally opposes the low latitude biases in AHT E (orange line in Figure 3b).Interestingly, shortwave absorption in the atmosphere is biased low in the models, which reduces the demand for poleward AHT by nearly 0.4 PW in both hemispheres (red line in Figure S5D in Supporting Information S1).However, this model deficit in atmospheric heating of the tropics is nearly compensated for by weaker than observed longwave cooling of the atmosphere such that there is almost no bias in AHT RAD,ATMOS at the equator-to-pole scale.Turbulent energy fluxes over the land inferred from net surface radiation are nearly identical in models and observations and make a negligible impact on AHT biases (cf.purple dashed and solid lines in Figures 3c and 3d).
These calculations demonstrate that the model biases in the partitioning of poleward heat transport between AHT and OHT that were inferred in Section 3 are consistent (in sign, spatial structure, and magnitude) with the model biases in energy input into the atmosphere and ocean by radiative fluxes and turbulent exchange between the atmosphere and ocean.Stronger than observed evaporation in the models contributes to enhanced poleward AHT at the expense of reduced OHT that is nearly hemispherically symmetric whereas radiative biases due to thinner than observed clouds in the extratropical Southern Ocean results in too weak poleward MHT that is primarily manifested in the surface energy budget and implied OHT bias.

Summary and Discussion
Coupled climate models have too little poleward OHT in both hemispheres and too much AHT in the NH, compared to observational estimates.These model biases are remarkably consistent across three generations of coupled model ensembles (CMIP3, CMIP5, and CMIP6) and across different sets of observational TOA radiation and atmospheric reanalysis data.These conclusions are not impacted by observed transient energy accumulation in the ocean.
We note that there are multiple different ways for computing AHT from atmospheric reanalysis, with subtle differences depending on the method used to balance the mass budget.Our method uses the energy budget with respect to a fixed mass of atmosphere (Donohoe & Battisti, 2013;Liang et al., 2018), and differs from that used by Trenberth and Stepaniak (2004) and Mayer et al. (2017).We have justified our decision by calculating the heat transport using the methods for "observational" and "model" in the context of a single model (Figure S1 in Supporting Information S1), and showed that they are nearly equivalent.We emphasize that all choices made here were aimed at creating a consistent way to compare observational and model MHT and AHT-OHT partitioning despite the different climate fields that go into each calculation.
This work focused on model biases in the vertical zonal and time integral of atmospheric moist static energy fluxes that comprise AHT without regard for biases in the underlying atmospheric circulations and associated temperature and humidity structures of the atmosphere.Donohoe et al. (2020) demonstrated that model biases in poleward AHT primarily result from larger than observed dry (sensible) heat transport by transient eddies in the mid-latitudes of both hemisphere (their Figure 4D in Supporting Information S1) and in the NH smaller than observed dry heat transport by stationary eddies; the moisture (latent heat) transport has negligible biases.Model biases in evaporation are expected to be manifested as biases in both moist and dry AHT because dry AHT is set by the spatial pattern of condensational heating of the atmosphere which represents the portion of AHT E that is not transported poleward as latent heat (Fajber et al., 2023); while spatial patterns of evaporation directly demand poleward moist AHT, the energy input to the atmosphere via evaporation is handed off to dry AHT where precipitation forms and the atmosphere is heated condensationally.Therefore, our finding that model biases toward too much AHT result from stronger than observed evaporation is consistent with the finding that excess poleward AHT in the models is expressed as a bias toward too much dry heat transport.
Remarkably, the model OHT bias inferred from observational estimates from satellite TOA radiation and atmospheric reanalyzes is in decent agreement with model biases in the energy exchange between the ocean and atmosphere calculated from independent observational estimates of surface heat fluxes.The latter bias is due primarily to stronger than observed low-latitude evaporation in the models.We note that the community has been reluctant to diagnose OHT from the observed surface energy balance because of uncertainty in the turbulent energy fluxes.Yet, our analysis paints a consistent picture of the model biases in turbulent energy fluxes -whether these are inferred from the residual of TOA radiation and AHT or from bulk formula.We also note that observational estimates of global mean evaporation and its equator-to-pole gradient vary substantially (Stephens et al., 2012) with reanalysis products generally having more evaporation than the bulk formula based estimates such as WHOI OA flux (Yu et al., 2004) and SEAFLUX (Curry et al., 2004).We chose to use WHOI OA flux for the analysis in Section 4 because the bulk formula in this product are optimized to match buoy observations -making it the most observationally constrained estimate of evaporation.Additionally, the global constraint of evaporation balancing precipitation is nearly satisfied from the combination of the WHOI OA FLUX evaporation over the ocean (62.8 W m 2 contribution to global mean) plus the ERA5 reanalysis evaporation over land (12.9 W m 2 for a global total evaporation of 75.7 W m 2 ) nearly balancing the best observational estimate of global mean precipitation (77.9 W m 2 ) from the NOAA GPCP (Adler et al., 2018).The lack of closure of the observed global mean surface energy budget suggests that observational surface radiation and/or turbulent energy fluxes are poorly constrained and one hypothesized solution is that both global mean evaporation and precipitation are substantially underestimated (Stephens et al., 2012).Our analysis circumvents this debate by removing global mean quantities, showing that the equator-to-pole gradient of surface energy fluxes is consistent with that inferred from TOA radiation and AHT divergence.This suggests that the meridional structure of surface energy fluxes constrained by TOA radiation and AHT could be used in conjunction with global mean imbalances to give an additional constraint for reconciling which terms in the observed surface energy budget are most uncertain and/or biased.AD, KCA, TC, GHR, and DSB acknowledge support from National Science Foundation Award AGS-2019647.AD acknowledges funding from National Science Foundation Award AGS-2311154.RF was supported by the NOAA Climate and Global Change Postdoctoral Fellowship programs for the Advancement of Earth System Science (CPAESS) under award NA18NWS4620043B.We thank three anonymous reviewers and Editor Kristopher Karnauskas for thoughtful comments and suggested clarifications in the presentation of the manuscript.

Figure 1 .
Figure 1.Observational and model (left panels) total meridional heat transport (MHT) and (right panels) its partitioning between the atmosphere (AHT, red) and ocean (OHT, blue).Results from the CMIP3, CMIP5, and CMIP6 models are shown in the top, middle and bottom panels respectively.The observational estimates are shown by the heavy solid line, individual coupled models are shown by the dotted lines and the model ensemble mean is shown by the heavy dashed line.

Figure 2 .
Figure 2. Comparison of MHT, OHT and AHT in models and observations using eight different observational estimates of MHT (black solid), AHT (red), and OHT (blue).The left panels show the sensitivity of the transports to TOA radiation product used with CERES EBAF on the top panel, ERBE in the second panel, and the unadjusted CERES SSF on the bottom and with the ERA5 AHT estimate across all panels.The right panels show the observational transports calculated using CERES EBAF TOA radiation in all panels but using different atmospheric reanalysis products in each panel: (b) ERA Interim; (d) NCEP; (f) MERRA2 and; (h) JRA.Panel (g) shows the impact of observed spatial patterns in ocean heat storage on implied OHT using EN4 ocean heat content changes over 2000-2018.The model mean is the average over all models in CMIP3, CMIP5, and CMIP6 (CMIP-mean).

Figure 3 .
Figure 3. and observational estimates of the energy input into the atmosphere and ocean and the implied and OHT biases resulting from each input.(a) Global anomaly energy input into the atmosphere in models (dashed) and observations (solid).See text in Section 4 for definition of terms.(b) Implied AHT bias (observations minus models) due to each energy input.The solid black line shows the sum of all terms.The dashed black line shows the bias in total heat transport inferred from CERES and ERA5 data as discussed in Section 3. (c) As in (a) but for the energy input to the ocean.(d) As in (b) but for the implied OHT bias.