We evaluate the annual mean radiative shortwave flux downward at the surface (RSDS) and reflected shortwave (RSUT) and radiative longwave flux upward at top of atmosphere (RLUT) from the twentieth century Coupled Model Intercomparison Project Phase 5 (CMIP5) and Phase 3 (CMIP3) simulations as well as from the NASA GEOS5 model and Modern-Era Retrospective Analysis for Research and Applications analysis. The results show that a majority of the models have significant regional biases in the annual means of RSDS, RLUT, and RSUT, with biases from −30 to 30 W m−2. While the global average CMIP5 ensemble mean biases of RSDS, RLUT, and RSUT are reduced compared to CMIP3 by about 32% (e.g., −6.9 to 2.5 W m−2), 43%, and 56%, respectively. This reduction arises from a more complete cancellation of the pervasive negative biases over ocean and newly larger positive biases over land. In fact, based on these biases in the annual mean, Taylor diagram metrics, and RMSE, there is virtually no progress in the simulation fidelity of RSDS, RLUT, and RSUT fluxes from CMIP3 to CMIP5. A persistent systematic bias in CMIP3 and CMIP5 is the underestimation of RSUT and overestimation of RSDS and RLUT in the convectively active regions of the tropics. The amount of total ice and liquid atmospheric water content in these areas is also underestimated. We hypothesize that at least a part of these persistent biases stem from the common global climate model practice of ignoring the effects of precipitating and/or convective core ice and liquid in their radiation calculations.
 Atmospheric radiative structures, such as fluxes and the vertical/horizontal distributions of heating, are one of the most important factors determining global weather and climate. Clouds can exert a strong influence on regional radiative balance by reflecting shortwave (SW) radiation back to space and trapping longwave (LW) radiation and radiating it back to the surface, providing one of the strongest feedback models in the climate system. The balance of these fluxes is essential for understanding Earth's climate system and constraining the energy balance for climate models [Stephens, 2005]. However, representing clouds and cloud radiation feedback in global climate models (GCMs) to reduce and quantify uncertainties associated with climate change projections remains a big challenge. Global constraints and information for developing and evaluating clouds and radiation in GCM simulations were typically derived from radiation budget observations from the Earth Radiation Budget Experiment/Clouds and the Earth's Radiant Energy System (ERBE/CERES) [Wielicki et al., 1996] and from cloud cover observations from the International Satellite Cloud Climatology Project (ISCCP) and related products [e.g., Han et al., 1999; Rossow and Zhang, 1995; Rossow and Schiffer, 1999]. In the last decade, the first satellite simulator was available for the International Satellite Cloud Climatology Project (ISCCP) [Rossow and Schiffer, 1999] to serve for evaluation and intercomparison of climate model clouds [e.g., Norris and Weaver, 2001; Lin and Zhang,2004; Zhang et al., 2005; Schmidt et al., 2006; Cole et al., 2011; Kay et al., 2012]. More recently, the Cloud Feedback Model Intercomparison Project (CFMIP) [e.g., Bony et al., 2011] has been coordinating development of the CFMIP Observation Simulator Package (COSP) and includes a number of new satellite observations from Multiangle Imaging Spectroradiometer, Moderate Resolution Imaging Spectroradiometer (MODIS), CloudSat, Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO), and Polarization and Anisotropy of Reflectances for Atmospheric Sciences Coupled With Observations From a Lidar (PARASOL). COSP has been used widely to understand and quantify climate model cloud biases [e.g., Chepfer et al., 2008; Bodas-Salcedo et al., 2008, 2011; Zhang et al., 2010; Kay et al., 2012; Kodoma et al., 2012, etc; Nam and Quaas, 2012]. For example, with the ISCCP simulator, Zhang et al.  identified the “too few, too bright problem” in the early stage of GCMs, while recently, Klein et al.  have shown that the representation of clouds, in particular the “too few, too bright” problem, has improved in CMIP5 over CMIP3.
 A key element of obtaining an accurate top of atmosphere (TOA) and surface radiation budget is the representation of clouds, which for GCMs and Earth radiation budget considerations can be roughly broken down into cloud cover, cloud water mass, and cloud particle sizes. While ISCCP and other products have provided some very useful information for the constraints on cloud cover, the latter two quantities have been largely unconstrained due to the lack of observations for cloud water mass and particle size. This is especially the case for the vertical structure information leaving too many degrees of freedom unconstrained. The ramifications of this issue for cloud mass are clearly evident in the wide disparity in the cloud ice and liquid water content (CIWC and CLWC) values exhibited in present-day models, including those contributing to Phases 3 and 5 of the Coupled Model Intercomparison Project (CMIP3 and CMIP5) [e.g., Waliser et al., 2009; Li et al., 2012].
 The recent availability of the first tropospheric vertically resolved cloud radar reflectivity and derived ice/liquid profiles from CloudSat [Austin et al., 2009] and combined with CALIPSO [Deng et al., 2010, 2013; Delanoë and Hogan, 2008, 2010] provides new means for global cloud mass evaluation [e.g., Chepfer et al., 2008; Li et al., 2012; Waliser et al., 2009; Chen et al., 2011; Jiang et al., 2012; Gettelman et al., 2010; Klein and Jakob, 1999; Webb et al., 2001; Delanoë and Hogan, 2008, 2010; Delanoë et al., 2011; Bodas-Salcedo et al., 2008, 2011; Zhang et al., 2010; Kay et al., 2012; Kodoma et al., 2012]. Among them, Li et al. [2011, 2012] and Waliser et al., [2009, 2011] strove to point out that considerable care and caution are required in order to make judicious comparisons/interpretations regarding atmospheric liquid/ice and its associated interactions with radiation. This is because most GCMs typically only represent the “suspended” hydrometeors associated with some/most clouds, while satellite observations include both clouds and falling hydrometeors (e.g., rain or snow) as well as convective core cloud mass. For example, the observations from sensors such as the CloudSat Radar and the Clouds and the Earth's Radiant Energy System (CERES) instruments are sensitive to a broader range of particles for ice/liquid water mass, including clouds, falling snow/rain, as well as convective core water mass. In contrast, most GCMs, including all CMIP3 models and most of the models in CMIP5, only model the radiation impacts from the cloud-related hydrometeors and, in some cases, not even all the clouds (e.g., deep convection). Given that most models from CMIP3 and CMIP5, for example, significantly underestimate (or do not explicitly model all) the total water mass, this may result in possible biases in the radiation fields. Evidence for systematic biases in GCM radiation fields, for example, in CMIP3, was given in the analysis by Trenberth and Fasullo  that showed too much absorbed SW and outgoing LW in conjunction with heavy precipitation regions, e.g., the Intertropical Convergence Zone (ITCZ). An observation-based modeling study by Waliser et al.  led to the hypothesis that the typical practice of ignoring the impacts of precipitating hydrometeors would account for at least a portion of this systematic bias. As the persistence of this practice continues, it is worth examining if the same systematic bias might be evident in CMIP5.
 In this study, we examine systematic radiation budget biases in CMIP3 and CMIP5, in particular by evaluating the RSDS at the surface and RSUT and RLUT fluxes at the TOA. These three quantities were chosen because they are the radiative fluxes that are most directly influenced by convections and clouds, and we are interested in examining their biases in light of the liquid and ice water biases mentioned above. Note that other fluxes, such as the downward longwave flux at the surface, are as much influenced by clouds; they are also influenced by other species such as water vapor, which are not covered in this study. The model simulations considered in this study are from the twentieth century CMIP3 and CMIP5 simulations as well as the NASA GEOS5 AGCM with prescribed SSTs and Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalysis data. Observation-based reference data for the TOA fluxes are derived from contemporary satellite radiation measurements, while surface fluxes are derived from satellite-constrained model calculations using a radiative transfer model. In section 2, we describe these observational data sets, including the way the different retrievals and other methodologies are combined to form a robust observational estimate with some quantitative information on uncertainty. In section 3, we briefly describe the models and Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalysis data sets utilized in this evaluation study. In section 4, we illustrate and discuss the results of our model evaluation. Section 5 summarizes and draws conclusions.
2 Radiation Data Sources
 To evaluate model radiative fluxes and to help account for observational uncertainty, we use reflected shortwave and outgoing longwave radiative fluxes at the TOA from satellite measurements and computed downward shortwave flux at the surface from two different algorithms. They are the following.
1.Surface downward radiative flux data. The sources of surface downward radiation fluxes, referred to here as RSDS, are from EBAF-Surface and ISCCP-derived products.
2.EBAF-Surface. This surface flux radiation product is constrained by TOA CERES-derived flux with Energy Balanced and Filled (EBAF) adjustments [Kato et al., 2011, 2012a, 2012b] and is based on two CERES data products. Edition 3-lite SYN1deg-Month provides computed irradiances to be adjusted, and EBAF Ed2.6r [Loeb et al., 2009, 2012] provides the constraint. In addition, temperature and humidity profiles used in the computations are from the Goddard Earth Observing System (EOS) Data Assimilation System reanalysis (GEOS-4 and GEOS-5). MODIS-derived cloud properties [Minnis et al., 2011] are combined with geostationary satellite (GEO)–derived cloud properties [Minnis et al., 1994] to resolve the diurnal cycle used in the SYN1deg-Month flux computations. Note that unlike TOA irradiances, the global estimate of irradiance at the surface is only possible by using a radiative transfer model. The errors in cloud and atmospheric properties used as inputs therefore directly affect the accuracy and stability of modeled surface irradiances [Kato et al., 2012a]. In addition, model-computed TOA irradiances do not necessarily agree with CERES-derived observed TOA irradiances. Therefore, to mitigate these problems, computed fluxes in the EBAF-Surface product are constrained to be consistent with CERES-derived observed TOA fluxes to within their uncertainties. In the constraining process, CloudSat radar-derived and CALIPSO lidar-derived cloud vertical profiles [Kato et al., 2010] as well as Atmospheric Infrared Sounder (AIRS)–derived temperature and humidity profiles are used to determine the uncertainty in cloud and atmospheric properties. The fluxes from EBAF-Surface have been extensively compared to surface observations taken at a number of sites. Kato et al. [2012b] showed that the bias (root-mean-square difference) between computed and observed monthly mean irradiances calculated with 10 years of data is 4.7 (13.3) W m−2 for downward shortwave and −2.5 (7.1) W m−2 for downward longwave over ocean and −1.7 (7.8) W m−2 for downward shortwave and −1.0 (7.6) W m−2 for downward longwave irradiances over land. Both the downward shortwave and longwave flux differences are within the monthly modeled gridded flux uncertainty of 10 W m−2 for shortwave and 14 W m−2 for longwave estimated by Kato et al. [2012a]. In addition, these differences are also within the uncertainty of surface measurements at a given site; measurement uncertainty at a buoy reported by Colbo and Weller  is 5–6 W m−2 for daily or annual mean downward shortwave and 4 W m−2 for daily and annual mean downward longwave flux [cf. Waliser et al., 1999; Medovaya et al., 2002]. The data used in this study are the monthly mean collected from January 2000 to December 2010.
3.ISCCP-FD. These products are calculated from an advanced radiation scheme using ISCCP-D1 input data that include global observations of the key variables, including better treatment of ice clouds, revision of aerosol climatology, diurnal variance of surface skin and air temperatures, revision of the water vapor profiles, and refinement of the land surface albedos and emissivities [Zhang et al., 2004]. An 18 year flux record has been created at 3 h time steps at 280 km intervals globally based on comparisons of monthly and regional mean values to the ERBE and the CERES TOA fluxes and to BSRN surface fluxes. Zhang et al.  reported that the overall uncertainties are 10–20 W m−2 for shortwave fluxes. Comparisons to BSRN also suggest that biases in monthly shortwave fluxes are less than 5 W m−2. The time period of the data used in this study is from January 1984 to December 2004.
 The A-Train fluxes derived from EBAF-Surface are especially noteworthy because unlike the other estimates such as ISCCP-FD, the flux values quoted are based on actual cloud profile observations, notably including the critical new information about cloud base derived from CloudSat and CALIPSO [Mace et al., 2009]. In this study, the RSDS flux product used as a reference for direct model evaluation is EBAF-Surface, which has been adjusted and constrained with CloudSat and CALIPSO cloud-derived vertical profiles, AIRS-derived temperature/humidity profiles, and CERES-derived TOA fluxes to within their uncertainties. The ISCCP-FD values are used for an additional measure of observational uncertainty for the model-observation comparisons.
4.Radiative longwave upward at TOA (RLUT) and radiative shortwave wave upward at TOA (RSUT) fluxes. Emitted longwave and reflected solar radiation at the TOA, referred to in this study as RLUT and RSUT, respectively, are from the CERES EBAF product (CERES_EBAF-TOA_Ed2.6r) [Loeb et al., 2012, 2009]. The CERES EBAF product includes the latest instrument calibration improvements, algorithm enhancements, and other updates. CERES TOA SW and LW fluxes in the EBAF product are adjusted within their range of uncertainty to remove the inconsistency between average global net TOA flux and heat storage in the Earth–atmosphere system, as determined primarily from ocean heat content anomaly (OHCA) data (see Supplementary Information in Loeb et al.  for more details). The data used in this study is the monthly mean collected from January 2000 to December 2010.
 In this study, the RLUT and RSUT fluxes used for direct model evaluation are CERES EBAF, which is directly measured and adjusted/balanced with global energy.
 Figure 1 shows the annual mean maps of RSDS estimated from EBAF-Surface (1) and ISCCP (2) and of the difference between ISCCP and EBAF-Surface (3). ISCCP RSDS is highly biased relative to EBAF-Surface over most of the globe (global average 2.6 W m−2; Figure 1, model 3), except for few subtropical regions, where stratocumulus and trade wind shallow cumulus clouds dominate, and it is lowly biased. Figure 2 shows the annual mean maps of RLUT (W m−2) from CERES EBAF. Figure 3 shows the annual mean map of RSUT from CERES EBAF. For more details on the observed radiation reference data sets and their uncertainties, readers are referred to Kato et al [2012a, 2012b] for RSDS as well as to Loeb et al.  for RSUT and RLUT.
3 Modeled Values of Radiative Fluxes
 Using the observations described in the previous section, we evaluate RSDS, RLUT, and RSUT in coupled atmosphere-ocean GCMs (CGCMs) from CMIP3, from CMIP5, and one additional GCM, the NASA GEOS5 AGCM. The CMIP5 simulations are listed in Table 1. Table 2 is an outline of the SW/LW radiation parameterizations used in the selected CMIP5 models and the GEOS5 model. While the GEOS5 model has land interactions and complex interactions within the atmosphere (clouds, boundary layer, dynamics, etc.), the CMIP5 models, in addition, have ocean interaction. Thus, the performance of their radiation simulations is not solely determined by the radiation schemes used but rather through complex interactions arising from a fully coupled system. Therefore, the behavior is not likely to be simply explained by any single component or scheme, but rather by details of the model's specific schemes and the coupling among schemes as well as the interactions with sea surface temperatures (SSTs) [e.g., Donner et al., 2011; Ma et al., 2012; Gettleman et al., 2010; Li et al., 2012; Randall et al., 2007; Webb et al., 2001]. The specific experimental scenario in CMIP5 is the historical twentieth century simulation, which used observed twentieth century greenhouse gas, ozone, aerosol, and solar forcing. The time period used for the long-term mean is 1970–2005, and if a model provided an ensemble of simulations, only one of them was chosen for this evaluation. For both the GCM and observational data sets, all fields have been regridded and mapped onto common 2° × 2° latitude-by-longitude grids.
Table 1. Model Label, Resolution, Institution, and Model Name for the CMIP5 GCMs Examined in This Study (See Section 3 for More Details)
64 × 128 × 26
Beijing Climate Center, China/BCC-CSM1-1
64 × 128 × 26
Beijing Climate Center, China/BCC-CSM1-1_esm
64 × 128 × 35
Canadian Centre for Climate Modelling and Analysis, Canada/CanESM2
288 × 192 × 26
National Center for Atmospheric Research, USA/CCSM4
128 × 256 × 17
Centre National de Recherches Meteorologiques, France/CNRM-CM5
96 × 192 × 18
Australian Commonwealth Scientific and Industrial Research Organization, Australia/CSIRO-Mk3-6-0
90 × 144 × 29
NASA/Goddard Institute for Space Studies, USA/GISS-E2-H
90 × 144 × 29
NASA/Goddard Institute for Space Studies, USA/GISS-E2-R
145 × 192 × 38
Hadley Centre for Climate Prediction and Research/Met Office, UK/HadGEM2-ES
120 × 180 × 21
Institute for Numerical Mathematics, Russian/Inmcm4
120 × 180 × 21
Institute for Numerical Mathematics, Russian/Inmcm4_esm
96 × 96 × 39
Institute Pierre Simon Laplace, France/IPSL-CM5A-LR
64 × 128 × 80
University of Tokyo, NIES, and JAMSTEC, Japan/MIROC-ESM-CHEM
320 × 640 × 56
University of Tokyo, NIES, and JAMSTEC, Japan/MIROC4h
128 × 256 × 40
University of Tokyo, NIES, and JAMSTEC, Japan/MIROC5
160 × 320 × 35
Meteorological Research Institute, Japan/MRI-CGCM3
96 × 144 × 26
Norwegian Climate Centre, Norway/NorESM1-M
96 × 192 × 25
Max Planck Institute for Meteorology/MPI-ESM-LR
90 × 144 × 40
192 × 288 × 30
Table 2. Outline of Shortwave and Longwave Radiation Representations Used in the Selected CMIP5 Models, as Well as GEOS5 AGCM and MERRA
Prognostic Cloud Variables
Shortwave Radiation Scheme
Longwave Radiation Scheme
Single mixing ratio of cloud condensate
Absorption due to water vapor, O3, O2, CO2, clouds, and aerosols. Interactions among the absorption and scattering by clouds, aerosols, molecules (Rayleigh scattering), and the surface are fully taken into account. Fluxes are integrated virtually over the entire spectrum, from 0.175 to 10 µm [Chou and Suarez, 1999].
This includes the absorption due to major gaseous absorption (water vapor, CO2, O3) and most of the minor trace gases (N2O, CH4, and CFCs), as well as clouds and aerosols with optical properties specified as input parameters.
Collins et al., ; Rashe and Rasch and Kristjánsson, ; Sundqvist, 
Mixing ratio of cloud liquid and ice
A two-stream solver, along with a delta-Eddington approximation, is used for calculations of radiative transfer in the atmosphere, which leads to a numerically efficient approach that has linear dependence on the number of vertical levels.
A two-stream solver is also used along with a methodology to efficiently account for the scattering by cloud and aerosol particles [Li, 2002].
Single mixing ratio of total water
Delta-Eddington with exponential-sum fit representation of near-IR, truncated ICA (TICA) for all-sky
RRTMG absorptivity-emissivity method for clear skies, plus truncated independent column approximation (TICA) for all-sky
From Fouquart and Bonnel , integrates the fluxes over the whole shortwave spectrum between 0.2 and 4 mm. The scheme includes Rayleigh scattering, absorption by water vapor and ozone, both varying in space and time, and by CO2, N2O, CO, CH4, and O2, which are treated as uniformly mixed gases.
Rapid Radiation Transfer Model (RRTM) [Mlawer et al., 1997] included in the IFS ECMWF model. The radiativetransfer equation is solved by a two-stream method. The RRTM scheme computes fluxes in the spectral range encompassing the 10–3000 cm−1 band. The computation is organized in 16 spectral bands and includes line absorption by H2O, CO2, O3, CH4, N2O, CFC-11, CFC-12, and aerosols.
Mixing ratio of cloud liquid and ice, diagnostic falling snow
The shortwave radiation scheme is a two-stream code with 12 bands and 24k terms [Grant and Grossman, 1998; Grant et al., 1999].
The longwave scheme has 10 bands—a combination of k distribution and precomputed transmittances [Grant et al., 1999].
Single mixing ratio of total water, diagnostic falling snow
Single mixing ratio of total water, diagnostic falling snow
Cloud ice water content (both cloud ice and snow);
Mixing ratio of cloud liquid and ice
Mixing ratio of cloud liquid and ice
Single mixing ratio of total water
MIROC (MIROC-ESM; MIROC-ESM-CHEM)
Mixing ratio of cloud liquid and ice
Mixing ratio of cloud liquid and ice
Mixing ratio of cloud liquid and ice
Mixing ratio of cloud liquid and ice
Single mixing ratio of total water
Mixing ratio of cloud liquid and ice
Mixing ratio of cloud liquid and ice
Number concentrations (Nc, Ni) and mixing ratios (qc, qi) of cloud droplets (subscript c) and cloud ice (subscript i). Diagnostic two-moment treatment of rain and snow
 There are 20 CMIP5 models used in this study, which are listed in Tables 1 and 2. Among them, 16 are conventional models, i.e., no radiative convective core cloud mass and falling hydrometeors included [Li et al., 2012]. The GFDL-CM3, CESM1-CAM5, and two GISS models, which include radiative falling cloud hydrometeors, are not considered in the multimodel mean (Figure 4, model 17). Figure 4 shows the long-term annual mean spatial distributions of simulated bias values of RSDS with EBAF-Surface as a reference from 16 conventional CMIP5 CGCMs (see Table 1, models 1–16), with the multimodel ensemble mean bias from these 16 CMIP5 models (17), three CGCMs which include precipitation-radiation interactions (18–21), and the GEOS5 atmosphere-only model (22). The magnitudes of the RSDS bias can be quite different regionally and across models, ranging from −25 to more than 30 W m−2. Except for CNRM (4), NorESM (8), MIROC (10), MIROC5 (14), and CCSM4 (16), most of the models and the CMIP5 multimodel mean bias values show overestimated RSDS fluxes in strongly convective regions of the tropics (e.g., ITCZ and South Pacific Convergence Zone (SPCZ), Indian Monsoon, and warm pool), in the storm tracks and the Southern Ocean, and over most of landmasses, especially the convectively active continental areas such as central Africa, South America, and the Antarctic. A number of models, i.e., IPSL (1), Inmcm4 (2), Inmcm4ESM (3), HadGEM (11), MRI (15), and GISS-E2H (18), and GISS-E2R (19), significantly overestimate RSDS globally (up to 30 W m−2), except the subtropical regions over the oceans. On the other hand, the two BCC models (6, 7) and NorESM (8) underestimate RSDS globally, except the convective regions over the ITCZ and tropical landmasses. The GEOS5 AGCM (22) significantly underestimates RSDS (by more than −25 W m−2) in the warm pool and the Indian Ocean. While the above model-observation differences in RSDS bias are substantial in many regards, it is worth noting that the CMIP5 multiensemble RSDS bias values (17) appear to exhibit improvement over the CMIP3 ensemble models (Figure A1, model 13); this will be discussed and quantified further below. The two GISS models (19) and (20) and the GFDL-CM3 model (21), as well as NCAR CESM1-CAM5 (21), include the impacts from either diagnostic precipitating grid mean mass or convective core mass on radiation. The GFDL models further include shallow cumulus, deep cumulus cells, and convective mesoscale clouds, weighted by their respective area fractions for radiation calculation. However, in the GFDL-CM3 model, precipitating ice that has fallen out of large-scale stratiform clouds and into clear areas is not included [Donner et al., 2011]. The GFDL-CM3 model exhibits relatively good RSDS in the tropical ITCZ and convectively active continental regions but still underestimates RSDS in the warm pool and continental regions. The NCAR CESM1-CAM5 model (21), on the other hand, includes diagnostic radiative snow, shows similar patterns to that in CCSM4 (16) but with less bias over the Southern Ocean, and still underestimates RSDS in the warm pool and continental regions.
 To summarize the multimodel performance of CMIP3 and CMIP5 models in representing the time-mean pattern of RSDS, Figure 5 illustrates the multimodel mean biases (top) and the multimodel mean standard deviation of the error, which is defined as a root-mean-square error with the mean bias removed (hereafter SDE) (bottom) against the observed estimate, which is calculated across the models for each of the ensembles. Overall, CMIP3 RSDS (Figure 5a) shows low biases globally and more uniformly, except the ITCZ, off the coast of the Peru/California regions and over Indian Monsoon regions. While the CMIP5 global area average bias (Figure 5b) is reduced (~30%) to the value of 2.5 W m−2 from the value of −6.9 W m−2 in CMIP3, it exhibits more distinct spatial gradients with greater local extreme biases and a higher bias over most land regions. While the bias figure emphasizes the sign and magnitude of systematic biases across the two model archives, the SDE figure emphasizes errors irrespective of the sign. The similar pattern of the SDE between the two model ensembles indicates that CMIP3 and CMIP5 share many of the same systematic errors in simulating radiation fluxes. The fact that the magnitude of the SDE is about the same between CMIP3 and CMIP5 indicates that little improvement from CMIP3 has been afforded. The high SDE values in the equatorial regions of the Pacific and Atlantic (Figures 5c and 5d) combined with the low bias in the same regions (Figures 5a and 5b) indicate that significant disparity in the manner models represent surface radiation in these regions. This is similarly true over the mountainous regions of Asia and South America and, to a lesser extent, over the storm tracks.
 To further quantify and synthesize the comparative information discussed above, we use a Taylor diagram [Taylor, 2001] with the CERES EBAF-Surface in Figure 1, model 1, used as the reference data set and the ISCCP-derived RSDS estimates used to help quantify observational uncertainty. The Taylor diagram used in this study relates two statistical measures of model fidelity: the spatial correlation and the spatial standard deviations [Taylor, 2001]. These statistics are calculated for the long-term time mean and over the global domain (area weighted). The reference data set is plotted along the x axis at the value of 1.0. The radial distance from the origin is proportional to the ratio of the standard deviations of the given data set relative to the reference data set. The azimuthal angle represents the spatial correlation between the given data set and the reference data set. The ratio of the standard deviation exhibits the relative amplitude of the simulated and the “reference” variations, whereas the correlation indicates the degree of similarity of variation between the two.
 Figure 6a shows the Taylor diagrams for CMIP3 and CMIP5 annual mean RSDS with the same information shown as bar charts in Figure 6b. Not surprisingly, the ISCCP (23) observed estimate shows the best agreement with the EBAF-Surface values, having a correlation of about 0.98 or better and standard deviation ratios close to 1.0. For the CMIP values, most of them have very good correlations between about 0.91 and 0.97 with standard deviation ratios between 0.9 (MIROC5) to 1.3 (IPSL). Note that none of the individual CMIP5 models fall within the range of the observational uncertainty, represented here by the ISCCP (23) value, although the CMIP3 and CMIP5 multimodel means are very close. Also notable and consistent with the discussion above, there is very little difference in the overall performance of the CMIP3 and CMIP5 multimodel mean values of RSDS. In regard to specific models, the best performing CMIP5 model by this metric is HadGEM (11), although most of the models fare relatively well according to Figure 6. One significant outlier is IPSL (1) with a standard deviation ratio of about 1.3 and a somewhat poor correlation value, relative to the others, of 0.93.
 Next, we examine RLUT. Figure 7 shows the long-term annual mean spatial distributions of bias values of RLUT against CERES EBAF (Figure 2, model 1) from 16 CMIP5 CGCMs (see Table 1, models 1–16), with the multimodel ensemble mean bias from the 16 CMIP5 models (17), and GEOS5 (22). The relative magnitudes of bias can be quite different across models and regionally, ranging from −25 to about 30 W m−2. Similar to RSDS, most of the models and the multimodel mean bias CMIP5 (17) values show overestimated RLUT in many of the strongly convective region of the tropics such as the warm pool, ITCZ/SPCZ, and convectively active continental regions. Inmcm4 (2), Inmcm4ESM (3), CNRM (4), MRI (15), and CCSM4 (16) overestimate RLUT over southern middle/high latitudes and Southern and Antarctic Oceans. The GEOS5 AGCM model (22) underestimates (more than 25 W m−2) RLUT in the subsiding regions of the tropical Pacific and Indian Oceans, while it overestimates RLUT in much of the Southern and Antarctic Oceans. MERRA (22) exhibits an overall high RLUT bias, except in the warm pool region. The ensemble mean (17) of the CMIP5 RLUT bias values appears to exhibit some degree of improvement compared to the ensemble mean of CMIP3 models (Figure A2); this will be discussed and quantified further below. The GFDL CGCM model (20) again does relatively well over most of the globe compared to the other models. It exhibits a relatively good spatial distribution of RLUT in the tropical ITCZ and convectively active continental regions but moderately underestimates RLUT in most of the oceans compared to the observed RLUT. The CESM1-CAM5 model (21) exhibits the best performance in representing RLUT over most of the globe compared to the other models with bias within −5 to 5 W m−2. It exhibits a relatively good spatial distribution of RLUT in the Southern Ocean and convectively active continental regions but moderately overestimates RLUT in the warm pool to the observed RLUT. Note that most of the CMIP5 models simulate cloud top heights that are highly biased compared to the cloud top height values estimate from CloudSat/CALIPSO IWC, suggesting that the bias to the RLUT is not from the underestimation of cloud top height.
 To summarize the multimodel performance of CMIP3 and CMIP5 in representing the time-mean pattern of RLUT, Figure 8 illustrates the multimodel mean bias (top) and the multimodel mean SDE (bottom) against the observed estimate. Interestingly, both CMIP3 (Figure 8a) and CMIP5 (Figure 8b) exhibit very similar RLUT bias patterns and magnitudes. Notable is the consistency of having a high bias over the ITCZ, the SPCZ, a part of the Southern Ocean and tropical continents, and the Indian Monsoon regions. The CMIP5 global area average RLUT bias value of −1.9 W m−2 is about a factor of 2 smaller compared to the CMIP3 value of −3.9 W m−2. The SDE figure indicates a similar pattern of systematic errors in the tropics, with no substantial change in the global mean SDE, CMIP3 (9.8 W m−2), and CMIP5 (8.9 W m−2). The bias and SDE together indicate that the models are performing relatively well in the middle and high latitudes but exhibit significant shortcomings in the tropics. The presence of bias and SDE errors suggests that some aspects of the errors are arising from the same cause (i.e., bias) and some aspects result from different causes (i.e., SDE). Given that all but one of these models are coupled, it is possible that some part of the SDE errors could result from spatial variations in the location of the ITCZ and the associated cloud structure which have a substantive impact on the RLUT field.
 Figure 9a shows a Taylor diagram for CMIP3 and CMIP5 annual mean RLUT, while the same information is shown as bar charts in Figure 9b. The reference value used for this Taylor diagram is the observation estimate from CERES EBAF shown in Figure 2, model 1. The CMIP5 models generally have correlations of 0.9 or better and standard deviation ratios ranging from 0.8 to 1.15. The two GISS models and IPSL represent the greatest outliers within the group. The CESM1-CAM5 model (21) exhibits the best correlations (0.97) and the best standard deviation ratio (0.99) among the models. Similar to RSDS, the CMIP5 (18) multimodel mean does not exhibit any significant improvement over the CMIP3 mean (17) in representing RLUT.
 Finally, we examine RSUT. Figure 10 shows the long-term annual mean spatial distributions of simulated bias values of RSUT from 16 CMIP5 CGCMs (see Table 1, models 1–16), with the multimodel ensemble mean bias from the 16 CMIP5 models (17), three CGCMs which include precipitation-radiation interactions (18–21), and the GEOS5 atmosphere-only model (22). Apparent is that most of the CMIP5 models tend to reflect too much shortwave radiation to space in the tropics and subtropics and too little in the middle to high latitudes. This indicates either clouds that occur too frequently or clouds that are overly reflective in terms of cloud cover and/or microphysical characteristics. Notable, however, is that even within this overall high bias in the tropics, there is a systematic underestimation of RSUT in the strongly convective region of the tropics such as the ITCZ/SPCZ, the Indian Monsoon, and over the land surfaces. For example, the relative minimum in the ITCZ region is evident in the multimodel mean and in most individual models, except, for example, for CanESM and MIROC5. The CMIP5 models also tend to be lowly biased over many of the subtropical stratocumulus regimes (e.g., northeastern and southeastern Pacific against North America and South America). Both of these latter features imply too few clouds or insufficiently reflecting clouds in these regions. Similar to RSDS and RLUT, the ensemble mean performance of the CMIP5 RSUT (16) bias patterns appears to be very similar compared to the ensemble mean CMIP3 models (Figure A3, model 13). Note that despite GEOS5 having the advantage of specified SSTs and that the GFDL model (20) and the two GISS models (17, 18) consider total LWC/IWC in the radiation calculations, none of them exhibit much of an improvement relative to the other CGCMs examined here, while the CESM1-CAM5 model (21) shows overestimation of RSUT in subtropical and tropical oceans with underestimation of RSUT elsewhere.
 Figure 11 illustrates the multimodel mean biases (top) and the SDE (bottom) against the observed estimate. Similar to RSDS and RLUT, the RSUT SDE and bias in both CMIP3 and CMIP5 exhibit very similar patterns, and there is a clear systematic underestimation over the ITCZ, the SPCZ, and a part of the Southern Ocean and tropical continents, and the Indian Monsoon regions. While the CMIP5 global area average RSUT bias is about a factor of 2 smaller (2.5 W m−2) compared to the CMIP3 value of 4.5 W m−2, they share very similar pattern distributions in bias and SDE. The SDE figure indicates a slight improvement of CMIP5 (14.1 W m−2) over CMIP3 (14.7 W m−2).
 A Taylor diagram for CMIP3 and CMIP5 annual mean RSUT is shown in Figure 12a, while the same information is shown as bar charts in Figure 12b. Of the three quantities shown, i.e., RSDS, RLUT, and RSUT, it is clear that the overall performance for RSUT is considerably worse in regard to the Taylor diagram metrics (i.e., compare to Figures 6 and 9). For the CMIP5 values, most of them have poor correlations between about 0.4 and 0.8 with standard deviation ratios ranging from 0.8 to 1.1. The CMIP3 (16) and CMIP5 (17) multimodel means do not show much difference in representing RSUT. In regard to specific models, the best performing CMIP5 model by this metric is HadGEM (11) and a group of models, i.e., CanESM2 (5), BCC (6), BCCesm (7), and NorESM (8), as the next best performers with standard deviation ratios ranging from 0.9 to 1.0. The GEOS5 model (22) performs poorly relative to the others in this group with the correlation of 0.65 and the ratio of 1.1.
5 Summary and Discussion
 The objective of this study is to evaluate the representation of surface and TOA fluxes of atmospheric radiation in GCMs, namely, CMIP GCMs, with the focus on the fluxes most strongly influenced by clouds (i.e., RSDS, RSUT, and RLUT). Apart from a general assessment of the fidelity of the models’ representation of radiation, we seek to relate the impacts of ignoring the radiation interaction with precipitating and convective clouds, a common practice in most of the CGCMs contributing to CMIP3 and CMIP5 (see cloud liquid and ice evaluations in Li et al. ). Observational reference values and their uncertainties for RSDS are addressed by using two different estimates, each based on satellite plus model retrieval approaches. These include the EBAF-Surface and ISCCP products [see section 2]. For the TOA radiative fluxes of RLUT and RSUT, the CERES EBAF–based estimates are used. In this case, even though both CERES RSDS and ISCCP RSDS are estimates derived from the model, only the CERES data are used; the ISCCP data are used only to give a sense of observational uncertainty but not directly for model-data comparison (e.g., to produce difference maps). The models evaluated (see section 3 and Tables 1 and 2) include 20 simulations of present-day climate available to date in CMIP5 and one GCM of interest (GEOS-5 AGCM). The evaluation of RLUT also includes one modern reanalysis (MERRA). In addition, we include comparative evaluations of 12 of the GCMs from CMIP3, allowing for an assessment of whether the GCM fidelity, in regard to RSDS, RLUT, and RSUT, has improved between CMIP3 and CMIP5.
 Overall, there is a fairly wide disparity in the fidelity of RSDS, RLUT, and RSUT representations in the models examined. Even for the annual mean bias maps considered, there are local biases easily as high (low) as 30 (−30) W m−2. Further, as a basis of comparison between the three quantities, it is worth noting the significantly poorer and more varied fidelity measures for the RSUT compared to RSDS and RLUT in the Taylor diagrams (Figures 6a, 9a, and 12a). Considering individual models, some models examined here perform rather well in regard to the Taylor diagram metrics for RSDS, RLUT, and RSUT; these include, for example, HadGEM (RSDS, see Figures 4 and 6; RSUT, see Figures 10 and 12) and MIROC5 (RLUT, see Figures 7 and 9). HadGEM RSDS/RSUT has the best correlations (0.97/0.82) and the second best standard deviation ratio of RSDS (1.02) and RSUT with a ratio above 0.9. Four models, on the other hand, perform particularly poorly in terms of having either a very large standard deviation ratio and/or a small correlation, showing their poor representations of the spatial variability relative to the observation reference data (i.e., MIROC, IPSL, INMCM4, and INMCM4-ESM). In comparison, the GEOS5 GCM model was found to perform relatively poorly in comparison to most of the CMIP5 models, for RSDS (Figure 6a), RLUT (Figures 9a), and RSUT (Figure 12a). This is despite the fact that GEOS5 simulation utilized observed SSTs, while the CMIP5 models were coupled ocean-atmosphere models.
 One topic of interest in this study is to compare the overall performance of CMIP3 to that of CMIP5. Based on a number of diagnostics, there has been a small degree of improvement in the representation of RSDS, RLUT, and RSDS from CMIP3 to CMIP5. This is demonstrated, in terms of global mean, by the reduction of RSDS (by about 30%), RLUT (about 50%), and RSUT (about 40%) in the multimodel mean bias. In particular, the multimodel mean bias of RSDS has been reduced from CMIP3 (−6.9 W m−2) to CMIP5 (~2.5 W m−2). However, this is mostly due to the positive biases over land that became larger, while the negative biases over ocean remained about the same. In addition, an indication of overall improvement in the representation of the quantities studied from CMIP3 to CMIP5 is not evident when considering the SDE computed across the models (i.e., Figures 5, 8, and 11) or in regard to the Taylor diagram metrics (Figures 6, 9, and 12).
 Persistent and systematic spatial pattern biases across most of the models with the multimodel ensemble mean values are underestimated in RSUT and overestimated in RSDS and RLUT in the convectively active regions of the tropics (i.e., ITCZ/SPCZ, warm pool, Indian Monsoon, as well as South America and central Africa) (i.e., Figures 5, 8, and 11). Given that a number of these RSDS, RLUT, and RSUT biases occur in conjunction with heavy precipitation and with biases in cloud liquid and ice water biases [Li et al., 2012], we hypothesize that at least a part of these persistent radiation biases stem from GCMs ignoring the effects of precipitating and/or convective core ice and liquid in their radiation calculations [e.g., Waliser et al., 2011].
 Given that viable observed estimates of TOA radiation fields and observation-driven modeled values at the surface have been available for many years, and yet the biases are still sizable, suggests challenges to utilizing the observations by the modeling groups or that there are still too many degrees of freedom unconstrained (e.g., cloud cover, cloud mass, particle size, vertical structure, and particle shape). There is certainly evidence of this in regard to cloud liquid and ice content [e.g., Waliser et al., 2009; Li et al., 2012]. In addition, GCMs have typically tuned their radiation and cloud fields to the observations, which naturally are sensitive to all/most hydrometeors in the atmosphere despite the fact that most of the models typically only represent the suspended hydrometeors associated with clouds, and usually, this does not include ice and liquid in convective cores. Thus, contributions from falling/precipitating hydrometeors are unaccounted for and/or erroneously accounted for by other processes such as interaction with radiation calculation and hydrological cycle.
 In order to illustrate the possible shortcomings of GCMs ignoring the cloud mass associated with precipitating hydrometeors (e.g., snow and rain) and convective core ice and liquid mass, Figures 13a and 13b present CMIP3 and CMIP5 multimodel mean biases, respectively, of total ice water path (TIWP = cloud + convective core + precipitating), where the observed cloud ice estimate is from the ensemble mean of three total ice water path observed estimates from the standard CloudSat, a version of DARDAR [Delanoë and Hogan, 2008, 2010], and a version of 2C-ICE [Deng et al., 2013] satellite products. For details, see Li et al. . Given that most models do not represent the convective core or precipitating ice, the strong negative bias shown in Figure 13 in the tropics is not surprising, yet it is important to be highlighted, given that this missing water mass will have some interaction with radiation.
 The modeled ice water path (IWP) values include only the contributions of suspended clouds, as that is all they typically represent, with no contribution from convective cores or precipitation. The observed IWP values are based on an ensemble of CloudSat + CALIPSO estimates [Li et al., 2012], which do include contributions from precipitation and all clouds, including convective cores. It shows that both CMIP3 and CMIP5 models significantly underestimate ice mass over the ITCZ, the SPCZ, a part of the Southern Ocean and tropical continents, and the Indian Monsoon regions. Thus, the models are trying to achieve radiative balance without representing all the ice mass in the atmosphere [e.g., Waliser et al., 2011]. Figure 14 is for the multimodel mean cloud liquid water path (CLWP) bias against the Advanced Microwave Scanning Radiometer (AMSR-E) liquid water path with all conditions (TLWP = no precipitating + convective + precipitating) for the CMIP3 GCMs (Figure 14a) and the CMIP5 multimodel mean bias (Figure 14b). Figure 14 shows that similar underestimates of liquid water occur in the same regions [see Li et al., 2012] and exacerbates the challenge of obtaining an accurate representation of the radiation fields.
 The combined results of Figures 13 and 14 suggest that the total (liquid and ice) cloud mass are underestimated over the ITCZ, the SPCZ, a part of the Southern Ocean and tropical continents, and the Indian Monsoon regions, which also tend to be all high-precipitation regions. In addition, Figures 13 and 14 suggest the IWP and LWP biases against the observed total IWP/LWP are both significantly worse in CMIP5 models compared to CMIP3 models (in the multimodel means), yet the radiative fields are slightly improved overall in terms of global area average shown in Figure 5. It exhibits more distinct spatial gradients with greater local extreme biases and a higher bias over most land regions in the CMIP5 RSDS bias (Figure 5b) than in the CMIP3 RSDS bias (Figure 5a).
 In order to illustrate the effect of the underestimated cloud mass due to excluding the cloud mass from precipitation and convective core in the conventional GCMs, we draw a conceptual sketch of cloud-precipitation-radiation interactions for the real-world GCMs versus those for the conventional GCMs in Figure 15. The figure shows that the underestimated values of cloud mass in the CMIP3 and CMIP5 multimodel means might directly, and in part, lead to the overestimations of RSDS (Figure 5) and RLUT (Figure 8) and underestimation of RSUT (Figure 11) across the models and the multimodel mean in the heavily precipitating regions. This conjecture is supported in Figure 16, which shows that the latitude of the maximum zonal mean precipitation (180° to 360°; 0°N to 15°N) is strongly correlated with the latitude of the maximum/minimum multimodel mean bias of RSDS, RLUT, and RSUT of the CMIP5 models. In other words, the RSDS, RLUT, and RSUT biases in CMIP3 and CMIP5 might be attributed to ignoring the radiative impacts from the precipitating clouds and convective core water mass. As cloud-climate feedback will undoubtedly represent a key uncertainty in the next Intergovernmental Panel on Climate Change assessment report, it is essential that cloud and radiation observations be utilized to their full extent and in concert to provide more complete constraints and that clouds, convection, precipitation, and radiation be treated in a consistent manner, as shown in Figure 15 (left).
 One GFDL GCM (CM3) and two GISS models as well as CESM1-CAM5 technically stand apart from the other GCMs as they attempt to model atmospheric ice/liquid water associated with convective clouds and precipitation and include their effects on radiation. The evaluations of RSDS (Figure 6a), RLUT (Figure 9a), and RSUT (Figure 12a) show that while NCAR CESM1-CAM5 has the best RLUT (Figures 9-11) among the models, GFDL CM3 slightly outperforms the other GCMs. The two GISS models do not illustrate a better performance than the other GCMs in terms of all the three radiative fluxes. The abilities of the CESM1-CAM5 and GFDL models to represent and output cloudy, precipitating ice, and convective profiles in GFDL-CM3 and to perform a more realistic radiation simulation, combined with the observational capabilities to roughly distinguish these types of ice mass, provide an additional means for constraining the model physics. Note that radiation biases from RSDS, RLUT, and RSUT in the two GISS models result from a substantial positive bias in CIWP/CIWC in the tropics. This positive bias in the tropics arises due to a compensation for a negative bias in clouds and CIWP/CIWC in the extratropics in conjunction with the need to have a globally averaged radiation budget consistent with observations [Li et al., 2012].
 While it is beyond the scope of this study to probe the causes of the model-to-model differences and model-to-observation biases in radiation, based on the results, we hypothesize that the lack of an explicit representation of the cloudy, precipitating, and convective core components of the ice (and liquid) mass might play an important role for the biases in RSDS, RLUT, and RSUT. Our recent study [Waliser et al., 2011] has shown that ignoring radiative effects of the precipitating components of the ice mass can result in nontrivial biases in the shortwave and longwave radiation budgets at the surface and top of atmosphere and even more significant impacts on the vertical radiative heating profile. While more work needs to be pursued in this area, there is a strong suggestion from these studies that GCMs should strive to explicitly represent a broader range of ice and liquid hydrometeors, namely, the larger falling hydrometeors (rain, snow) as well as convective core mass, and include their effects in the radiative heating calculations which, for the moment, are largely ignored. Moreover, the evaluation results of this study show that the radiation balance in the CMIP class of GCMs is still underconstrained and, in many cases, is likely to have been achieved in unrealistic ways.
 Taken together, these points indicate the need for additional observational resources to adequately characterize and constrain cloud-precipitation-radiation interactions. Some potentially useful observational resources are a multichannel radar/lidar measurement to characterize the profile and spectrum of cloud and precipitation particle sizes, as well as a Doppler radar capability to provide information on cloud and precipitation dynamics. In addition, satellite observations are affected by spatiotemporal sampling, instrument sensitivity, and retrieval assumptions. Simulators are one method available to emulate these idiosyncrasies within a climate model and thus can be an invaluable tool for robust evaluation of model-simulated clouds. In the future, we plan to integrate these methodologies into our evaluation studies. These additional observational resources, used in conjunction with systematic model experimentation practices, will likely be a constructive strategy for improving the cloud-precipitation-radiation interactions alluded to above.
 We would like to thank the Editor and two reviewers for giving very insightful and helpful comments and suggestions. We thank Prof. M-D Chou/NCU and Dr. W-L Lee/RCEC-Academia Sinica for useful comments; Dr. Anthony Del Genio/NASA GISS, Dr. Ken Lo/NASA GISS, Dr. Voldoir Aurore/CNRM, Dr. Masahiro Watanabe and Dr. Shingo Watanabe/MIROC, Dr. Leon Rotstayn/CSIRO, Dr. Knut von Salzen/CCCma, Dr. Gary Strand/NCAR, Dr. Alf Kirkevag/NCC, Dr. Seiji Yukimoto/MRI, Dufresne Jean-Louis/IPSL, Dr. Tongwen Wu/BCC-CMA, and many other colleagues from climate modeling centers for providing model information. Thanks also to Prof. W-T Anne Chen/NTU when at JPL and Gregory Huey/JPL with data. The contributions by DEW and JLL to this study were carried out on behalf of the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The contribution of Hsi-Yen Ma to this work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.