Corresponding author: J.-L. F. Li, Jet Propulsion Laboratory, California Institute of Technology, MS 233-306K, 4800 Oak Grove Dr., Pasadena, CA 91109, USA. (email@example.com)
 We perform an observationally based evaluation of the cloud ice water content (CIWC) and path (CIWP) of present-day GCMs, notably 20th century CMIP5 simulations, and compare these results to CMIP3 and two recent reanalyses. We use three different CloudSat + CALIPSO ice water products and two methods to remove the contribution from the convective core ice mass and/or precipitating cloud hydrometeors with variable sizes and falling speeds so that a robust observational estimate can be obtained for model evaluations. The results show that for annual mean CIWP, there are factors of 2–10 in the differences between observations and models for a majority of the GCMs and for a number of regions. However, there are a number of CMIP5 models, including CNRM-CM5, MRI, CCSM4 and CanESM2, as well as the UCLA CGCM, that perform well compared to our past evaluations. Systematic biases in CIWC vertical structure occur below the mid-troposphere where the models overestimate CIWC, with this bias arising mostly from the extratropics. The tropics are marked by model differences in the level of maximum CIWC (∼250–550 hPa). Based on a number of metrics, the ensemble behavior of CMIP5 has improved considerably relative to CMIP3, although neither the CMIP5 ensemble mean nor any individual model performs particularly well, and there are still a number of models that exhibit very large biases despite the availability of relevant observations. The implications of these results on model representations of the Earth radiation balance are discussed, along with caveats and uncertainties associated with the observational estimates, model and observation representations of the precipitating and cloudy ice components, relevant physical processes and parameterizations.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Representing clouds and cloud climate feedback in global climate models (GCMs) remains a pressing challenge to reduce and quantify uncertainties associated with climate change projections. Until recently, useful (global) constraints for developing and evaluating clouds in GCMs were derived from radiation budget observations from Earth Radiation Budget Experiment/Clouds and the Earth's Radiant Energy System (ERBE and CERES) [Wielicki et al., 1996], cloud cover observations from International Satellite Cloud Climatology Project (ISCCP) and related products [Han et al., 1999; Rossow and Schiffer, 1999], and other indirect observations such as precipitation observations [Xie and Arkin, 1997; Adler et al., 2003]. A key element of obtaining an accurate top of the atmosphere (TOA) radiation budget is the representation of clouds, which for GCMs and earth radiation budget considerations can be roughly broken down into realistic cloud cover, cloud water mass, cloud optical thickness, and cloud particle sizes. In general, the observed TOA radiative fluxes (e.g., CERES, ERBE) are used to constrain GCMs by selecting the coefficients in convective cloud, stratiform cloud fraction parameterizations. In addition, in a coupled atmosphere-ocean GCM, it is important to consider radiative effects/budgets emphasis on implied ocean heat transports and the global and regional patterns of TOA OLR and absorbed shortwave, with some emphasis on shortwave cloud forcing and longwave cloud forcing. In most cases, models such as in Coupled Model Intercomparison Project Phase 3 (CMIP3) tend to underestimate cloud fraction when compared which is compensated by optically thick clouds. While ISCCP and other products have provided some information and guidance on cloud cover, there has been significant flexibility in tuning the latter two quantities due to the lack of observations for cloud water mass and particle size. This is especially the case for vertical profiles of information for cloud/hydrometeor particle sizes and water mass – leaving too many degrees of freedom unconstrained. The gap in available observations for cloud water mass and/or their lack of use in constraining the models was clearly evident from the wide disparity in the cloud ice and liquid water path (CIWP and CLWP) values exhibited in the CMIP3 GCMs [Li et al., 2008; Waliser et al., 2009].
 In this study, we take the first approach mentioned above, and perform the evaluation in terms of the model representations of CIWC/CIWP. We utilize the experience we have gained from a number of studies performed in recent years on cloud ice and liquid [e.g., Li et al., 2005, 2007, 2008; Waliser et al., 2009; Chen et al., 2011; Ma et al., 2012a]. This includes developing a measure of observational uncertainty (discussed in section 2), and applying an illustrative and quantitative set of evaluation diagnostics. A prominent goal of the study is to examine how the fidelity of the models may have changed between CMIP3 and CMIP5. Moreover, we attempt to discriminate CMIP5 models that achieve a threshold capability of model fidelity via the Taylor diagram [Taylor, 2001] construct and the observational uncertainties just mentioned. In addition, as reanalyses data for some quantities such as the basic atmospheric state variables have become nearly synonymous, in some contexts, with “observations,” we also incorporate two recent reanalysis data products in our evaluation to provide some assessment of this questionable perception – particularly for quantities such as CIWP that are not strongly constrained by observations.
 In the following section, we describe the observational resources we use for this study, including the way the different retrievals and other methodologies are combined to form a robust observational estimate with some quantitative information on uncertainty. In section 3, we briefly describe the models and reanalyses data sets utilized in this evaluation study. In section 4, we illustrate and discuss the results of our model evaluation. Section 5 summarizes and draws conclusions.
2. Observed Estimates of IWC and IWP
 The A-Train constellation of satellites, which includes CloudSat and Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) flying only 15 s apart, provides a global view of the vertical structure of clouds, including cloud condensate such as IWC. CloudSat provides vertical profiles of radar reflectivity measured by a 94 GHz cloud profiling radar (CPR) with a minimum sensitivity of ∼−30 dBZ. The profiles extend between the surface and 30 km altitude with a vertical resolution of 240 m and have a footprint of about 2.5 km along track and 1.4 km across track. The CALIPSO lidar measures parallel and perpendicular backscattered laser energy at 532 nm and total backscattering at 1064 nm at altitude-dependent vertical resolutions and footprints (75 m vertically with about 0.3 km along track footprint above 8.2 km and 30 m vertically with about 1.0 km along track footprint below 8.2 km). To date, a series of retrieval algorithms either using CloudSat radar or CALIPSO lidar or both provide global retrievals of IWC, effective radius (Re), and the extinction coefficient from the thinnest cirrus (seen only by the lidar) to the thickest ice cloud [Austin and Stephens, 2001; Hogan, 2006; Delanoë and Hogan, 2008, 2010; Mace et al., 2009; Young and Vaughan, 2009; Sassen et al., 2009; Stein et al., 2011; M. Deng et al., Evaluation of several A-Train ice cloud retrieval products with in situ measurements collected during the SPartICus campaign, submitted toJournal of Applied Meteorology and Climatology, 2012].
 In this study, IWC and IWP products retrieved from three different algorithms are used to help account for observational uncertainty. They are:
 1. 2B-CWC-RO4 [Austin et al., 2009]. The CloudSat Science Team radar-only product provides estimates of IWC and Re using measured radar reflectivity from CloudSat 2B-GEOPROF to constrain the retrieved IWC. The retrieved IWC profiles are obtained by assuming constant ice particle density with a spherical shape and a lognormal particle size distribution (PSD). An a priori PSD is specified based on its temperature dependencies obtained from European Centre for Medium-Range Weather Forecasts (ECMWF) operational analyses. The cloud water contents for both liquid and ice phases are retrieved for all heights using separate assumptions, then a composite profile is created by using the retrieved ice properties at temperatures colder than −20°C, the retrieved liquid water content at temperatures warmer than 0°C, and a linear combination of the two in intermediate temperatures. This reduces the total IWC as the temperature approaches 0°C. The sensitivity and uncertainty of this retrieval algorithm are discussed inAustin et al. . The time period of this data set is from January 2007 to December 2010. The vertical and horizontal resolutions are the same as the CloudSat instrument discussed above. Our previous CMIP3 model evaluations [Waliser et al., 2009] were based on an earlier version of this product.
 2. DARDAR. DARDAR (raDAR/liDAR) [Hogan, 2006; Delanoë and Hogan, 2008, 2010] is a synergistic ice cloud retrieval product derived from the combination of the CloudSat radar and CALIPSO lidar using a variational method for retrieving profiles of the extinction coefficient, IWC and Re of the ice cloud. DARDAR assumes a “unified” PSD given by Field et al. . The mass-size and area-size relations of non-spherical particles are considered using in situ measurements [Brown and Francis, 1995; Francis et al., 1998; Delanoë et al., 2011; Stein et al., 2011]. For DARDAR, it used CALIPSO backscatter and temperature to find supercooled water in the 0°C to −40°C range, while the depolarization is too noise to use at the CALIPSO resolution [Delanoë and Hogan, 2010]. The time period of this data set is from July 2006 to June 2009.
 3. 2C-ICE (Deng et al., submitted manuscript, 2012).Similar to DARDAR, the CloudSat 2C-ICE cloud product is a synergistic ice cloud retrieval derived from the combination of the CloudSat radar and CALIPSO lidar using a variational method for retrieving profiles of the extinction coefficient, IWC, and Re in ice clouds. The CALIPSO attenuated backscattering coefficients are collocated to the CloudSat vertical and horizontal resolutions. The ice cloud microphysical model assumes a first-order Gamma particle size distribution of idealized non-spherical ice crystals [Yang et al., 2000]. The Mie scattering of radar reflectivity is calculated in a forward model look-up table according to a discrete dipole approximation calculation [Hong, 2007]. The 2C-ICE cloud identification is provided by CloudSat CLDCLASS-Lidar product, which takes advantage of CALIPSO lidar backscatter (sensitive to water clouds), lidar depolarization (sensitive to non-spherical ice particle) and CloudSat radar (sensitive to large ice particles). For the mixed-phase clouds, 2C-ICE treats it as ice clouds using radar only to perform the retrieval. For mixed-phase clouds in deep convection, where the lidar is usually attenuated, they retrieval clouds above −6°C mainly relying on CloudSat radar only [Deng et al., 2010]. Readers desiring a more in-depth description of the 2C-ICE algorithm should refer toDeng et al.  for details. The time period of this data set is from January 2007 to December 2008.
 There are several differences between 2C-ICE and DARDAR. First, DARDAR is retrieved using the CALIPSO vertical resolution (60 m) instead of the CloudSat vertical resolution as in 2C-ICE. Second, the multiple scattering in the lidar signal is accounted for with a fast multiple-scattering code [Hogan, 2006] instead of assuming a constant multiple scattering factor as in 2C-ICE. Third, the lidar backscatter to extinction ratio is retrieved rather than assumed to be a constant as in 2C-ICE. Fourth, no parameterizations of radar or lidar signals are used for the lidar-only or radar-only regions of the ice cloud profile. Empirical relationships are heavily relied on for those regions in the DARDAR algorithm. Fifth, the DARDAR product assumes a “unified” PSD given byField et al. . The mass-size and area-size relation of non-spherical particles is considered using relationships derived from in situ measurements [Francis et al., 1998; Brown and Francis, 1995], while 2C-ICE used a first order Gamma PSD of idealized ice particles inYang et al.  with consistent size, area, mass, and scattering property relations (Deng et al., submitted manuscript, 2012).
 There are two important aspects to keep in mind regarding model and observation compatibility. First, because of the use of the CALIPSO in the latter two products described above, they will have more sensitivity to thin/cirrus ice clouds. However, thin ice clouds make very little contribution to the total ice mass and water content of clouds in a time-mean sense. Second, and more importantly, all three products, to first order, represent total tropospheric ice including “floating” ice and the precipitating cloud hydrometeors with variable size and falling speed, as the measurements are sensitive to a wide range of particle sizes, including small (quasi-suspended/cloud) particles and large (falling/precipitating) particles. The latter, including those particles associated with convective clouds, are generally not included as prognostic variables in most current GCMs [e.g.,Li et al., 2008; Waliser et al., 2009]. It is generally assumed that convective core areas in a GCM grid box are small for a GCM with a grid box size that is commonly larger than one hundred km2and above. Thus, its contribution to total water content is not very large. Even they are prognostically determined, the relative contribution does not change. However, the grid box resolution in most current state-of-the-arts GCMs is much higher, with grid box size smaller than 100 km2 to tens of km2, so that their IWC contribution from the convective core should be considered. Thus, for a meaningful model-observation comparison between the satellite-estimated and model-simulated IWC, an estimate of the convective/precipitating ice mass needs to be removed from the satellite-derived IWC/IWP values.
 In this study, we use two independent approaches to distinguish ice mass associated with clouds from ice mass associated with precipitation and convection. The two approaches include:
 1. FLAG method [Li et al., 2008; Waliser et al., 2009]. We exclude all the retrievals in any profile that are flagged as precipitating at the surface and exclude any retrieval within the profile whose cloud type is classified as “deep convection” or “cumulus” (from CloudSat 2B-CLDCLASS data). By excluding these portions of the ice mass, we obtain an estimate of the cloud-only portion of the IWP/IWC (hereafter, referred to as CIWP/CIWC). This methodology of estimating CIWP/CIWC was used in our previous CMIP3 model-data comparisons [e.g.,Li et al., 2008; Waliser et al. 2009] and for model cloud parameterizations improvements in ECMWF Integrated Forecast System (IFS) (R. M. Forbes, personal communication, 2011), National Aeronautics and Space Administration/The Goddard Earth Observing System Model, Version 5 (NASA/GEOS5) GCM, NASA Goddard finite-volume multiscale modeling framework (fvMMF) GCM [Chen et al., 2011], National Center for Atmospheric Research, Community Atmosphere Model, version 5 (NCAR CAM5) [Gettelman et al., 2010; Song et al., 2012], Geophysical Fluid Dynamics Laboratory, Coupled Model, version 3(GFDL CM3) [Donner et al., 2011], and University of California at Los Angeles (UCLA) GCM [Ma et al., 2012, also Sensitivity of global tropical climate to land surface processes: Mean state and interannual variability, submitted to Journal of Climate, 2012].
 2. PSD method [Chen et al., 2011] . We use the ice Particle Size Distribution (PSD) parameters associated with each CloudSat retrieval to separate the total IWC into mass with particle sizes smaller and larger than a selected particle size threshold. Based on the analysis in Chen et al.  and references therein, the size separation of cloud ice and precipitating ice on a global mean basis likely falls between 100 μm and 200 μm in diameter. A threshold of 150 μm is chosen for the present study, and the integrated mass of particles with diameter smaller than this size is considered representative of the CIWC/CIWP. In this case, such estimates are based on a quantitative, microphysical characterization (i.e., PSD) regardless of the presence of surface precipitation or cloud type, and thus the vertical distributions of cloud ice versus precipitating ice mass can be derived from each CloudSat profile. The CIWC derived by this method has been shown to agree well with estimates based on the FLAG method, and have been applied to evaluate the atmospheric ice in the ECMWF IFS and NASA Goddard fvMMF GCM [Chen et al., 2011].
 It should be underscored that with present satellite/retrieval technology, it is not possible to absolutely separate floating/cloudy forms of ice from falling/precipitating forms, yet models often try to make this distinction. Specific retrievals of this sort will require co-located vertical velocity information, such as from a Doppler radar capability, and/or multiple frequency radar to better characterize particle size.
 To account for observational uncertainty in our study, we produce four different estimates of “cloud” ice (i.e., CIWP/CIWC) from the three retrieval products and two precipitation/convection filtering methods described above. These include the FLAG method applied to all three of the retrieval products as well as the PSD method applied to the 2B-CWC-RO4 product. We use the ensemble mean of these four estimates as the “observed” or “reference” values, hereafter referred to as such, and the spread among the four estimates as a measure of observational uncertainty.
Figure 1shows long-term annual mean maps of IWP quantities associated with our observational estimates. The four columns represent estimates of total ice (TIWP:Figures 1a–1d), precipitating and convective ice (PCIWP: Figures 1e–1h), cloud ice (CIWP: Figures 1i–1l) and ensemble information (Figures 1m–1p). Overall, it is evident that the cloud ice (Figures 1i–1l) represents a smaller contribution to the total ice mass (Figures 1a–1d) than the precipitating/convective contribution (Figures 1e–1h), ranging from 10 to 30% depending on the product and location. It accounts for a smaller contribution in the two radar + lidar products (Figures 1b, 1c, 1f, 1g, 1j, 1k, 1n, and 1o) and in the tropics and storm track regions in all products. In general, the CIWP estimates typically agree relatively well, typically within a factor of two, and most of the differences can be explained by the different microphysical assumptions [Delanoë et al., 2011; Deng et al., submitted manuscript, 2012]. The difference in mass-size, area-size relations and the cloud occurrence identified criteria between CloudSat 2C-ICE and DARDAR contributes to some subtle differences between those two data sets (Deng et al., submitted manuscript, 2012).Figures 1m and 1n show the ensemble mean and standard deviation of the three observational estimates of TIWP (Figures 1a–1c), while the same is shown in Figures 1o and 1p for the four estimates of CIWP. It is the latter two maps, which are based on the four individual estimates in Figures 1i–1l, that represent the observational basis for CIWP and the GCM evaluations in this study.
Figure 2 shows a synthesis of sorts of Figure 1, with bar charts of regional averages of TIWP, PCIWP and CIWP, which demonstrates that irrespective of the retrieval method, cloud/precipitation-filtering method, or region, there is generally good agreement among the different estimates of CIWP. That said, there are some latitudinal bands where the four estimates of CIWP can differ up to 30% or so (e.g., N. Midlatitudes, S. Polar) and thus it is prudent that this uncertainty be accounted for in the model evaluations.
Figure 3 is similar to Figure 1, but displays the data as zonally averaged annual mean IWC as a function of height, rather than vertically integrated IWP. The general commonalities and differences between the products and different filtering methods are the same as described above for CIWP and Figure 1. Apart from that, the overall vertical structure of IWC in each exhibits three local maxima; one is in the tropics near 300 hPa and is associated with deep convection, and the other two are in the northern and southern midlatitudes at approximately 600 hPa and correspond to the storm tracks. In general, these maxima tend to be lower in the two radar + lidar products (2nd and 3rd row) and higher in the radar-only products (1st and 4th row). Moreover, there is a tendency for the maxima to be higher in the CIWC profiles (3rd column) compared to the PCIWC profiles (2nd column), particularly in the midlatitudes, which has some intuitive merit as the larger, precipitating particles should be biased in the cloud(s) present at lower altitudes. As withFigure 1, the four estimates of CIWC (3rd column) and their ensemble information (Figures 3o and 3p) are used to evaluate the model/reanalysis representations of CIWC. The information in Figures 1m, 1n and 3m and 3n are used to compare with two GCMs (i.e., GFDL AM3, CM3) examined in this study that provide outputs of TIWC/TIWP.
 Apart from the uncertainty of the retrieval method, an additional uncertainty to consider in light of making model-observation comparisons concerns the differences in the spatial and temporal sampling between the observations and the GCMs, such as those in the CMIP archives. The former are based on the suborbital footprint of the CloudSat and CALIPSO sensors as they fly in the formation in the polar-orbiting A-train constellation, while the latter are well- and regularly sampled means across the diurnal cycle and the globe. To estimate the impact of this sampling, we utilize the 3-hourly IWC at a number of levels from Modern Era Retrospective-analysis for Research and Applications (MERRA) and sample them along the A-Train orbit to examine the sampling bias. The unsampled MERRA values represent the true state of the geophysical variable and the sampled data represent estimates of the time-space subset of the variable observable from the A-Train (i.e., like the actual observed values are). From these two data sets, two monthly IWC climatologies are then created and compared in the Taylor diagram shown inFigure 4(red dots). The results show that the correlations between the two data sets are greater than 0.9 at all vertical levels examined. Standard deviation ratios fall between 1 and 1.2 for all vertical levels indicating that the sampled exhibit slightly more variability – which in this case would be reflective of sampling noise. This comparison suggests that the two climatologies closely resemble each other, particularly when considering the differences that are exhibited between the various GCMs examined in this study and their difference with the observed estimates that are shown in the next section. As a result of the above test – which does assume some fidelity by MERRA in representing the diurnal cycle, it is plausible to compare the observed, satellite-sampled, ice water estimates to those from the GCMs without the need to sample the GCMs along the A-Train satellite track. Also shown inFigure 4 are the four observed CIWP estimates (blue dots) shown in Figures 1i–1l plotted against their ensemble mean. Taken together, these sampling and retrieval uncertainties provide some quantitative information on the overall uncertainty in our observational estimate. This uncertainty is shown as the red outline in Figure 4, which is roughly delineated by a lower bound of 0.85 for correlation and a range of 0.8 to 1.5 for the standard deviation ratio. The high standard deviation of 2C-ICE inFigure 4 could be caused by the less data volume compared to other observation data set. In Taylor plots illustrated below, we will use these criteria as observational uncertainty and utilize it as a guideline for good model performance.
3. Modeled Values of IWC and IWP
 Using the observations described in the previous section, we evaluate CIWP/CIWC in reanalysis data sets, including ECMWF (ERA-Interim) [Dee et al., 2011] and NASA MERRA analyses, coupled atmosphere-ocean GCMs (CGCMs) from the CMIP3 (for CIWP only), CGCMs from CMIP5, and two additional state-of-the-art GCMs, including the UCLA CGCM (Ma et al., submitted manuscript, 2012) and the NASA GEOS5 GCM. The CMIP3 simulations are the same as those described inWaliser et al.  – although excluding the two UKMO models which we have since learned provided output on CIWP that was inconsistent with the experimental protocol's specifications [cf. Li et al., 2011]. Note that CMIP3 model output did not include CIWC. The CMIP5 simulations are listed in Table 1a. Table 1bis an outline of cloud microphysics parameterizations used in the selected CMIP5 models, as well as UCLA CGCM, GEOS5, MERRA and ERA-Interim model. Unlike that in a single column model intercomparison [e.g.,Ghan et al., 2000; Xie et al., 2002, 2005; Xu et al., 2002, 2005; Klein et al., 2009; Morrison et al., 2009] which is conducted with the same large-scale conditions and forcing for each model, an undertaking to elicit the reason for the different model behaviors is more plausible within the scope of a single paper. For CMIP5 models, however, in this study, the performance of simulated cloud properties arises from a fully coupled system (land, ocean, atmosphere, etc.) and the behavior is not likely to be simply explained by any single component/scheme of, but rather by details of the model's specific schemes and the coupling among schemes related with a particular process such as cloud and turbulence for boundary layer clouds, and cloud and convection for deeper clouds as well as the interactions with SSTs. However, material on attempting to explain the behavior of some of the best/worst performing models is discussed in the Summary. This includes for example discussion regarding the specifics associated with GFDL-CM3, GISS, MRI, UCLA CGCM and CAM5. In some cases, these are updates relative to the CMIP5 and thus provide possible explanations for good/poor model behavior and thus insight that modeling groups may find useful in improving their model physics. The specific experimental scenario is the historical 20th century simulation, which used observed 20th century greenhouse gas, ozone, aerosol and solar forcing. The time period used for the long-term mean is 1970–2005, and if a model provided an ensemble of simulations, only one of them was chosen for this evaluation. Unlike all other models examined in this study, which do not include ice mass from convective-type clouds in their CIWC, the two GFDL models include grid means over shallow cumulus, deep cumulus cells, and convective mesoscale clouds, weighted by their respective area fractions. In the GFDL-CM3, precipitating ice, however, that has fallen out of large-scale stratiform clouds and into clear areas is not included. Thus, the GFDL models should be considered somewhat carefully with respect to the others as they are including cloud mass from clouds whose contribution have been typically ignored, and their IWC/IWP fields would be more commensurate with TIWC/TIWP. In the CSIRO, diagnostic falling precipitation is considered while the convective-type clouds of cloud hydrometeors are not included. Thus, the CSIRO model somewhat should be considered between the cloud only and total ice water content/path. For both the GCM and observational data sets, all fields have been re-gridded to 40 levels (with a constant pressure interval of 25 hPa) and mapped onto common 8° × 4° longitude by latitude grids.
Table 1a. Institution, Model Resolution and Label for the CMIP5 GCMs Examined in This Study
64 × 128 × 26
Beijing Climate Center, China/ BCC-CSM1-1
64 × 128 × 26
Beijing Climate Center, China/ BCC-CSM1-1_esm
64 × 128 × 35
Canadian Centre for Climate Modeling and Analysis, Canada/CanESM2
288 × 192 × 26
National Center for Atmospheric Research, USA/CCSM4
128 × 256 × 17
Centre National de Recherches Meteorologiques, France/CNRM-CM5
96 × 192 × 18
Australian Commonwealth Scientific and Industrial Research Organization, Australia/CSIRO-Mk3-6-0
90 × 144 × 29
NASA / Goddard Institute for Space Studies, USA/GISS-E2-H
90 × 144 × 29
NASA / Goddard Institute for Space Studies, USA/GISS-E2-R
145 × 192 × 38
Hadley Centre for Climate Prediction and Research / Met Office, UK/HadGEM2-ES
120 × 180 × 21
Institute for Numerical Mathematics, Russian/ Inmcm4
120 × 180 × 21
Institute for Numerical Mathematics, Russian/ Inmcm4_esm
96 × 96 × 39
Institute Pierre Simon Laplace, France/ IPSL-CM5A-LR
64 × 128 × 80
University of Tokyo, NIES, and JAMSTEC, Japan/ MIROC-ESM-CHEM
320 × 640 × 56
University of Tokyo, NIES, and JAMSTEC, Japan/ MIROC4h
128 × 256 × 40
University of Tokyo, NIES, and JAMSTEC, Japan/ MIROC5
160 × 320 × 35
Meteorological Research Institute, Japan/ MRI-CGCM3
96 × 144 × 26
Norwegian Climate Centre, Norway/ NorESM1-M
Table 1b. Outline of Cloud Microphysics Parameterizations Used in the Selected CMIP5 Models, as Well as UCLA CGCM, GEOS5, MERRA and ERA-Interim Model
Bulk single moment; mixing ratio of cloud condensate with temperature dependent partitioning; “anvil” cloud, originates in detraining convection. “large-scale cloud,” originates in a probability distribution function (PDF) based condensation calculation.
Bulk single moment; mixing ratio of cloud condensate with temperature dependent partitioning; “anvil” cloud, originates in detraining convection. “large-scale cloud,” originates in a probability distribution function (PDF) based condensation calculation.
Single mixing ratio of total water Diagnostic falling snow
Bulk single moment; mixing ratio of cloud condensate with temperature dependent partitioning (The bounds are adjustable constants with current settings ice at T = −35°C and liquid at T = −4°C over ocean; T = −35°C and liquid at T = −10°C over land).
Single mixing ratio of total water Diagnostic falling snow
Bulk single moment; mixing ratio of cloud condensate with temperature dependent partitioning (The bounds are adjustable constants with current settings ice T = −35°C and liquid at T = −4°C over ocean; T = −35°C and liquid at T = −10°C over land).
Prognostic cloud fraction; PDF based; The Tiedtke (1993) parameterization includes among its assumptions a simple PDF for humidity in clear air (uniform distribution). The Donner (1993) cumulus parameterization generates a discrete PDF (defined by its ensemble of cumulus elements), each member of which has a joint distribution of thermodynamic and microphysics properties and vertical velocities.
Mixing ratio of cloud liquid and ice; Diagnostic falling snow
Bulk single moment; ice crystal number concentration is diagnosed; mixing ratio of cloud condensate with temperature dependent partitioning (The bounds are adjustable constants with current settings ice at T = −40°C);
Figures 5a–5zshow the long-term annual mean spatial distributions of simulated values of CIWP from fifteen CMIP5 CGCMs (seeTable 1a and Figures 5a–5o), the multimodel ensemble mean from the fifteen CMIP5 models (Figure 5p), and GEOS5 (Figure 5s), UCLA CGCM (Figure 5t), and two analyses including ECMWF-Interim (Figure 5u) and MERRA (Figure 5v) as well as the ensemble mean (Figure 5y) and standard deviation (Figure 5z) of the four observed estimates of CIWP discussed above. Overall, the multimodel mean CMIP5 CIWP values are spatially similar to observations but nonetheless are biased high. Individually, most models tend to qualitatively capture the global and regional CIWP patterns. This includes the relatively high values of CIWP in the ITCZ, warm pool, and storm tracks from the subtropics to high latitudes, and over convectively active continental areas over central Africa and South America. Note that the relative magnitudes between tropical and midlatitude values can be quite different across models; this will be more evident when discussing Figure 8below. About three of the CMIP5 models do a good job at representing both the observed patterns and magnitudes of CIWP (i.e., CNRM-CM5, CanESM2, MRI). A number of models, however, significantly (∼factor of 2) underestimate tropical CIWP (i.e., NorESM, BCC, BCC-CSM1, CCSM4) and two severely (∼factor of 10) underestimate CIWP (i.e., Inmcm4, Inmcm4ESM). The two GISS GCMs greatly overestimate (∼factor of 5) tropical CIWP. The IPSL, CSIRO, MIROC5, MIROC4h and the two GISS GCMs moderately overestimate CIWP in the extra-tropics. For the non-CMIP5 GCMs, the GEOS5 AGCM significantly underestimates (∼factor of 3) CIWP in the storm tracks while the UCLA CGCM does remarkably well over most of the globe. The two reanalyses, ECMWF and MERRA, show relatively good CIWP patterns and magnitudes, with MERRA being biased a bit low in midlatitudes which is not surprising given the base model (GEOS5) exhibits such a strong negative bias. While the above model-observation differences are still substantial in many regards, it is worth noting that the ensemble of CMIP5 CIWP values examined here, appear to exhibit improvement compared to the ensemble CMIP3 models evaluated in our previous study (Appendix AFigure A1); this will be discussed and quantified further below. The two GFDL models (Figures 5q and 5r) that simulate and provide output on TIWC each exhibit fairly good TIWP in the tropical ITCZ, warm pool and convectively active continental regions, but significantly underestimate TIWP in the extra-tropics storm track regions compared to ensemble mean TIWC shown inFigure 5w.
Figure 5aashows a bar chart illustrating CMIP5 CGCM global and regional CIWP averages for midlatitudes (30° N/S – 60° N/S, tropical (30° N 30° S) and polar latitudinal bands (60° – 80° N/S). Included are CMIP5 CGCMs, multimodel CMIP5 and CMIP3 CGCM ensemble means, GEOS5, UCLA AGCM, UCLA CGCM, ECMWF-Interim and MERRA, and also observed ensemble means. It is evident, and expected, that the model disagreement with observations is larger for regional values. For example, the GISS, IPSL, MIROC4h, MIROC5 and CSIRO models have extra-tropical IWP values that are larger than tropical values by often factors of two or more. The two Inmcm4 and GEOS5 models exhibit CIWP underestimates for all latitudinal bands. Interestingly, while the two GISS CGCMs grossly overestimate the CIWP in the tropics for CMIP5, with moderate overestimates for the extratropics, the opposite was the case for CMIP3 as shown inFigure A1. With the exception of the tropical values, the multimodel mean CIWP values from CMIP5, for the different latitudinal bands shown, are closer to observations than those from CMIP3. This demonstrates a quantitative improvement of CIWP simulations from CMIP3 to CMIP5; this will be further demonstrated below. The two GFDL models simulate TIWP well in tropical latitudinal bands but underestimate TIWP by 50% or more in the extra-tropical storm tracks and polar latitudinal bands.
 To summarize the multimodel performance of CMIP3 and CMIP5 in representing the time-mean pattern of CIWP,Figures 6a–6d illustrate the multimodel mean biases against the observed estimate and the root mean square error (RMSE) (Figure 6e) calculated across the models for each of the ensembles. In CMIP3, the high latitudes were biased high, while in CMIP5 the bias is more uniform. In the case of the latter, only the subtropics and deserts exhibit a low bias due, in part, to the relatively small cloud ice values in these regions. While the bias figure emphasizes the sign and magnitude of systematic biases across the two model archives, the RMSE figure emphasizes systematic errors irrespective of the sign, with larger errors emphasized as well because of the squaring operation. Again, high latitudes were more systematically incorrect in CMIP3 while low-latitudes were more problematic in CMIP5. However, we would like to point out the impact of the more extreme outliers on these figures. Examination ofFigure 5 indicates that the GISS model is an extreme outlier, and biased very high, relative to the ensemble of CMIP5 models examined here. In addition, because there are two different versions of it analyzed here that do not differ much in regards to their CIWP representation, the effect of this model's extreme bias is doubled. Similarly, it is the case for the two INMCM GCMs but with a large negative bias of CIWP. Figures 6e and 6g show the impact on the spatial patterns of mean bias and RMSE if the two GISS models are removed from the ensemble, while the two right panels show the impact if the two GISS and the two INMCM models are removed. In this case, the bias of the CMIP5 ensemble analyzed here changes from a large, somewhat uniform (except the subtropics), positive value (∼20–60 g m−2), to a somewhat smaller positive bias in the extra-tropics with an offsetting negative bias in the tropics. In addition, the RMSE gets reduced substantially across the tropics and to a lesser extent in high latitudes. This change in the pattern of the overall bias and RMSE is substantial. What is left resembles a more canonical pattern associated with the observed ITCZ and storm tracks, possibly indicating a more straightforward bias/error to try and reconcile and improve (i.e., the cloud patterns are roughly realistic, but tropical clouds are too infrequent or have too little ice with the opposite being the case for storm track clouds).
 To further quantify and synthesize the comparative information discussed above, we can use a Taylor diagram [Taylor, 2001] to summarize both the degree of agreement in overall CIWP spatial pattern correlations along with the absolute sizes of the biases among the CMIP5 CGCMs, including their multimodel mean, two analyses, three other GCMs and four observed CWIP estimates. As with Figure 4the ensemble mean of the latter is used as the reference data set and their spread to help quantify observational uncertainty. The Taylor diagram relates three statistical measures of model fidelity: the “centered” root mean square error, the spatial correlation, and the spatial standard deviations. These statistics are calculated for the long-term time mean and over the global domain (area-weighted). The reference data set is plotted along thex axis at the value 1.0.
Figure 7 shows Taylor diagrams for CMIP3 (Figure 7a) and CMIP5 (Figure 7b), as well as select information from both as bar charts (Figure 7c). The observed estimates are plotted in blue, the CMIP GCMs in red, their ensemble means in green and the reanalyses and non-CMIP GCMs are in black. The red rectangular-like region illustrates a measure of observational uncertainty developed and shown in conjunction withFigure 4. Not surprisingly, the four individual observed estimates, reanalyses and AGCM simulations (i.e., specified SST; GEOS5 version2.5) perform as a group considerably better than the CMIP coupled GCMs. The former all tend to have correlations at around 0.9 or better and standard deviation ratios of between about 0.8 and 1.5. For the CMIP values (red), most of them have correlations between about 0.4 and 0.7 with standard deviation ratios well above 1, with some well above 3 and even up to 5 (see Figure 7c). The CMIP3 and CMIP5 multimodel means do not exhibit the best overall performance relative to the individual models due to the few strong outliers in the ensembles. Noteworthy, however is that the CMIP3 and CMIP5 multimodel means (green) have correlations of 0.54 and 0.76, and standard deviation ratios of about 3.1 and 1.4, respectively, indicating a rather considerable performance improvement from CMIP3 to CMIP5 for representing CWIP. While this progress is encouraging, keep in mind also that all models shown still exhibit a very poor correlation against the reference data set, with values less than 0.8 and none of the CMIP GCMs fall within the (red “box”) range of observational uncertainty.
 In regards to specific models, the best performing CMIP5 model by this metric is CNRM-CM5 with correlation and standard deviation ratios of about 0.8 and about 1.0, with MRI and CanESM2 the next best performers. As mentioned above, GEOS2.5 performs well relative to the others in this group but it has the advantage in this case of being an AGCM-only run, and thus uses specified SSTs, while all other models examined here are fully coupled. Noteworthy in this regard, is the very good performance of the UCLA CGCM (non-CMIP5), with metric values nearly identical to the best performing CMIP5 GCM. The two GISS models have standard deviation ratios that are off the scale of the Taylor plot, with values of about six. Other poorly represented CIWP fields as measured by this metric are exhibited by the CMIP5 Inmcm4, Inmcm4ESM, IPSL and CSIRO models. Along the lines of the discussion above regarding the impact of the biggest outliers, we note that even if the two GISS models and the two INMCM models are removed, the multimodel ensemble still does not quite mimic the observations within the estimated range of observational uncertainty (see caption). While expected, it is still encouraging to see the relatively good performance of the ECMWF-Interim and MERRA CIWP (e.g., correlations ∼0.8 and standard deviations ∼0.8–1.0), as neither does any direct assimilation of cloud ice observations. In summary, while there is progress in the overall performance of CMIP5 relative to CMIP3, and there is at least one model that would be considered quite good (i.e., CNRM-CM5) although not within the observational uncertainty used here, these disparities demonstrate a need for continued work in refining the model CIWP representations.
 Next, we examine the fidelity of the models' CIWC vertical structure. A comparison is given in Figure 8shows the CIWC zonal and annual mean values from thirteen CMIP5 CGCMs (Note that the CNRM-CM5 CGCM CIWC is not available from the CMIP5 data portal at the time), GEOS5 AGCM (Figure 8q) and UCLA CGCM (Figure 8p), as well as ECMWF-Interim (Figure 8r) and MERRA (Figure 8s). These models provide output specifically on cloud ice. The two GFDL GCMs, on the other hand, are shown in Figures 8n and 8o and provide output for TIWC. Overall, there are significant disparities among the CMIP5 CGCMs against the observed ensemble mean (Figure 8v) with overall discrepancies ranging from multiplicative factors of about 0.25 of the observations (i.e., Inmcm4) to factors of 10 (i.e., GISS GCMs). Moreover, the general character of their vertical distributions with respect to pressure levels is considerably different. For example, the IPSL exhibits significant overestimates of CIWC over the storm track regions. About five of the CMIP5 models do a fair job at representing the vertical structure and magnitude of IWC (i.e., CanESM2 (Figure 8f), BCC-ESM1 (Figure 8g), NorESM1 (Figure 8h), MIROC5 (Figure 8k), MRI (Figure 8l) and CCSM4 (Figure 8m)). The rest of the models (CSIRO ((Figure 8i), MIROC4h ((Figure 8j), ECMWF-Interim ((Figure 8o)) generally tend to qualitatively capture the patterns but overestimate CIWC over midlatitudes and below 700 hPa. The GEOS5 model, on the other hand, tends to slightly overestimate CIWC in the tropics but significantly underestimates CIWC in the mid to high latitudes by about a factor of 2 to 3. The analyses from MERRA as well as the simulation from the UCLA CGCM show realistic CIWC vertically with values close to the observed ensemble mean albeit not extending as close to the surface when compared to the observed estimate. However, it is reasonable to exercise caution when considering the robustness of the observed values in these lower tropospheric regions, or anywhere below the freezing level as there are artificial limitations applied to the retrievals that involve separating ice from liquid contributions. Compared to the observed TIWC (Figure 8s), the two GFDL models all capture the ITCZ in tropical regions pretty well but significantly underestimate TWIC in the extra-tropical storm track and Polar Regions. A realistic ITCZ is found in the GFDL uncoupled AGCM (Figure 8o) while a more notable double ITCZs is evident in the GFDL CGCM ((Figure 8n). Also, based on the caution above, the higher values in mid tropospheric tropics in these models relative to the observed value may not be an error. To summarize, apart from gross qualitative agreement with observations among many CMIP5 models, it is still apparent from Figures 4, 5, 6 and 8 that significant disparities exist not only horizontally but also in the vertical structure.
Figure 9 summarizes some of the basic features of Figure 8by showing the global (80°N-80°S) mean vertical profiles of the models against observed ensemble mean CIWC (thick black). Again, the two GISS models and IPSL significantly overestimate CIWC while the two Inmcm4 models significantly underestimate CIWC at all levels. The best simulated CIWC vertical profile is from CanESM2 followed by the MRI, CCSM4, NorESM and BCC-ESM models (although keep in mind that CNRM-CM5 CGCM is not available here). Note that CanESM2 and MRI also performed very well in terms of horizontal structure. Both MERRA and GEOS5 overestimate CIWC in the upper troposphere, but values sharply decrease below 350 hPa. By decomposingFigure 9 into various regional averages (a subset of those used in Figure 5) shown in Figure 10, we find that the bias mainly comes from mid- and high-latitudes from both hemispheres, especially over the southern high-latitudes. Besides Inmcm4 and GEOS5, which are biased high relative to the observed estimate throughout the troposphere, the other models are biased high below 500 hPa, and increasingly so in the lowest levels. However, it is wise, as mentioned above, to exercise some caution regarding the observed estimates of cloud-ice at low levels. Simulation fidelity in the midlatitudes is similar to that in high-latitudes, although three models (BCC, CCSM4, NorESM and CanESM) compare quite favorably with observations in this region. Over the convectively active regions, the models generally are close to capturing the correct CIWC peak, but CIWC values vary greatly from model to model. Similar toFigure 10, Figure 11is for GFDL CM3z global and regionally averaged vertical TIWC profiles in the tropics, northern mid- and high- latitudes (southern high-latitudes are similar). Compared to the ensemble mean TIWC (black profile), we find that the GFDL model (green) significantly underestimates TIWC throughout the troposphere in the mid- and high- latitudes. Over the convectively active regions, the model overestimates TIWC below 450 hPa but underestimates above 450 hPa. The IWC profiles spaghetti diagrams exhibited inFigure 9 is similar to that shown in Xu et al.  and Xie et al. . They might be related to cloud microphysics or convective parameterization used in each model.
 Finally, in order to determine if there are systematic biases across the models in the vertical structure of their CIWC fields, we examine the models', and their multimodel mean, bias and RMSE at each level including 700, 600, 500, 400, 300, 200, 150, and 100 hPa against the observed CIWC values. A Taylor diagram representing each pressure layer of the annual mean CIWC for the CMIP5 multimodel mean (red), MERRA (blue), and ECMWF-Interim (green) are shown inFigure 12. The information on Figure 12 is also given in Figure 13, which provides the Taylor diagram correlations (Figure 13a) and standard deviation ratios (Figure 13b) for the CMIP5 multimodel mean of CIWC at each pressure level for the CMIP5. In addition, Figure 13also includes the values for each individual model in the CMIP5 ensemble analyzed here, along with the values for the UCLA and GEOS5 GCMs and the MERRA and ECMWF-Interim analyses, and correlation and standard deviation values for CIWP as well (left most column). ForFigure 13, we utilize the combined sampling error estimates and observation data set sensitivity discussed in section 2 and shown in Figure 4 to classify reasonable model performance (pink boxes) relative to observed estimates for these Taylor diagram metrics. In this case, the threshold values for reasonable performance are 0.85 or better for correlation and 0.8 to 1.5 for standard deviation ratios.
 Considering Figure 12 and Figure 13, it is evident that CIWC is better simulated by most of the models in the lower, mid-tropospheric layers (i.e., correlation values are greater than 0.85 for 700, 600 and 500 hPa) and to some extent around 200 hPa. Interesting is that the models that tend to do well in terms of their pattern in the lower, mid-troposphere tend to more poorly simulated in the upper troposphere, and vice versa (seeFigures 13a and 13b). All the models exhibit relatively poor scores at the 400 hPa level. From Figure 13b, it is evident that models do not uniformly overestimate or underestimate the magnitude of ice through the column. In some cases, the range of the standard deviation ratio for a given GCM across the different levels can easily be up to a factor of five (e.g., CSIRO, IPSL, BCC-ESM, GISS). In regards to CIWP, and as shown also inFigure 7c, a number of CMIP5 models exhibit good performance (i.e., shaded box) for the standard deviation ratio, while none of these models exhibit good performance for the correlation value – as indicated by the observational uncertainty level used here. Evident again here is the improvement in CMIP5 compared to CMIP3 in regards to both measures.
5. Summary and Discussion
 The objective of this study is to evaluate the representation of atmospheric ice in present-day GCMs, with an emphasis on: (1) the cloud portion of the ice (i.e., CIWC/CIWP) which is the quantity typically simulated and output by GCMs and (2) model simulations of present-day climatology of CIWC/CIWP for CMIP5 and their comparison to CMIP3. Observational reference values and their uncertainty were addressed by using four different estimates of CIWC/CIWP that accounted for different approaches to the retrieval and to the methods of filtering out the contribution to the ice mass in the retrievals due to large-particle/precipitating components which is a contribution to the mass that is typically not represented in GCMs as a prognostic or column-resolved quantity (seesection 2). In addition, observational uncertainty also included considerations of the effects from the temporal sampling of the observations, i.e., the sun-synchronous, polar-orbiting A-Train satellites. The models evaluated (seesection 3) included 15 simulations of present-day climate available to date in CMIP5 and two other GCMs of interest (GEOS-5 AGCM, and UCLA CGCM). The evaluation also included two modern reanalyses (MERRA and ECMWF-Interim). In addition, we included comparative evaluations of fourteen of the GCMs from CMIP3 that we analyzed in our previous study [Waliser et al., 2009], allowing for an assessment of whether GCM fidelity in regards to CIWP/CIWC has improved from CMIP3 to CMIP5.
 Overall, there is a fairly wide disparity in the fidelity of CIWP representations in the models examined. Even for the annual mean maps considered, there are easily factors of 2, and nearly up to 10, for the differences between observations and modeled values for most of the GCMs for a number of regions (Figures 5–7). That being said, there are about 3 models in the CMIP5 ensemble examined here that perform rather well in regards to the Taylor diagram metrics (i.e., standard deviation ratio, pattern correlation) for CIWP; these include the CNRM-CM5, MRI and CanESM2
 There is another five that perform particularly poorly, GISS-E2-H, GISS-E2-R, IPSL, INMCM4, and INMCM4-ESM, with the former (latter) three (two) being biased very significantly high (low) in terms of overall CIWP magnitude. The remaining seven exhibit performance in between (BCC, BCC-CSM1, NorESM, CSIRO, MIROC5, MIROC4h and CCSM4). As expected the two reanalyses examined performed relatively well compared to the group as a whole due to their use of observed SSTs and the incorporation of a wide array of constraining observations; a result that is still notable though since they don't assimilate cloud ice observations and thus rely on (parameterized) model physics to represent this quantity. However, even with the assimilation of many other/related quantities and the benefit of observed SSTs, neither MERRA's nor ECMWF-Interim's performance was within the uncertainty of the observations for both the standard deviation ratio and pattern correlation. The GEOS5 AGCM also performed well but it also has the benefit from specified, observed SSTs. Remarkable is the performance of the UCLA CGCM that was one of the best performing CGCMs of all those examined, along with the three identified above. Considering even these results alone and the remaining disparities between the observations and modeled values of CIWP, it is evident that while the models may be providing roughly the correct radiative energy budget, many are accomplishing it by means of unrealistic cloud characteristics of cloud ice mass at a minimum, which in turn likely indicates unrealistic cloud particle sizes and cloud cover.
 Examination of the vertical structure of CIWC in terms of global, zonal and large-region averages (e.g., high, mid and tropical latitudes) indicates similar findings in terms of overall performance across the models and re-analyses examined here. Setting aside the differences in magnitude discussed above, most of the systematic error in the global-mean vertical profile of CIWC occurs below the mid-troposphere where the models tend to overestimate CIWC compared to the observed estimate. When considered in more detail, it is evident that this bias arises mostly from the mid and high latitudes (Figures 8–10). Note that some caution is warranted here in regards to the observations as discriminating the boundary between ice and liquid and quantifying the contributions from each in mixed-phase clouds is very difficult using present-day observing capabilities. Apart from the differences in magnitude, the models also display considerable differences in the pressure level of maximum CIWC, ranging from about 250 to 550 hPa. Finally, using the observational uncertainty to discriminate good from poorly performing models in terms of CIWC and CIWP (Figure 13) also indicated that in terms of pattern correlations, the bulk of the CMIP5 models tend to perform best in the mid-to-lower troposphere (∼500–700 hPa) with a few also performing well at about 200 hPa (Figure 13a).
 A specific objective of this study was to compare the overall performance between CMIP3 and CMIP5. Based on a number of diagnostics (e.g., Figures 5a–5z versus Figure A1, Figure 6, Figure 7) there has been significant and quantitative improvement in the representation of CIWP between CMIP3 and CMIP5 possibly due to the use of two-moment cloud microphysics schemes in more GCMs. This is clearly demonstrated by the reduction (by about 50% or more) in the multimodel mean bias and RMSE of the annual mean maps of CIWP between CMIP3 and CMIP5 (Figures 6a–6d and Figures 6e–6h). Similarly, when viewed on a Taylor plot, the multimodel mean values for the annual mean maps of CIWP are considerably better in CMIP5 than CMIP3 with the distance between the multimodel ensemble value and the reference point reduced by about a factor of 3, although the CMIP5 ensemble value (even with outliers removed – see more below) did not yet quite lie in the range of the observational uncertainty discussed above.
 While there are a number of models that perform quite well (e.g., CNRM-CM5, Can-ESM, and MRI-CGCM3), there are still a number that exhibit very significant biases in their mean spatial structure of CIWP (and CIWC). This includes differences up to factors of ∼5 or more in a number of regions (e.g.,Figure 5; GISS-E2-H, GISS-E2-R, INMCM4 and INMCM4ESM). The impact of these outliers on the multimodel mean bias and RMSE is substantial. For example, when excluding these outliers, not only is the overall magnitude of the bias and RMSE significantly reduced, the pattern of the bias changes markedly. With all models, the bias is generally positive over most regions where high clouds occur, and there is a notable double ITCZ imprint (Figures 6a–6d). When these outliers are removed, the bias is positive (negative) in the extra-tropics (tropics) and the pattern echoes a more canonical pattern of the ITCZ and storm tracks (Figures 6e–6h). Note that the substantial positive bias in CIWP/CIWC in the tropics in the two GISS models arises due to a compensation for a relative negative bias in clouds and CIWP/CIWC in the extra-tropics (Figures 5a–5z and A. Del Genio, GISS, personal communication, 2012) in conjunction with the need to have a globally averaged radiation budget consistent with observations.
 Given that there have been viable observed estimates of CIWP/CIWC for about 4–5 years from CloudSat, about 2–3 years from combined CloudSat + CALIPSO retrievals, and even longer for CIWC values from MLS [e.g., Wu et al., 2008, 2009; Li et al., 2005], yet there are still GCMs exhibiting such large biases, indicates challenges to utilizing the observations by all the modeling groups. In addition, the deficiencies of model simulations on temperature, humidity, dynamics, radiation can also contribute to the IWC/IWP large bias to the reference values. These challenges likely include the painstaking work of model development and improvement, the relatively few people/resources available to perform such work within a modeling institute and group, the reduction in the above mentioned time periods of data availability due to the need to have “final” model versions ready for CMIP5 at least a year ago, and the non-trivial effort of obtaining and getting acquainted with the data [e.g.,Waliser et al., 2009; Gleckler et al., 2011]. As cloud feedback will undoubtedly still represent a key uncertainty in the next IPCC assessment report, it is essential that these observations be utilized to their full extent to provide the key constraints they offer (see section 1). The instantiation of the Climate Metrics Panel sponsored by the Working Group on Coupled Modeling (WGCM) and Working Group on Numerical Experimentation, (WGNE) along with the development of the “obs4MIPs” activity [Gleckler et al., 2011] are designed to facilitate the use of observations to provide multiple constraints and points of evaluation for climate model performance. Thus, it can be expected that there will be fewer and fewer extreme outliers in model performance as these activities and resources become fully implemented.
 Our study also included evaluating the performance of two GFDL GCMs (AM3 and CM3) that stand apart from the others as they attempt to also model and include in their output atmospheric ice water associated with convective clouds and precipitation. Thus, the comparison in these cases is on TIWC/TIWP, and the unfiltered versions of the satellite retrievals can be utilized (e.g., Figure 5). For these two GCMs, the comparisons of TIWP show that these models perform very well in the tropics, but underestimate the TIWP by about a factor of ∼2–4 in the extra-tropics. Examining the TIWC profiles shows modest pattern biases that come from the upper troposphere in the tropics (e.g., enhanced double ITCZ) with rather uniform underestimates of TIWC throughout the troposphere in the extra-tropics. The abilities of the GFDL models to represent and output both cloudy and precipitating ice profiles, combined with the observational capabilities to roughly distinguish these two types of ice mass, provides an additional means for constraints on the model physics.
 While it was beyond the scope of this study to probe the causes of the model-to-model differences and model-to-observation biases, highlighting a few recent developments is instructive to help keep in mind the complexities associated with modeling atmospheric ice. An issue is that, the inconsistency, sensitivity and uncertainties of temperature thresholds used for the temperature-dependent mixed-phase assumptions for cloud ice and cloud liquid water in observations and models can also contribute to the biases [Cheng et al., 2012; Ma et al., 2012]. For example, Note that, MRI is one of the CMIP5 models that perform quite well, is the only model using a double moment cloud scheme with two separate prognostic equations for cloud liquid and ice. The model treats atmospheric transport (including cumulus convection) of the aerosols and detailed cloud-aerosol effects [Yukimoto et al., 2012].
 Recent progress in this regard includes a study by Gettelman et al. who reported improvements from the incorporation of a process-based ice cloud representation that allowed supersaturation with respect to ice and an ice cloud cover consistent with the treatment of ice microphysics. The sensitivities and improvements found in association with ice supersaturation were also reported byTompkins et al. for a recent version/update of the ECMWF Integrated Forecast System. In another development, Ma et al. [2012a] indicates that the notable fidelity of simulated CIWC/CIWP in the UCLA GCM stems from better representation of deep convection and the associated detrainment, as well as cloud macro- and microphysics schemes.Song et al. incorporated two-moment microphysical scheme to convective areas of grid cells, helping to rectify the negative CIWP bias evident in CCSM4.
 We would like to thank editor and three reviewers for their very comprehensive review and constructive comments/suggestion. We thank Anthony Del Genio/NASA GISS, Ken Lo/NASA GISS, Voldoir Aurore/CNRM, Masahiro Watanabe and Shingo Watanabe /MIROC, Leon Rotstayn/CSIRO, Knut von Salzen/CCCma, Gary Strand/NCAR, Alf Kirkevag/NCC, Seiji Yukimoto/MRI, Jean-Louis Dufresne/IPSL, Tongwen Wu/BCC-CMA and many other colleagues from climate modeling centers for providing model information. Thanks also go to Gregory Huey with data and Matthew Lebsock for reading and providing comments on the manuscript. The contributions by D.E.W. and J.L.L. to this study were carried out on behalf of the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The contribution of Hsi-Yen Ma to this work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.