Improvements in Cloud and Water Vapor Simulations Over the Tropical Oceans in CMIP6 Compared to CMIP5

Clouds and water vapor are among the most difficult quantities for global climate models to simulate because they are affected by physical processes that operate over scales unresolved by current climate models. We use NASA satellite data to assess the representation of clouds and water vapor structures in 28 climate models that participate in the Coupled Model Intercomparison Project Phase 6 (CMIP6). Each model is assigned numerical scores based on its performance in simulating spatial mean, variance and pattern correlation of multi‐year mean clouds and water vapor structures in lower, middle, upper troposphere, and near the tropopause over tropical oceans. We find measurable improvements in CMIP6 models relative to CMIP5 models for both clouds and water vapor. The differences between models and satellite observations and the spread across the models are reduced. In addition, we find that the models' equilibrium climate sensitivity (ECS) is correlated with overall performance scores for both CMIP5 and CMIP6 models, with a weaker correlation in CMIP6, suggesting that the models that capture better tropical clouds and water vapor distributions tend to have higher ECS. The physical processes responsible for the apparent correlation between ECS and model performance score warrant further study.


Introduction
Clouds and water vapor are important players in modifying surface warming caused by increasing greenhouse gases (IPCC AR5). However, their spatial and temporal variations in the atmosphere are driven by processes that occur over multiple scales, some of which are smaller than the grid sizes of global climate models (GCMs) and thus are difficult to simulate. Previous studies documented the large errors in simulated clouds and water vapor structures in GCMs and their relations with large-scale dynamic and thermodynamic conditions (e.g., Dolinar et al. 2015;Jiang et al. 2012;Lebsock & Su, 2014;Li et al. 2005;Su et al. 2006Su et al. , 2013Tian et al., 2013;Waliser et al. 2009). These model evaluation studies provided useful references for climate model improvements. For example, Jiang et al. (2012) evaluated a number of climate model simulations of clouds and water vapor profiles of models participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5). They showed that GISS model E2 produced upper tropospheric ice water content (IWC) about a factor of 10 greater than satellite retrievals, which motivated the development of a new ice cloud parameterization scheme with updated ice particle fall velocity and particle size assumption in the GISS model that drastically reduced the IWC biases (Elsaesser et al. 2017).
For the Coupled Model Intercomparison Project Phase 6 (CMIP6), increased spatial (both horizontal and vertical) resolutions and arguably more sophisticated model physics are implemented in the newest generations of GCMs (Meehl et al. 2020). Satellite retrievals of clouds and water vapor have also been updated Abstract Clouds and water vapor are among the most difficult quantities for global climate models to simulate because they are affected by physical processes that operate over scales unresolved by current climate models. We use NASA satellite data to assess the representation of clouds and water vapor structures in 28 climate models that participate in the Coupled Model Intercomparison Project Phase 6 (CMIP6). Each model is assigned numerical scores based on its performance in simulating spatial mean, variance and pattern correlation of multi-year mean clouds and water vapor structures in lower, middle, upper troposphere, and near the tropopause over tropical oceans. We find measurable improvements in CMIP6 models relative to CMIP5 models for both clouds and water vapor. The differences between models and satellite observations and the spread across the models are reduced. In addition, we find that the models' equilibrium climate sensitivity (ECS) is correlated with overall performance scores for both CMIP5 and CMIP6 models, with a weaker correlation in CMIP6, suggesting that the models that capture better tropical clouds and water vapor distributions tend to have higher ECS. The physical processes responsible for the apparent correlation between ECS and model performance score warrant further study. recently with better accuracy and precision (e.g., Austin et al. 2018;Livesey et al. 2020;Tian & Hearty, 2020). A comprehensive assessment of CMIP6 model simulations of clouds and water vapor against the best-available observations is clearly needed to help gauge the fidelity of climate change projections and guide further model improvements.
Clouds and water vapor are an indispensable part of global energy and hydrological cycles and quantitative assessment of their representation is justified. However, model performance in simulating clouds and water vapor also exhibit interestingly direct relevance to constraining the estimates of equilibrium climate sensitivity (ECS), the equilibrium global-mean surface temperature increases under a doubling of CO 2 . For instance, Fasullo and Trenberth (2012) showed that climatological subtropical tropospheric relative humidity (RH) is highly correlated with ECS and drier and less cloudy subtropics is associated with higher ECS. Su et al. (2014) found that the similarity between the modeled and observed zonal-mean structures of cloud fraction (CF) and RH are positively correlated with ECS in the CMIP5 models with better performing models having higher ECS. Qu et al. (2018) elucidated that the CF and RH model performance metrics constructed in Su et al. (2014) are highly correlated with shortwave cloud feedbacks and thus ECS. Hence, model performance scores in reproducing climatological clouds and water vapor structures may serve as emergent constraints on ECS . A recent study (Schlund et al. 2020) evaluated a number of CMIP5-based emergent constraints on ECS in CMIP6 models and found their correlations with ECS changed from CMIP5 to CMIP6. This suggests that there are substantial structural differences between the CMIP5 and CMIP6 models. An update about the model performance scores in clouds and water vapor and their relations to ECS is warranted.
In this study, we provide a quantitative assessment of CMIP6 simulated water vapor mixing ratio (H 2 O) and cloud water content (CWC) structures using the best-available observations from NASA satellites. We follow the performance scoring system used in Jiang et al. (2012) and present both CMIP5 and CMIP6 results whenever possible. We focus on the climatological mean structures of H 2 O and CWC in terms of spatial mean, variance and correlation between the modeled and observed fields. Four vertical pressure levels are evaluated separately. A final integrated score is computed for each model by averaging all individual scores for each statistical property of H 2 O and CWC at the four pressure levels. The relationship between the final scores and models' ECS values is presented.

The CMIP6 Climate Models
Totally 28 coupled atmosphere-ocean models from the CMIP6 archive available at the time of the analysis are included in this study (Table 1). The vertical profiles of clouds (ice and liquid) and water vapor from the historical runs are used. The cloud outputs used for this study are cli for ice water mixing ratio and clw for liquid water mixing ratio, both vertically resolved (Taylor et al., 2012(Taylor et al., , 2018. As defined, cli (clw) includes both large-scale and convective cloud, which is calculated as the mass of cloud ice (liquid) in the grid cell divided by the mass of air (including the water in all phases) in the grid cell. Precipitating hydrometeors are considered in the calculation only if they affect the calculation of radiative transfer in model, but not included in the model output of cli and clw.
The total cloud water content (CWC) is the sum of clw and cli. The water vapor data in the model outputs are the specific humidity profiles hus, or the water vapor mixing ratio. The models' ECS values are taken from Meehl et al. (2020).

The Satellite Data
For satellite observations, we use NASA's A-Train satellites (Aqua, Aura, and CloudSat) that provide nearly simultaneous and co-located measurements of cloud and moisture profiles (L'Ecuyer & Jiang, 2010). The measurement parameters used in this study are similar to the previous study of Jiang et al. (2012), which include (a) water vapor (H 2 O) profiles from 1,000 hPa to 300 hPa from the Atmospheric Infrared Sounder  For CWC, CloudSat IWC and LWC from the 2B-CWC-RO (version R05) data set are used. The retrieved IWC and LWC include contributions from precipitating particles. We thus construct noPcp IWC/LWC at each grid box by removing the precipitating CWC profiles (rain, snow, drizzle and graupel) indicated by precipitation flags in the CloudSat 2C-PRECIP-COLUMN product (Haynes et al., 2009). The uncertainty of both IWC and LWC is about a factor of 2 due to the particle size assumptions used in the retrieval. Therefore, the range of observed CWC is within 0.5× to 2.0× of the retrieved values.
All the above-mentioned data sets were re-gridded onto a common 144 (longitude)  91 (latitude) grid and 40 pressure levels as done for the model outputs. The 40 pressure levels are from the surface to 24 hPa, with intervals of 50 hPa in the middle troposphere and finer in the boundary layer and near the tropopause. The original CMIP6 outputs are interpolated linearly with respect to log-pressure from their native vertical levels to standard pressure levels. We have carried out sensitivity studies to test different vertical interpolation methods and compare the results. We find the maximum error due to the vertical interpolation is <20%, mostly near the tropopause. The model results used for comparison with satellite data are multiyear averages from the CMIP6 "historical" runs, which were generated from coupled atmospheric-ocean experiments under historical forcings (Eyring et al. 2016). We use the 20-year averages from 1995 to 2014 to represent the present-day climatological mean. The multi-year mean satellite measurements used in evaluating the models are averages over the following time periods: 4.5 years (June 2006 to December 2010) for CloudSat; 13 years (August 2004 to September 2016) for AIRS and MLS. Although these time periods do not overlap with those of the model outputs, no significant trends in clouds and water vapor are found in the model outputs from 1995 to 2015. The A-Train satellites are sun-synchronous with equatorial crossings at ∼1:30 p.m. and ∼1:30 a.m., which leads to some sampling biases associated with diurnal variations. To reduce the effects of diurnal sampling biases, we focus on the tropical oceans (30ºN to 30ºS) when quantitatively scoring the model performances, as diurnal variations are much smaller over ocean than over land. In previous studies (e.g., Jiang et al. 2015), we estimated the magnitude of diurnal bias in earlier versions of CMIP5 models, as well as reanalysis data by comparing standard monthly mean IWCs from the GCMs with the monthly mean IWCs constructed by sampling 3-hourly model outputs onto A-Train satellite tracks. We found that the differences between two monthly means over the tropical ocean are generally <2% compared to up to 200% differences over land. We thus assume that diurnal variation introduces a bias of less than 2% in the tropical oceanic means, significantly smaller than the measurement uncertainties. In addition, as JIANG ET AL.
10.1029/2020EA001520 4 of 13 satellite cannot accurately retrieve clouds and moisture near the surface, we limit our analysis to altitudes of 900 hPa and above.

Results
We first examine the vertical structures of clouds and moisture averaged over the globe. The colored lines in Figure 1 show the multiyear mean vertical profiles of CWC (top-panels) and H 2 O (lower-panels) from the 28 CMIP6 models. For clouds (Figure 1, top-panel), there are large spreads among CMIP6 model CWCs, although the MMM CWCs are very close to the observed values at all altitudes and latitude regions, with overall differences between MMM and observation less than 20%. When compared with CMIP5 (Jiang et al. 2012), the CMIP6 CWC spreads are reduced in the upper troposphere (pressure <400 hPa), especially in the tropics. The exceptions are the two GISS e2 models GISS e2-1-g and GISS_e2-1-h-their CWCs between 500 and 200 hPa are about five times larger than the MMM and observation. Another difference between the CMIP6 and CMIP5 results is that the CWCs at pressure <500 hPa are reduced in all CMIP6 models, except for GISS e2-1-g and GISS_e2-1-h, resulting in smaller MMM in CMIP6 than in CMIP5. Near the tropopause (pressure <150 hPa), the CWCs from CIROS-access-cm2 and the two MOHC models, MOHC_hadgem3-gc31-II and MOHC_ukesm1-0, are about 2-4 times larger than the MMM and the observations.  and lower stratosphere. Model differences from the observations are small (<10%) in the middle and lower troposphere, but range from 1% to 100% near 100 hPa. The most notable result is that the CMIP6 models, on average, are drier than the CMIP5 models in the upper troposphere. In CMIP5, modeled H 2 O profiles are mostly biased high compared to the observations in the mid-and upper troposphere between 150 and 700 hPa in all latitude bands. This bias has been corrected in the CMIP6 models. In CMIP6, however, the H 2 O spread in the upper troposphere has increased, compared to CMIP5.
The similarity between the CMIP6 MMM and observations can be further illustrated by the maps of the CWC and H 2 O multi-year means ( Figure 2 Figure S1. It is clear that the differences between the models and the differences between each model and the observations are huge. However, as shown in Figure 2, the ensemble means from all models effectively average out individual models' biases, delivering a reasonable representation of the climatological clouds and water vapor distributions. To quantitively assess each model's performance against the observations, we focus on the tropical oceanic regions (30ºN to 30ºS) and use the scoring system as in Jiang et al. (2012). We focus on the models' JIANG ET AL. In the mid-(600 hPa) and lower-(900 hPa) troposphere, the CMIP6 model simulated H 2 O is all within the observational uncertainty of 2 5%. CWCs from CMIP6 have larger spread than the CMIP5. Relative to the observed CWCs, the CMIP6 CWC spatial means are 4%-250% at 600 hPa and 10%-250% at 900 hPa. However, the MMM CMCs at 600 and 900 hPa are very close to the observed and within the observational uncertainties. Figure 4 shows the Taylor diagrams that illustrate the spatial variances and correlations for the CWCs and H 2 Os at 100, 215, 600, and 900 hPa pressure levels. The definitions of the symbols are the same as those in Figure 3.
For CWC at 100 hPa, there are large differences among CMIP6 models simulated spatial variance and spatial correlation with the observation. The CMIP6 CWC all have weak correlations <0.4 and large normalized root-mean-squared-error (RMSE) greater than the observed spatial standard deviation. The CMIP6 MMM CWC has a correlation of 0.3 and an RMSE of 0.9. In comparison, the CMIP5 MMM has a correlation of 0.84 and normalized RMSE (relative to the observed standard deviation) of ∼2.
At 215 hPa, the CMIP6 CWCs yield spatial correlations with the observation around 0.8 to 0.9, with the highest being 0.93 and the lowest being 0.72. This result is similar to CMIP5, suggesting that the spatial locations of deep convection are well captured. However, all CMIP6 models except the two GISS e2 models have smaller standard variance and RMSE due to the generally smaller CWC values compared to the observation (see Figure 3). The CMIP6 modeled standard deviations and RMSE for CWC at both 600 hPa and 900 hPa are quite scattered with spatial correlations from 0.6 to 0.8 at 600 hPa and 0.3 to 0.9 at 900 hPa. Both GISS e2 models have negative spatial correlations at 900 hPa, which indicates a problem in simulating the locations of marine stratiform clouds (see Figure S1). JIANG ET AL.
10.1029/2020EA001520 7 of 13   (30ºN-30ºS) oceanic multi-year mean CWC (left-column) and H 2 O (right-column) simulations from the CMIP6 and CMIP5 models (definitions for the symbols are the same as those in Figure 3) as compared to the satellite observations (the black dot on the horizontal axis with the value of 1 = the standard deviation of the observed variable). The horizontal axis represents the fraction of the modeled spatial variation pattern that can be explained by the observed spatial pattern. The vertical axis represents the standard deviation of the modeled spatial pattern orthogonal to the observation, which is normalized by the observed standard deviation. The distance to the origin from each point in the Taylor Diagram corresponds to the spatial standard deviation of modeled variable and the distance of each point to the observed point (1, 0) on the x axis is the RMS of the difference between the modeled and observed quantities, as scaled by the green arc-lines. The correlation between the modeled and observed quantities is marked by the numbers on the black arc. is about 0.5-1.5 times of the observed. At 600 and 900 hPa, the simulated and observed spatial variances are very comparable.
We now quantitatively score the model performances. The model performance is ranked following a scoring system in Jiang et al. (2012), in which the distances between the modeled and observed spatial mean, variance and correlation are scaled with respect to the observational data uncertainty to values between 0 and 1, with 1 being the perfect skill and 0 being no skill. The model performance scores are given at four vertical pressure levels, representing the atmospheric boundary layer (900 hPa), mid-troposphere (600 hPa), upper troposphere (215 hPa), and the tropopause layer (100 hPa) for both CWC and H 2 O. Figure 5 summarizes the 28 CMIP6 models' performances. We find that most CMIP6 models perform better at lower-to mid-tropospheric levels (900 and 600 hPa) than in the upper troposphere (215 and 100 hPa), especially for water vapor simulations. However, six models have larger errors in simulating the locations of low clouds, resulting in low spatial correlations for CWC at 900 hPa-BCC csm2-mr and esm1, CAMS csm1-0, GISS e2-1-g and e2-1-h, and MIROC es2l. We note that the new GISS-e3trn705a model has improved performance scores at all pressure levels for both water vapor and clouds. Most models do not simulate the observed spatial mean and variance of CWC or H 2 O, or both, at 215 and 100 hPa very well. However, JIANG ET AL.
10.1029/2020EA001520 9 of 13 Figure 5. Color-coded summary of CMIP6 model performance scores for simulating CWC and H 2 O at 100, 215, 600, and 900 hPa pressure levels, based on the scoring method of Jiang et al. (2012). For each model, M = spatial mean performance scores, V = spatial variance performance scores, and C = spatial correlation performance scores. The color-bar: 1 means perfect skill and 0 means no skill. CWC, cloud water content. they have better scores for spatial correlation at 215 hPa for both CWC and H 2 O, and at 100 hPa for H 2 O, indicating that models generally capture the climatological locations of deep convection but have difficulties in reproducing the magnitude of convective influence on the upper troposphere. The ensemble model means (the last row) exhibit relatively superior performance at all pressure levels, except for the 215 hPa spatial mean and 100 hPa spatial variance of H 2 O, which are both below 0.2. The low scores for H 2 O at these levels reflect the fact that most models have high biases at 215 hPa and low biases at 100 hPa in H 2 O, compared to the Aura MLS observations. To compare the model performance scores between CMIP5 and CMIP6, we show in Figure 6 the histograms of CMIP6 and CMIP5 models' performance scores for spatial mean, variance and correlation at each pressure level from 100 hPa to 900 hPa. Five bins are used to compute the histograms: 0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8, and 0.8-1.0.
Overall, both CWC and H 2 O show improvements in CMIP6, compared to CMIP5. Near the tropopause at 100 hPa, better spatial mean and variance in CWC and better spatial correlation in H 2 O are shown. Despite the fact that the improvements in spatial mean for CWC and spatial mean and variance for H 2 O are encouraging, the cloud and moisture amounts in deep convective regions are still poorly captured in CMIP6. At 215 hPa, there is improvement in CMIP6 H 2 O spatial variance and correlation relative to CMIP5, but the spatial mean still has low scores due to large moist biases in the upper troposphere (also see Figure 2). The 215 hPa CWC shows notable improvement in correlation but most models have a low CWC bias, resulting in poor scores in spatial mean and variance. Overall, CMIP6 models show only slight improvement at 215 hPa.
There are clear improvements from CMIP5 to CMIP6 in the mid-and lower-troposphere (600 and 900 hPa) in both CWC and H 2 O, especially at 600 hPa where the spatial mean, variance and correlation performance scores for both CWC and H 2 O are improved in CMIP6. Similarly, at 900 hPa, overall improvements in both CWC and H 2 O are shown, although the CMIP6 CWC spatial mean scores show little change from CMIP5 due to the large spread in simulated CWC (Figure 2).
Not surprisingly, the overall score for the MMM turns out to be the best (scored 0.8) among all the models. It is comforting as the use of multi-model ensembles in climate projections is a common practice and the MMM is generally perceived as closer to the "truth" than any single model alone, as found in previous model evaluation studies (e.g., Gleckler et al. 2008).
Lastly, we examine the relationship between the models' performance scores in simulating CWC and H 2 O and their ECS values for both CMIP5 and CMIP6. We find that the model performance scores are correlated with ECS with correlation coefficient of 0.27 for CMIP6 and 0.75 for CMIP5 (Figure 8). The positive correlations are primarily contributed by the correlations of the CWC performance scores at all levels with ECS (figure not shown). The models that more closely match the observed cloud and water vapor structures tend to have higher ECS than the models that deviate more from the observations, especially for CMIP5, consistent with many previous studies of the emergent constraints on ECS in CMIP5 (Fasullo & Trenberth, 2012;Sherwood et al., 2014;Su et al., 2014;Tian 2015;Zhai et al., 2015).
The much weaker correlation between the model performance scores and ECS in CMIP6 than in CMIP5 is consistent with the results in Schlund et al. (2020) in that many emergent constraints derived from  CMIP5 models do not correlate with ECS significantly in CMIP6. The reason for the weaker correlation is not clear. One explanation could be that the overall improvement in the model performances (and thus a smaller range in the scores) would tend to explain less variance of the ECS. On the other hand, the increased complexity in CMIP6 model systems may dilute the contribution of certain moist processes related to the water vapor and cloud climatology to the spread of ECS as many more processes are at play. The physical mechanisms responsible for the different correlations are the subject of on-going research.

Summary and Conclusions
Using satellite observations (CloudSat, AIRS, and MLS), we assess the simulated multi-year mean cloud and water vapor profiles from 28 CMIP6 models historical simulations. We use the grading scheme in Jiang et al. (2012) to quantitatively evaluate model performance in simulating clouds and water vapor at four pressure levels (from the boundary layer to the tropopause) over the tropical (30°N-30°S) oceans in terms of spatial mean, correlation and standard deviation. Compared to CMIP5, we find that both cloud water content (CWC) and water vapor volume mixing ratio (H 2 O) simulations have improved in the lower-(900 hPa) and mid-troposphere (600 hPa) from CMIP5 to CMIP6, where the performance scores are significantly higher than those for the upper troposphere and near the tropopause. CMIP6 shows overall improvement in CWC and H 2 O at 100 hPa. However, little improvement is found at 215 hPa. A prevailing moist bias is found at 215 hPa for H 2 O over the tropical deep convective regions. The implication of this moist bias for water vapor feedback and cloud radiative effects merits further study.
In addition, we find that the models' equilibrium climate sensitivity (ECS) is positively correlated with the integrated CWC and H 2 O performance scores for both CMIP6 and CMIP5 models, but the correlation becomes much weaker in CMIP6. Quantitative evaluation of model simulations using best-available observations has direct relevance to constraining future warming magnitude. Research efforts are underway to disentangle the exact physical processes that underpin the apparent correlation between ECS and model performance scores and why the correlation weakens substantially in CMIP6.

Data Availability Statement
All the climate model data used for this research can be downloaded from the PCMDI website at https:// pcmdi.llnl.gov/CMIP6/. The satellite observational datasets can be downloaded from https://airs.jpl.nasa. gov/for AIRS data, http://www.cloudsat.cira.colostate.edu/for CloudSat data and https://mls.jpl.nasa.gov/ for MLS data. For additional questions regarding the data sharing, please contact the corresponding author at Jonathan.H.Jiang@jpl.nasa.gov.