Land climate is important for human population since it affects inhabited areas. Here we evaluate the realism of simulated evapotranspiration (ET), precipitation, and temperature in the CMIP5 multimodel ensemble on continental areas. For ET, a newly compiled synthesis data set prepared within the Global Energy and Water Cycle Experiment-sponsored LandFlux-EVAL project is used. The results reveal systematic ET biases in the Coupled Model Intercomparison Project Phase 5 (CMIP5) simulations, with an overestimation in most regions, especially in Europe, Africa, China, Australia, Western North America, and part of the Amazon region. The global average overestimation amounts to 0.17 mm/d. This bias is more pronounced than in the previous CMIP3 ensemble (overestimation of 0.09 mm/d). Consistent with the ET overestimation, precipitation is also overestimated relative to existing reference data sets. We suggest that the identified biases in ET can explain respective systematic biases in temperature in many of the considered regions. The biases additionally display a seasonal dependence and are generally of opposite sign (ET underestimation and temperature overestimation) in boreal summer (June–August).
 The multimodel climate experiments conducted as part of the different phases of the Coupled Model Intercomparison Project (CMIP) are essential contributions for the evaluation of past and future climate change in recent and on-going reports of the Intergovernmental Panel on Climate Change (IPCC) [e.g., Solomon et al., 2007]. It is thus of critical importance to evaluate the reliability of these simulations.
 While there exist good-quality observations for several climate variables, large uncertainties remain in the evaluation of land-surface fluxes [e.g., Seneviratne et al., 2010; Mueller et al., 2011; Jimenez et al., 2011; Sheffield et al., 2012; Seneviratne, 2012]. In particular, observations of evapotranspiration (ET) have traditionally been scarce, and global-scale data sets can only be derived indirectly. Nonetheless, there have been substantial advances in this research area in recent years, and new data sets are now available, each combining observational information and numerical algorithms in alternative ways [Mueller et al., 2011; Jimenez et al., 2011, see www.iac.ethz.ch/url/research/LandFlux-EVAL]. As part of the Global Energy and Water Exchanges Project (GEWEX)-sponsored LandFlux activity, the LandFlux-EVAL project recently compiled several synthesis data sets based on existing global ET data sets [Mueller et al., 2013], providing for the first time the “best-guess” estimate of this quantity.
 In the present article, we use one of these recently compiled data sets to evaluate the realism of land ET fields in simulations of the fifth phase of the CMIP project (CMIP5), which serve as basis for the fifth assessment report of the IPCC. These analyses are complemented with comparisons to the earlier CMIP3 multimodel ensemble (which served as basis for the fourth assessment report of the IPCC). Several recent studies have shown that soil moisture and ET have a strong influence on temperature in many regions of the world [Seneviratne et al., 2006; Koster et al., 2006; Mueller and Seneviratne, 2012], since latent heat flux associated with ET uses a substantial fraction of the available net radiation on land. Whenever ET is reduced, the corresponding energy is instead used by sensible heat flux, leading to higher air temperatures. Due to this strong relation between evaporation and temperature, we validate also temperature fields over land and assess where their biases can be related to biases in ET. We also include an evaluation of precipitation in the CMIP simulations in order to investigate the causes of the ET biases.
2 Data and Methods
2.1 Observational Estimates
 As reference ET data, we use here the recently compiled LandFlux-EVAL synthesis product based on all data set categories and computed for the years 1989–1995 [see Mueller et al., 2013]. A total of 40 observation-based data sets, land-surface model output, and atmospheric reanalyses are included in this synthesis product. The statistics used are the median and interquartile range from the long-term mean or monthly values. Note that the median of most included data sets agreed well with observational estimates of precipitation minus runoff for large river basins [Mueller et al., 2011].
 We further use combined precipitation data from the Global Precipitation Climatology Project (GPCP) [Adler et al., 2003] and precipitation from the Climate Research Unit (CRU) [Mitchell and Jones, 2005] Version 3.1 as a reference for the analysis of the CMIP simulations. For temperature, we use the Willmott and Matsuura data set [Willmott and Robeson, 1995, University of Delaware] based on Global Historical Climatology Network and Legates and Willmott data [Legates and Willmott, 1990], as well as data from the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis ERA-Interim [Dee et al., 2011].
2.2 CMIP3 and CMIP5 Simulations
 Simulations of 14 models contributing to the CMIP5 [Taylor et al., 2012] and 11 models contributing to the CMIP3 multimodel experiments have been used (see Table 1 for details). We evaluate the variable “surface latent heat flux” (i. e., ET, including both evaporation and sublimation), near-surface air temperature, and precipitation from the historical CMIP5 and the twentieth century CMIP3 simulations, respectively. All data have been linearly interpolated to a 1×1° resolution, corresponding to the resolution of the LandFlux-EVAL synthesis ET product.
Table 1. List of Included CMIP Models, Their Original Resolution, and the Respective Root-Mean-Square Error (From Monthly Values) of ET, Precipitation, and Temperature (Excluding Greenland and Antarctica for ET) During 1989–1995
Mean of CMIP5
Mean of CMIP3
3.1 Biases in ET and Precipitation
 Figure 1 (top row) displays the difference in ET between the CMIP simulations and the LandFlux-EVAL synthesis product for annual means (Figure 1, left column: CMIP3 and Figure 1, middle column: CMIP5) and June to August means (JJA; Figure 1, right column: CMIP5, see Figure S1 in the supporting information for relative differences). Regions where at least 66% of the models (i. e., 10 out of 14 for CMIP5 and 8 out of 11 for CMIP3) agree on the sign of the difference are overlayed with stripes. In the annual average, the CMIP simulations display an overall systematic overestimation of ET in most regions, in particular in Europe, Africa, China, Australia, Western North America, and part of the Amazon region. This bias is slightly more pronounced in CMIP5 than in the previous CMIP3 ensemble (Figure 1), (see also Mueller et al. ). India is one of the few regions where the CMIP simulations are too dry. The CMIP5 simulations display consistent (and thus shared) biases in more regions than the CMIP3 simulations.
 A more in-depth evaluation of the CMIP5 and CMIP3 ET fields (annual means) for 25 regions based on the definition of the IPCC Special Report on Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation (SREX) [see Seneviratne et al., 2012] is provided in Figure 2. This regional analysis compares the respective spreads of the CMIP3 and CMIP5 ensemble (box plots) and the median (full lines) and interquartile range (dotted lines) of the LandFlux-EVAL synthesis product (red lines for CMIP5 underestimation, blue lines for overestimation). The overall systematic overestimation of ET is confirmed for most considered regions. CMIP simulations (medians) are consistent with the LandFlux-EVAL estimates (interquartile ranges) in 16 (CMIP3) and 12 (CMIP5) out of the 25 regions. The intermodel spread is on average similar for the CMIP3 and the CMIP5 ensemble, indicating no systematic improvement in the newer multimodel ensemble in terms of ET. This fact is confirmed by the average root-mean-square errors for the respective ensembles (see Table 1).
 To assess the possible causes and implications of this systematic bias, we also compare the CMIP5 and CMIP3 simulations with observational data sets of precipitation and temperature in Figure 1 (middle and bottom rows). This analysis reveals a systematic overestimation of precipitation in the CMIP simulations in the annual mean values (Figure 1, left and middle columns, see also Figures S2 and S3) in most of the regions displaying a systematic overestimation of ET, thus highlighting precipitation biases as likely driver for the ET biases. Indeed, most of these regions are characterized by a soil moisture-limited ET regime (Figure S4), and thus additional precipitation is generally used by ET in these regions [e.g., Teuling et al., 2009; Seneviratne et al., 2010]. Nonetheless, a few tropical regions display an underestimation of precipitation despite an overestimation of ET, which is realistic in regions where ET is not limited by water availability, but rather by the incoming net radiation [Nemani et al., 2003]. It should also be noted that a few regions display systematic dry biases rather than systematic wet biases (i. e., underestimation of ET and precipitation), such as India, the Mississippi River Basin, and Argentina.
 Hence, though there are some exceptions in a few regions, the identified overestimation of ET over a large fraction of land areas is found to be driven by a corresponding overestimation of precipitation in these regions. In a global average, the overestimation of ET amounts to 0.17 mm/d (CMIP5) and 0.09 mm/d (CMIP3). The biases for precipitation are similar (0.16 mm/d for CMIP5 and 0.05 mm/d for CMIP3) when GPCP is employed as a reference data set. However, precipitation in GPCP is relatively high compared to other reference data sets. For instance, the precipitation bias of the CMIP5 and CMIP3 ensemble relative to the Climate Research Unit (CRU) precipitation data set is 0.27 mm/d and 0.22 mm/d. Overall, higher overestimation of precipitation than ET appears consistent with the assumption that the former drives the latter (although the presence of feedback cannot be excluded).
3.2 Possible Relation to Temperature Biases
 The evaluation of temperature in the CMIP5 simulations (Figure 1, bottom row) suggests a tendency for a systematic cold bias in the multimodel ensemble (0.4 and 1.3 K colder than the reference on a global average for CMIP5 and CMIP3, respectively). The temperature biases are even more pronounced when compared to a different reference data set (ERA-Interim, not shown). In regions with water-limited regimes (see Figure S4), the underestimation of temperature is likely related to the identified systematic ET overestimation, linked with enhanced evaporative cooling in the model simulations [e.g., Seneviratne et al., 2010]. For similar reasons, it is likely that the temperature overestimation in South America (Figure 1 bottom left and middle) is related to the ET underestimation there (Figure 1 top left and middle, i. e., underestimation of evaporative cooling). One of the few regions where the temperature and ET biases are not of opposite sign is India, where previous studies disagreed on the existence of strong soil moisture-temperature coupling [Koster et al., 2006; Mueller and Seneviratne, 2012].
 Analyses of ET and temperature biases for Northern Hemispheric summer (JJA) are provided in Figure 1 (right column). Interestingly, these analyses reveal biases of opposite sign compared to those on annual time scale in some regions (ET underestimation and temperature overestimation in most regions during JJA) and more pronounced biases in others. Results for the boreal winter are provided in Figure S5. They are generally more consistent with the annual biases but also display a tendency for low ET and high temperatures in Southern Hemisphere midlatitudes.
 As has been shown in recent studies [Boberg and Christensen, 2012; Christensen and Boberg, 2012], CMIP5 simulations display a tendency to overestimate warm temperatures, which is consistent with our seasonal results despite the cold and wet biases identified on annual time scale. Quantile-quantile plots of ET and temperature from CMIP5 versus reference data sets (Figures 3 and S6) show an overestimation of ET in the wettest months as well as an underestimation (overestimation) of temperature in cold (warm) months. A tendency to underestimate ET in dry months is present in a few regions only, which may be due to the bounded lower limit of ET (zero flux). We note that most CMIP5 models display these patterns, but that they are more pronounced in some of the models. These analyses confirm the presence of systematic biases in both the ET and temperature distributions in most regions.
4 Summary and Conclusions
 This article provides an evaluation of ET in the recent CMIP5 and CMIP3 multimodel ensembles. These first analyses reveal systematic land hydrological and climate biases in the CMIP simulations, characterized by an overestimation of ET in the yearly average in many land regions. These biases can be generally related to a corresponding overestimation of precipitation in most of the identified regions, which is driving the ET bias. Concomitantly, the excess evaporative cooling leads to a cold bias in most of the identified regions on the annual time scale. These biases are generally found in regions with soil moisture-limited ET regimes, with a few exceptions, and tend to change sign in the warm season in midlatitude regions.
 The possibly underlying causes for the identified biases as well as consequences for CMIP5 projections would need to be investigated in follow-up studies. Direct interaction between precipitation (clouds) and temperature may also be possible. Furthermore, the direction of causality cannot always be fully established, and it is possible that in some regions the diagnosed biases are unrelated. For instance, in energy-limited regions, one would expect the opposite relationship, i. e., temperature biases would cause ET biases (of same sign rather than opposite sign). However, these results highlight the presence of remaining systematic biases in state-of-art climate models, which are similar in magnitude for the CMIP5 and the CMIP3 ensembles (and even larger for the former). Previous studies pointed out the fact that extensive model developments were undertaken between the two series of experiments [e.g., Knutti and Sedlacek, 2013]. Our results suggest that these new model developments and ensembles did not necessarily lead to a better performance for key variables such as ET, precipitation, and temperature over land.
 This work was funded by the European Commission's 7th Framework Programme, under grant agreement 282672, EMBRACE project. We also thank the WCRP GEWEX project for support of the LandFlux initiative. The LandFlux-EVAL project (http://www.iac.ethz.ch/url/research/LandFlux-EVAL) was additionally supported by the Integrated Land Ecosystem-Atmosphere Process Study (ILEAPS). We acknowledge the WCRP's Working Group on Coupled Modelling and the climate modeling groups for producing and making available their model output. We also thank Jan Sedlacek, Urs Beyerle, and Thierry Corti (IAC ETH and C2SM) for support with downloading, storage, and regridding of CMIP5 data. GPCP combined precipitation data were developed by the NASA/Goddard Space Flight Center's Laboratory for Atmospheres as a contribution to the GEWEX Global Precipitation Climatology Project. We acknowledge the ECMWF for the dissemination of ERA-Interim data. CRU data were obtained from the University of East Anglia Climate Research Unit, British Atmospheric Data Centre, 2008, available from http://badc.nerc.ac.uk/data/cru. Willmott air temperature (University of Delaware) data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.esrl.noaa.gov/psd/.
 The Editor thanks Ray Anderson and an anonymous reviewer for their assistance in evaluating this paper.