• Open Access

Evaluation of vegetation cover and land-surface albedo in MPI-ESM CMIP5 simulations

Authors


Corresponding author: V. Brovkin, Max Planck Institute for Meteorology, Bundesstr. 53, 20146 Hamburg, Germany. (victor.brovkin@zmaw.de)

Abstract

[1] In recent generation Earth system models (ESMs), land-surface grid cells are represented as tiles covered by different plant functional types such as trees or grasses. Here, we present an evaluation of the vegetation-cover module of the ESM developed at the Max Planck Institute for Meteorology in Hamburg, Germany (MPI-ESM) for present-day conditions. The vegetation continuous fields (VCF) product that is based on satellite observations in 2001 is used to evaluate the fractional distributions of woody vegetation cover and bare ground. The model performance is quantified using two metrics: a square of the Pearson correlation coefficient, r2, and the root-mean-square error (RMSE). On a global scale, r2 and RMSE of modeled tree cover are equal to 0.61 and 0.19, respectively, which we consider as satisfactory values. The model simulates tree cover and bare ground with r2 higher for the Northern Hemisphere (0.66) than for the Southern Hemisphere (0.48–0.50). We complement this analysis with an evaluation of the simulated land-surface albedo using the difference in net surface radiation. On a global scale, the correlation between modeled and observed albedos is high during all seasons, whereas the main disagreement occurs in spring in the high northern latitudes. This discrepancy can be attributed to a high sensitivity of the land-surface albedo to the simulated snow cover and snow-masking effect of trees. By contrast, the tropics are characterized by very high correlation and relatively low RMSE (5.4–6.5 W/m2) during all seasons. The presented approach could be applied for an evaluation of vegetation cover and land-surface albedo simulated by different ESMs.

1. Introduction

[2] Terrestrial ecosystems substantially affect near-surface heat and moisture fluxes as well as the exchange of greenhouse gases between the land surface and the atmosphere. Land-surface models (LSMs) are considered nowadays as a standard part of atmospheric models. LSMs have been constantly developed and updated from very simplified concepts used in the 1980s toward current generation models that include more ecological processes such as the effect of climate and CO2 changes on land vegetation composition [Arora, 2002; Sellers et al., 1997]. Models that include the transient response of vegetation cover to climate changes (dynamic global vegetation models (DGVMs)) were first developed as stand-alone models without feedback to climate [Cramer et al., 2001; Sitch et al., 2003]. DGVM approaches were quickly adopted into LSMs to calculate interactive vegetation cover according to climate change simulated by atmospheric models. Changing fractions of woody and herbaceous vegetation cover, as well as of bare land, affect land-surface albedo and evapotranspiration, which, in turn, modify near-ground temperature and precipitation. These land-atmosphere interactions are pronounced in several large-scale “hot spots” [Claussen et al., 2004]. These areas include high northern latitudes where the snow-masking effect of forests leads to additional warming in comparison with herbaceous cover; tropical forests, where extensive transpiration enhances atmosphere moisture recycling; and subtropical deserts, where the presence of vegetation cover shifts climate toward moister conditions. Some local-scale feedbacks such as the formation of vegetation patterns in semideserts are not yet represented in the global models but proposed for inclusion [Rietkerk et al., 2011].

[3] The vegetation-cover composition affects not only land biophysics but also land-atmosphere exchange of CO2 and other greenhouse gases such as CH4 and N2O. In particular, changes in tree cover strongly affect the amount of carbon stored in biomass and soil. This has an effect on the atmospheric CO2 concentration and operates as a biogeochemical feedback between vegetation dynamics and climate [Bathiany et al., 2010; Port et al., 2012]. Feedbacks between forests and atmospheric chemistry through emissions of volatile organic compounds such as isoprene are an emerging research field as well [e.g., Arneth et al., 2010].

[4] Through biophysical and biogeochemical effects, changes in vegetation cover affect the simulated climate in future climate projections, such as performed in the framework of the Climate Model Intercomparison Project 5 (CMIP5). Several CMIP5 Earth system models (ESMs) include dynamic vegetation-cover models [Collins et al., 2011; Watanabe et al., 2011]. These models simulate fractions of plant functional types (PFTs) in response to the climate simulated by the atmospheric models. Since ESMs inevitably simulate temperature and precipitation fields that differ from observations, vegetation patterns in ESMs are different from the ones simulated by vegetation models driven by observed climate data (stand-alone models). The latter also have biases, as the parameterizations of vegetation dynamics in these models are far from being perfect.

[5] These biases need to be quantified. Quantitative assessment of LSMs, as a way to evaluate their performance, is part of the long-term modeling strategy. Several projects, such as Project for Intercomparison of Landsurface Parameterization Schemes, are dedicated to the intercomparison of LSMs [Schlosser et al., 2000; Slater et al., 2001]. A quantitative assessment of LSMs has been done recently in the form of benchmarking that involves scoring metrics aiming to express the simulation quality of numerous aspects of the vegetation cover in terms of one scalar value. Several scoring metrics have been suggested for physical characteristics of the LSMs [Abramowitz et al., 2008], the terrestrial carbon cycle and hydrology [Blyth et al., 2011; Randerson et al., 2009], and atmospheric CO2 observations [Cadule et al., 2010].

[6] The aim of this study is to evaluate the vegetation cover and land-surface albedo simulated by the ESM developed at the Max Planck Institute for Meteorology in Hamburg, Germany (MPI-ESM). As the outcome of a model evaluation is always specific to the model version and resolution, we focus here on the CMIP5 model version at spatial resolution T63 (1.9° × 1.9°), which was used for future climate and vegetation-cover projections.

2. Methods

2.1. Atmosphere-Ocean Model

[7] We used the MPI-ESM in low resolution. It includes the atmospheric model ECHAM6 in T63 resolution with 47 vertical levels described by Stevens et al. [2013], the oceanic model MPI-OM at approximately 1.6° resolution with 40 vertical layers [Jungclaus et al., 2006], and the LSM JSBACH [Raddatz et al., 2007] sharing the horizontal grid of the atmospheric model. All MPI-ESM modules interact directly without flux adjustments. A detailed description of the model and an evaluation of the model performance regarding temperature and precipitation fields is given by M. A. Giorgetta et al. (Climate change from 1850 to 2100 in MPI-ESM simulations for the Coupled Model Intercomparison Project 5, submitted to Journal of Advances in Modeling Earth Systems, 2012).

2.2. Land-Surface Model

[8] The LSM of MPI-ESM, JSBACH, simulates fluxes of energy, water, momentum, and CO2 between land and atmosphere. The modeling concept is based on a tiled (fractional) structure of the land surface. Each land grid cell is divided into tiles covered with eight natural PFTs (i.e., different types of trees, shrubs, and grasses) and four anthropogenic PFTs (crop and pasture types; see Table 1). Two types of bare surface are taken into account, seasonally bare soil and permanently bare ground, i.e., desert.

Table 1. A Correspondence Between JSBACH and VCF Vegetation Classes
JSBACH PFTVCF Type
Tropical broadleaf evergreen treesTree
Tropical broadleaf deciduous trees 
Extratropical evergreen tree 
Extratropical deciduous tree 
Deciduous shrubs 
C3 grassGrass
C4 grass 
C3 crop 
C4 crop 
C3 pasture 
C4 pasture 

[9] The calculation of land-surface albedo is done separately for visible and near-infrared solar radiation. It considers the fractional cover of each PFT, the desert fraction, the leaf area index of each PFT, and the snow on soil as well as on the vegetation canopy. The effect that forests mask underlying snow is accounted for. A detailed description of the albedo scheme is presented by Otto et al. [2011], except for the albedo of snow-covered surfaces, which is described by Dickinson et al. [1993].

[10] The vegetation model in JSBACH includes an efficient module for vegetation dynamics [Brovkin et al., 2009]. It is based on the assumption that competition between different PFTs is determined by their relative competitiveness expressed in annual net primary productivity, as well as natural and disturbance-driven mortality (fire and wind disturbance).

[11] Anthropogenic land use is predetermined. In particular, the extent of pasture and cropland is prescribed. Transitions from natural to anthropogenic land cover and vice versa follow the New Hampshire harmonized land use protocol by Hurtt et al. [2011]. The dynamic vegetation model of JSBACH only affects the natural vegetation distribution and defines the type of pasture (C3 or C4). The implementation of the harmonization protocol into MPI-ESM is described by C. Reick et al. (The representation of natural and anthropogenic land cover change in MPI-ESM, submitted to Journal of Advances in Modeling Earth Systems, 2012).

2.3. Observations of Vegetation Cover Used for the Model Evaluation

[12] We chose the vegetation continuous fields (VCF) data set [Hansen et al., 2003, 2007] derived from the moderate resolution imaging spectroradiometer (MODIS) sensor data for comparison with the model results. Two reasons for selecting the VCF product were decisive. First, the product has a global coverage on 1 km × 1 km for a relatively recent time period (the inputs date from 31 October 2000 to 9 December 2001). Second, the VCF data set describes the land surface in fractions of vegetation-cover types: woody vegetation, herbaceous vegetation, and bare ground. This is very similar to the DGVM approach describing the vegetation cover in fractions of PFTs. The only difference is that DGVMs have usually more than two PFTs per grid cells. For comparison purposes, the model output is lumped into three broad vegetation classes: woody PFT (trees and shrubs), herbaceous PFT (grasses and crops), and a bare (nonvegetated) ground (see Table 1).

2.4. Observations of Albedo Used for the Model Evaluation

[13] MODIS surface albedo (MCD43C3, ver5) observations of 10 years (2001–2010) [Schaaf et al., 2002] are used for comparison with JSBACH results. The MODIS surface albedo has an absolute error in the order of 0.02, whereas it is slightly higher over snow-covered areas. A brief summary of different validation studies for the MODIS albedo product can be found in Liu et al. [2009]. The albedo observations are filtered in accordance with the product quality flags to ensure that only best quality observations are considered in the reference data set. The data are then reprojected to the Gaussian T63 grid of the LSM. The monthly mean surface albedo and its variance are calculated from the 10 year time series for each grid cell. Ten years of observations is a relatively short period in estimating the climatological mean of the surface albedo, but the MODIS observations started only in 2001. On the other hand, changes in vegetation cover and characteristics might already occur at subdecadal timescales [e.g., de Jong et al., 2012; Fensholt and Proud, 2012]. Such changes might significantly change the surface albedo on decadal timescales and therefore affect climate [Govaerts and Lattanzio, 2008; Knorr et al., 2001; Loew and Govaerts, 2010; Myhre et al., 2005]. We analyzed the effect of sampling the MODIS observations as well as the MPI-ESM simulations on shorter timescales (5 years) and found very similar results to those based on the 10 year time series. For the simulated land-surface albedo, we were not limited by the length of the record and used the climatological mean of a 30 year period (1971–2000).

3. Evaluation of the Vegetation Cover

[14] Here, we evaluate vegetation cover simulated by the fully interactive MPI-ESM in the ensemble mean of three CMIP5 historical simulations from 1850 to 2005. For evaluation, we use the VCF product that is based on the MODIS satellite data for the year 2001 [Hansen et al., 2007]. Consequently, we selected the vegetation cover simulated for the year 2001 from the historical CMIP5 simulation (1860–2005). The comparison of matching years is especially important, as anthropogenic land use was prescribed for the historical period, which significantly affects vegetation cover in many regions. An interannual variability in tree cover and bare ground fraction due to interannual variability of climate is relatively small because of slow vegetation dynamics.

3.1. Evaluation Metrics

[15] Agreement between simulated and observed vegetation cover could be evaluated using several metrics. To quantify the spatial correlation between vegetation patterns from the model and from observations, a square of the Pearson correlation coefficient (r2) is used. In a linear approximation, this metric quantifies a fraction of variation explained by the model:

display math

where Mi and Oi are the vegetation fractions simulated by the model or observed in the grid cell i, respectively; wi is a normalized weight (area) of grid cell i inline image, and N is the total number of grid cells under evaluation.

[16] The amplitude of the difference between two data sets is measured using the root-mean-square error (RMSE):

display math(1)

[17] The two metrics, r2 and RMSE, are calculated separately for each vegetation class as well as for different regions.

[18] As the sum of all cover fractions (trees, grass, and bare ground, or nonvegetated fraction) is equal to one, only two out of the three vegetation classes are independent. For the evaluation, we chose the two classes: (i) fraction of trees and (ii) fraction of bare ground. The rationale for considering bare ground instead of grass cover is that bare ground plays an important role in the albedo-based feedback between vegetation cover and climate.

3.2. Global Evaluation

[19] The simulated tree cover fraction is in a good overall agreement with the observations (Figure 1). The main patterns of boreal forest match the data. A noticeable disagreement is in the northern polar regions (above 60°N), where the simulated tree fraction is too high in West and East Siberia as well as in the northwest territories of Canada (Figure 2, top). This mismatch could be explained by two reasons. First, the vegetation model parameterizations, in particular, the simple disturbance module and the absence of permafrost, cause too much tree cover in high latitudes. Second, the simulated climate is too warm in the northwest territories of Canada (except British Columbia, where the modeled annual mean surface-air temperature is by 2°C–4°C lower than in the observations). The bare ground fraction is overestimated in the regions above 60°N likely due to low productivity and reduced canopy cover simulated by the model.

Figure 1.

Vegetation classes simulated by the MPI-ESM for the (top) year 2001 and (bottom) VCF [Hansen et al., 2007] upscaled to the model resolution. (left) tree cover and (right) bare ground as grid cell fractions.

Figure 2.

Differences between the model and observations in (top) tree and (bottom) bare ground fractions.

[20] Tropical forests are simulated reasonably by the model as well. The main deficiencies are too low tree cover fractions in the Amazon region and too extended tree cover in northeastern Brazil, west Sahel, tropical and southern Africa, as well as in Australia (Figure 2). The bare ground is underestimated in subtropical deserts, especially in central Asia (CEAS). A comparison of the zonal vegetation-cover distribution shows that the model slightly overestimates tree cover in all latitudes and underestimates bare ground in subtropical and temperate regions (Figure 3), but the general agreement between model and data is remarkable. The overestimation of the tree cover is likely due to the simplified parameterization of tree-grass competition in the model, and missing processes, such as permafrost.

Figure 3.

Comparison of zonal averages between MPI-ESM results for the year 2001 (blue) and VCF (red). (top) Tree fraction and (right) bare ground fraction.

[21] The quality of the model simulations is quantified in Table 2. On a global scale, r2 and RMSE of tree cover are equal to 0.61 and 0.19, respectively, which we consider as satisfactory values. The model simulates tree cover and bare ground with r2 higher for the Northern Hemisphere (0.66) than for the Southern Hemisphere (0.48–0.50). Grass cover, as an intermediate class between tree cover and bare ground, is reproduced less reliably in both Northern Hemisphere and Southern Hemisphere (r2 of 0.44 and 0.17, respectively).

Table 2. Evaluation of Simulated Vegetation Cover in Terms of r2 and RMSE
RegionVegetation Classr2RMSEN (Number of Grid Cells)
GlobalTree0.610.194354
Grass0.390.244354
Bare ground0.650.224169
Northern HemisphereTree0.660.163514
Grass0.440.233514
Bare ground0.660.213413
Southern HemisphereTree0.480.25840
Grass0.170.24840
Bare ground0.480.21756
TropicsTree0.630.211556
Grass0.570.201556
Bare ground0.860.161429
ExtratropicsTree0.570.162798
Grass0.240.262798
Bare ground0.430.272740

[22] Furthermore, we calculated an overall score including tree cover and bare ground for the tropics and extratropics (Table 3). This is done by taking the mean of the scores for both vegetation types in both tropics and extratropics. For scoring, we use the 1-RMSE metric, since this metric is increasing with less error. We applied equal weights of 100 for both metrics and received scores of 62 and 80 for r2 and 1-RMSE, respectively. An overall score of 71 out of 100 is calculated for MPI-ESM (Table 3). This relative score could be used for comparison with different models or model versions, regarding the model performance for these two vegetation types.

Table 3. Evaluation of Vegetation Cover on the Global Scale
RegionVegetation Classr2RMSE
TropicsTree0.630.21
Bare ground0.860.16
ExtratropicsTree0.570.16
Bare ground0.430.27
Weighted global 0.620.20
Score (max: 100) 6280
Total score (max: 100)71  

3.3. Regional Evaluation

[23] For regional evaluation, we used the world subdivision in accordance with 14 geographical regions of the Global Fire Emission Database (GFED, see van der Werf et al. [2006]). The metrics values, r2 and RMSE, were calculated separately for each region. They are presented in Figure 4 and Figure S1 in the supporting information for tree and bare ground cover, respectively. The tree cover in the boreal Asia is reproduced well in the model with r2 of 0.67. The North African tree cover is represented with highest r2 score of 0.78. RMSE is very low for bare ground for two regions (6 and 10, Equatorial Asia and Northern Hemisphere South America, respectively); however, this variable is not representative because of very low bare ground fraction in these regions (Figure 4).

Figure 4.

A diagram of model-data agreement for tree and bare ground fractions. Statistics for 15 regions were computed, and a number was assigned to each region considered. The position of each number on the plot quantifies how closely the modeled vegetation patterns of the different regions match the observations (r = 1, RMSE = 0). The region acronyms are as follows: Australia and New Zealand (AUST), boreal Asia (BOAS), boreal North America (BONA), central America (CEAM), central Asia (CEAS), Equatorial Asia (EQAS), Europe (EURO), Middle East (MIDE), Northern Hemisphere Africa (NHAF), Northern Hemisphere South America (NHSA), Southeast Asia (SEAS), Southern Hemisphere Africa (SHAF), Southern Hemisphere South America (SHSA), Temperate North America (TENA), and Global (GLOB).

4. Evaluation of the Land-Surface Albedo

[24] Evaluation of the land-surface albedo, αs, in the framework of the climate model should account not for the absolute values of albedo but for its relative significance for the radiation budget at the surface. For example, in winter time in the polar regions, there is little incoming solar radiation, SWdown, and, therefore, the significance of the albedo changes in these regions is highly uncertain. Hence, it is more appropriate to analyze the net surface solar radiation:

display math(2)

where SWup is the upward flux of solar radiation. A climatological mean seasonal cycle of SWdown derived from MPI-ESM simulations for the period of 2001–2005 was used to weight the surface albedo. Instead of using SWdown from the MPI-ESM model output, one might have also taken SWdown estimates from long-term satellite records. Hagemann et al. [2013] analyze in detail the accuracy of the simulated SWdown fields and conclude that the MPI-ESM does not show significant differences to existing observational data sets. Therefore, we used simulated SWdown fields for both observed and modeled albedo values.

4.1. Evaluation Metrics

[25] As for vegetation cover, r2 and RMSE were used for the analysis of spatial correlation and amplitude of difference between the model and observations. The surface albedo shows a higher temporal variability than the vegetation cover. Therefore, an additional evaluation metric is applied, which takes into account the different temporal dynamics of surface albedo in different regions. A normalized error variance (e2) is calculated according to Reichler and Kim [2008] as follows:

display math(3)

where wi is an area weight of grid cell i (∑iwi = 1) of the grid cell i; inline image is the interannual variance of the observations in the grid cell i; inline image and inline image are the values of the variable (net surface solar radiation) averaged over the analysis time period in the grid cell i in the model and the observation, respectively; and N is the total number of land grid cells.

4.2. Global Evaluation

[26] The maps of difference between model and observation in net surface solar radiance QSW for four seasons are presented in Figure 5. Bluish (reddish) colors correspond to regions where simulated surface albedo is higher (lower) than in the observations. In the high latitudes of the Northern Hemisphere, the differences between the model and the data are most pronounced during the boreal winter (December–January–March (DJF)) and spring (March–April–May (MAM)) seasons (Figure 5, top). The albedo in this period is dominated by the snow cover and its masking by the tree cover [e.g., Bonan, 2008; Essery et al., 2009]. In the northwestern part of North America (British Columbia), the model underestimates tree cover fraction (Figure 2, top). This is a likely explanation of too high albedo in the model that results in an underestimation of net surface radiation in this region in all seasons (Figure 5). In the regions above 60°N, the model generally overestimates tree cover (Figure 2, top). In boreal winter season, this does not influence the surface radiation budget because of too little solar radiation. In the MAM season, the flux of radiation is much more significant. The patterns of too low-surface albedo in North America (the Nunavut province of Canada) and Eurasia (West Siberia) are complemented by the regions with too high-surface albedo (Alaska, East Siberia). These mixed patterns could be explained by the interplay of patterns of too high tree cover (West Siberia, Nunavut) and too high albedo of snow cover (e.g., eastern Siberia). In the summer season, the land-surface albedo is overestimated by the model almost everywhere except in the Sahel and some desert regions in Asia and Australia.

Figure 5.

The difference between model and observations in the net surface radiation flux QSW (W/m2) averaged for different seasons (top left, DJF; top right, MAM; bottom left, June–July–August (JJA); bottom right, SON). Areas with statistically significant changes (p < 0.05) are stippled.

[27] The correlation between simulated and observed albedos on global and hemispherical scales is high (Table 4). The tropics are characterized by high correlation and relatively low RMSE (5.4–6.5 W/m2) during all seasons. The albedo has highest score during boreal winter on the global scale and in the Southern Hemisphere. During boreal spring season (MAM), the correlation is the lowest on both global and hemispheric scales. This is presumably an indication of the biases in the albedo induced by biases in snow and tree cover in high and temperate northern latitudes discussed above. The boreal autumn (September–October–November (SON)) is characterized by lowest values of RMSE in the Northern Hemisphere as well as on the global scale and highest value of correlation in the Northern Hemisphere.

Table 4. Evaluation of Land-Surface Albedo on the Global Scale
Regionr2RMSE (W/m2)
DJFMAMJJASONDJFMAMJJASON
Global0.990.860.900.9811.112.312.67.4
Northern Hemisphere0.880.840.890.979.214.314.36.7
Southern Hemisphere1.000.810.850.9914.45.25.48.7
Tropics0.930.970.980.966.56.25.55.4

4.3. Regional Evaluation

[28] Our model generally overestimates albedo during all seasons. The box-and-whisker plots of Figure 6, therefore, show a slightly negative shortwave net radiation bias for most regions. The net radiation bias is most pronounced in boreal North America (BONA) in spring (MAM). In this season, the model substantially underestimates the net radiation flux (Figure 6). The score of the 14 GFED regions is presented in a Taylor [2001] plot in Figure 7 and in a table format in Figure S2. The variance of the model is hereby normalized to the variance of the observations to make the results of the different regions better comparable. It appears that the BONA region has the lowest r2 and the highest RMSE, both in absolute units (W/m2) and relative to the net radiation flux, RMSE/QSW. The region has very low correlation coefficients r during all seasons, although CEAS, Southeast Asia (SEAS), and Europe show even lower correlation coefficients for single seasons. The e2 metric provides a single error for each region and allows a relative ranking between them. The results shown in Figure 8 indicate that four regions (BONA, central America, CEAS, and SEAS) have relatively high errors, which is due to pronounced differences in these regions throughout the annual cycle. BONA, for instance, shows maximum e2 in summer. During this season, the observed interannual albedo variability is small, resulting in relatively large values for e2 compared to other regions.

Figure 6.

Box-and-whisker plots for the difference between model and observations in the net surface radiation flux QSW (W/m2) for different regions. For season acronyms, see Figure 5; and for region acronyms, see Figure 4.

Figure 7.

Taylor diagram of model-data agreement for the net surface radiation flux QSW (W/m2) for different regions and seasons. The region and season acronyms are as in Figures 4 and 5, respectively. Global average is shown in red.

Figure 8.

Normalized error (e2) for the net surface radiation flux QSW (W/m2) for different regions and seasons. The region and season acronyms are as in Figures 4 and 5, respectively. Global average is shown in red.

5. Discussion and Conclusions

[29] The performance of vegetation-cover dynamics simulated by MPI-ESM is evaluated for present-day conditions. The patterns generated by the model generally coincide well with the data; however, the model tends to overestimate tree cover and to underestimate bare ground. The model simulates tree cover and bare ground with r2 higher for the Northern Hemisphere (0.66) than for the Southern Hemisphere (0.48–0.50). Grass cover, as an intermediate class between tree cover and bare ground, is reproduced less reliably in both Northern Hemisphere and Southern Hemisphere (r2 of 0.44 and 0.17, respectively). In the vegetation model parameterization, we assume that trees are dominant over grasses. This simple approach partly explains the lower correlation for grass cover.

[30] The global-scale patterns of land-surface albedo simulated by the model and weighted by the incoming solar radiation match the observations well. In comparison with the vegetation cover, land-surface albedo generally shows higher correlation with observations. At the same time, the agreement between model and data differs depending on regions and seasons. The main disagreement between model and data occurs during boreal spring in the Northern Hemisphere, where biases in simulation of snow cover and tree cover cause substantial error and increased variance. In general, the model exhibits a slight overestimation of land-surface albedo, which is pronounced in all seasons. Hagemann et al. [2013] provide a more detailed assessment of MPI-ESM surface albedo and show that the CMIP5 version of MPI-ESM has considerably improved compared with the previous model versions.

[31] Vegetation cover is evaluated using continuous metrics (r2 and RMSE). Another approach recently used by Poulter et al. [2011] for the evaluation of simulated vegetation cover is based on a β-diversity metric (mean Euclidean distance) that allows simultaneously accounting for more than one vegetation class. We have focused here on two classes, trees and bare ground, because they have the strongest effect on biophysical properties of land surface such as albedo and evapotranspiration. Vegetation models that use discrete vegetation classes could be evaluated using a discrete metric such as κ statistics [Monserud and Leemans, 1992]. The application of vegetation-cover metrics, therefore, depends on the model setup and on the research question.

[32] The metrics applied here are useful as relative measures of mismatches between model and observations. They quantify general performance of the model and are useful for regional and seasonal differences between model and observations. While we found that general patterns of albedo and vegetation cover are well represented by the model, we also identified regions where the model performance could be considerably improved. In addition, these metrics could be applied for benchmarking different models or model versions. In particular, such studies could help to identify differences between versions of the model driven by observed and simulated climatology and allow a deeper analysis of mechanisms behind mismatches between data and observations.

Acknowledgments

[33] This work was partly supported through the Cluster of Excellence “CliSAP” (EXC177), University of Hamburg, funded by the German Science Foundation (DFG). This study is contributed to the International Land Model Benchmarking Project (ILAMB). The authors thank Soenke Zaehle and two anonymous reviewers for their constructive comments.

Ancillary