Large Variability in Simulated Response of Vegetation Composition and Carbon Dynamics to Variations in Drought‐Heat Occurrence

The frequency of heatwaves, droughts and their co‐occurrence vary greatly in simulations of different climate models. Since these extremes are expected to become more frequent with climate change, it is important to understand how vegetation models respond to different climatologies in heatwave and drought occurrence. In previous work, six climate scenarios featuring different drought‐heat signatures have been developed to investigate how single versus compound extremes affect vegetation and carbon dynamics. Here, we use these scenarios to force six dynamic global vegetation models to investigate model agreement in vegetation and carbon cycle response to these scenarios. We find that global responses to different drought‐heat signatures vary considerably across models. Models agree that frequent compound hot‐dry events lead to a reduction in tree cover and vegetation carbon stocks. However, models show opposite responses in vegetation changes for the scenario with no extremes. We find a strong relationship between the frequency of concurrent hot‐dry conditions and the total carbon pool, suggesting a reduction of the natural land carbon sink for increasing occurrence of hot‐dry events. The effect of frequent compound hot and dry extremes is larger than the sum of the effects when only one extreme occurs, highlighting the importance of studying compound events. Our results demonstrate that uncertainties in the representation of compound hot‐dry event occurrence in climate models propagate to uncertainties in the simulation of vegetation distribution and carbon pools. Therefore, to reduce uncertainties in future carbon cycle projections, the representation of compound events in climate models needs to be improved.

Extreme weather and climate events can strongly influence carbon dynamics and may even lead to shifts in vegetation composition (Felton & Smith, 2017;Reichstein et al., 2013). In particular, droughts and heatwaves are among the most damaging hazards for terrestrial vegetation (Allen et al., 2010;Arend et al., 2021;Buras et al., 2020;Frank et al., 2015;Senf et al., 2020;Sippel et al., 2018;von Buttlar et al., 2018;Zscheischler et al., 2014), and often co-occur as compound events (Bastos et al., 2014;Zscheischler et al., 2018Zscheischler et al., , 2020. In most cases, impacts from compound events are not simply a linear combination of the univariate impacts Ribeiro et al., 2020;Zscheischler et al., 2014) and disentangling their individual effects from longterm observations is difficult. The impacts of droughts and heatwaves can vary substantially, depending on the vegetation type, location, and phenology of the vegetation (Bastos et al., 2020;Flach et al., 2021;Sippel et al., 2016). Furthermore, in some instances, their individual effects can cancel each other out while in other cases they compound each other, again depending on the location and the underlying vegetation type and state. For example, productivity is increased under hot and dry conditions in high latitudes where warmer temperatures leading to a longer growing season counteract the negative effects of the drought, while productivity is decreased in mid-latitudes, where the high temperatures as well as the low precipitation have a negative impact on productivity (Li et al., 2022).
To understand the effects of extreme events on vegetation, it is important to know which factors influence the distribution of vegetation. At first order, these are climate conditions such as temperature, precipitation, and light availability as well as other environmental conditions such as atmospheric CO 2 concentrations, nutrient availability, or topography (Peng, 2000). However, the frequency of extreme conditions can also impact vegetation distribution but their effects are much less well researched. Controlling all of these confounding factors in experiments in the real world is expensive or in many cases infeasible and therefore, field experiments or observations typically focus on individual species, often different types of grasslands (Hoover et al., 2014), although there are studies focusing on the effects of drought and heat on trees as well (Adams et al., 2015).
An alternative to long-term observations and field experiments are process-based vegetation models. Dynamic global vegetation models (DGVMs) incorporate key ecological processes such as tree growth, nutrient cycling, competition, and mortality and simulate the distribution of vegetation types and their response to climate variability at global scale. DGVMs are able to predict vegetation structure, carbon pools, and fluxes over time and space. Despite being developed to answer similar questions, DGVMs can differ significantly in their temporal resolution, selection of processes, and parameterizations. DGVMs are also an excellent tool to explore new hypotheses that could then be tested in experiments.
Here we aim to explore how DGVMs respond to different drought-heatwave occurrences, everything else being equal. We therefore force a suite of different DGVMs with 100-year long stationary climate scenarios that are similar in their climatology (mean annual temperature and precipitation) but differ in the occurrence rate of droughts, heatwaves, and compound drought-heatwave events (Tschumi et al., 2022a). This builds on the work of Tschumi et al. (2022b) who forced the DGVM LPX-Bern with these different climate scenarios and found that LPX-Bern simulated a much higher forest cover in scenarios with few or no hot and dry extremes and more grasses when frequent compound drought-heatwave events occur. We extend this analysis and conduct simulations with six DGVMs, which will help uncover uncertainties in the modelled vegetation response to different drought-heat frequencies. Additionally, we link the likelihood of experiencing a compound event to changes in total carbon stored in vegetation. While most analyses in this study are done globally, we also included some local analyses to illustrate the large variability in responses between locations.

Climate Forcing
We use the forcing scenarios generated from EC-Earth climate model output by Tschumi et al. (2022a). They consist of a set of six 100-year long daily climate scenarios with similar climatologies but varying drought-heat signatures, originally derived from long stationary climate model simulations whose global mean temperature is approximately at the level of observed 2011-2015 temperatures. The scenarios differ in the occurrence of droughts and heatwaves during the 3 months with maximum net primary production (NPP), based on simulations conducted with the DGVM LPX-Bern ( Figure A1). In those 3 months the effects of hot and dry extremes will presumably cause the largest effects.
Besides a control scenario representing natural variability (Control), one scenario has neither heat nor drought extremes (Noextremes), one has univariate extremes such as heatwaves and droughts but no compound extremes (Nocompound), one has only heat extremes but few droughts (Hot), one has only droughts but few heatwaves (Dry), and one has many compound heat and drought extremes (Hotdry). See Table 1 for an overview of the sampling design.
The scenarios differ only moderately in their annual global mean climate (about 0.3°C in temperature and 6% in precipitation across all scenarios) and do not contain any long-term trends. For Hot, Dry, and Hotdry, 50 years were always sampled randomly from the complete data set in order to keep mean climate comparable and reduce the number of time years resampling of years. Furthermore, at the local level, climatologies are similar among scenarios, which differ primarily in the occurrence of droughts and heatwaves (Tschumi et al., 2022a). The data are provided on a daily time step over land (except Antarctica and parts of Greenland) on a regular 1° × 1° grid. Due to the sampling design, there is no spatial coherence in the climate fields, that is, the climate in one pixel is independent of the climate in the neighboring pixel. A complete description of the scenarios, including a quantification in how they differ in terms of droughts and heatwaves as well as access to the forcing data can be found in Tschumi et al. (2022a).

Modeling Setup
This model intercomparison project (MIP) aims at comparing the response of different vegetation models to varying likelihoods of droughts, heatwaves, and compound drought-heatwave events, while keeping everything else equal. The goal is to better understand uncertainties in the simulation of vegetation composition and carbon dynamics stemming from the occurrence of those climate extremes and compound events. The following models were used in this analysis: CABLE-POP (Haverd et al., 2018), JULES Clark et al., 2011), LPJ-GUESS , LPX-Bern (Lienert & Joos, 2018), OCN , and ORCHIDEE-MICT (Guimberteau et al., 2018). A short description of each model is provided in Section 2.3.
For all models, six simulations are run with the forcing variables sampled as described in Section 2.1. All models are run with dynamic vegetation, except for CABLE-POP, where vegetation distribution is constant over time Table 1 Sampling Design for the Six Climate Scenarios (Tschumi et al., 2022a) Scenario name Sampling procedure

Control
Hundred randomly selected years representing present-day climate Noextremes Only years where temperature and precipitation lie between the 40th and 60th percentile, respectively.

Nocompound
No years where both temperature and precipitation lie above the 85th percentile or below the 15th percentile Hot Fifty years where temperature exceeds the 85th percentile and precipitation lies between the 40th and 60th percentiles, 50 years randomly selected from the rest Dry Fifty years where precipitation lies below the 15th percentile and temperature lies between the 40th and 60th percentile, 50 years randomly selected from the rest Hotdry Fifty years where temperature lies above the 85th percentile and precipitation lies below the 15th percentile, 50 years randomly selected from the rest Note. Sampling is based on average temperature and precipitation during the 3 months in which vegetation is most productive, in terms of maximum NPP. The table is taken from Tschumi et al. (2022b). but differs between the scenarios as its vegetation distribution is calculated from mean climate conditions at the beginning of the runs. Here, the models only simulate natural vegetation, based on the corresponding plant functional types (PFTs) that are represented by each model. This typically includes tropical broadleaf PFTs, temperate broadleaf and needleleaf, boreal broadleaf and needleleaf and grasses as well as, for CABLE-POP and JULES, shrubs, which are counted toward the tree class in this analysis. CO 2 is kept constant at 389.78 ppm (level of 2011). An input file for nitrogen deposition is provided (from NMIP, Tian et al. (2018)). The nitrogen deposition is also given for the year 2011 and is kept constant. Each model uses its own approach to distribute nitrogen deposition over the year. No nitrogen fertilization is included. The forcing data is provided on a 1° × 1° grid. It is important that all models use the spatial resolution of the forcing data, since there is no spatial coherence in the climate forcing due to the nature of the sampling (Section 2.1). The spin-up for the scenarios consists of the 100 years of data for each scenario, recycling it as often as needed to ensure that vegetation and carbon pools are in equilibrium at the beginning of each of the 100-year simulations.

Model Descriptions
In the following we provide a short description of each vegetation model that participated in the MIP.

CABLE-POP
CABLE-POP (Haverd et al., 2018) has been developed around a biogeophysics core module (Y.-P. Wang & Leuning, 1998) and a biogeochemistry module including nitrogen cycling (Y.-P. Wang et al., 2010). The 'POP' module (Haverd et al., 2013) simulates woody demography, which represents forest population dynamics such as establishment and mortality, but not competition among vegetation types. The model distinguishes eight plant functional types which can co-occur in a grid cell. The model disaggregates daily meteorological forcing into 3-hourly time steps using a weather generator. Extreme heat affects carbon fluxes through photosynthesis and respiration. Photosynthesis is modeled following Farquhar et al. (1980), which predicts a hump-shaped temperature dependency with a well-defined temperature optimum. Extreme heat beyond the photosynthetic temperature optimum reduces carbon uptake and at the same time increases plant maintenance respiration rates. In CABLE, both photosynthesis and plant respiration were assumed to acclimate to growth temperature, which mitigates the negative effects of extreme heat to some extent. Soil respiration increases with soil temperature, provided that enough water is available. Drought affects physiology mainly through a "water stress factor", which is linearly dependent on soil moisture and which scales down the stomatal slope parameter (g 1 ) (Medlyn et al., 2011) as well as the maximum carboxylation rate (Vcmax). Drought thus reduces carbon uptake via reduced stomatal conductance and photosynthesis. Drought further inhibits microbial activity which leads to reduced heterotrophic respiration. Both extreme heat and drought reduce carbon uptake and thus the available C that can be allocated to leaves. The consequence is a reduction in LAI and thus foliar protective cover (fpc), which is calculated as a function of LAI.

JULES
The Joint UK Land Environment Simulator (JULES) model Clark et al., 2011) is a community model and is used in coupled or stand-alone mode forced by meteorological variables. Since JULES runs on sub-daily timesteps, we made use of the JULES disaggregator (Williams & Clark, 2014), which is based on the IMOGEN method (Huntingford et al., 2010). The model parameters (science settings i.e., excluding driving data, 1° × 1° grid, simulation dates, ancillary data, prescribed data and spin-up method that were specified for this model intercomparison) are described in Mathison et al. (2022). For this study, only the (nine) natural plant functional types were simulated and we do not use the fire module. Leaf photosynthesis is modeled according to Collatz et al. (1991) and G. J. Collatz et al. (1992), and includes a PFT-dependent temperature optimum in the maximum rate of carboxylation of Rubisco and dark respiration (with no acclimation). Leaf phenological status is also temperature dependent. There is piecewise-linear soil moisture stress function, weighted by plant roots in each soil layer, which acts on photosynthesis and maintenance respiration. Vegetation competition uses the TRIFFID model (Cox, 2001). In addition, soil respiration is both temperature and soil moisture dependent. Leaf Area Index is modeled separately for each plant functional type.

LPJ-GUESS
The Lund-Potsdam-Jena General Ecosystem Simulator (LPJ-GUESS) (Smith et al., 2001) is a DGVM simulating processes such as establishment, growth, mortality, and competition of PFTs of various age cohorts and their carbon-, water-and nutrient (N)-cycling. PFTs are distributed in a specified number of patches across each gridcell. For this study, a spin-up time of 1,000 years is used to reach equilibrium of soil pools. Disturbances are modeled as patch-destroying disturbances with an average return time of 300 years (Pugh et al., 2019). Here, we simulate 25 replicate patches to account for the distribution of vegetation stands of different stages after disturbance. In addition, fire is modeled via the GLOBFIRM fire model (Thonicke et al., 2001). To model the vegetation, global PFT parameterizations are used . As forcing, daily surface air temperature, precipitation, and downward shortwave radiation from this MIP are used. Foliar projective cover in LPJ-GUESS is computed by multiplying a cohort's number of individuals with their crown area and the amount of light that is attenuated by the foliage. The latter is computed using LAI and the Lambert-Beer-law for light extinction. Drought and heat affect productivity in a way that photosynthesis is down-regulated under high temperatures and low water availability. Low water availability in addition leads to increased allocation of carbon belowground.
In the longer-term, low productivity leads to increased mortality due to low growth efficiency. Boreal PFTs are affected by heat-stress mortality.

LPX-Bern
The Land surface Processes and eXchanges (LPX-Bern v1.4) model (Lienert & Joos, 2018) is a DGVM based on the Lund-Potsdam-Jena (LPJ) model . It needs as forcing daily or monthly data of temperature, precipitation, and radiation, as well as information on soil type (Wieder et al., 2014), CO 2 , and nitrogen deposition to model water, carbon, and nitrogen cycling in each grid cell. The model represents 10 different natural vegetation types (eight tree PFTs and two grass PFTs) on mineral soils. PFTs grow within their bioclimatic limits and compete for resources. Foliar projective cover lies between zero and one, with a maximum limit of 0.95 for tree coverage and trees given priority over grasses as long as conditions are favourable for trees. Land-use classes for cropland, pastures, and urban area, and for wetlands and peat lands are not enabled in this study. The fire disturbance module and the nitrogen module were activated during the runs. Drought and heat extremes affect vegetation distribution and carbon dynamics in various ways. Apart from direct effects by exceeding bioclimatic limits and leading to vegetation mortality, they can, for example, affect photosynthesis and respiration through low soil moisture, leading to reduced carbon storage in vegetation.

OCN
The terrestrial biogeochemical model O-CN (referred to as OCN here) is originally based on the ORCHIDEE model (Krinner et al., 2005) but was extended through the addition of dynamic nitrogen cycle processes coupled to the carbon cycle as described in  and Zaehle et al. (2011). Biological nitrogen fixation was dynamically simulated with the OPT scheme described by Meyerholt et al. (2016). The model represents 13 PFTs (eight tree types, natural C3 and C4 grasses, C3 and C4 crops and bare-soil). The version of OCN used in this study simulates dynamic vegetation processes (mortality, competition, and establishment) based on the LPJ model  and includes fire disturbance dynamics based on Thonicke et al. (2001). Only natural vegetation types were included, that is, 11 PFTS, excluding the two crop types. A spin-up simulation was performed by recycling the 100-year climate forcing with random sampling until vegetation and soil carbon pools were in equilibrium. Fire disturbances and nitrogen dynamics were activated during the spin-up and runs.

ORCHIDEE-MICT
ORCHIDEE-MICT (Organising Carbon and Hydrology in Dynamic Ecosystems-aMeliorated Interactions between Carbon and Temperature) has been developed from ORCHIDEE, a land surface component of the French Institut Pierre Simon Laplace (IPSL) Earth system model (ESM) that simulates water, energy, and carbon processes (Krinner et al., 2005). The ORCHIDEE-MICT incorporates a new vertical soil parameterization scheme, snow processes, and a fire module, improving the representation of high-latitude processes such as permafrost physics and hydrology (Guimberteau et al., 2018). A spin-up simulation following Guimberteau et al. (2018) was performed to reach the equilibria for soil conditions and carbon pools. The model discretizes the vegetation into 13 PFTs (eight for trees, two for natural C3 and C4 grasses, two for crops, and one for bare-soil type). Foliar projective cover ranges from 0 to 0.9 and is related to the mean individual leaf area index by the Lambert-Beer law (Smith et al., 2001). Daily forcings provided by the MIP were used for the simulations. Only the natural PFTs (trees and natural grasses) were represented and the anthropogenic processes such as grass grazing and crop harvesting were disabled. Drought and heat affect the assimilation in various processes such as photosynthesis, leaf phenology and carbon allocation. For example, water limitation reduces the photosynthesis capacity and trees can shed leaves under the conditions of severe water stress.

Vegetation Cover, Carbon Pools and Fluxes in the Control
Total mean global vegetation coverage, based on the foliar projective cover, which is the percentage of ground area occupied by the vertical projection of foliage, ranges from 43% to 89% in the Control scenario, depending on the model. Most models simulate larger total tree coverage than total grass coverage ( Figure 1). Overall vegetation coverage is the lowest for LPJ-GUESS (43%), OCN simulates mainly trees (70%, with 78% of vegetated area), whereas ORCHIDEE-MICT simulates the highest grass cover (46%, with 89% of vegetated area), with CABLE-POP, LPX-Bern, and JULES being somewhere in-between. Mirroring the variability in global vegetation cover, the models also differ strongly in their spatial patterns of vegetation distribution ( Figure 2). White areas in the maps represent land areas with bare soil or sparsely vegetated. Under the EC-Earth model-based Control forcing, most models agree on grass coverage in Australia, western USA, and central Asia, with some dominantly grass-covered regions in South Africa and southern South America. Some models (particularly ORCHIDEE-MICT) simulate grass cover in the Sahara desert. Tropical regions as well as most temperate to higher latitudes are mainly covered in trees. OCN simulates that nearly all land regions are dominated by tree cover, which is likely a consequence of the wet bias in the extra-tropics in the forcing data (Tschumi et al., 2022a). The models simulate different types of trees (tropical, temperate, broadleaf etc.) and grasses that are, however, not differentiated in this comparison. Please note that the prescribed control climate has strong biases compared to observational data at the with strong impacts on simulated baseline vegetation distribution. Additionally, the modeled vegetation distribution also looks very different to observational data because only natural vegetation is being simulated.
Global gross primary production (GPP), net primary production (NPP), and heterotrophic respiration (RH) show some variation across models in the Control simulation, with GPP ranging from 134 to 195 PgC per year, NPP ranging from 68 to 96 PgC per year and RH ranging from 57 to 84 PgC per year (Figure 3a). For most models, soil carbon pools (ranging between 1,540 PgC and 2,078 PgC, ORCHIDEE-MICT being an exception with 3,827 PgC) are generally about twice as large as the vegetation carbon pools (674 PgC to 1,876 PgC) (Figure 3b). ORCHIDEE-MICT simulates a soil carbon pool about five times larger than the vegetation carbon pool (3,827 PgC in soils compared to 674 PgC in vegetation). The sizes of the vegetation and soil carbon pools correlate with the vegetation distribution; models with a high tree coverage also simulate a large vegetation carbon pool. The high soil carbon value in ORCHIDEE-MICT is probably related to the fact that it includes permafrost carbon in the soil carbon variable.

The Effect of Varying Drought-Heat Signatures
The responses in vegetation coverage to the different scenarios vary strongly between models (Figure 4). The strongest agreement in direction of change between models is found for the Hotdry scenario, for which all models agree on an increase in grass cover and nearly all models agree on a decrease in tree cover. In the Dry scenario, all models simulate a reduction in tree cover but models disagree as to whether grasses increase or decrease. In contrast, nearly all models simulate an increased tree cover in the hot scenario. Again, model results vary in the grassland response to this scenario.
Models show relatively weak and inconsistent response to the Nocompound scenario. Finally, the responses to the Noextremes scenario, which represents a climate with both temperature and precipitation always between the 40th and 60th percentile during the three months of highest growth, are rather large but vary strongly across models: whereas LPX-Bern (red) and ORCHIDEE-MICT (pink) simulate a strong increase in tree coverage, the other models generally show a slight decrease or no change at all (in the case of CABLE-POP, purple). Overall, CABLE-POP generally shows a relatively weak response for most scenarios, most likely related to the fact that it does not simulate vegetation dynamically but uses fixed vegetation determined by the climate in the spin-up (Section 2.3).   In absolute terms, the largest relative differences are simulated by the OCN grass response (light blue) to the Noextremes (+10.5%), Dry (+8.9%) and Hotdry (+7.7%) scenarios, which is likely due to the fact that the overall grass fraction is very low in the Control (Figure 1). LPJ-GUESS (green) also simulates an increase in grass cover of 9.5% for Hotdry whereas LPX-Bern (red) and ORCHIDEE-MICT (pink) both simulate a decrease by about 7% in grass cover for Noextremes. Regarding changes in tree cover, JULES (orange) simulates the strongest decrease (over −7%) for Hotdry extremes whereas LPX-Bern and ORCHIDEE-MICT simulate a more than 5% increase for Noextremes.
Similar to the response in vegetation cover, the models show diverse responses in total vegetation and soil carbon pools relative to the Control ( Figure 5). For the Noextremes and the Nocompound scenario, most models agree on an increase in both vegetation and soil carbon, with LPX-Bern generally showing the strongest increase followed by ORCHIDEE-MICT. For the Hot scenario the responses across models are more mixed, with LPX-Bern showing an increase in both vegetation and soil carbon pools, JULES showing a slight decrease in both pools, and the remaining models showing both increases and decreases in the carbon pools. The Dry and the Hotdry scenarios overall lead to stronger carbon losses, especially in the vegetation pool, for which nearly all models agree on a loss. For soil carbon in these two scenarios, CABLE-POP, JULES, and ORCHIDEE-MICT show a decrease while the other models show an increase. The amount of decrease or increase is generally largest for the Hotdry scenario, followed by the Dry scenario. An exception is the change in the vegetation carbon pool in LPX-Bern and also to a lesser extent ORCHIDEE-MICT for the Noextremes and Nocompound scenario, which is relatively large and mirrors the increase in forest cover (Figure 4). The spatial patterns of these changes can be seen in Figures B1 and B2. The largest positive effect on trees by the Hot and the Hotdry scenario occurs in very high latitudes. For most models, the effect of the Hotdry scenario on carbon pools exceeds the combined effect from both the Hot and Dry scenario. The effect from the Hotdry scenario on carbon pools is generally larger than the effect from the Dry scenario, and the effect from the Hot scenario shows an opposite response compared to the effect from the Dry scenario for many models, meaning that it would be difficult to predict the effect of the Hotdry scenario from the individual effects of the Hot and the Dry scenario.
In the previous sections we found an indication that the occurrence of more frequent compound hot and dry conditions may lead to a reduction in the overall carbon pools (vegetation and soil carbon combined). The occurrence rate of concurrent hot and dry extremes can be approximated by the seasonal correlation between temperature and precipitation, with a stronger negative correlation indicating more frequent compound hot and dry conditions (Zscheischler & Seneviratne, 2017). We therefore test whether the correlation between temperature and precipitation in the months with highest productivity can serve as an indicator of total carbon accumulation by computing the correlation of the correlation between temperature and precipitation and total carbon pools across scenarios r(r(T,P),C) for all six models. Pooling the results of all available models using the approach for averaging correlations suggested by Corey et al. (1998), we find that for most land regions, this is indeed the case ( Figure 6). In most regions, we see a rather strong positive relationship. Since temperature and precipitation are generally negatively correlated over land ( Figure C1), this means that the stronger negatively correlated temperature and precipitation are, the smaller is the total carbon pool in that region. In other words, in those regions climatologically more frequent concurrent hot and dry conditions reduce the carbon pools at equilibrium in the dynamic vegetation models used in this study.
In some high-latitude regions and in mountainous regions, such as the Himalayas or the Alps, and dry regions such as the Sahara desert, the correlation in Figure 6 is slightly or even strongly negative. In these regions, more frequent compound hot and dry conditions lead to higher carbon pools. Generally, this effect seems to hold for regions that are cold and/or regions that have little vegetation coverage to begin with. The magnitude of the correlation can be interpreted as a sensitivity of the dynamic vegetation and in particular the carbon pools to the occurrence of compound hot and dry events. So far we have focused on the average and large-scale responses of the models to the different scenarios. However, local analyses might provide additional insights on model differences. Figure 7 shows the variability in tree cover (first row), grass cover (second row), and GPP (third row) across years for all models and all scenarios for a location in the Australia (−20.5°N 130.5°E). The Control simulation has a bias of −0.4°C in annual mean temperature and −2.3% in annual mean precipitation compared to observations at this location. The plot on the top left shows the cooling degree days (CDD) against the mean standardized precipitation index (SPI) as indica tors for heat stress and drought intensity, respectively, for the different scenarios. CDD, here used as a  . Correlation between the correlation of temperature and precipitation averaged over the 3 months with highest vegetation activity (as simulated by LPX-Bern) and total carbon pools (vegetation carbon plus soil carbon) (r(r(T,P),C)). The correlation for each model based on six data points was transformed using Fisher z transformation, then the values were averaged over all models and transformed back (Corey et al., 1998). In white are the areas where the correlation is not significant at the 5% level (calculated based on 36 data points). proxy for heat stress, is calculated as the sum of all temperature exceedances over a high threshold, in this case the 90th percentile of the Control scenario at each pixel. SPI is calculated using a 3-months timescale based on monthly precipitation values which is fitted to a Gamma distribution and then transformed to a standard normal distribution (Tschumi et al., 2022a).
Models strongly vary in a number of characteristics: the distribution between tree and grass cover, the interannual variability in vegetation cover and GPP, and their response magnitude to the different scenarios. While OCN generally simulates a higher tree cover than grass cover, the opposite is true for all other models. However, tree versus grass cover does not seem to affect the difference in GPP much between the models. Some models, like LPX-Bern and to some extent also LPJ-GUESS, OCN, and ORCHIDEE-MICT, show a large interannual variability in vegetation coverage (as shown by the length of the boxes), indicating a high sensitivity or fast response to year-to-year variations in weather conditions. This high interannual variability is also visible in GPP. Regarding the response to the different scenarios, OCN and ORCHIDEE-MICT agree on less tree coverage for the scenarios Dry and Hotdry and an associated increase in grass coverage. JULES, on the other hand, simulates a reduction in grass coverage for these two scenarios. CABLE-POP simulates no difference between scenarios for this location.
Pixel-based simulations for other locations are shown in Figures D1 (South Africa), D2 (Siberia), and D3 (USA) with their corresponding locations indicated in Figure D4. The pixel in South Africa has a temperature bias of −3.2°C and a precipitation bias of +78.6% in the Control scenario compared to observations. JULES, LPJ-GUESS and OCN simulate a dominant tree cover for all scenarios (>50%), whereas CABLE-POP, LPX-Bern and ORCHIDEE-MICT simulate mainly grass (>60%). JULES simulates a pronounced reduction of tree cover for the Hotdry scenario and a corresponding increase in grass cover. Overall, LPX-Bern shows the strongest response to the different scenarios in vegetation cover, though GPP is rather similar in all scenarios. Despite the differences between the models and scenarios in tree or grass coverage, GPP is comparable for all models and most scenarios, with small declines for Hotdry.
The pixel in Siberia has a +3.3°C temperature bias and a +28.7% precipitation bias. Here, LPX-Bern simulates mainly trees, with large variations between the scenarios, resulting in moderate tree (50%) cover in the Control and the Nocompound scenario and very high tree cover in the other sscenarios (80%). OCN also simulates a relatively high tree cover (50%), with increasing cover in the Hot and Hotdry scenario. CABLE-POP simulates relatively low tree cover in the Control and Nocompound but strong increases in all other scenarios. Grass cover shows the opposite response. JULES, LPJ-GUESS, and ORCHIDEE-MICT simulate a dominance of grasses in this location, with JULES showing a strong increase in tree cover for Hot. Again, interannual variability is largest in LPX-Bern followed by LPJ-GUESS and OCN.
The pixel in the United States has a +0.2°C temperature bias and a +60% precipitation bias. While JULES, LPJ-GUESS, OCN, and ORCHIDEE-MICT generally simulate a higher tree cover than grass cover, the opposite is true for LPX-Bern and CABLE-POP. Some models, in particular LPX-Bern and to some extent also LPJ-GUESS, OCN, and ORCHIDEE-MICT, show a large interannual variability in vegetation coverage. Regarding the response to the different scenarios, LPX-Bern, OCN, and JULES agree on less tree coverage for the scenarios Dry and Hotdry. Also LPJ-GUESS shows a slight reduction in tree coverage for the Dry scenario but similar coverage in the Control and the Hotdry scenario. Consistent with the earlier large-scale analysis, LPX-Bern simulates a much higher tree cover in the Noextremes scenarios. OCN and JULES simulate a much weaker response in the same direction. CABLE-POP simulates no difference between scenarios for this location.
Overall, the pixel-based analysis highlights the large variability of model responses to the different forcing scenarios. Mean vegetation and carbon fluxes differ greatly, but models also vary strongly in their interannual variability of vegetation composition and carbon uptake in response to climate variability.

Discussion
Modeled vegetation distribution and terrestrial carbon dynamics are strongly affected by the occurrence rate and intensity of extreme climate events. In this study we investigate how state-of-the-art global vegetation models simulate changes in vegetation distribution and carbon dynamics to differences in the occurrence rate of heatwaves, droughts, and compound drought-heatwave events during the 3 months of largest vegetation activity. To this end, we forced DGVMs by different climate scenarios and evaluated results over a hundred-year period after models have been brought to equilibrium. We computed responses as differences in outcomes over the hundred years between a scenario with different extreme event statistics and a Control simulation. Annual mean temperature and precipitation are approximately equal across scenarios (variation of about 0.3°C in mean global temperature and about 6% in mean global precipitation). We find overall a large variability across models in the response of vegetation distribution and carbon uptake to change in the frequency of extreme events. The differences in responses across models are typically more pronounced than the differences in responses across the six selected scenarios for a given model. Furthermore, the linear sum of the effects from single extreme hot or dry conditions are less that the effects from compound hot and dry extremes. This means that it would be difficult to extrapolate effects from single events to effects from compound events.
We observe the largest effect in the Dry and Hotdry scenarios, where models agree that more frequent droughts/ more frequent compound drought-heatwave events lead to a reduction in tree cover and increase in grass cover ( Figure 4). Likewise, most models simulate a reduction in the vegetation carbon pool by up to −7.5% (−2% in mean reduction for Dry and −4.8% for Hotdry) for those scenarios ( Figure 5). This indicates that globally, more frequent droughts lead to the terrestrial biosphere being a smaller carbon sink. Only the high latitudes show an increase in tree coverage for some scenarios, in particular those with many heatwaves, since these regions are usually energy-limited ( Figure B1). The results indicate that in a climate with frequent droughts and compound drought-heatwave events, trees cannot thrive and are outcompeted by grasses, which are less dependent on a stable climate and can adapt easier to strong variations in water availability. Large-scale tree mortality has been linked to extreme droughts in observations (Senf et al., 2020), compound hot-dry conditions Hartmann et al., 2022) and sequences of hot and dry years . Some processes concerning mortality, such as plant hydraulics for example, might be missing from some models or are differently implemented in different models, leading to varying results. Although the current set of global vegetation models lacks many processes that are important for vegetation mortality, for example, plant hydraulics, concerning biogeochemical cycling and vegetation composition (Bugmann et al., 2019;McDowell et al., 2018;Meir et al., 2015), our results indicate that the models are able to simulate reduced forest cover when droughts and heatwaves are very frequent in the long-climatology.
The responses to a climate with more frequent heatwaves are much less pronounced at the global scale and are mostly an effect of increased forest cover and vegetation productivity in energy-limited regions such as the high latitudes and reduced tree cover in regions that already reach temperature limits in the control climate. For the Nocompound scenario, responses are generally weak. In contrast, for the Noextremes scenario, model responses are strong but in high disagreement. For both scenarios models tend to simulate more vegetation carbon. The variations in the responses to the Noextremes scenario could be an indication to differences in how models deal with the effect of extremes on vegetation and carbon dynamics. Trees in LPX-Bern and ORCHIDEE-MICT seem to thrive under stable conditions with few extremes (Tschumi et al., 2022b) whereas all other models simulate reduced tree cover. This could be due to the fact that the Noextremes scenario excludes some frequently warm temperature which are actually beneficial for C3 plant photosynthesis. Excluding these warmer temperatures may lead to lower foliar projective cover of trees.
In most regions, total carbon stocks are strongly correlated with the likelihood of experiencing compound drought-heatwave events ( Figure 6). In most tropical and mid-latitude regions, more frequent compound drought-heatwave events lead to lower carbon stocks in vegetation and soils, whereas the opposite is true for the high latitudes. The temperature-precipitation correlation-which determines the likelihood of experiencing compound drought-heatwave events (Zscheischler & Seneviratne, 2017)-can vary substantially across climate models  due to differences in how atmospheric and land surface processes are simulated (Berg et al., 2015). Climate models can have substantial biases in the temperature-precipitation coupling compared to observations (Vrac et al., 2022). Furthermore, varying long-term trends in the temperature-precipitation coupling have been identified in climate models (Zscheischler & Seneviratne, 2017), which may add to reductions in future crop yields caused by warming temperatures (Lesk et al., 2021). Through the link between total carbon stocks and precipitation-temperature coupling in vegetation models illustrated in our study, we demonstrate how uncertainties in the representation of the temperature-precipitation coupling and changes therein can contribute to uncertainties in the projection of terrestrial carbon stocks (Friedlingstein et al., 2014).
Our model intercomparison shows very high variability in model responses, which is not uncommon in vegetation model intercomparison studies (Paschalis et al., 2020). All models were run with the same forcing, reducing uncertainties related to the choice of forcing data (M. Wang et al., 2021). Nevertheless, strong differences between models in the Control simulations ( Figure 2) could be related to the fact that we used raw model output from a climate model as forcing, which-despite matching the observed global mean temperature of 2011-2015-is characterized by regional biases in temperature and precipitation ( Figure E1). Vegetation models are often calibrated to represent observed vegetation well when forced with observed climate (e.g., when used to estimate the land carbon sink, Friedlingstein et al. (2022)) so regional climate biases can lead to very different simulations of vegetation distribution and carbon dynamics for the different models. For instance, Teckentrup et al. (2022) found large differences in the simulation of carbon fluxes and stocks for raw climate model forcing compared to a bias-corrected forcing in water-limited regions of Australia. Not all variables were equally sensitive to the bias, and not all PFTs responded in a similar way, indicating that the bias could have an influence on vegetation composition. To test whether a bias in forcing could affect our conclusions, we restricted the analysis to regions with small biases in the climate forcing (less than 1°C and 20 mm difference compared to CRU data). The general response patterns look very similar ( Figure F1) which leads us to conclude that the bias on the forcing data does not strongly affect our findings on the relative changes. Uncertainties in the model responses may also be related to the fact that we based the sampling of the scenarios on the three most productive months as simulated by LPX-Bern (Tschumi et al., 2022a). Other models might have strong shifts in the most productive months and thus be sensitive to climate extremes in different seasons. We find that in most of the extratropics the time shift between the most productive months is small ( Figure A1). The largest differences occur in the tropics and subtropics, which are regions where the seasonal cycle is not very pronounced and therefore differences in the months does not necessarily mean large differences in productivity.

Conclusion
This model comparison aims at investigating how different vegetation models simulate vegetation distribution and carbon dynamics to climates with few or no droughts and heatwaves, only univariate extremes, and frequent compound drought-heatwave events. Even though all models are run with exactly the same forcing data, we find that model responses vary greatly. Despite large differences, the models generally agree that a climate with more frequent compound hot-dry events would lead to a reduction in forest cover and carbon stocks. Furthermore, the size of the total carbon pool is generally strongly related to the likelihood of experiencing compound hot-dry events. Overall our study highlights how uncertainties in the simulation of compound hot-dry events in climate models can propagate to uncertainties in simulated vegetation distribution, carbon uptake and carbon pools. This suggests that in order to reduce uncertainties in future carbon cycle projections, in addition to improving the representation of land surface processes, the representation of compound weather events in climate model requires attention.

Appendix A: Most Productive Month for All Models in Comparison to LPX-Bern
Differences in the most productive months across models ( Figure A1).  5°N 120.5°E). The top left panel shows the Standardized Precipitation Index (SPI) as a drought indicator and the Cooling Degree Days (CDD) as a heat indicator for all scenarios. The other panels show tree coverage in the top row, grass coverage in the middle row and GPP in the bottom row for all models. The boxes depict the variation over the years. The temperature bias for the Control scenario is +3.3°C and the precipitation bias is +28.7% compared to CRU.   in the bottom row for all models. The boxplots depict the interannual variations. The temperature bias for the Control scenario is +0.2°C and the precipitation bias is +60% compared to CRU climate data (Harris et al., 2014).

Appendix E: Bias between EC-Earth and Observations for Temperature and Precipitation
Bias between Control scenario and CRU data ( Figure E1). Figure E1. Biases in EC-Earth simulations with respect to observation-based data from CRU (Harris et al., 2014). (a) Difference in annual mean temperature between EC-Earth and CRU in °C. (b) Relative difference in annual precipitation between EC-Earth and CRU in %. The time period 1988-2017 was used for CRU and randomly sampled 100 years (representing 2011-2015) for EC-Earth. The land regions depicted in gray in (b) are desert regions with a mean annual precipitation of less than 250 mm in the CRU data set and were excluded in the maps to avoid dividing by very small numbers. Taken from Tschumi et al. (2022a).