Investigating spatial climate relations using CARTs: An application to persistent hot days in a multimodel ensemble



[1] This study introduces Classification and Regression Trees (CARTs) as a new tool to explore spatial relationships between different climate patterns in a multimodel ensemble. We demonstrate the potential of CARTs by a simple case study based on time-aggregated patterns of circulation (represented by average levels and variabilities of sea level pressure, SLP) and land surface conditions (diagnosed from the time-averaged surface water balance) from regional climate model simulations (ENSEMBLES) over Europe. These patterns are systematically screened for their relevance to the spatial distribution of persistent hot days. Present-day (ERA40) and future (A1B) climate conditions are analyzed. A CART analysis of the ERA40 reanalysis complements the results for the present-day simulations. In many models, long persistent hot days concur with low variabilities of SLP and high water balance deficits both in present and future. However, for the change patterns (A1B minus ERA40) the analysis indicates that the most robust feature is the link between aggravating persistent hot days and increasing surface water deficits. These results highlight that the factors controlling (in our case spatial) variability are not necessarily the same as those controlling associated climate change signals. Since the analysis yields a rather qualitative output, the model bias problems encountered when studying ensemble averages are alleviated.

1. Introduction

[2] Changes in extreme weather and climate events count among the most crucial themes in climate change research, both for science and society. While they have strong and visible impacts on health and economy [Halsnaes et al., 2007], their simulation in current Global and Regional Climate Models (GCMs and RCMs) still suffers from substantial uncertainty [Durman et al., 2001; Beniston et al., 2007; Kjellström et al., 2007]. The uncertainty arises from deficiencies of the models such as too coarse spatiotemporal resolutions and unsatisfactory parametrizations of subgrid-scale processes. This is of particular concern, since extreme events are by definition rare and therefore limited observational evidence is available to assess their representation in climate models.

[3] Persistent hot days often occur conditional on persistent circulation patterns and can be aggravated by land-atmosphere interactions [Black et al., 2004; Seneviratne et al., 2006a; Diffenbaugh et al., 2007; Jaeger and Seneviratne, 2010; Lorenz et al., 2010; Teuling et al., 2010; Hirschi et al., 2011; Wang et al., 2011]. The wide range of future trends in circulation patterns, that is found in climate model simulations [Van Ulden and Van Oldenborgh, 2006], is therefore one reason for the uncertainty of changes in extreme heat events. Substantial uncertainties also exist in the representation of land-climate interactions [Boe and Terray, 2008; Pitman et al., 2009; Orlowsky and Seneviratne, 2010], to a large degree due to the scarcity of representative and long-term surface observations (e.g., of soil moisture or evapotranspiration; see Seneviratne et al. [2010] for a general overview).

[4] A common approach to at least partly overcome these shortcomings is to analyze ensembles of climate model simulations and their average behavior, which is expected to be less affected by biases than single model simulations [Hagedorn et al., 2005; Doblas-Reyes et al., 2005; Intergovernmental Panel on Climate Change (IPCC), 2007]. While this works well in certain cases [e.g., Weisheimer et al., 2009], it is difficult to systematically assess the validity of this assumption (for an overview, see, e.g., Tebaldi and Knutti [2007]).

[5] In this paper we propose a complementary approach to analyze multimodel ensembles, with a focus on qualitative spatial relations between different time-aggregated climate fields. Our approach is based on Classification and Regression Trees (CARTs) [see Breiman et al., 1993]. In climate research, CARTs have been used for downscaling of large-scale weather information (e.g., climate model output as in work by Zorita and von Storch [1999]) or for upscaling of point observations to the regional scale [Jung et al., 2009].

[6] In contrast to these applications, we are less interested in the predictive skills of CARTs, but rather in the structures that CARTs detect in the predictor phase space. The simplicity of these structures (see below) allows for an intuitive understanding of the relations within the data. It further allows for their comparison in different climate model simulations and thus to identify models which behave similarly. We apply this approach in a simple case study to evaluate spatial relations between different time-aggregated climate patterns, representing circulation and land-atmosphere interactions, and spatial distributions of persistent hot days. Cross correlations of the spatial predictor and predictand patterns are used to evaluate the consistency of the CARTs with respect to more traditional approaches.

[7] After introducing the reanalysis and climate model data used in this study (section 2.1), we describe CARTs and their application in section 2.2. The results of our analysis are presented in section 3. Finally, a discussion and conclusions are provided in section 4.

2. Data and Methods

[8] Our study is based on time-aggregated spatial patterns, for example seasonal averages of sea level pressure (SLP) and maximum persistence of heat extremes. The elimination of the temporal dimension excludes important aspects of the involved processes, such as the succession of different states in predictors and predictand. Furthermore, relations are derived for spatially cooccurring features and any nonlocal connections remain hidden. The range of questions we address with our setup is thus reduced to which general climatic conditions constrain the spatial distribution of persistent hot days. This relatively simple setting provides a suitable test bed for an exploration of the potential of CARTs, one of the main motivations of our study. We therefore defer the extension to the temporal dimension to future work. The following subsections describe the data from which we derive the spatial patterns, and introduce CARTs and how they are applied in our study.

2.1. Data

2.1.1. ERA40

[9] We use daily sea level pressure (SLP), evapotranspiration, precipitation and maximum near surface air temperature from the European Center of Medium-Range Weather Forecast (ECMWF) ERA40 reanalysis [Uppala et al., 2005]. The data is provided on a reduced Gaussian grid with grid cells of approximately 1.25° × 1.2°. The analyzed time period is 1961–2000.


[10] We analyze coordinated RCM simulations with prescribed forcings and boundary conditions, which were run as part of the ENSEMBLES project [Hewitt, 2005]. These simulations are performed by a range of RCMs and GCM/RCM pairs from the main European GCMs and RCMs. Here we focus on RCM simulations and two sets of experiments: Simulations of the 1961–2000 period, driven by the ERA40 reanalysis, and simulations for the 2061–2100 period, driven by SRES-A1B scenario runs with different GCMs [IPCC, 2001, 2007]. Eight RCMs (Table 1) have runs for both of these periods with the required variables on the daily time scale (SLP, evapotranspiration, precipitation and maximum near surface air temperature). If available from the ENSEMBLES homepage (, Table 1 also indicates the land surface models implemented in the RCMs (in order to evaluate whether the results show any dependence on them). For the remaining RCMs, the information is taken from Jacob et al. [2007] and enclosed in parentheses. Five different land surface models are implemented in the eight analyzed RCMs. Nonetheless, these different schemes are not necessarily independent given shared development and model frameworks.

Table 1. Regional Climate Models (RCMs), Their Host Institutions, Driving Global Climate Models (GCMs), and the Implemented Land Surface Modelsa
RCM NumberRCM NameInstitutionGCM (A1B)Land Surface Model
1RCA3C4IHadCM3RCA development
3HIRHAM5DMIARPEGE(Dümenil and Todini [1992])
7REMOMPIECHAM5(Dümenil and Todini [1992])
8RCASMHIHadCM3RCA development

2.1.3. Derived Fields

[11] The choice of potential predictors and predictands is always arbitrary to a certain extent and the results necessarily depend on it. Here we aim at investigating influences from circulation and land-atmosphere interactions onto persistent heat events over Europe in summer and to identify possible seasonal components in the inferred relationships. Consequently, the following predictor fields are calculated from the ERA40 and ENSEMBLES data: (1) For SLP, seasonal average and standard deviation (daily time scale, after removing the annual cycle) in spring (MAM) and summer (JJA). Including the information of spring SLP allows us to investigate the relevance of the atmospheric state preceding the persistent hot days in summer. (2) For the climatological water balance WBAL, defined as precipitation minus evapotranspiration, seasonal (MAM and JJA) and annual (entire year) averages are calculated. The entire year average in addition to the seasonal averages accounts for the general hydrological state of the soil, which in contrast to atmospheric circulation has a memory of several weeks to months [Koster and Suarez, 2001; Seneviratne et al., 2006b]. Note that this multiyear averaging yields one single number per grid cell for each of these predictors. Together with elevation (accounting for much of the time-independent boundary conditions), these fields are used as predictors in the CART analysis. As predictand, patterns of persistent hot days are calculated, given as the maximum number of consecutive hot days, where a hot day is defined by Tmax > 30°C. This index is calculated for the summers (JJA) of each single year and is then averaged for each of the ERA40 and A1B periods. We note that these extremes are based on an absolute threshold, and indeed, our present analysis focuses on spatial variations in such absolute extremes. In some regions, they are thus not extremes in the sense of statistical outliers.

2.1.4. Land-Sea Mask

[12] One question we address in this study is the distinction between the influences from circulation and land-atmosphere interactions. Since evaporation, the key mediator of these latter interactions, is fundamentally different over open water compared to land, only grid cells consisting entirely of land should ideally be considered in the analysis. While the resolution of the ENSEMBLES simulations is high enough to provide a sufficient number of 100% land pixels (approximately 8000), we slightly relaxed this requirement to 90% for the ERA40 data in order to obtain a sufficiently large sample (only 15 grid cells over Europe are 100% land, while approximately 400 have at least 90% land.)

2.2. CARTs

[13] This section introduces CARTs and the assessment of their structural robustness by bootstrapping, followed by a brief description of their usage in our study.

2.2.1. CART Basics

[14] For many kinds of predictive tasks, one seeks to establish a functional dependence d between predictors from (multidimensional) predictor domain X and a predictand from (unidimensional) predictand space Y, d: XY. In our case, X has 8 dimensions and corresponds to the phase space spanned by the predictors (2 + 2 SLP statistics, 3 WBAL averages and elevation) and Y corresponds to the unidimensional phase space of the persistent hot days. Multilinear regression analysis is often used to model such a dependence d. However, it can be advantageous to define d piecewise, that is, to represent the dependence by several models, which are defined over different subdomains of X. See, for example, Schomburg et al. [2010] for an application of threshold based rules for downscaling of RCM output to spatial scales of less than a kilometer. CARTs also belong to this class of tasks, especially where piecewise constant models over disjoint subdomains of X are appropriate.

[15] An exemplary CART outcome is illustrated in Figure 1a for two predictors (spatial patterns of elevation and time-averaged water balance), which explain the predictand “pattern of heat waves.” In this example, strong heat waves occur only in regions of low elevation and mostly in those low elevation regions with a negative water balance. As shown, CARTs provide disjoint rectangular subdomains of X, Liequation imageX (hereafter called “leaves,” see below), and predict for each x within a given Li one common value of y.

Figure 1.

Conceptual output from a Classification and Regression Tree (CART) explaining heat waves by elevation and the surface water balance. (a) Partition of the domain of explaining variables, consisting of four subdomains (leaves) L1L4, divided by three splitting lines S1S3. (b) Tree-like representation of the partition with leaves L1L4 and splits S1S3. See also section 2.2.1.

[16] For a given training sample (x, y)k, k = 1 … N (where in our case k is the index of the grid points), CARTs iteratively construct the set of Liequation imageX which minimizes the sum of squared residuals,

equation image

although this minimum is possibly a local one, see below. d is piecewise constant over the Li, simply consisting of the averages of the yk values associated with the xk within the Li subdomains (or leaves),

equation image

where # (xkLi) denotes the number of xk within Li. Contrasting correlation and multilinear regression analysis, CARTs do not attempt to find one model that describes the dependence between x and y over the entire domain X, which enables them to naturally adapt to nonlinearities and interactions in the predictors.

[17] As seen in Figure 1a, the Li leaves are determined by binary splits Sl = (pl, vl) in single predictors, defining which predictor p splits a (sub) domain equation imageX and at which value v it does so. In Figure 1a they are represented by the dashed separating lines. Splits and subdomains/leaves can be conveniently represented in a tree like structure (see Figure 1b), where the branches of the tree represent the left or right sides of the Sl splits and the leaves at the very ends of the branches correspond to the Li subdomains. The tree-like representation is especially suitable for high-dimensional predictor domains (a representation such as the one in Figure 1a is obviously limited to the two-dimensional case) and can assume any shape, not necessarily symmetric as in Figure 1b.

[18] The CART algorithm starts by screening possible binary splits of all predictors (in terms of a “greedy search” [see Breiman et al., 1993]) and chooses the one that minimizes equation (1). The splitting is recursively repeated for all of the therewith identified subdomains, until preset parameters such as a required minimum of observations per subdomain prevent further splitting. To avoid overfitting of the data, the size of a tree (that is, the number of subdomains, or leaves, Li) is reduced (“pruned”) after growing the tree, usually based on cross validation. For growing and pruning of the trees, the rpart package is used, available for R (a statistical computing environment) [R Development Core Team, 2008].

[19] Complementing standard multilinear regression analysis, in CARTs both interactions and nonlinear relations in the data are taken into account in a natural way, see for example the (hypothetical) amplifying effect of low elevation with negative water balances onto heat waves in Figure 1.

[20] The fact that CARTs define the splits by screening the predictors separately can cause problems if predictors are correlated and the predictor choice for a given split becomes unstable. This is why ideally CARTs are applied to independent predictors. In our case, however, some of the predictor patterns are quite similar to each other, resulting for example in the high correlations between WBALANN and WBALMAM in Table A1. Another source for unstable splits is that CARTs often detect only secondary minima of equation (1), as a consequence of the “greedy search” approach. Bootstrapping can be used to assess the robustness of a tree in such cases.

2.2.2. Assessing the Robustness of CARTs

[21] Since CARTs are inherently unstable, an assessment of their robustness can be necessary. We are primarily interested in the structural robustness (see below) and try to evaluate it. To this end, for an individual RCM simulation, 50 bootstrap trees are grown, based on random subsamples consisting of 66% of the original (x, y)k data. If the structures of these 50 trees turn out to be similar, we assume that the tree gives a robust representation of the relationships within the complete (x, y)k sample. As described in the following, structural similarity is evaluated by analyzing the predictors chosen for the Sl splits, which define the Li leaves.

[22] Formally, the Li leaves can be obtained by a concatenation of left and right operators which return the subdomain of any domain equation imageX, which lies left (right) of a split Sl, that is,

equation image

[23] For example, L3 in Figure 1b can be written as L3 = left(S3, right(S1, X)).

[24] If one assumes two of the bootstrap trees to have the same number of Li leaves, which are additionally sorted in ascending order of y, then a direct pairwise comparison of the leaves from the two trees is possible. A measure of structural similarity for such a pair of leaves is the fraction of agreeing predictors chosen for their splits. That is, for each leaf of these two trees, the Sl splits in the concatenations of left and right operators defining the respective leaf are analyzed and the pl predictors from the involved Sl splits are extracted. In the example of L3 in Figure 1b, this would yield the set {elevation, water balance}, where elevation is from S1 and water balance from S3. If in one of the bootstrap trees L3 happened to be defined by elevation only, the intersection set of these two predictor sets ({elevation, water balance} and {elevation}) would be the set {elevation}. We define the similarity of the two leaves as the number of different predictors in the intersection set divided by the averaged numbers of different predictors in the two individual predictor sets. In the example, this would be 1/(equation image(2 + 1)) = 2/3. A value of 1 of this similarity measure indicates total agreement, while a value of 0 indicates total disagreement. The average of these values from all leaf pairs characterizes the similarity of the two bootstrap trees. A perfect structural similarity in this sense would be given if the predictors defining the leaves of the two trees agreed for each leaf. Such assessment requires two properties of the compared trees. First, the number of leaves has to be the same, and second, the leaves within each compared pair must correspond to each other. This is detailed in section 2.2.3.

[25] This metric is very rudimentary, but it captures the predictor choices which characterize the extreme event patterns within the trees, which is one of our main interests here. More sophisticated metrics could e.g. include the distance between the vl values in the corresponding Sl splits of two trees, the order of the chosen predictors from the “trunk” to the leaves or the distances of the splits from the trunk. Such extensions are left to future studies.

[26] We apply this framework to each RCM simulation and compare the structural similarities found between the bootstrap trees of one single RCM to the structural similarities between the trees of the different RCM simulations. If the similarities between the bootstrap trees are high, and in particular, are higher than the inter-RCM similarities, this supports the robustness of the structures identified for each RCM. See section 3.1 for the results of this assessment.

2.2.3. CARTs in Our Study

[27] Here we grow an individual CART for each RCM simulation (and the ERA40 reanalysis). As a result of the cross validation pruning, not all of the trees will have the same number of leaves. To enable their pairwise comparability, they are therefore first pruned further down to the minimum number of leaves found across all trees. Note that this pruning may lead to an overgeneralization for certain trees, however, it is needed for their intercomparability. The Li leaves are reordered with ascending values of y for a direct pairwise correspondence between the Li leaves from different trees. This pruning and reordering is applied both for the trees from the RCM ensembles and for the bootstrap tree ensembles used to evaluate their robustness.

[28] As already mentioned, the main interest lies on the partition of the predictor domain that is defined by a CART, and the relationships between the predictors and extreme event contained in it. In order to capture the qualitative aspects of these relationships (such as “long persistent hot days concur with pronounced water balance deficits”), we express the partition in terms of thresholds based on quantiles, which can thereby differ between the models. Thus in our evaluation, attributes like “long” or “pronounced” always refer to the predictor distributions of the individual models, which makes it independent of possible model biases and facilitates a consistent qualitative synthesis of the relations in the models.

[29] Note that even the most robust CART, like any other statistical method in general, cannot detect the causality of relations between different variables [see, e.g., Orlowsky and Seneviratne, 2010]. CARTs also cannot evaluate true physical relations if these are misrepresented in the RCMs. Physical interpretations of CART outcomes therefore always reflect the relations within the model world, like any other multimodel output analysis (e.g., based on ensemble averages).

3. Results

[30] This section describes the outcome of the CART analysis of persistent hot days (defined as the averaged yearly maximum length of consecutive days with Tmax > 30°C, see section 2.1). After assessing the robustness of the CARTs in our setup (section 3.1), CARTs are analyzed for the ERA40 (1961–2000) and the A1B (2061–2000) periods. Additionally, CART ensembles are grown for the patterns of change between the late 20th century (ERA40) and future (A1B) periods in order to investigate the roles of the different patterns of predictor changes for the patterns of persistent hot days changes (section 3.2). All results are further compared to classical cross correlations.

[31] Generally, the variance explained by the CARTs is rather high, e.g., in the ERA40 RCM ensemble it ranges between 66% and 84% with a median of 75%. The analyses for the future (A1B) patterns and the patterns of change between the ERA40 and future periods show similar values.

3.1. Robustness of the CARTs

[32] To evaluate the robustness of the CARTS, CARTs are grown for 50 bootstrap replications of the data of single models, and their similarity is assessed as described in section 2.2.2. From the 50 bootstrap trees, 1225 different pairs of trees can be compared. Depending on the model, the average similarities of all bootstrap tree pairs range approximately between 0.77 and 0.9 with typical standard deviations of 0.12, indicating a large structural similarity between the bootstrap trees of one model. The median over all of these intramodel similarities is approximately 0.87. The similarities between the tree pairs grown from the different models in the analysis of the preceding subsections are generally lower, being highest in the ERA40 simulations with an average of 0.51 (standard deviation 0.15).

[33] We therefore conclude that, despite the nonnegligible correlations between different predictors (Tables A1A3) and the possibility of secondary minima results from the “greedy search” algorithm, the trees for the single models indeed reveal robust structures of the predictor domain, which allows us to attempt a physical interpretation.

3.2. Findings and Interpretations

3.2.1. ERA40 Period

[34] For the persistent hot days in the ERA40 simulations, an average map of the persistent hot days predictions by the CARTs of the individual models is shown in Figure 2a. The pattern is in close agreement with the ERA40 and single-model persistent hot days patterns and the RCM ensemble average persistent hot days pattern (not shown), with the longest persistent hot days in Southern Europe, in particular around the Mediterranean and the Black Sea. While the averaged CART ensemble predictions thus do not add any value to classical ensemble averages, the CARTs themselves contain a large amount of information regarding the relationships between predictors and predictands. In the following, this information is evaluated in table-like figures.

Figure 2.

Averaged predictions from the trees grown for the eight RCMs (see Table 1). The numbers on the color bars give the averaged lengths (in days) of persistent hot days predicted by the CARTs of all RCM simulations for that tree leaf. Only grid cells consisting of 100% land are shown. (a) ERA40 period (1961–2000) persistent hot days. (b) The same as Figure 2a but for the A1B period (2061–2100). (c) Δ change patterns (A1B minus ERA40).

[35] For the ERA40 period, Figure 3 shows that SLP standard deviations in both spring and summer are the most frequently chosen patterns and show a clear signal. Only in the regions with the shortest lengths of persistent hot days, the standard deviation is high in a few models, and the longer the persistent hot days, the smaller the values of the standard deviation in almost all models. The water balance in spring, and, to a smaller degree, in summer also shows a distinct relation in the majority of the models: long persistent hot days concur with water balance deficits. The relation to the SLP average patterns and elevation is ambiguous. The systematic features found in the RCM ensemble agree with the tree evaluation of the ERA40 data (the left-most entry of each column), except for water balance in spring.

Figure 3.

Persistent hot days leaves, their predictors, and associated ranges (ERA40). In each of the two halves, the first row lists the predictors. The numbers in the second row refer to the RCMs in Table 1; “E” stands for ERA40. The following rows summarize the CARTs outcomes for each leaf (colors in the first column correspond to Figure 2a). For each Li leaf, the second and third columns indicate the fraction of grid cells belonging to that leaf and the predictions from the CARTs for the length of periods with persistent hot days, both denoted as average and range over all considered simulations (comprising RCM simulations and ERA40 reanalysis). Symbols in the four columns to the right are indicative of the cutoff values which are induced by the Sl splits associated with a leaf Li for predictor p. If for a given predictor p in leaf Li the associated cutoff value is greater than q80p (denoting the 80% quantile of the predictor values (xp)k, k = 1 … N), then a double up-arrow is used. Similarly, single up-arrows, single down-arrows, and double down-arrows correspond to cutoff values within [q60p, q80p] or [q20p, q40p] or below q20p, respectively. If predictor p is used in the Sl splits of leaf Li, but its associated cutoffs do not fall into any of the above categories, then a circle is used. If predictor p does not occur at all in the Sl splits, then a dash is used. See also section 2.2.1 and Figure B1 for the actual threshold values. The last row indicates the cross correlations between the predictor and the patterns of persistent hot days in each RCM/data set, using an open up-triangle (open down-triangle) for positive (negative) correlations and a solid up-triangle for correlations >0.5 (solid down-triangle for correlations <−0.5).

[36] Interestingly, three of the CARTs (for RCMs 2, 4 and 7) do not choose spring WBAL as predictor at all, but instead summer WBAL and elevation. In contrast, none of the CARTs which choose spring WBAL also chooses elevation, and only one uses summer WBAL. The bootstrap tree ensembles of the individual models show the same behavior, indicating true differences between the models. These differences, however, do not appear to be related to the implemented land surface models (listed in Table 1), although a more detailed investigation of the respective models would be necessary to fully address this question.

[37] These findings overall agree with the correlations between the predictors and the predictand from Figure 3 and Table A1: negative correlations with SLP variability and WBAL and around zero correlations with SLP and elevation. However, some interesting differences are found. For example, if predictors were chosen based on the strength of the cross correlations, WBALJJA and WBALANN would appear more important than WBALMAM. In the CART analysis, however, WBALMAM is much more prominent. This possibly reflects that the correlations measure the link between two entire fields, while the piecewise constant representation in the CARTs takes a much more local perspective. A low WBALMAM is of particular importance in the regions of long persistent hot days, which is why its influence is detected in the CARTs. Since these areas are rather small (see Figure 3) and the cross correlations also consider the large areas where WBALMAM does not matter as much, its influence is not so visible in the cross correlations.

[38] The CART results suggest the following physical interpretation. Low SLP variabilities correspond to a steady atmosphere in the months preceding and during the season of the extreme event. Together with the water balance deficits in spring, this could support the development of conditions such as low soil moisture (because of less rain-bringing storms, which would be associated with higher pressure variabilities), which are likely to favor the persistent high temperatures. A positive feedback with the water balance deficit in summer could lead to further temperature increases. This interpretation is in agreement with Black et al. [2004], who discuss these processes in the context of the 2003 heat wave, and Della-Marta et al. [2007] and Vautard et al. [2007], who analyze multidecadal observations.

3.2.2. Future (A1B) Period

[39] Simulations by the same RCMs for the 2061–2100 period, driven by different GCMs from the IPCC AR4 ensemble (see Table 1), are analyzed analogously. The average persistent hot days pattern shows a similar spatial distribution (Figure 2b), with the longest events occurring around the Mediterranean and the Black Sea. In agreement with several studies on extreme heat events in climate model simulations [Tebaldi et al., 2006; Beniston et al., 2007; Fischer and Schär, 2010; Orlowsky and Seneviratne, 2011], the persistent hot days see a substantial increase in length.

[40] Figure 4 evaluates the trees for the future (A1B scenario) simulations. Generally, the same relationships are found as for the ERA40 simulations (except for summer WBAL), however, the agreement between the models is much lower. One possible reason for the stronger agreement among the models in the ERA40 period simulations is the identical forcing applied to all models, whereas for the A1B future simulations, forcings from three different GCMs are used for the RCM simulations (see Table 1).

Figure 4.

Same as Figure 3 but for the future A1B period persistent hot days. Arrows are indicative of the cutoff values which are induced by the Sl splits associated with the leaves. See Figure B2 for the defining threshold values.

[41] Also for the future (A1B) simulations, the relations in the CARTs overall agree with the correlations between the predictors and the predictand (Figure 4 and Table A2): negative correlations with SLP variability and WBAL, and around zero correlations with SLP and elevation. While for the water balance the correlations between WBALANN and persistent hot days are the strongest and most consistent ones, similarly as for the ERA40 simulations, the CART analysis identifies WBALMAM as being more important. This again reflects the local focus of the CART analysis, a feature which is not accessible through correlation or multilinear regression analysis. Note that the correlations between the predictors and the predictand are generally stronger compared to the ERA40 simulations. This is probably a consequence of the threshold-based definition of the persistent hot days, which leads to more pronounced patterns in the warmer A1B simulations.

3.2.3. Differences Between Two Periods (Δ Changes)

[42] The results so far do not give any clear indications regarding the mechanisms that cause the changes between the persistent hot days patterns of the two periods. We therefore additionally subject the change patterns of predictors and predictands to the same CART analysis. The “Δ change” patterns consist of the difference patterns between future A1B and ERA40 period for extreme event and predictor patterns (except for elevation). Figure 2c reveals that the increases in persistent hot days is strongest in Southern Europe, that is, in the regions with longest persistent hot days in the ERA40 period.

[43] In Figure 5, triangles summarize the Δ changes: The symbols open up-triangle (open down-triangle) are used for strictly positive (negative) changes associated with a given leaf, and the symbols solid up-triangle and solid down-triangle are used for ‘strong’ Δ changes. For a given model and predictor, the threshold of a strong change is defined as the maximum of two numbers, namely the median of all positive Δ changes and the absolute value of the median of all negative Δ changes. See Figure B3 in Appendix B for the actual threshold values.

Figure 5.

Same as Figure 3 but for persistent hot days Δ change leaves and their predictor Δ changes (A1B minus ERA40). Triangles are indicative of (strong) positive/negative changes (except for elevation): open up-triangle (open down-triangle) for Δ changes >0 (<0); solid up-triangle (solid down-triangle) for “strong” Δ changes; see section 3.2 for details and Figure B3 for the threshold values.

[44] Figure 5 shows that in the regions where the periods of persistent hot days lengthen most, a decrease of the spring water balance occurs in half of the models, while all other predictors have either inconsistent or extremely weak signals limited to single models. This is also in agreement with the cross correlations in Figure 5 and Table A3, where spring WBAL shows negative correlations in all RCMs, whereas the other predictors either have low correlations and/or range from positive to negative correlations.

[45] This means that for the Δ changes from ERA40 to future A1B conditions, it is a different predictor that is most robust across the RCM ensemble compared to the outcome of the analysis for the single periods. Namely, the most robust driver for the lengthening of the periods of persistent hot days is a decreasing water balance, while in the ERA40 and the A1B periods taken separately, patterns of persistent hot days are more strongly explained by SLP variability (although the signal is weaker in the future A1B simulations). The decrease in the spring water balance proposes soil moisture depletion and thus reduced evaporative cooling as an explaining mechanism underlying the lengthening of periods of persistent hot days. This interpretation is in agreement with several model studies [Seneviratne et al., 2006a; Diffenbaugh et al., 2007]. Furthermore, Rowell and Jones [2006] find in a high-resolution GCM that the soil moisture decline in spring is a main contributor to future summer drying and associated surface heating.

[46] Generally, the systematic features of Figures 35 are not directly attributable to any specific selection of land surface models. The same holds for the driving GCMs, which do not systematically steer the CART outcome, although more in-depth investigation is needed to address these questions.

3.2.4. Summary

[47] To summarize the results for the periods of persistent hot days, Figure 6 aggregates the symbols of Figures 35 and gives the ranges of the quantile thresholds associated with the symbols over the model ensemble (compare to Figures B1B3 in Appendix B). Figure 6 shows at a glance that in the RCMs the periods of persistent hot days concur with low variabilities of SLP. Water balance is of importance for persistent hot days in the ERA40 and future A1B period. It also appears critical for the Δ changes (lengthening) of the periods of persistent hot days.

Figure 6.

Summary table on the factors related to the persistence of hot days. For a particular predictor, the down-arrow for the ERA40 and A1B periods indicates that in at least one of the two strongest extreme event leaves at least four models show the down-arrow or double down-arrow and no model shows the up-arrow or the double up-arrow. The double down-arrow indicates that this condition holds for at least seven models. Up-arrows and double up-arrows are used analogously. For the Δ changes, the triangles correspondingly indicate agreement on strictly positive or negative changes. A circle means that the predictor is part of the trees but does not show a consistent signal. A dash indicates that the predictor is not selected. The numbers give the ranges over all models of the thresholds corresponding to the 60% (40%) quantile, defining the up-arrow (down-arrow) in Figures 35. Units are hPa for SLP and mm/d for WBAL.

[48] Especially the ranges of the water balance values are large. This reflects the known climate model biases and uncertainties regarding precipitation and land-atmosphere interactions (see section 1). Ensemble averages of these quantities have thus to be interpreted carefully [see also Orlowsky and Seneviratne, 2010]. The more qualitative perspective adopted in our study partly avoids this bias problem (though it cannot compensate for any misrepresentation of physical processes in the models).

4. Conclusions

[49] We explore the potential of Classification And Regression Trees (CARTs) for investigating spatial relations between different time-aggregated climate patterns and the spatial distribution of persistent hot days in a multimodel ensemble, consisting of RCM simulations from the ENSEMBLES project and ERA40 reanalysis data. Our goal is a spatial investigation of the regions where seasonally averaged patterns of sea level pressure (SLP) and its variability (together representing circulation), as well as average water balance (assessed from precipitation minus evapotranspiration, representing the influences of soil moisture changes and corresponding land-atmosphere interactions) are of importance to patterns of persistent hot days. Note that by excluding the temporal dimension from our analysis, we miss important relationships between predictors and the predictand. However, this simplified setting provides a suitable test bed for the proposed approach.

[50] Ensembles of CARTs are grown for RCM simulations of two periods, namely the ERA40 period (1961–2000, ERA40-driven) and a corresponding period at the end of the 21st century (2061–2100, A1B driven). The ERA40 reanalysis itself is added to the RCM ensemble of the ERA40 period. The CART ensembles of these two periods describe the relation between the patterns for the periods of persistent hot days and several predictor patterns in two different climate states. By also analyzing the Δ change patterns (future A1B minus ERA40 patterns), the relevance of the different predictor Δ patterns for the Δ patterns of the persistent hot days is evaluated.

[51] Generally, the results for the ERA40 reanalysis agree with those for the ERA40-driven RCM simulations. While not being fundamentally different in the future A1B simulations, the detected relations are much clearer in the ERA40 simulations, probably as a consequence of the uniform forcing from the ERA40 reanalysis.

[52] 1. The longest periods with persistent hot days occur in the Mediterranean and around the Black Sea and increase in the future (A1B) simulations.

[53] 2. The SLP variability is a robust predictor for both periods. It tends to be low in regions with long periods of persistent hot days (Mediterranean and Black Sea). This makes sense since a certain atmospheric stability seems to be necessary for persistent hot days to develop.

[54] 3. Long persistent hot days are associated with high water balance deficits. This hints at a role of limited soil moisture availability in these regions (Mediterranean and Black Sea), associated with low evaporative cooling.

[55] 4. For the analysis of the two periods, the models agree more on the relevance of circulation than with respect to the water balance and possible land-atmosphere interactions, although this signal is clearer in the ERA40 simulations.

[56] 5. By contrast, the patterns of Δ changes of persistent hot days lengths are mainly related to Δ changes of the water balance, where the regions of strong lengthening are also found to show systematic decreases in the spring water balance. This is plausible because of enhanced soil moisture depletion and thereby suppressed evaporative cooling.

[57] Thus, the applied approach allows for a direct comparison of the relations within climate states (like in the ERA40 and A1B periods) versus the changes from one state to the other. For the considered patterns of persistent hot days, this comparison reveals that the main driving patterns indeed differ (steady atmosphere in the climate state versus increasing water deficits in the Δ change analysis).

[58] Since CARTs are sensitive to correlations between the predictors and can get stuck in unoptimal partitions, the robustness of the CARTs is evaluated by a bootstrap validation. The structural similarity between the bootstrap CARTs from one model turns out to be much higher than the similarity between trees from different models, supporting the robustness of our findings.

[59] Our results generally agree with findings from a spatial cross correlation analysis between the predictors and predictands, which further supports the robustness of the CARTs. However, the cross correlations can only resolve the linear part of the relations and do not allow for the detailed characterization of extreme event classes (including their occurrence in space), which is possible from the structural analysis of CARTs.

[60] In contrast to the classical ensemble average, the information of the individual models remains traceable in the output from our CART analysis, which can be helpful for model improvement. Since the analysis focuses on qualitative relationships between predictors and predictand, it is less affected by model biases, which at times make ensemble averages difficult to interpret. Furthermore, the analysis is not restricted to spatial patterns as applied in this study, and in particular, could be generalized to time dependent data. We therefore suggest that this methodological framework can serve as a useful extension of multimodel analyses, in order to yield transparent information on relations between climate variables without the need for computationally expensive numerical experiments.

Appendix A:: Cross Correlations

[61] Tables A1A3 contain the cross-correlation matrices of the predictors and the persistent hot days for the ERA40 simulations, the A1B simulations and the Δ changes between them. Given are the median and min/max range across the RCM ensemble (including the ERA40 reanalysis in Table A1).

Table A1. Cross Correlations in Space Between All Predictor and Predictand Pairs for the ERA40 Persistent Hot Days Dataa
  • a

    Bold numbers give the median over all RCMs and the ERA40 reanalysis; the line below contains the minimum/maximum range.

Hot days1.−0.7.−0.4−0.7.−0.5−−−−0.6.−0.3−0.5.−0.2−
σ (SLPMAM)−0.6.-0.1−0.5.-0.1−−−0.5.-0.4
σ (SLPJJA)−0.7.−0.1−0.6.−0.2−−−0.5.−0.4
WBALANN       1.00.4
Elevation        1.0
Table A2. Cross Correlations in Space Between All Predictor and Predictand Pairs for the A1B Persistent Hot Days Dataa
  • a

    Bold numbers give the median over all RCMs; the line below contains the minimum/maximum range.

Hot days1.−0.8.−0.6−0.7.−0.6−−−0.6.−0.1−0.8.−0.0−0.6.−
σ (SLPMAM)−−−−0.5.−0.5
σ (SLPJJA)−−−−−0.5.−0.5
WBALANN       1.00.2−
Elevation        1.0
Table A3. Cross Correlations in Space Between All Predictor and Predictand Pairs for the Δ Change Persistent Hot Days Dataa
  • a

    Bold numbers give the median over all RCMs; the line below contains the minumum/maximum range.

Hot days1.−−−−−0.7.-0.2−−0.7.-
σ (SLPMAM)−0.2−0.10.1−−−−−0.6.-0.1−−
σ (SLPJJA)−−−−−−
WBALANN       1.00.3−
Elevation        1.0

Appendix B:: Quantile Thresholds and Symbols

[62] Figures B1B3 contain the quantile-derived thresholds for the arrow and triangle symbols in Figures 35. See also the Δ change paragraphs of section 3.2.

Figure B1.

Quantile thresholds and corresponding symbols for Figure 3, analyzing patterns in the ERA40 period.

Figure B2.

Quantile thresholds and corresponding symbols for Figure 4, analyzing patterns in the future A1B period.

Figure B3.

Quantile thresholds and corresponding symbols for Figure 5, analyzing Δ-change patterns (except for elevation).


[63] Support from the German Science Foundation DFG (grant Or256/1-1)a and ETH Zurich is acknowledged. The authors are grateful to three anonymous reviewers, whose comments and suggestions substantially helped to improve the manuscript.