New cloud and microphysics parameterisation for use in high-resolution dynamical downscaling: application for summer extreme temperature over Belgium

Authors


Abstract

We explore the use of high-resolution dynamical downscaling as a means to simulate extreme values of summer maximum surface air temperature over Belgium (TMAX). We use the limited area version of the ARPEGE-IFS model, ALADIN. Our approach involves a sequence of daily integrations driven by perfect boundary conditions at the lateral boundaries provided by the ERA40 reanalysis. In this study, three different recent past (1961–1990) simulations are evaluated against different station datasets: (1) 40 km spatial resolution (ALD40), (2) 10 km spatial resolution (ALD10), and (3) 4 km spatial resolution (ALR04) using a new parameterisation of deep convection, and microphysics allowing the use of ALADIN at resolutions ranging from a few tens of kilometers down to less than 4 km. The validation of ALD40 reveals a positive summer bias (2.2 °C), even though the considerable spatial resolution enhancement by a factor of 4, ALD10 reduces slightly the warm biases (1.7 °C). This warm bias on TMAX is strongly correlated with cloud cover representation. Result shows an overestimate of clear-sky occurrences by ALD10 and a developing solar radiation overestimate through the diurnal cycle, with 116 W m−2 maximum overestimate at noon. ALR04 reduces considerably the warm biases (−0.2 °C) which suggests of its ability to simulate weakly forced convective cloud in the summer over Belgium. In addition, the Generalized Pareto Distribution (GPD) of the extreme high temperatures produced by the different simulations has been compared to observations of the same period. ALD40 and ALD10 gave a GPD distribution that did not replicate the observed distribution well and, thus, overestimated the extremes. This study shows that the consistent treatment of deep convection and cloud–radiation interaction when increasing the horizontal resolution is very important when studying extremely high temperatures events. Copyright © 2011 Royal Meteorological Society

1. Introduction

Understanding feedbacks involving the cloud–radiations interaction is essential with respect to simulating anthropogenic climate change through climate models. As mentioned in the Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (Randall et al., 2007), large differences exist between climate models in their simulated cloud–radiation feedbacks, which are the main source of uncertainty in climate model sensitivity to a doubling of atmospheric CO2. In fact, clouds play an important role in long-term climate changes through their impact on the surface radiation budget which is one of the main controls on key surface variables, such as the surface air temperature. Because the simulated surface radiation budget is mainly controlled by downwelling shortwave and longwave radiation, it is therefore highly dependent on the representation of cloud amounts, microphysical processes, and cloud–radiation interactions. Increased cloud cover, particularly of low clouds, leads to a greater fraction of reflected solar radiation and therefore cooling of daily maximum temperature (TMAX) (Groisman et al., 2000; Sun et al., 2000). In comparison clouds have a relatively small net effect on daily minimum temperature (TMIN) (Dai et al., 1999). A systematic bias in the simulated surface radiation budget can lead to errors in surface air temperature, with the potential for subsequent error propagation throughout the simulated climate system (Liang et al., 2008). Lobell et al. (2007) analysed changes in mean daily maximum temperature and their relation with cloud cover using projections for 12 climate models under an A2 emission scenario. They find that intermodel standard deviations of June–August mean daily maximum temperature are more than 50% larger than for the mean daily minimum temperature. Model differences in cloud changes, which exert relatively greater influence on TMAX during the summer were identified as the main source of uncertainty disparity, which highlight the importance of considering separately projections of daily maximum temperature when assessing climate change impacts, even in cases where average projected changes are similar. Parey (2008) made a comparison of current climate high summer temperature distributions given by observations in France and climate model results from the European PRUDENCE (Prediction of Regional scenarios and Uncertainties for Defining European Climate change risks and Effects, Christensen et al., 2007) using the Generalized Extreme Value distribution. She found that all models fail in reproducing all features of high-temperature distribution, while they correctly reproduce the mean and standard deviation over the whole sample of summer temperatures.

A number of studies have evaluated simulated cloud amounts and surface radiation budget in climate models with different modelling tools such as global climate models (GCMs, Martin et al., 2006; Williams et al., 2006) or regional climate models (RCMs). While GCMs have improved in terms of accuracy of the simulation of the large-scale behaviours of the atmosphere, there are still difficulties in capturing small-scale processes due to their coarse resolution. GCMs can also suffer from circulation errors, often with origins that are remote to the region of study, which makes it difficult to evaluate the simulated surface energy budget against point observations. However, it is widely recognized that RCMs are more skillful at resolving fine-scale systems that define the local climate than the driving GCMs, especially for near-surface variables. This improvement is a direct result of the spatial resolution enhancement in RCMs versus GCMs (Leung and Qian, 2003). Certain GCM systematic biases, however, cannot be removed simply by increasing spatial resolution (Risbey and Stone, 1996; Marshall et al., 1997). A more important factor is that RCMs employ higher grid resolution than GCMs, have more realistic topography, better resolve convective-scale processes, and incorporate more realistic representation of surface–atmosphere and cloud–radiation interactions (Leung et al., 2003; Han and Roads, 2004). A special type of simulation for the past are global reanalysis such as the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis ERA40 (Uppala et al., 2005), in which data assimilation methods are used to find optimal estimates for past atmospheric states that are consistent with meteorological observations and the model dynamics. This global reanalysis covers the last 40 years and can be used to provide the so-called perfect boundary conditions for RCMs. Thus, the simulated RCM processes resulting from local interaction between the model parameterisations and the resolved dynamics can be validated with the observations in order to detect systematic biases of the RCM before simulating future climate change.

The limited area version of the ARPEGE-IFS forecast system (ALADIN, Bubnova et al., 1995) is presently being evaluated for use as a new operational RCM for regional climate change projections over Belgium. It has been used in numerical weather prediction (NWP) by a wide community since the 1990s and, more recently, in regional climate modelling (Farda et al., 2007; Radu et al., 2008). Within the framework of the CECILIA (Central and Eastern Europe Climate Change Impact and Vulnerability Assessment, www.cecilia-eu.org) project Skalák et al. (2008) and Farda et al. (2010) evaluated the ALADIN-Climate model over the Czech Republic under the high resolution of 10 km and using the ERA40 reanalyses as perfect lateral boundary conditions. They have identified summer biases for the maximum 2-m temperature. Using the limited-area version of the Global Environmental Multiscale Model (GEM-LAM, Zadra et al., 2008) Markovic et al. (2008, 2009) have also identified summer biases over North America for 2-m temperature coupled respectively with an overestimation of shortwave and longwave downward radiation and with an underestimation of the cloud fraction (CF).

Recently, Gerard et al. (2009) tested a new parameterisation of cloud and precipitation allowing the use of the ALADIN model in the resolution range of 3–8 km (the so-called gray zone) where deep convection is neither fully subgrid nor fully explicit. Common current options in NWP and climate model of either keeping a standard parameterisation or assuming explicit convection do not appear to give a universal solution working for various atmospheric situations (Gerard et al., 2009). Thus, a new approach was proposed, with an integrated sequential treatment of resolved condensation, deep convection, and microphysics together with the use of prognostic variables. This new parameterisation allows for the production of consistent and realistic results at resolutions ranging from a few tens of kilometers down to less than 4 km. It also allows handling feedback mechanisms present in nature and up to now only treated in models where the mesh size permits a fully explicit simulation of convective clouds (more details can be found in Gerard et al., 2009). A systematic verification of this new version of the model, called ALARO, with respect to observations at 7 km resolution has been done since January 2010 at the Royal Meteorological Institute of Belgium. The results (not shown) present better skill scores over Belgium with the new parameterisation than with the diagnostic convection scheme based on Bougeault (1985) and improved as described in Gerard and Geleyn (2005). Nearly all the difference between the two configurations resides in the convection scheme and its interactions with the moist physics.

The objective of this study is then on the use of reanalysis-driven simulations: (1) to explore the ability of high-resolution dynamical downscaling with the finest grid size of 4 km and sophisticated model physics scheme to better represent summer maximum surface air temperature over Belgium with emphasis on reproducing the extremes, and also (2) to gain a better understanding of the origin of the warm biases found in the aforementioned ALADIN-Climate references. Work with GCM-driven simulations is underway to investigate postulated changes in extreme surface air temperature under anthropogenic global warming within an interdisciplinary study of climate change and land-use change effects on heat stress and air quality over Belgium (Delcloo et al., 2010).

2. Data and methods

2.1. Model and experimental design

We dynamically downscale the ERA40 re-analysis data (Uppala et al., 2005), produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) using the limited area model (LAM) ALADIN developed by the ALADIN international team (1997) (ALADIN is running operationally at the Royal Meteorological Institute of Belgium). In the first step, the ALADIN model is coupled to the ERA40 data and run at a resolution of 40 km on a domain (DOM1, Figure 1) encompassing most of Western Europe (ALD40 referred to as the control run). To increase the spatial resolution above Belgium, a nested domain (DOM2, Figure 1) is added and two sensitivity runs are conducted with the following resolutions: (1) 10 km spatial resolution (ALD10), and (2) 4 km spatial resolution using the sophisticated model physics allowing the use of the ALADIN model in the resolution range of 3–8 km (ALR04). The differences will provide a measure of uncertainty in the downscaling technique with respect to the effect of: (1) increasing the horizontal grid spacing (ALD40 versus ALD10), and (2) increasing the horizontal grid spacing using sophisticated model physics scheme (ALD40 versus ALR04). This new parameterisation of cloud and precipitation was also tested in this study (not shown) at 40 and 10 km horizontal resolution (ALR40, ALR10) and our experimentation has shown comparative skill scores over Belgium with the new parameterisation of Gerard (2009) or with the diagnostic convection scheme of Bougeault (1985). At those resolutions we can consider to be out of the gray zone. The ALD04 simulations, i.e. running the model with the diagnostic could parameterisation at higher resolution, is not physically possible because of: (1) the missing or unsatisfying representation of some physical phenomena, (2) using the parameterisation in conditions where its assumptions become invalid which will lead to erroneous conclusions.

Figure 1.

Top: Domains for ALADIN simulations, DOM1 represents the 40 km horizontal resolution, whereas DOM2 represent the nested domain with 10 and 4 km horizontal resolution. Bottom: Map of Belgium showing the location of the 50 climatological stations and the names of the 9 stations also belonging to the synoptic network of the Royal Meteorological Institute of Belgium

Our procedure is to interpolate the original ERA40 files to 40 km resolution. These 6-h files serve as initial and boundary conditions for a 48 h ALADIN run at 40 km resolution (ALD40). These are started at 00 UTC every day. The (3-h) output from this first run serves as input for the high-resolution 10 km run (ALD10) and also for the very high-resolution 4 km run (ALR04). However, to avoid spin-up problems, the first 12 h are not taken into account. So we have 36 h of data left for the 4 and 10 km runs (which thus start at 12 UTC). Finally, we again dismiss the first 12 h of the runs, to arrive at 24 h of (3 h) output at 4 km and 10 km resolution, and then integrate/re-initialize over each subsequent 24 h period during the summer period of June–July–August, 1961–1990.

We recognize that the daily re-initialisation limits an equilibration of the surface physics (soil moisture and temperature), which is particularly desirable in long-term regional climate modelling (e.g. Giorgi and Mearns, 1999). However, the benefit of re-initialisation is that it prevents error growth in the form of a succession of convective scale interactions and feedbacks that, consistent with Lorenz's (1969) theoretical prediction can erroneously saturate the solution. In e.g. Qian et al. (2003) it was shown that one should not let a regional model run unattended for a long period of time. After a number of days, the model diverges from the coupling data to arrive at some (constant) bias (Nicolis, 2003, 2004). The feedback mechanism between this new cloud and microphysics parameterisation and the land surface scheme of ALARO will be studied in a subsequent work using long-term continuous run.

2.2. Station data

The station data used in this study has been retrieved from the climatological network of the Royal Meteorological Institute (RMI) of Belgium. It is based on the daily maximum temperature (TMAX) over the period of 1961–1990. The climatological network is nowadays composed by more than 250 stations. However, in order to have a homogeneous network without substantial interruptions, 50 stations were selected. The selected stations, which are geographically dispersed around Belgium (Figure 1), represent the conditions for a mixture of both coastal and inland locations. From the 50 stations, 9 (names are given in Figure 1) belong also to the synoptic network of the RMI where much more variables at a much higher rate are measured. This network started to operate in 1952, however, it is less dense (39 stations in total) than the climatological one and some stations display long periods with missing data. Measurements provided by the synoptic and climatological network have been extensively used for climatological analysis (Sneyers, 1975, 1986) and more recently (Hamdi and Van de Vyver 2011; Van de Vyver, 2011).

A further dataset used in this study is the ground-based measurement of solar radiation. The RMI has a long-term experience with uninterrupted 30 min average measurements in Uccle, some 6 km south of the centre of Brussels, since 1951. Uccle is one of the 22 Regional Centers established within the WMO Regions. The usual solar radiation parameters measured on the ground are the global solar irradiance (a measure of the rate of total incoming solar energy, both direct and diffuse, on a horizontal plane at the Earth's surface), the direct solar irradiance (a measure of the rate of solar energy arriving at the Earth's surface from the Sun's direct beam, on a plane perpendicular to the beam), and the sunshine duration (defined to be the sum of all time periods during the day when the direct solar irradiance equals or exceeds 120 W m−2).

2.3. Grid point selection

The first step in comparing models with observations consists in extracting time series from model simulations, which can correspond to each observational series. This presents a difficulty because the parameter values in the model represent the value of that parameter in a grid box centred on the geographic coordinates given in the dataset, while the station data represents a single site within a grid box. Different approaches can be investigated to identify the appropriate method for comparison with the observed values (Mooney et al., 2010). However, the simplest technique considered here is to compare the value observed at the station with the series of the nearest grid box. As the aim of this work is to study the behaviour of extreme values, this simplest method has been preferred to any averaging or combination of grid points, which could smooth or alter extreme values (Parey, 2008). The selection of the nearest grid point has been done automatically by comparing grid point distances from observational series location and then choosing the nearest grid point over land. Fifty time series are extracted from each model simulation and each temperature time series is corrected to fit the same altitude as the nearest observation series. This has been done by adding the standard atmosphere gradient of 6.5 K/km to the original model temperature values, according to the altitude difference between the model grid point and the observational station.

2.4. Extreme value analysis with threshold models

We summarize the basic elements of modelling extreme events using Peak-Over-Threshold (POT) methods, the mathematical description is contained in classical textbooks of Coles (2001), Embrechts et al. (1997), Beirlant et al. (2004).

2.4.1. The Generalized Pareto Distribution

Suppose we have a sequence of independent and identically distributed random variables X1, …, Xn from an unknown distribution function F. We are interested in the excesses over a high threshold t. Let x0 the finite or infinite right-endpoint of the distribution F. We define the distribution function of the excess Y = Xt over the threshold t by

equation image(1)

where 0⩽y < x0t.Ft(y) is thus the conditional probability that the threshold is exceeded by no more than an amount y, given that the threshold is exceeded. The cumulative distribution which comes to the fore in the modelling of excesses is the Generalized Pareto Distribution (GPD)

equation image(2)

where σ is the scale parameter and γ the shape parameter. A well known result is that the GPD is the limiting distribution for the distribution of excesses Equation (1) as the threshold tends to the right-endpoint:

equation image(3)

Then the tail of the distribution function F can be easily approximated as follows. For x > t, it follows that

equation image(4)

where ζt = P{X > t}.

The estimation of the GPD's parameters can be obtained with well known techniques such as the maximum likelihood method (ML), method of moments (MOM), method of probability-weighted moments (PWM), and L-moments method. The event which on the average exceeds once in T year is called a T-year event (or quantile) and is denoted by xT. It can be calculated with the following formula (Stedinger et al., 1993)

equation image(5)

where λ is the average number of excesses per year of the threshold t. One easily shows that

equation image(6)

2.4.2. Bivariate threshold excess model

A bivariate version of Equation (4) (Coles, 2001) can be obtained as follows. Suppose equation image are independent realisations of a random variable equation image, equation image with joint distribution F. For suitable thresholds t1 and t2, the marginal (i.e. univariate) distributions of F each have an approximation of the form Equation (4), with respective parameter sets equation image, i = 1, 2. When introducing the transformations

equation image(7)

we get

equation image(8)

where

equation image(9)

and H is a distribution function on [0,1] such that equation image. The function V is called the pairwise extremal dependence function and it captures the bivariate dependence structure. A popular class is the logistic family

equation image(10)

for a parameter α∈(0, 1). Here α = 1 corresponds to the case of independent variables while α = 0 presents the full dependent case.

The maximum likelihood method is used to estimate the dependence value α. Observe that model Equation (5) is only specified on the region xi > ti, and hence does not apply directly to observations outside that region. However, it may happen that for an observation (x1, x2) we have, for example, x1 > t1 and x2 < t2. The likelihood function that takes account with such situations is given by

equation image(11)

where

equation image(12)

with F and its partial derivatives computed according to the right-hand side of Equation (8).

The maximum likelihood estimation for α is obtained by maximising the log-likelihood

equation image(13)

3. Results and Discussion

3.1. Bias

3.1.1. 2-m temperature

Figure 2 shows 30-year average summer biases of the daily maximum temperature that were obtained by comparing the three ERA40-driven simulations ALD40 (Figure 2(a)), ALD10 (Figure 2(b)), and ALR04 (Figure 2(c)) to observations from the 50 climatological stations. The ALD40 values are higher than observed particularly near the coast where warm bias exceeds 2 °C. Even though the considerable spatial resolution enhancement by a factor of 4, ALD10 reduces slightly the warm biases. The average bias over the 50 climatological stations between model simulations and observations is 2.2 °C for ALD40 and 1.7 °C for ALD10. This indicates that summer warm bias cannot be removed simply by increasing spatial resolution. Previous downscaling studies for other geographical areas have shown error magnitudes comparable to or larger than those computed in this study (Lim et al., 2007; Caldwell et al., 2009; Lynn et al., 2010; Paquin-Ricard et al., 2010). Skalák et al. (2008) performed ERA40-driven simulations with the regional climate model ALADIN at 10 km resolution over the Czech Republic. They found that the model has a summer warm bias of 1.9 °C for the maximum 2-m temperature. This study also reported that the error magnitude in summer TMAX tends to be larger than that in winter, indicating that the correct simulation of TMAX is more challenging in summer.

Figure 2.

Spatial distribution of 30-year average summer biases (model minus observed) of the daily maximum temperature obtained with ALD40 (a), ALD10 (b), and ALR04 (c). The mean bias over the 50 climatological stations is indicated at the top of each sub-figure

The ALR04 considerably reduces warm biases. Although the signs of the average biases are opposite between the ALD10 and the ALR04, the latter produces values much closer to observed over the whole country, reducing the overall average absolute biases to 0.2 °C. This suggests that major biases of the ALD40 and ALD10 simulations are mainly produced by processes internal to the ALADIN domain, and not by large-scale circulation patterns near the lateral boundaries of the ALADIN domain. Thus, the high-resolution dynamical downscaling with the finest grid size of 4 km and sophisticated model physics scheme is able to correct these summer biases.

3.1.2. Cloud fraction

To investigate the possible causes of the warm bias, simulated cloud cover from the ALD10 and the ALR04 runs are compared to observed cloud cover over the period of 1961–1990.

Figure 3 shows a normalized frequency distribution of 3-h mean cloud fraction occurrences at the 9 stations. Although the general shape of the frequency distribution is well captured by ALD10 and ALR04 runs, this figure clearly reveals an overestimate of clear-sky (CF < 10%) occurrences by ALD10 for all stations. ALD10 generally underestimates CF, ranging from 27 in Spa to 43% in Koksijde and Middelkerke at the coast. This underestimation is reduced to less than 15% for the ALR04 simulation. This suggests the ability of ALR04 to simulate weakly forced convective cloud in the summer over Belgium. However, both simulations underestimate the occurrence of overcast conditions (CF > 80%). In fact, the diagnostic of total and partial cloud cover (low, medium, high, and convective) is computed in the ALADIN model with two possibilities: (1) random overlap of adjacent clouds assumption and (2) maximum overlap of adjacent clouds. When using the maximum overlap assumption, the occurrence of cloud covers near 100% is clearly underestimated with respect to the observed frequencies. Recently, Wittmann (2010, personal communication) improved the quality of total and partial cloud cover diagnostics by introducing a near maximum overlap solution. A systematic verification for 9 stations in Austria has been done for June 2008 and the cloud cover near 100% are represented much better (not shown). This solution will be used for the GCM-driven simulations. It is important, however, to underline that this new solution does not affect any other model fields, except the 1D fields for total and partial cloud cover, being without any implication for the result of this work.

Figure 3.

Frequency of occurrence of 3-h mean cloud fraction for the 9 stations belonging to the synoptic network of the Royal Meteorological Institute of Belgium (Figure 1 gives their spatial locations)

3.1.3. Surface solar radiation

In this section, we compare simulated and observed 30-year mean diurnal cycles of surface solar radiation at the Uccle station. Unfortunately measurements of the downwelling longwave radiation are not available, therefore, a complete analysis of the observed and modelled surface radiation balance is not given in this study. In order to be able to compare the representation of the solar radiation fluxes that are isolated from the confounding effects of either could fraction or cloud-radiation parameterisation errors, we analyse in Figure 4 surface solar radiation fluxes for all-sky conditions and clear-sky conditions only (CF < 10%). ALD10 is clearly associated with a developing solar radiation overestimate through the diurnal cycle, with 116 W m−2 maximum overestimate at noon. The overestimation is reduced to 31 W m−2 for the ALR04 simulation. The overestimation of the total solar radiation for all-sky conditions is significantly reduced when only clear-sky conditions are considered. For ALD10, the all-sky overestimate of 31 W m−2 is reduced to 6 W m−2 around noon, whereas, for the ALR04 simulation, the all-sky overestimate now becomes an underestimate of 6 W m−2 around noon. Thus, the simulated downwelling solar radiation in clear-sky conditions does not appear to be the main cause of the overestimate of the surface air temperature (Figure 2).

Figure 4.

30-year mean diurnal cycle of 3-h mean shortwave solar radiation observed at Uccle and simulated with ALD10 and ALR04, (a) for all-sky conditions, and (b) for clear-sky conditions

This may not explain the whole temperature bias, but a more detailed examination of the physical causes of this positive summer bias is beyond the scope of this study.

3.2. Extreme value analysis

3.2.1. Calibration of the classical POT-models

One should keep in mind that classical extreme value theory assumes that the underlying variables are independent, a condition which is usually not met in the physical world. Extremes are often synonymous with clusters of large values. A usual approach selects the highest value in a cluster of extremes that exceed the threshold. In addition, two subsequent POT events are considered independent when the minimum separation time between the events is four days, in accordance with other studies (e.g. Brabson and Palutikof, 2002; Kyselý et al., 2010; Van de Vyver, 2011), which corresponds with the duration of mesoscale heat waves. After declustering, we calibrate the univariate GPD Equation (2) to the selected extremes of the observations and the model's predictions ALD40, ALD10, and ALR04. The threshold value for each dataset is chosen in such a way that there are, on average, λ = 5 excesses per year. We have used the classical two-sample Kolmogorov–Smirnov test to investigate whether the underlying probability distribution of the POT events of the observations and a model mutually differs. The test statistic is Dmath image = Max|Fmath image(x)− Fmath image(x)|,

Where Fmath image(x) and Fmath image(x) are the empirical distribution functions of the observations and the model respectively, and ni refers to number of samples. The null hypothesis is rejected at level α if

equation image(14)

where Kα is the critical α-level of the Kolmogorov distribution:

equation image(15)

In Figure 5 we have plotted K of observations and model output for the 50 climatological stations. The null-hypothesis that the extremes of observations and model output belong to the same population is accepted at the 95% level at 4 and 2 locations for ALD40 and ALD10, respectively. In contrast, the null-hypothesis is accepted at 25 locations when using ALR04. Overall, it can be seen that K is generally much smaller for ALR04 compared to the other simulations, indicating that this simulation produces the best estimations of the extreme events. We calculate the 5- and 20-year events by Equation (5). The estimation results are displayed in Figure 6. It can be clearly seen that the predictions of ALR04 are fairly close to those of the observations, while the other models significantly fail to reproduce the extreme-value statistics.

Figure 5.

Two-sided Kolmogorv-Smirnov test, Equation (14). The POT events of observations versus model runs (ALD40, ALD10, and ALR04) are examined in every location. The horizontal red line presents the 95% level of significance

Figure 6.

Estimation of 5-year (a), and 20-year (b) events in every location from observations and model runs (ALD40, ALD10, and ALR04)

Next, we examine the dependency between POT extremes of observations and model's predictions with the dependence value α of the bivariate threshold model Equation (8). Inference can be simplified by carrying out the marginal estimation, followed by the transformation Equation (7) as a preliminary step (Coles, 2001). In this case, the likelihood Equation (11) is a function only of α. In Figure 7 we have plottd α at each climatological station. As expected, the dependency between the extremes of the observations and ALR04 is generally higher than for the other simulations. On one hand, this means that the extremes in the ALR04 run and observations tend to coincide simultaneously more often than those in the other runs, and on the other hand, the extremes are much better reproduced (i.e. smaller bias and rmse, shown in Section 3.2.2.). For comparison, the spatial average of α (here denoted by equation image) for ALD40, ALD10, and ALR04 is equation image, equation image, and equation image, respectively.

Figure 7.

Estimation of the dependency parameter α of the bivariate threshold excess model of observations versus model runs (ALD40, ALD10, and ALR04) in every location

3.2.2. Model error versus extreme value analysis

Here we examine the error in the extremes provided by the models, and if this can be related to the previous extreme-value statistics. At each location, a threshold has been selected that corresponds to the 95th percentile of the empirical distribution of the observations. An important difference with the foregoing study is that there are no distributional assumptions, so there is no need to decluster the extremes. To assess the ability to reproduce the extremes by the models we consider some relevant scores such as bias and rmse between observational extremes and the corresponding model output as shown in Figure 8. Obviously, the best reproduction of extreme values is produced by ALR04.

Figure 8.

Scores (a) bias, and (b) rmse between observations higher than the 95th percentile and the corresponding model output (ALD40, ALD10, and ALR04) in every location

It is remarkable to see that the quantile estimations provided by extreme value statistics (Figure 6) of the observations and ALR04 are much closer than the scores in Figure 8 would suggest. To comment further on this point, we reconsider the POT series of the univariate analysis. The n-ordered POT values of the observations will be denoted by equation image and analogously for ALR04's extremes. The closer the ranked POT series equation image and equation image are, the closer the corresponding GPDs are. In Figure 9 the QQ-plot of equation image and equation image at a certain location indicates a close agreement between the respective extreme value statistics. To put this in perspective, we should compare the model error scores with the difference of the GPDs. However, we recall that the scores of Figure 8 consider 95th percentiles, which are not declustered like in the POT model. In order to make a suitable comparison between both we reconsider the bivariate pairs (x1, i, x2, i) of the foregoing bivariate threshold excess model and we examine the difference between x1, i and x2, i on one hand, and the ordered data equation image and equation image on the other hand. In Figure 10, we have plotted the rmse of both variables. In conclusion, the presence of a substantial model error in the extremes does not seriously affect the extreme value statistics of ALR04-output. Even if the modelled and observed maxima not necessarily coincide, there is good agreement between both statistics.

Figure 9.

QQ-plot of the ranked POT values of observations versus ALR04 at station Hives. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

Figure 10.

Rmse between x1, i and x2, i of the pair (x1, i, x2, i) of the bivariate threshold excess model for observations and ALR04, and rmse between the ranked POT values of the marginal threshold excess models for observations and ALR04. This figure is available in colour online at wileyonlinelibrary.com/journal/joc

3.2.3. Heat wave event

A single long-duration, moderately extreme event can also have a greater impact, particularly on health, than multiple short-lived, more extreme events (WHO, 2004). Here we use the definition of heat waves proposed by Huth et al. (2000) and employed in recent European studies (Hutter et al., 2007; Kyselý, 2010). Two thresholds, T1 and T2 are applied: a heat wave is defined as a continuous period of at least 5 days during which (1) TMAX is higher than T1 in at least 3 days, (2) mean TMAX over the whole period is higher than T1, and (3) TMAX does not drop below T2. The threshold values were set to T1 = 30 °C and T2 = 25 °C, in accordance with the Belgian climatological practice which refers to the days with TMAX reaching or exceeding 30 °C and 25 °C as tropical and summer days, respectively.

The number of heat wave events observed at the Uccle station in Brussels, which represents the large population centre in Belgium (population of 1 031 215 on 1 January 2007 estimated by the National Institute of Statistic), is 8 events between 1961 and 1990. While the ALR04 simulation reproduces exactly the same number, the ALD10 and ALD40 values are much higher, 29 and 41, respectively. To characterize heat wave intensity, we calculate the 5-day cumulative TMAX excess above 30 °C, which is probably the most appropriate characteristics of their severity. Then the values were added for all summers between 1961 and 1990. The ALR04 value is very close to the observed one, 110 and 111 °C, respectively, while the ALD10 value is exacerbated 350 °C. All indicate that ALR04 results are much more realistic than ALD10 in simulating the number and intensity of heat wave events and thus considered also to be more credible for projecting future climate change.

4. Conclusion

Within the present study, high-resolution downscaling simulations have been analysed and validated for means and extreme values of the maximum surface air temperature for each summer season (June–July–August) during the period of 1961–1990. The simulation has been performed with the limited area version of the ARPEGE-IFS forecast system ALADIN which is presently being evaluated for use as a new operational regional climate model for regional climate change projections over Belgium. The simulations has been performed with ALADIN with a horizontal resolution of 40 km on a domain encompassing most of Western Europe (ALD40) in hindcast mode driven by the ERA40 reanalysis with a resolution of 1.125°. In order to increase the spatial resolution above Belgium, a nested domain has been added and two sensitivity runs were conducted with the following resolutions: (1) 10 km spatial resolution (ALD10), and (2) 4 km spatial resolution (ALR04) using the sophisticated model physics allowing the use of the ALADIN model in the resolution range of 3–8 km (the so-called gray zone). In fact, recently, in Gerard et al. (2009) a new approach was proposed, with an integrated sequential treatment of resolved condensation, deep convection, and microphysics together with the use of prognostic variables. This new parameterisation allows for the production of consistent and realistic results at resolutions ranging from few tens of kilometers down to less than 4 km. The differences between the three simulations (ALD40, ALD10, ALR04) provided a measure of uncertainty in the downscaling technique with respect to the effect of: (1) increasing the horizontal grid spacing, and (2) increasing the horizontal grid spacing using sophisticated model physics scheme. The high resolution of the simulations and the availability of a dense climatological station (50 stations) dataset over Belgium provide the opportunity to validate the different model runs against observed data.

The validation of ALD40 reveals a positive summer bias, which is most pronounced near the coast where warm bias exceeds 2 °C. Even though the considerable spatial resolution enhancement by a factor of 4, ALD10 reduces slightly the warm biases. However, the ALR04 run reduces the warm biases considerably with an overall average absolute bias of 0.2 °C. Thus, the high-resolution dynamical downscaling with the finest grid size of 4 km and sophisticated model physics scheme is able to correct these summer biases. To investigate the possible causes of this warm bias, simulated cloud cover from the ALD10 and the ALR04 runs are compared to observed Cloud Fraction (CF) over the period of 1961–1990. Although the general shape of the frequency distribution is well captured by the ALD10 and the ALR04 runs, result clearly reveals an overestimate of clear-sky occurrences by the ALD10 run for all stations. This suggests the ability of ALR04 to simulate weakly forced convective cloud in the summer over Belgium. In order to evaluate the cloud–radiation interaction in the different configurations, we have compared simulated and observed 30-year mean diurnal cycles of surface solar radiation at the Uccle station (some 6 km south of the centre of Brussels). Comparison of the mean diurnal cycle of solar radiation flux showed that ALD10 is clearly associated with a developing solar radiation overestimate through the diurnal cycle. The overestimate of the total solar radiation for all-sky conditions is significantly reduced when only clear-sky conditions are considered. Thus, the simulated downwelling solar radiation in clear-sky conditions does not appear to be the main cause of the overestimate of the surface air temperature.

Then, reproduction of Peak-Over-Thresholds (POT) distributions of extreme summer temperature produced by different model runs, have been compared with observations on the same 1961–1990 period. We have investigated whether the POT events of the observations and a model are mutually statistically different by using the classical two-sample Kolmogorov–Smirnov test. It has been shown from the null-hypothesis that the extremes of the observations and the model output belong to the same population is hardly accepted at the 95% level for ALD40 and ALD10, respectively, while it is accepted at half the locations when using ALR04. Furthermore, it has been shown that the quantile estimations based on ALR04 runs are fairly close to those of the observations. Finally, the ability of the different model runs to reproduce the number of heat wave events observed at the Uccle station in Brussels, which represents the large population centre in Belgium was assessed. While the ALR04 simulation reproduce exactly the same number (8 events), the ALD10 and ALD40 values are much higher, 29 and 41, respectively.

This study shows that the consistent treatment of deep convection and cloud–radiation interaction when increasing the horizontal resolution is very important in impact studies of extremely high temperatures events. In this regards, the RCM downscaling with a finer resolution and more complete physics representation can significantly reduce the driving GCM biases in the present climate and thus enhances the credibility of future climate change projections. Therefore, our results indicate that it is highly questionable to directly apply a biased GCM projection of future climate changes for impact studies of extremely high temperatures at regional scale.

Ancillary