#### 4.1. Validation framework

Statistical approaches implicitly assume stationarity in their transfer functions in the case of indirect DECMs or in model error characteristics in the case of direct DECMs (Wilby, 1997; Benestad *et al.*, 2009). If this assumption is violated, statistical models cannot account for changes described by predictor forcings. As this assumption cannot be approved in advance, a temporal cross-validation framework is applied which repeatedly divides the data period into a calibration (10 years) and independent validation period (1 year). By this means, each year is estimated and evaluated independently with the remaining 10 years used for model calibration (sometimes denoted as ‘leave one out’ cross-validation; see Figure 3). For evaluation purpose, model skill scores as well as model error characteristics are used. The models' performances are analysed via mean skill scores and mean model error characteristics, averaged over all validation periods, as well as using the entire, not averaged, 11-year validation time series. Mean skill scores and model error characteristics are presented in Figures 5–9. The models' performances represent the station scale as each statistical model is calibrated and evaluated separately station by station. For graphical representation, the station-wise evaluation results are spatially averaged in sub-region 6, sub-region 8 (compare Figure 2(b)), and for entire Austria. The two sub-regions are selected because of their different climate characteristics. Sub-region 6 is mainly dominated by westerly flows from the Atlantic with high precipitation amounts, whereas sub-region 8 features more continental dry characteristics with additional influence from the Mediterranean Sea.

The results are divided in three parts. The first describes the general characteristics of the uncorrected RCM within all regions. The second focusses on the characteristics of each DECM and the third part analyses the effectiveness of DECMs compared to uncorrected RCM results.

#### 4.2. RCM evaluation

Gobiet *et al.* (2006) already compared the MM5 precipitation data from 1981 to 1990 on monthly scale to HISTALP observations (Auer *et al.*, 2007). The same comparison is shown in Figure 4 on seasonal basis. Regarding Austria, the RCM features seasonally and regionally varying error characteristics with strong precipitation overestimation along the Alpine crest in winter (DJF), an overall good performance in summer (JJA), and underestimation at the southern Austrian border in autumn (SON). Gobiet *et al.* (2006) argue that, besides possible model deficiencies, the well-known problematic precipitation measurement at high altitudes, especially in DJF, may partly cause the pronounced overestimation. Secondly, the reduced SON precipitation in south-eastern parts of Austria is probably related to an under-representation of northern Mediterranean cyclones and a consequent lack of humidity. These findings further motivate the selection of sub-regions 6 and 8 for evaluation as these regions cover the problematic areas.

#### 4.3. Characteristics of the applied DECMs

Referring to the MLR predictor selection, Table II shows the most important seasonal predictors for the considered study regions. All three regions indicate precipitation (accrnon, accrcon, pre), humidity-related parameters at surface (q2, pwat), as well as eastward (u) and northward (v) wind at 10 m and 750 hPa, and surface vapour pressure (e2) to be the dominant predictors for local precipitation. The composition of the predictor set varies seasonally with increased importance of the convective precipitation (accrcon) and northward wind in summer months, which reasonably corresponds to the regional climate characteristics (see Section 2). The dominance of RCM precipitation as predictor supports the assumption that RCM precipitation integrates large parts of the relevant information for local precipitation. Further, the frequently claimed integration of humidity as predictor (e.g. Giorgi and Mearns, 1991; Wilby and Wigley, 2000; Fowler *et al.*, 2007) is supported.

Table II. Seasonal predictor variables for MLR approaches in sub-region 6, sub-region 8, and for entire Austria according to their occurrence probability (Prob) given in percent after objective predictor selection.DJF | MAM |
---|

Sub-region 6 | Sub-region 8 | Entire Austria | Sub-region 6 | Sub-region 8 | Entire Austria |
---|

Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob |
---|

pre_sfc | 19.5 | q2_2m | 16.2 | pre_sfc | 17.6 | q2_2m | 18.0 | accrnon_sfc | 19.6 | accrnon_sfc | 15.0 |

accrnon_sfc | 17.0 | accrnon_sfc | 13.1 | accrnon_sfc | 12.1 | accrnon_sfc | 12.7 | accrcon_sfc | 14.4 | q2_2m | 13.3 |

u700hPa | 10.0 | e2_2m | 13.1 | q2_2m | 10.8 | pre_sfc | 12.7 | q2_2m | 8.6 | pre_sfc | 10.0 |

iclc_sfc | 8.2 | pre_sfc | 12.9 | v_700hPa | 9.8 | e2_2m | 11.7 | e2_2m | 7.5 | e2_2m | 9.4 |

e2_2m | 6.0 | v_700hPa | 12.8 | e2_2m | 8.9 | u10_10m | 10.0 | zg_700hPa | 6.1 | accrcon_sfc | 6.6 |

JJA | SON |
---|

Sub-region 6 | Sub-region 8 | Entire Austria | Sub-region 6 | Sub-region 8 | Entire Austria |
---|

Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob |
---|

pre_sfc | 14.5 | v_700hPa | 19.2 | v_700hPa | 17.4 | pre_sfc | 20.7 | q2_2m | 23.0 | accrnon_sfc | 16.6 |

v_700hPa | 14.2 | pre_sfc | 17.6 | pre_sfc | 15.4 | accrnon_sfc | 16.0 | e2_2m | 22.8 | pre_sfc | 14.5 |

u_700hPa | 10.7 | accrcon_sfc | 12.0 | accrcon_sfc | 7.7 | q2_2m | 11.0 | accrnon_sfc | 16.7 | q2_2m | 13.0 |

accrnon_sfc | 10.2 | iclc_sfc | 8.1 | accrnon_sfc | 7.4 | e2_2m | 10.2 | accrcon_sfc | 9.7 | e2_2m | 12.0 |

pwat_sfc | 8.5 | u_700hPa | 7.4 | v_850hPa | 6.3 | v10_10m | 7.2 | pre_sfc | 9.3 | accrcon_sfc | 8.6 |

Similar to point-wise predictor selection for MLR, Table III indicates that PCs of precipitation fields are by far the most important ones for local precipitation for conditional resampling approaches. Further relevant predictors are pressure-related parameters at surface (es, psfc, pslv), geopotential height at 500 hPa (zg), and vertical velocity at 700 hPa (w).

Table III. As in Table II but with atmospheric predictor fields for the analogue methods. PC indicates the used principle componentDJF | MAM |
---|

Sub-region 6 | Sub-region 8 | Entire Austria | Sub-region 6 | Sub-region 8 | Entire Austria |
---|

Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob |
---|

accrnon_sfc | 17.2 | accrnon_sfc | 15.6 | accrnon_sfc | 11.3 | accrnon_sfc | 16.5 | pre_sfc | 23 | pre_sfc_D2 | 15.6 |

PC2 | | PC3 | | PC1 | | PC2 | | PC2 | | PC2 | |

accrnon_sfc | 16.7 | es_2m | 14.7 | pre_sfc | 10.6 | accrcon_sfc | 16.2 | accrnon_sfc | 17.1 | accrnon_sfc | 14.1 |

PC1 | | PC2 | | PC1 | | PC1 | | PC2 | | PC2 | |

W_700hPa | 10.2 | accrcon_sfc | 13.8 | accrnon_sfc | 9.8 | pre_sfc | 11.5 | pre_sfc | 8.8 | accrcon_sfc | 10.1 |

PC3 | | PC1 | | PC2 | | PC2 | | PC1 | | PC1 | |

pre_sfc | 7.7 | pre_sfc | 8.3 | accrnon_sfc | 7.7 | accrnon_sfc | 6.5 | pre_sfc | 7.4 | pre_sfc | 5.8 |

PC1 | | PC1 | | PC3 | | PC1 | | PC3 | | PC1 | |

psfc_sfc | 6.7 | zg_500hPa | 6.1 | pre_sfc | 6.0 | | | iclc_sfc | 6.3 | | |

PC2 | | PC2 | | PC2 | | | | PC1 | | | |

JJA | SON |
---|

Sub-region 6 | Sub-region 8 | Entire Austria | Sub-region 6 | Sub-region 8 | Entire Austria |
---|

Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob | Predictor | Prob |
---|

accrnon_sfc | 18.0 | pre_sfc | 18.9 | pre_sfc | 13.0 | accrnon_sfc | 22.5 | accrcon_sfc | 20.9 | accrnon_sfc | 22.1 |

PC1 | | PC2 | | PC1 | | PC3 | | PC3 | | PC3 | |

zg_500hPa | 10.7 | w_700hPa | 12.9 | pre_sfc | 12.1 | accrcon_sfc | 22 | accrnon_sfc | 18.5 | accrcon_sfc | 14.6 |

PC2 | | PC3 | | PC2 | | PC1 | | PC3 | | PC1 | |

w_700hPa | 10.5 | accrcon_sfc | 11.9 | accrnon_sfc | 10.8 | pre_sfc | 15.5 | accrcon_sfc | 15.5 | accrcon_sfc | 9.4 |

PC3 | | PC1 | | PC1 | | PC2 | | PC1 | | PC3 | |

pre_sfc | 8.0 | pre_sfc | 11.5 | accrcon_sfc | 6.0 | | | pre_sfc | 7.5 | pre_sfc | 7.6 |

PC1 | | PC1 | | PC1 | | | | PC1 | | PC2 | |

Pslv_slv | 5.2 | accrnon_sfc | 7.9 | w_700hPa | 5.9 | | | pre_sfc | 6.6 | pre_sfc | 7.0 |

PC3 | | PC1 | | PC3 | | | | PC2 | | PC1 | |

Figure 5 illustrates the annual evolution of the wet-day thresholds *WT*^{mod} and the scaling factors *S* used in LOCI. Both parameters feature distinct annual cycles, which indicate frequency overestimation in winter and intensity underestimation in summer in the RCM. *WT*^{mod} ranges from 1.5 mm/day to 5 mm/day, which differs from the results of Schmidli *et al.* (2006), who found wet-day thresholds around 1 mm/day for the same region with LOCI, but applied to coarser ERA-40 reanalysis and calibrated on the entire year. *S* varies around one with a reversed pattern compared to *WT*^{mod} and shows comparable quantities to Schmidli *et al.* (2006). The ranges of the magnitudes of *S* and *WT*^{mod} indicate that, besides for the summer season, the RCM precipitation error is overall dominated by a frequency overestimation error.

The seasonal correction functions of QM in Figure 6 show differences of all percentiles between observed and modelled calibration *ecdfs* for all study regions. The respective precipitation quantities are indicated on the *x*-axes. Generally, and particularly in winter in sub-region 6, the RCM overestimates wet-day precipitation intensities, which leads to partly significant negative correction values, especially at the highest precipitation intensities (i.e. at the highest percentiles). By contrast, particularly in summer in sub-region 8, significant positive correction values at the highest precipitation intensities indicate a lack of extreme precipitation events in the RCM data. The highest corrections are applied to the highest percentiles and range from − 12 mm/day (in winter in sub-region 6) to + 15 mm/day (in summer and autumn in sub-region 8). For entire Austria the correction function is strongly damped, which illustrates the importance of point-wise application where local error characteristics are taken into account instead of a broad spatial average. Abrupt changes of the correction function at highest modelled precipitation amounts, as illustrated in winter, spring and summer in sub-region 6, are more probably related to statistical noise at these percentiles than to RCM error characteristics.

#### 4.4. DECM evaluation

For assessing the skill of the considered DECMs, their performances are evaluated regarding the median, variability, and indicators for extremes. Boxplots in Figure 7 display the median seasonal and annual differences between models and observations as lines in the middle of 25th and 75th quantile boxes derived from daily differences. Standardized Taylor diagrams (Figure 8; Taylor, 2001) show the normalized centred root-mean-square (RMS) difference of the different DECMs compared to observations as the distance to point 1 on the abscissa, the variance ratio between models and observations as the radial distance to the zero point, and the correlation between models and observations as the angle between the abscissa and the position vector (i.e. a perfect model would be displayed on point 1 of the abscissa). Error diagrams in Figure 9 illustrate the performances of the methods regarding precipitation intensity (SDII), wet-day frequency (Freq), the 95th percentile of all modelled days (Q95), and the 75th percentile on wet days (RQ75), where the latter two represent moderately extreme conditions. The results in Figure 9 are colour-coded; lighter colours indicate smaller errors. Finally, a quantile-quantile plot in Figure 10 compares the 11-year seasonally and annually modelled to the observed distributions using all station time series within the respective region. This enables the analysis of the DECMs' performances for absolute extreme conditions. In the case of linear regression models, also negative precipitation values are produced. Though unphysical, we did not replace these negative values by zeros in order to avoid the introduction of biases or the reduction of variability in the evaluation statistics.

In Figure 7 the leftmost bars display the regional average RCM error characteristics. They indicate the largest error ranges in sub-region 6, as expected. The error range shows a high seasonality, which is related to overestimated temporal variability, shown in Figure 8. The results from Figures 5 and 6, showing that higher modelled precipitation sums are positively biased, can be identified by the positive skewness of the difference bars.

In comparison, all DECMs except MLR virtually correct the median error of daily precipitation to zero, independent of season and region. QM systematically yields the best results followed by LOCI, AM and NNAM. MLR partly even degrades error characteristics, which is probably related to nonlinear relations between predictors and local daily precipitation as well as to non-normally distributed and heteroscedastic residuals (compare Wilks, 1995). However, with the simple extension of MLR to MLRR this deficiency can be removed due to the incorporation of error residuals (Equation (2)). MLRT corrects the median difference to nearly zero, but shifts the error distribution to negative values, whereas all other statistical approaches show nearly equally distributed differences around the median. Though only two sub-regions are presented here in detail, all DECMs show similar performances in all sub-regions shown in Figure 2.

The effect of DECMs on variability is displayed in Figure 8. In general, the RCM tends to overestimate day-to-day variability, but also shows pronounced underestimation in sub-region 8. These deficiencies are removed by most DECMs. Major problems remain for MLR which strongly underestimates variability and MLRT which shows non-systematic errors in variability with the tendency to underestimation. However, by adding error residuals (MLRR) the variability is modelled adequately. Minor problems are shown for LOCI, where especially for entire Austria a tendency to variability overestimation is indicated. Additionally, LOCI was more sensitive to a reduced window size than QM concerning the variance ratio (not shown). None of the DECMs is able to increase correlation. This is expected for direct DECMs as they solely rely on temporal characteristics of climate model precipitation. AM, NNAM and MLRR even degrade correlation. In the case of MLRR this is caused by the random resampling of residuals, whereas concerning the conditional resampling methods this might be an indication that the mesoscale fields, used as predictors, do not fully explain local precipitation. Furthermore, with the exception of MLR, DECMs show no systematic reductions of the RMS, but even sometimes enlarge it. However, an increasing RMS does not indicate a worse model skill, as at low correlation levels an underestimated variance ratio lowers the RMS (compare MLR in Figure 8). In summary, most DECMs drastically reduce seasonal precipitation biases, some strongly improve the temporal variability, but many improve temporal correlation on a daily basis. However, since this study focusses on climate applications, the improvement of temporal correlation is not the objective.

Figure 9 depicts several further performance indices: the uncorrected RCM overestimates wet-day frequency (Freq), as already demonstrated. Daily precipitation intensity (SDII), in contrast, shows regional variations, but the tendency to be underestimated by the RCM. These RCM Freq and SDII behaviour are characteristic of the ‘drizzle’ problem in climate models (e.g. Gutowski *et al.*, 2003; Fowler and Kilsby, 2007). LOCI and QM correct these errors to virtually zero. Resampling approaches, particularly NNAM, show significant skill, but slight systematic underestimation of the analysed indicators. Although MLRR improves MLR, both regression approaches fail in reproducing intensity and frequency, with drastic intensity underestimation (up to − 4.6 mm per wet day) and overestimation of frequency (up to about 12 days per month). MLRT shows similar results for intensity, but underestimation of frequency.

Towards extreme precipitation (Q95, RQ75), the uncorrected RCM shows an inhomogeneous picture with overestimation in sub-region 6 and underestimation in sub-region 8. Only in summer all regions agree in underestimation of higher precipitation amounts. QM and LOCI but also AM as well as NNAM systematically reduce RCM error characteristics in these moderately extreme precipitation indices, which is also demonstrated by the quantile-quantile plots in Figure 10. MLR and MLRT underestimate Q95 and RQ75 significantly, demonstrating their deficiencies in estimating the daily precipitation's distribution. MLRR captures Q95 surprisingly well, whereas RQ75 is heavily biased. This is related to MLRR resampling, which correctly broadens the entire distribution as seen in Figure 10, but does not correct the general MLR problem of estimating the right wet-day probability. The latter fact is confirmed by the underestimation of RQ75. The problematic characteristics of MLRT become obvious in Figure 10, which shows a significant curvature in the quantile-quantile relation. Figure 10 also confirms the superior performance of LOCI, AM, NNAM and particularly QM for higher quantiles. However, minor deficiencies still remain; e.g. in winter in sub-region 8, LOCI significantly overestimates heavy precipitation events greater or equal to 30 mm/day. This is caused by scaling factors which adequately correct for the mean, but fail to correct these extremer precipitation intensities in the RCM where the error characteristics change from under- to overestimation (compare Figure 10 upper leftmost panel).