### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Motivation of the Damage Function
- 4. Parametric Simplification
- 5. Modeling Results
- 6. Discussion
- Acknowledgments
- References

[1] Analyzing insurance-loss data we derive stochastic storm-damage functions for residential buildings. On district level we fit power-law relations between daily loss and maximum wind speed, typically spanning more than 4 orders of magnitude. The estimated exponents for 439 German districts roughly range from 8 to 12. In addition, we find correlations among the parameters and socio-demographic data, which we employ in a simplified parametrization of the damage function with just 3 independent parameters for each district. A Monte Carlo method is used to generate loss estimates and confidence bounds of daily and annual storm damages in Germany. Our approach reproduces the annual progression of winter storm losses and enables to estimate daily losses over a wide range of magnitudes.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Motivation of the Damage Function
- 4. Parametric Simplification
- 5. Modeling Results
- 6. Discussion
- Acknowledgments
- References

[2] A storm-damage function describes losses as a function of observable meteorological parameters, typically maximum wind speed. For winter storms occurring in central Europe several storm-damage functions for residential buildings are described in literature. The reinsurance company*Münchener Rückversicherungs-Gesellschaft* [1993, 2001]found a power-law damage function of maximum wind speed with varying exponents of roughly 3 as well as 4–5, depending on the storm event and country being analyzed.*Klawa and Ulbrich* [2003]proposed a power-law damage function with exponent 3, refined by*Donat et al.* [2011], using excess wind speed over threshold instead of absolute maximum wind speed. Similarly, *Heneka and Ruck* [2008]used a power-law damage-propagation function of excess wind speed with exponent of either 2 or 3, assuming proportionality to the force or the kinetic energy of the wind, respectively. Both groups define threshold wind speed as the empirical 98 percentile of the wind distribution. For the Netherlands*Dorland et al.* [1999] derived a damage function for residential property that can be reformulated as a power law of maximum wind speed with exponent 0.5. When comparing these studies with literature on hurricane losses in the United States (see *Watson and Johnson* [2004] for an overview), one must be aware of the many differences in building structure and the nature of the hazard. However, following a similar approach to this article *Huang et al.* [2001]describe an exponential damage model for residential property in the Southeastern United States based on 10min-averaged wind speed.

[3] Our work is based on daily insurance-loss data (years 1997–2007) with a regional resolution of administrative districts. From theoretical considerations we propose a stochastic power-law damage function depending on maximum daily wind speed to describe empirical losses. We find exponents typically ranging from 8 to 12. Statistical deviations are modeled by a spatially correlated stochastic variable drawn from a log-normal distribution. Correlations among parameters and with socio-demographic data are exploited to reduce the number of independent parameters to three per district. The model quality is assessed by out-of-sample calculations based on Monte Carlo simulations of losses in daily and annual resolution. We demonstrate good agreement between annual model results and empirical values, albeit observing a small, potential underestimation of high losses. For the majority of districts we find high correlations between annual loss estimates and data. Absolute daily losses in Germany for the three most severe storms show good predictions of losses across 4 orders of magnitude.

[4] This article is structured as follows: After a brief discussion of data, we describe motivation and details of the damage function in section 3. A simplified parametrization of the damage function is demonstrated in section 4. Finally, we present modeled loss estimates and close with the discussion of our results in sections 5 and 6, respectively.

### 3. Motivation of the Damage Function

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Motivation of the Damage Function
- 4. Parametric Simplification
- 5. Modeling Results
- 6. Discussion
- Acknowledgments
- References

[7] The aim of this section is to derive a stochastic small-scale damage function that, for each district*i* and on a daily basis, relates the loss ratio *D*_{i} (recorded storm loss over insured value) to the maximum wind speed *v*_{i}. As all calculations are performed on district level subscript *i* will be omitted for simplicity.

[8] A damage function should naturally have a sigmoid shape with steep initial increase and saturation at large wind speeds. Such growth processes are often modeled by a logistic function

where *d*_{max} is the asymptotic upper bound and the exponent *c* determines the steepness of the function. We apply the transformation *x* = ln(*v*/*b*_{v}), with maximum daily wind speed *v* scaled by local constant *b*_{v}. Taking the logarithm reduces broadness and skewness of the distribution of daily maximum wind speeds and ensures that lim_{v0}*d*(*v*) = 0. Since recorded data show that for Germany *d* ≪ *d*_{max}, *d*(*v*) can be approximated as

where constants were combined to the new scaling parameter *b* ≈ *b*_{v}*d*_{max}^{−1/c}.

[9] Figure 1a shows the empirical loss data for an arbitrarily chosen district. By inspection we see that the logarithmically binned data reveals a strong increase for wind speeds higher than approximately 13 ms^{−1} and an approximately constant regime for lower wind speeds. To capture this behavior an additional constant offset *a* is introduced, giving

Calculating the residuals between empirical data and *d*(*v*), we find an approximately log-normal distribution of residuals with nearly constant scale parameter*σ.* For simplicity, we utilize this finding for modeling statistical deviations ϵ and hence describe losses via a stochastic variable

where represents the log-normal distribution and*μ*(*v*) = ln *d*(*v*). *μ*(*v*) and *σ* are the mean and standard deviation, respectively, of the variable's natural logarithm.

[10] So far the analysis accounts for the loss intensity given a loss event, leaving aside the probability of an event. An empirical occurrence rate of loss events (Figure 1b) was calculated from linearly binned binary data, where a loss event was coded as ‘1’ and days without loss as ‘0’. While the empirical occurrence rate is approximately 1 at high *v*, it drops to a constant base rate for *v* 0. Ideally, the occurrence rate could be derived from *D*_{ϵ}(*v*) as the probability of exceeding a certain loss threshold. We were not able to identify such threshold via censored-regression modeling and hence chose to fit the data with an empirical occurrence-probability function

with base probability (1 − *α*), shift *β*, and slope *γ.* Multiplying *D*_{ϵ}(*v*) with a stochastic weight function *w*(*v*) based on *p*(*v*), we obtain the complete stochastic damage function

Maximum-likelihood estimation was applied to calculate the parameters of*D*_{ϵ}(*v*) in an iterative process, alternating between computing parameters *a*, *b*, *c* while keeping the scale parameter *σ*constant and vice versa (see pseudo-likelihood algorithm by*Ruppert et al.* [2003]). A least-squares approach was used to fit the parameters of*p*(*v*).

[11] As some wind stations may not be representative for a given district, the wind station featuring the highest predictive power was chosen from a set of 5 wind stations closest to the geographical center of the district. The coefficient of determination for non-linear regression models, generalized*R*^{2}, was chosen as a measure of predictive power. For the given shape of the damage curve *d*(*v*), *R*^{2} values related to nearby wind stations indicate the level of variance inherent to the specific combination of district loss and wind data. Due to the high level of statistical deviation around *d*(*v*), low *R*^{2} scores would be expected for any smooth damage curve. In fact, all estimated *R*^{2} scores lie within the interval [0.2, 0.6], with an average of 0.42. High *R*^{2}is seen for north-western coastal regions which often experience high winds. Regions with an*R*^{2} score of 0.4 and below largely coincide with German low mountain ranges (Mittelgebirge) and along the southern alpine border. Best scores are hence generally obtained for regions with homogeneous elevation and high frequency of strong winds.

[12] The spatial distribution of the exponent *c* estimated for all German districts is shown in Figure 2. We find a slightly right-skewed distribution with mean 9.8. 80% of values are contained within the interval from 8.3 to 11.8. Values of 15 and beyond can be conceived as outliers, occurring in districts where wind measurements insufficiently differentiate losses even at high wind speeds. Geographically, values of*c* below 10 predominate in Western, Central, and Northern Germany, while values above 10 are most often found across Southern Germany and the southern districts of East Germany.

[13] Our analysis is based on the assumption that maximum wind speed is the dominating criterion for the occurrence and severity of storm damages. It was not feasible to quantify the effects from other potential factors (e.g., storm duration, precipitation, or turbulent winds). However, the presence of systematic large-scale deviations should be reflected in spatial correlations of the statistical deviations ϵ. In fact, calculations of Spearman's correlation coefficient from normalized residuals showed significant spatial rank correlations between districts, ranging from −0.30 to 0.67. While insignificant for the estimation of loss in single districts, these correlations must be accounted for when spatially accumulating loss across Germany. In order to reproduce the spatial correlations during the Monte Carlo calculations, the empirically estimated rank correlations were enforced on the random deviations ϵ of *D*_{ϵ}. The algorithm was implemented as follows:

[14] 1. Determine pairwise Spearman's correlation coefficients *ρ*_{i,j} of between every possible combination of districts and thus populate matrix .

[15] 2. Determine the nearest positive-definite correlation matrix**M** using the algorithm derived by *Higham* [2002].

[16] 3. Use the iterative procedure by *Iman and Conover* [1982] to create spatially correlated random deviations ln(*ϵ*).

[17] We assume two main processes giving rise to the statistical deviations being found in Figure 1a. Firstly, the correlation between wind-speed measurements at separate sites is known to decrease significantly with growing distance. To assess the significance of this effect on small scales, we compared two closely situated wind stations within the same district (Berlin Tempelhof and Berlin Tegel, distance ≈ 11 km). From the empirical distribution we estimate that 75% of statistical deviations lie within the interval [−1.5 ms^{−1}, 1.4 ms^{−1}], while roughly 5% exceed [−3 ms^{−1}, 2.9 ms^{−1}]. Hence, a significant part of the observed deviations may be attributed to such source of error. Secondly, insurance data may be subject to statistical fluctuations caused by incorrect or delayed reporting of losses. We however expect that for large losses the latter errors are small and negligible.

### 4. Parametric Simplification

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Motivation of the Damage Function
- 4. Parametric Simplification
- 5. Modeling Results
- 6. Discussion
- Acknowledgments
- References

[18] In order to simplify the parametrization of the damage model we identified global statistical relationships and reduced the number of local fitting parameters. As additional predictors we used the number of residential buildings per district *h*, long-term damage rate*δ* defined as the share of days with recorded damages during the observation period, and the wind speed *ν* = *ba*^{1/c} at the intersection of the constant *a*and the power-law term in*d*(*v*). The raw data for the 439 districts and the corresponding least-square fits are shown inFigures 3a–3c. Parameters *a*, *α*, and *β* could hence be replaced with the fitted global relationships

Intuitively, the inverse proportionality between loss offset *a* and number of buildings *h* (equation (7a)) follows from the definition of the loss ratio, defined as the absolute loss divided by the insured value, since the insured value scales linearly with the total number of houses. This suggests a common minimum noise level for all districts. Furthermore, the approximate direct proportionality between *ν* and *β* in equation (7c)hints at a common threshold that separates the regime of noise at lower wind speeds from storm-driven losses at high wind speeds. In line with this proposition, we interpret (1 − *α*) as the probability of a random loss event in the noisy regime of the curve. Accordingly, equation (7b)shows that the regional differences of the long-term damage rate*δ* are dominated by random loss events. The remaining third parameter of *p*(*v*), *γ*, could furthermore be replaced by its mean value over all districts, . As *p*(*v*) generally increases rapidly from (1 − *α*) to 1, results were insensitive to the error induced by this replacement.

[19] In summary, the above global relationships can be used to reduce the model parametrization to three local parameters (exponent *c*, and scaling parameters *b* and *σ*). Additionally, we observe a weak dependence of scale parameter *b* on the elevation of the respective wind stations above mean sea level (Figure 3d). However, it is expected that *b* comprises a multitude of scaling effects due to orography or land use, and that hence the altitude dependence is not sufficient for a robust approximation.

[20] In the following, all calculations are based on the full parametric model unless we refer to the *reduced model.*

### 5. Modeling Results

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Motivation of the Damage Function
- 4. Parametric Simplification
- 5. Modeling Results
- 6. Discussion
- Acknowledgments
- References

[21] In order to assess the predictive power of the proposed damage function, calculations of regional and country-wide loss figures were compared to empirical values. Due to the availability of only 11 years of spatially resolved loss data, an out-of-sample-test algorithm was implemented as follows:

[22] 1. Exclude year *x* from empirical loss data.

[23] 2. Train the storm damage model on the remaining data.

[24] 3. Predict country-wide daily and cumulated losses for year*x* based on daily maximum wind speeds.

[25] 4. Vary *x* and repeat all calculations.

[26] In order to estimate the distribution of daily losses, the Monte Carlo method was used and 500 realizations of daily loss estimates were calculated.

[27] Figure 4shows daily loss predictions in Germany for the time periods around the three major storm events named ‘Lothar’ (24.-27.12.1999), ‘Jeanett’ (26.-29.10.2002), and ‘Kyrill’ (17.-19.1.2007). These storms are of particular interest, as they caused the largest insurance losses during the period under consideration. For most days empirical values lie within the uncertainty bounds of the model estimates. Peak empirical losses of storm events ‘Lothar’ and ‘Kyrill’ are contained within the 80% uncertainty bound, while ‘Jeanett’ is found in the 95% interval. The results demonstrate the model performance for predicting losses over 4 orders of magnitude.

[28] Annual loss estimates during winter months are shown in Figure 5a. Regarding absolute loss figures, we estimate a very high Pearson correlation of 0.99 between the model estimates (median) and the empirical values, which indicates a good reproduction of the annual progression of empirical storm-loss data. Annual losses are dominated by the storm events, ‘Lothar’, ‘Jeanett’, and ‘Kyrill’, in the years 1999, 2002, and 2007, respectively. Loss estimates for these years hence reflect the peaks seen inFigure 4. Additionally, we observe a small positive bias for years with loss ratio below 10^{−4}, which may be due to ignoring correlations in the estimation of *p*(*v*) (equation (5)). In total, we find approximately 12% underestimation of absolute loss accumulated over 11 years.

[29] Figure 5b summarizes the correlation per district between the median of the annual loss estimates and the empirical values. Approximately 1/3 of all districts show high Pearson correlation coefficients above 0.9. The mean correlation over all districts is 0.74. The correlations allow for a comparison of the full model and the reduced model with only three fit parameters per district. The histogram shows an increase of correlations between 0.5 and 0.9, while the number of correlations with values above 0.9 is slightly decreased. Together with a slight increase of mean correlation to 0.76 this demonstrates the sufficiency of the three remaining fit parameters. Since both the original and the reduced model produce nearly identical quantitative loss estimates for Germany, we show results for the original model only.

### 6. Discussion

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Motivation of the Damage Function
- 4. Parametric Simplification
- 5. Modeling Results
- 6. Discussion
- Acknowledgments
- References

[30] Empirical data of daily insurance losses across German administrative districts show a strong increase of losses with maximum daily wind speed. We find that these losses are well described by power-law damage functions with regionally varying exponents that typically range between 8 and 12. For the out-of-sample calculations we generated successive parameter fits based on varying time slices of the available data. The estimated parameters were insensitive to these variations, thus demonstrating model robustness even under exclusion of the major loss events.

[31] While these results are in contrast to damage functions published before, a direct comparison of the exponents may be misleading. In fact, excess-over-threshold models, as applied by*Klawa and Ulbrich* [2003] and *Heneka and Ruck* [2008], imply a much steeper increase of loss in the threshold vicinity than pure power-law models of absolute wind speed [e.g.,*Münchener Rückversicherungs-Gesellschaft*, 1993, 2001]. The basic conjecture of our approach is a monotonous relationship between insured loss and maximum wind speed applicable to both small and extreme storm loss, which enables us to exploit information from a wide range of recorded losses. Since we found a universal power-law increase of loss for all districts we think that the use of damage functions with differing asymptotic shape may result in significant extrapolation error.

[32] In Figure 4we demonstrated in an out-of-sample test that daily modeled losses across Germany closely match empirical values ranging over four orders of magnitude. Judging from the comparison of median loss estimates and empirical data, peak losses may be slightly underestimated while still being within the uncertainty bounds of the model. Next to being a purely statistical effect (e.g., insufficient length of time series), this may be due to other aspects such as underdetermination of the model based on maximum wind speed only. Where available, empirical data regarding such aspects as the temporal wind profile, storm duration, or gustiness may be used to improve loss estimation. Socio-economic effects, such as demand surges [see, e.g.,*Olsen and Porter*, 2011], are expected to play a minor role.

[33] Inspired by other studies, the proposition of an exponential damage function was tested, but rejected due to strong overestimation of damages for large wind speeds. Bearing in mind that the damage function was fitted on the whole range of available loss data and thus not specifically calibrated to extreme losses, we conclude that the model results demonstrate good reproduction of both daily and annual extremes.

[34] Strong country-wide correlations of model parameters support the universal applicability of our damage function and permit the separation of the damage curve into a approximately constant noisy regime and a physical power-law regime. Employing these correlations, the model parametrization was successfully reduced to three independent parameters determining the basic shape of the damage curve. While the power-law exponent determines the curve's steepness, the scale parameter accounts for regional variation between districts and wind stations (e.g., distance and orography). The third parameter specifies the width of the log-normal loss distribution around the central curve and thus relates to the expected level of statistical deviations. In particular, the value of the exponent may be interpreted as an indicator for regional vulnerability to extreme winds. Its spatial distribution indicates reduced vulnerability within Western and Northern Germany. As these regions, and especially the coastal regions, are highly exposed to extreme winds, the relatively low values of the exponent suggest a greater level of adaptation to the current wind climate than for Southern Germany.

[35] All model calculations were deliberately based on raw measurements of maximum wind speed as provided by DWD. While most wind stations are known to be subject to inhomogeneities due to change in measurement apparatus, location or surrounding surface roughness they may nonetheless possess predictive power for neighboring districts. Due to the selection criterion of maximizing generalized *R*^{2}, wind stations with inhomogeneities causing significant additional variance were excluded unlike for temperature or pressure data, correction of inhomogeneities in maximum wind speed data would require case-specific non-linear transformations that are beyond the scope of this study.

[36] Additional insight, in particular regarding the significance for extreme loss modeling, could be gained from a dedicated model intercomparison on the basis of common meteorological and insurance-loss data. In further work we moreover intend to apply our model to loss data for other European countries and regions. A cross-national comparison of model parameters could enable the identification of clusters of similar vulnerability and reveal regional adaptation potential.