## 1 INTRODUCTION

The importance of wind energy has increased significantly in the past decades. In 2011, approximately 21% of installed power capacity in Europe was from wind power. [1] One problem of integrating wind power into the electricity grid is the volatility of wind speed and, consequently, of power production. Prediction of power production is therefore crucial for energy trading and management. In this context, probabilistic forecast methods have been receiving increased attention recently because of their higher value in decision making when compared with single value (point) forecasts. [2-4] Probabilistic forecasts can, e.g., be quantile or interval forecasts, full predictive distributions or risk indices in addition to point forecasts.

The general approach to make probabilistic power production forecasts with lead times ≥ 6 hours is to statistically post-process forecasts (mainly wind speed forecasts) from numerical weather prediction (NWP) models. [5] In the atmospheric sciences, this approach is termed model output statistics (MOS). [6] However, standard linear regression analysis, as typically used for MOS, is complicated by two major problems:

The relationship between wind speed and power production is clearly non-linear (Figures 1 and 2) .

The range of power production is limited between zero and nominal power so that typical parametric distribution assumptions (e.g., Gaussian) are inappropriate.

To overcome these problems, non-linear and often also non-parametric regression methods are used frequently in the literature. For example, a variety of non-linear quantile regression methods have been proposed. Examples are locally weighted quantile regression, [4, 7] quantile regression with spline basis functions [8] or a time-adaptive quantile regression. [9] Other widely used approaches are kernel density estimators and variations of it, [7, 10-12] ensemble post-processing, e.g., with kernel dressing [13, 14] or quantile correction, [15, 16] or adaptive resampling. [17] The disadvantages of such non-parametric non-linear models are that generally, a large number of parameters have to be estimated, and therefore, these estimations can be unstable, especially in cases where few data are available. Furthermore, the resulting models are sometimes hard to interpret and, more importantly, neglect the available information about the form of the power curve and the censoring.

Therefore, we propose a new (line of) approach(es):

Transform the observed power observations into wind speed observations prior to MOS regression modeling by using the inverse of the power curve function. Note that this transforms the limited range from zero to nominal power into the limited range from cut-in wind speed to nominal wind speed.

Exploit the information about this limited range by using censored models in ‘wind space’ where typically much simpler (more) linear regressions can be used, and parametric distributions work well.

Figure 3 shows the relationship between power observations, transformed with the inverse power curve on the *y*-axis and NWP wind speed forecasts on the *x*-axis. Clearly, this seems to be almost linear, and just the censoring of the transformed power observations at cut-in and nominal wind speed has to be accounted for in a regression model. While such censored regression techniques are not very frequently used for MOS, they are among the standard regression models in statistics and econometrics and easily available in many software statistics packages. Thus, we can obtain probabilistic forecasts in ‘wind space’ with a relatively simple model and then employ the power curve again to transform these to probabilistic power production forecasts.

We are not the first to suggest usage of the known power curve to address the non-linearity issue. However, previous approaches employed the power curve itself rather than its inverse to transform the NWP wind speed forecasts into power forecasts prior to regression modeling. [15, 18, 19] While this is also very easy to carry out (refer to Figure 4 for an example), it has a crucial disadvantage: in the steep parts of the power curve, errors in the NWP wind speed forecasts are strongly amplified while errors of low and high NWP wind speed forecasts are suppressed. Hence, the resulting relationship between the (wind-to-power) transformed NWP wind speed forecasts and observed power production exhibits strong heteroskedasticity, which leads to less reliable estimates in regression models. Note the higher variance in the center of Figure 4 as compared with the lower variance on the left and the right sides. In contrast, the inverse power-to-wind transformed relationship in Figure 3 has a rather low and stable variance (only limited by censoring at cut-in and nominal speed).

In this study, we demonstrate how both parametric and non-parametric censored (linear) regression models can be employed for inverse power curve transformed data (i.e., in wind space). The resulting models are assessed and compared with previously suggested approaches for untransformed data as well as wind-to-power transformed data (i.e., in power space), showing that in many situations, we can get similar or even better performance from models that are easier to compute and interpret. As observation data, we use 3 years of wind turbine data from a turbine located in Austria. As NWP forecasts, high resolution and ensemble forecasts of wind in different heights from the European Centre for Medium-Range Weather Forecasts (ECMWF) are employed.

The remainder of the paper is organized as follows: In Section 2, the data used for testing the transformations and models are described briefly. The regression models are introduced in Section 3. The verification measures are specified in Section 4, and the corresponding results are shown in Section 5. Finally, a conclusion of the paper is provided in Section 6.