Probabilistic wind power forecasts with an inverse power curve transformation and censored regression


  • The copyright line for this article was changed on 29 February 2016 after original online publication.


Forecasting wind power is an important part of a successful integration of wind power into the power grid. Forecasts with lead times longer than 6 h are generally made by using statistical methods to post-process forecasts from numerical weather prediction systems. Two major problems that complicate this approach are the non-linear relationship between wind speed and power production and the limited range of power production between zero and nominal power of the turbine. In practice, these problems are often tackled by using non-linear non-parametric regression models. However, such an approach ignores valuable and readily available information: the power curve of the turbine's manufacturer. Much of the non-linearity can be directly accounted for by transforming the observed power production into wind speed via the inverse power curve so that simpler linear regression models can be used. Furthermore, the fact that the transformed power production has a limited range can be taken care of by employing censored regression models.

In this study, we evaluate quantile forecasts from a range of methods: (i) using parametric and non-parametric models, (ii) with and without the proposed inverse power curve transformation and (iii) with and without censoring. The results show that with our inverse (power-to-wind) transformation, simpler linear regression models with censoring perform equally or better than non-linear models with or without the frequently used wind-to-power transformation. © 2013 The Authors. Wind Energy published by John Wiley & Sons Ltd.


The importance of wind energy has increased significantly in the past decades. In 2011, approximately 21% of installed power capacity in Europe was from wind power. [1] One problem of integrating wind power into the electricity grid is the volatility of wind speed and, consequently, of power production. Prediction of power production is therefore crucial for energy trading and management. In this context, probabilistic forecast methods have been receiving increased attention recently because of their higher value in decision making when compared with single value (point) forecasts. [2-4] Probabilistic forecasts can, e.g., be quantile or interval forecasts, full predictive distributions or risk indices in addition to point forecasts.

The general approach to make probabilistic power production forecasts with lead times ≥ 6 hours is to statistically post-process forecasts (mainly wind speed forecasts) from numerical weather prediction (NWP) models. [5] In the atmospheric sciences, this approach is termed model output statistics (MOS). [6] However, standard linear regression analysis, as typically used for MOS, is complicated by two major problems:

  • The relationship between wind speed and power production is clearly non-linear (Figures 1 and 2) .

  • The range of power production is limited between zero and nominal power so that typical parametric distribution assumptions (e.g., Gaussian) are inappropriate.

Figure 1.

Power curve function f() of the turbine manufacturer: power production by wind speed.

Figure 2.

Normalized power production (black points) by the ECMWF (European Centre for Medium-Range Weather Forecasts) wind speed forecasts with power curve (gray line).

To overcome these problems, non-linear and often also non-parametric regression methods are used frequently in the literature. For example, a variety of non-linear quantile regression methods have been proposed. Examples are locally weighted quantile regression, [4, 7] quantile regression with spline basis functions [8] or a time-adaptive quantile regression. [9] Other widely used approaches are kernel density estimators and variations of it, [7, 10-12] ensemble post-processing, e.g., with kernel dressing [13, 14] or quantile correction, [15, 16] or adaptive resampling. [17] The disadvantages of such non-parametric non-linear models are that generally, a large number of parameters have to be estimated, and therefore, these estimations can be unstable, especially in cases where few data are available. Furthermore, the resulting models are sometimes hard to interpret and, more importantly, neglect the available information about the form of the power curve and the censoring.

Therefore, we propose a new (line of) approach(es):

  • Transform the observed power observations into wind speed observations prior to MOS regression modeling by using the inverse of the power curve function. Note that this transforms the limited range from zero to nominal power into the limited range from cut-in wind speed to nominal wind speed.

  • Exploit the information about this limited range by using censored models in ‘wind space’ where typically much simpler (more) linear regressions can be used, and parametric distributions work well.

Figure 3 shows the relationship between power observations, transformed with the inverse power curve on the y-axis and NWP wind speed forecasts on the x-axis. Clearly, this seems to be almost linear, and just the censoring of the transformed power observations at cut-in and nominal wind speed has to be accounted for in a regression model. While such censored regression techniques are not very frequently used for MOS, they are among the standard regression models in statistics and econometrics and easily available in many software statistics packages. Thus, we can obtain probabilistic forecasts in ‘wind space’ with a relatively simple model and then employ the power curve again to transform these to probabilistic power production forecasts.

Figure 3.

Inverse power curve transformation (power to wind): transformed power production by the ECMWF wind speed forecasts.

We are not the first to suggest usage of the known power curve to address the non-linearity issue. However, previous approaches employed the power curve itself rather than its inverse to transform the NWP wind speed forecasts into power forecasts prior to regression modeling. [15, 18, 19] While this is also very easy to carry out (refer to Figure 4 for an example), it has a crucial disadvantage: in the steep parts of the power curve, errors in the NWP wind speed forecasts are strongly amplified while errors of low and high NWP wind speed forecasts are suppressed. Hence, the resulting relationship between the (wind-to-power) transformed NWP wind speed forecasts and observed power production exhibits strong heteroskedasticity, which leads to less reliable estimates in regression models. Note the higher variance in the center of Figure 4 as compared with the lower variance on the left and the right sides. In contrast, the inverse power-to-wind transformed relationship in Figure 3 has a rather low and stable variance (only limited by censoring at cut-in and nominal speed).

Figure 4.

Power curve transformation (wind to power): observed power production by transformed ECMWF wind speed forecasts.

In this study, we demonstrate how both parametric and non-parametric censored (linear) regression models can be employed for inverse power curve transformed data (i.e., in wind space). The resulting models are assessed and compared with previously suggested approaches for untransformed data as well as wind-to-power transformed data (i.e., in power space), showing that in many situations, we can get similar or even better performance from models that are easier to compute and interpret. As observation data, we use 3 years of wind turbine data from a turbine located in Austria. As NWP forecasts, high resolution and ensemble forecasts of wind in different heights from the European Centre for Medium-Range Weather Forecasts (ECMWF) are employed.

The remainder of the paper is organized as follows: In Section 2, the data used for testing the transformations and models are described briefly. The regression models are introduced in Section 3. The verification measures are specified in Section 4, and the corresponding results are shown in Section 5. Finally, a conclusion of the paper is provided in Section 6.


As observation data, we utilize power production data from a wind turbine in eastern Austria with a nominal power of 2000 kW. Measurements with 10 min temporal resolution are available from 2006 to 2009. Data values when the turbine was off because of maintenance are removed.

As input for the statistical models, we use NWP forecasts from the ECMWF. In particular, we use wind speed forecasts, linearly interpolated from neighboring model levels to turbine hub height, as this has been shown to be the best predictor from ECMWF for wind speed on wind turbines. [20] No further variables (such as wind direction or air density) are added because they do not improve forecasts significantly for the data considered. To capture heteroskedasticity (i.e., inhomogeneous, input-dependent standard deviation of the observations) some of our models additionally employ the 10 m wind speed ensemble standard deviation form the ECMWF ensemble prediction system. To combine the observation data (with temporal resolution of 10 min) with the NWP data (with resolution of 3 h), means of the observation data are computed for 1 h around the times for which forecasts are available.

Thus, for each lead time, 1340 forecast-observation pairs are available. For lead time of 24 h, the data is plotted in Figure 2. Note that all used ECMWF forecasts are initialized at 00 UTC. 12 and 36 h forecasts are therefore always for midday while 24 and 48 h forecasts are always for midnight.

In the next sections, the following notations are used:


: Number of forecast-observation pairs.


: Power production; i = 1,...,n.

math formula

: Wind speed; i = 1,...,n.


: Cut-in wind speed (wind speed where turbine starts to rotate).


: Nominal wind speed (wind speed where turbine reaches maximum power).


: Power curve function given by the turbine manufacturer; math formula (Figure 1).

vi = f − 1(p i)

: Inverse-transformed power production [refer also to Equation (1)].


: Vectors of input variables (NWP forecasts); i = 1,...,n.

qπ(yi | xi)

: π-quantile of yi given the regressor variables xi.

Note that the inverse-transformed power production (vi) can be interpreted as wind speed censored at cut-in and nominal wind speed (Figure 3). That means

display math(1)

Note that an inverse-transformed power production of vi = vCI can also occur at very high wind speed when the turbine has to be switched off in order to avoid damages. However, switching off the turbine because of too high wind speed did never happen in our data and is therefore not considered in the following.


To obtain probabilistic forecasts of power production, we consider a range of different regression models that lead either to conditional quantiles or full predictive distributions (from which conditional quantiles can easily be extracted). More formally, all models yield predictions of specific quantiles qπ(pi | xi) of power production pi given a vector of regressor variables xi (e.g., forecasts of wind speed, etc.). We divide the models into parametric and non-parametric models. All models except some benchmark models are estimated in wind space. That means that quantiles math formula of wind speed given some regressor variables are first estimated. Subsequently, they are transformed to quantiles of transformed power by considering cut-in and nominal wind speed of the turbine and finally transformed into quantiles of power production by employing the power curve of the turbine:

display math(2)
display math(3)

3.1 Parametric models

For parametric models, it is assumed that the response follows a specific distribution, and here, the normal (or Gaussian) distribution is used. If such an assumption is appropriate, these models are easy to estimate, and with every forecast, a full predictive distribution is given. Arbitrary quantiles are very easy to compute by inverting this distribution. The main disadvantage of parametric models is that it is sometimes difficult to find an appropriate parametric distribution.

3.1.1 Tobit model

The tobit model was first introduced by Tobin [21] and is a widely used linear model for censored data. For this model, it is assumed that the true wind speed math formula follows a normal distribution with a mean μi that depends linearly on some input variables xi and typically a constant variance σi = γ:

display math(4)
display math(5)
display math(6)

However, as outlined above, the wind speed obtained by transforming the observed power production (vi) is censored at cut-in and nominal wind speed [Equation (1)]. Thus, the coefficients β and σ are not estimated with standard least squares regression but with maximum likelihood estimation with the likelihood function

display math(7)

where the indicator function I(a) is 1 if the argument a is true and is 0 if it is not. Furthermore

display math(8)
display math(9)
display math(10)

where Φ and ϕ are the cumulative distribution function and the probability density function of the standard normal distribution, respectively. With this model, conditional quantile forecasts for math formula can be computed with

display math(11)

3.1.2 Heteroskedastic tobit model

The standard tobit model assumes a constant residual variance σi over all math formula. This assumption can be relaxed with an additional regression equation for the standard deviation σi. Thus, Equation (6) is generalized to

display math(12)

where zi is an additional vector of input variables, not necessarily equal to xi. The log link is used to assure positive variances. All remaining equations, Equations (4)(11), can still be applied as before.

The heteroskedastic version of the tobit model is used less frequently in the literature. However, e.g., Thorarinsdottir and Gneiting [22] proposed a closely related model with the main difference being that the parameters are estimated by minimizing the continuous ranked probability score [23] instead of maximizing the likelihood function. Their method is a modified version of Gneiting et al. [24] considering the truncation of wind speed at zero. The method of Gneiting et al. [24] has proven to perform very well for temperature and precipitation forecasts. [25]

3.2 Non-parametric models

Non-parametric models are more flexible than parametric ones since no distribution of the response has to be assumed. Therefore, they are preferable when no good approximation of the response distribution is known. The price for this flexibility is that only specific quantiles can be estimated and that the model has to be fitted separately for each quantile. If more than one quantile is required, this means that more parameters have to be estimated.

3.2.1 Quantile regression

Similar to the mean in least squares regression, specific quantiles can be estimated with quantile regression. Instead of the quadratic loss function in least squares regression, Koenker and Bassett [26] proposed to weight residuals above or below the quantile differently, namely,

display math(13)

The π-quantile can be estimated by

display math(14)

with parameters βπ minimizing

display math(15)

Although the censoring of the transformed power production vi is not considered explicitly in this model, we employ it for comparison to assess the importance of censoring in the regression. Additionally, we use several benchmark models based on quantile regression for observed power production pi directly as these are used frequently in the wind energy literature. [4, 7-9] Refer to Section 3.3 for details on the different models.

3.2.2 Censored quantile regression

As for the parametric models, it is also possible to consider censoring with quantile regression. As suggested by Powell, [27] Equations (13) and (14) still apply, and in Equation (15), math formula is replaced by qπ(vi | xi) from Equation (2). Note that further approaches to estimate censored quantile regression exist [28-30] besides the approach of Powell. [27]

3.3 Choice of regressors

In wind space (Figure 3), a simple linear model that uses NWP wind speed forecasts (vNWP,i) as the sole regressor is certainly justifiable. However, despite the inverse transformed response, some slight remaining non-linearities at the lower and the upper ends appear to remain. These are much weaker than the non-linearities in the untransformed power-by-wind space (Figure 2) and can be captured very well by a low-dimensional polynomial. Therefore, we consider a number of models that employ not only the linear term vNWP,i but additionally the corresponding squared and cubic terms, i.e., a polynomial of order three. In addition to these regressors for the mean/quantiles of the predicted wind distribution, the heteroskedastic model also allows for regressors for the standard deviation of the wind distribution. A natural candidate is the ensemble standard deviation of the 10 m wind speed (σ(vEPS,i)).

Combining these ideas, we consider a number of models listed in Table 1. The tobit model with NWP wind speed forecasts as single regressor variable is the simplest model and already produces a reasonable fit of the data (see tobit1 in Figure 5). Adding the second and third powers to the regressor improves the fit somewhat (tobit3). Neither polynomials with higher powers nor the inclusion of further NWP variables (e.g., air density, 10 m wind speed ensemble mean, sine and cosine of wind direction) as regressors lead to further significant improvements for the data considered. Hence, we confine ourselves to linear functions and order three polynomials in vNWP,i for all models in wind space. Only in power space or power-by-wind space stronger non-linearities may have to be accounted for by using spline basis functions for (transformed) NWP wind speed. More specifically, we assess three benchmark quantile regression models (rq3p, srq3p, and srq4wp) for pi (i.e., replacing vi and math formula with pi in Equations (14) and (15)). As regressor variables they either use three polynomial basis functions of transformed NWP wind speed forecasts (rq3p), spline basis functions (for details, refer to the work of Nielsen et al. [8]) of transformed wind speed forecasts with three degrees of freedom (srq3p) or spline basis functions of wind speed forecasts with four degrees of freedom (srq4wp). The benchmark models srq3p and srq4wp were chosen because they are similar to the models proposed in [8] and [4, 7], respectively. the srq4wp is not a local quantile regression model but similar in that it is a non-linear quantile regression model in the ‘wind-to-power’ space. Model rq3p was chosen to investigate differences of spline and polynomial basis functions.

Table 1. List of models considered.
Model ResponseRegressors
  1. The first seven models are all estimated in wind space, and all except rq3 incorporate censoring information. The remaining models are either estimated entirely in power space (srq3p, srq4wp) or in power-by-wind space (srq4wp).
tobit1Tobit modelmath formulaxi = vNWP,i
tobit3Tobit modelmath formulaxi = (vNWP,i,vNWP,i2,vNWP,i3)
htobit1Heteroskedastic tobit modelmath formulaxi = vNWP,i, zi = σ(vEPS,i)
htobit3Heteroskedastic tobit modelmath formulaxi = (vNWP,i,vNWP,i2,vNWP,i3), zi = σ(vEPS,i)
rq3Quantile regressionmath formulaxi = (vNWP,i,vNWP,i2,vNWP,i3)
crq1Censored quantile regressionmath formulaxi = vNWP,i
crq3Censored quantile regressionmath formulaxi = (vNWP,i,vNWP,i2,vNWP,i3)
rq3pQuantile regression in power spacepixi = (f(vNWP,i) ,f(vNWP,i) 2,f(vNWP,i) 3)
srq3pQuantile regression in power spacepixi = 3 spline basis functions of f(vNWP,i)
srq4wpQuantile regression in power spacepixi = 4 spline basis functions of vNWP,i
Figure 5.

Different model fits (median) for the full data set at lead time 24 h plotted in wind space (left) and power-by-wind space (right).


In this section, several measures are described to compare the performance of the different models. First, a score is introduced in Section 4.1 to measure the value of a forecast in a simplified energy market. Such a single value score is very convenient in comparing the performance of different forecast methods but unfortunately cannot fully characterize the performance of a forecast. [23] Therefore, two important properties of quantile forecasts, reliability and sharpness, are discussed in the following subsections. Reliability is the crucial property of a good forecast that the forecast probabilities match the observed relative frequencies. A test to check whether this property is fulfilled is presented in Section 4.2. For two reliable forecasts, the one with the narrower predictive distribution is preferable. This property is termed sharpness for which measures are defined in Section 4.3.

To evaluate these measures and their variances in an empirical setting, we employ a bootstrapping [31] approach as suggested by Hothorn et al. [32]:

  • Sample n times with replacement from the entire data set (bootstrap sample).

  • Fit the models on this bootstrap sample.

  • Compute performance measures on the ‘out-of-bootstrap’ data, i.e., the observations not contained in the bootstrap sample (approximately 36.8% of the data).

  • Repeat steps 1–3 k times.

With this approach, we obtain k = 250 values for each verification measure, which can be interpreted as a sample from the associated distribution.

4.1 A simple market model score

Since one important application for wind power forecasts is energy trading, the value of a forecast in an energy market can serve as a direct indicator of forecast performance. Instead of a real energy market, we use a simplified market model [4, 33]: First, the provider has to bid an amount math formula of energy. The actual production though is pi. The provider always receives a fee c for the energy pi he eventually produces. If less than the bid math formula is produced, a penalty c −  for each missing energy unit has to be payed. If too much is produced, each kW of surplus energy is penalized with c + . Thus, this simple market can be described by the expected income or revenue

display math(16)

In the work by Bremnes, [4] it is shown that the expected income is maximized when math formula, with π = c +  ∕ (c +  + c − ). When dividing Equation (16) by (c +  + c − ), replacing math formula by qπ(pi | xi), and using π = c +  ∕ (c +  + c − ), it can be seen that for a specific price combination c, c −  and c +  the best forecast is the one that minimizes

display math(17)

Note that this equation is equivalent to the loss function used for quantile regression [Equation (13)], which is sometimes also referred to as quantile score. [34]

A simple performance measure for wind power forecasts would be to compute the income of a specific forecast for a test data set (e.g., as in the work of Bremnes [4]). However, to do so, specific market prices have to be assumed. Because prices can vary over different markets and days, we avoid to assume specific market prices by taking the sum of Si,π for a range of possible price combinations:

display math(18)

Here, small values of Si denote good performance. The mean value of Si over the test dataset is denoted as math formula. Note that this score also fits into the framework of Pinson et al. [2] and Gneiting and Raftery [34] for a unique skill score.

4.2 Reliability

Reliability is the property of the forecast probabilities to be in accordance with the observed relative frequencies. For example, 75% of the observations should be on average below the 0.75 quantile. The set of quantile forecasts math formula form 10 intervals with nominal probability of 1 ∕ 10 for an observation vi to fall into one of these intervals. To test the reliability, the relative frequencies of observations falling into specific intervals can be compared with their nominal probability by a Pearson's χ2-test as proposed by Bremnes. [7]

A problem occurs for the censored regression models when the observation falls on one of the censoring points (zero or nominal power). If one or more quantiles are below cut-in or above nominal wind speed, it is not clear in which interval the observation falls. Thus, in the χ2-test, such censored observations are split up proportionally into the intervals from which they may stem. To illustrate this split-up strategy, consider the following example (Figure 6): If the uncensored wind quantiles are math formula and math formula, and the observation is censored at cut-in wind speed vCI = 3 (i.e, f − 1(p i = 0)), then it may either come from the first decile (10%) or the first quarter of the second decile ( 2.5% = (3 − 2.5) ∕ (4.5 − 2.5) * 10%). Thus, the first decile receives weight 0.8 = 0.1 ∕ (0.1 + 0.025) and the second decile receives 0.2 = 0.025 ∕ (0.1 + 0.025) for this event.

Figure 6.

Schematic figure how censored observations are split up in a χ2-test.

Note that this analysis is done in wind space [before applying Equations (2) and (3)]. Exceptions are the models that are estimated in power space for which the analysis is also done in power space. Since censoring is not considered in these models, the split-up of observations is not necessary. Like in the study of Bremnes, [7] we declare forecasts to be unreliable if the p-value of the χ2-test is below 0.05.

For the quantile regression models, quantile crossing may occur, which makes this reliability analysis difficult. As a simple solution, we therefore sort the quantiles before testing. For example, if the 0.2 quantile is higher than the 0.3 quantile, they are interchanged.

4.3 Sharpness

Sharpness is a further property that can be used to characterize forecast performance. Here, we follow the definition of Pinson et al. [2]: define a central prediction interval as

display math(19)

The probability of the observation to fall within this interval is α. Given a reliable forecast, it is preferable that this prediction interval is as narrow as possible, which is related to a small forecast uncertainty. This property is measured by the mean width of the prediction interval over the dataset math formula, which is hereafter denoted as sharpness.


In this section, the verification measures, introduced in the previous section, are used to compare the performance of the different models. Since reliability is the crucial property for a good probabilistic forecast, it is assessed first. Table 2 shows the medians of the 250 reliability p-values from bootstrapping of all tested models and lead times. First, it can be seen that all tobit models have worse p-values for lead times 12 and 36 h than for 24 and 48 h. While the heteroskedastic tobit model is still reliable for these lead times, the p-value of the standard tobit model drops beyond the 0.05 level. A probable reason for this difference can be found in Figure 7, which shows the relative frequencies of observations falling into the intervals formed by the predicted deciles from the heteroskedastic tobit model. For both, 12 and 24 h lead time, the observations fall slightly too often into intervals in the center and too rarely into intervals in the margins (for 36 and 48 h, figures look very similar to 12 and 24 h respectively and are therefore not shown). This suggests that, in fact, the response follows a distribution with somewhat heavier tails than the normal distribution. Although this problem is apparent for both lead times, it is less pronounced for lead time 24 h, which results in higher reliability p-values than for 12 h.

Table 2. Median p-values (from 250 bootstrap samples) for different lead times (h) from the reliability test for models listed in Table 1.
Figure 7.

Relative frequencies of observations vi falling into intervals math formula for the heteroskedastic tobit model (htobit3) for lead time 12 (left) and 24 h (right).

When regarding the non-parametric models in Table 2, it can be seen that censored quantile regression is reliable for all lead times. A comparison with the uncensored quantile regression shows that not considering the censoring clearly deteriorates the reliability. Finally, it can be seen that all models in the power space seem to have problems with reliability, whereas the model in the untransformed space (srq4wp) is reliable throughout all lead times.

Similar features as in Table 2 are shown in Figure 8 where a more detailed picture of reliability at lead time 24 h is plotted. As in Table 2, it can be seen that all censored models in wind space (i.e., using the inverse power curve transformation) and the model in the untransformed space are rather reliable, while the uncensored quantile regression in wind space (rq3) and the models in power space are not.

Figure 8.

Reliability p-values of different models (Table 1) for lead time 24 h. A horizontal line is plotted for 0.05. The boxes indicate the interquartile ranges of the 250 values from the bootstrapping approach, the whiskers show the most extreme values that are less than 1.5 times the length of the box away from the box and points are plotted for values that are outside the whiskers.

In Figure 9, the market score for different lead times is plotted. Not surprisingly, the market score increases with lead time. Apparently, the models predict more poorly for 24 and 48 h (nighttime) than for 12 and 36 h. This can be attributed to the fact that in our data set, more events with zero production can be found for day time than for night time. For these events, mostly a large number of the regarded quantiles are also 0 and therefore Si small [Equation (18)]. When comparing the models among each other, the differences are small and seem mostly not significant when compared with the uncertainty. To determine if differences are significant, the 250 values from bootstrapping need to be considered pairwise. This can be done, e.g., by using skill scores.

display math(20)

where math formula is the market score of a reference model, which is in our case, the quantile regression model with spline basis functions in power space (srq3p). Figure 10 shows this market skill score for the different models. The heteroskedastic tobit model (htobit3) performs clearly better than the reference model, and the censored quantile regression (crq3) is still somewhat better. The remaining models are neither clearly better nor clearly worse than the reference except for the quantile regression model with spline basis functions of wind speed forecasts in the power space (srq4wp), which performs worst. Note that the better performance of the heteroskedastic tobit model (htobit3) stems from additional predictive information in form of the ensemble standard deviation of the 10 m wind speed ensemble forecast.

Figure 9.

Market score (math formula; smaller is better) for different models (Table 1) and lead times.

Figure 10.

Market skill score relative to the reference model srq3p (larger is better) for different models (Table 1) and all lead times. Market skill scores greater than 0 indicate better performance than the reference model.

The sharpness for two different prediction intervals and lead time 24 h is shown in Figure 11. One feature of this figure is that the non-parametric models have clearly better sharpness, especially for the small 0.4 prediction interval. This can again be attributed to the fact that the assumption of a normal distribution in the parametric models does not apply perfectly (Figure 7).

Figure 11.

Sharpness (smaller is better) of interval forecasts with interval probabilities α = 0.4 (left) and α = 0.8 (right) for different models (Table 1) and lead time 24 h.

Finally, we show in Figure 12 the market scores for different training sample sizes. For computation, we used the same bootstrapping approach as described in Section 4 but taking smaller samples in step 1. Clearly, the performance increases with a larger training sample. Fewer parameters have to be estimated for the parametric models . Therefore, it is not surprising that they perform better than the non-parametric models if only few data are available for fitting. The spline model with the completely untransformed data (rqs4wp) has the most degrees of freedom and is therefore the worst of all models for small training sample sizes. While for very small training sample sizes, the simplest tobit model (tobit1) seems to be the best; the heteroskedastic tobit model is already best for training sample sizes ≥ 100. However, note that as for the full dataset, the differences are mostly relatively small compared with the uncertainty.

Figure 12.

Market score (math formula; smaller is better) for different training sample sizes, selected models and lead time 24 h.


A combination of new approaches for improving probabilistic wind power forecasts is proposed: (i) Exploit the readily available information from the power curve of the turbine to transform observed power production to wind speed (inverse power curve transformation). (ii) Respect the limited range of power production between zero and nominal power (in power space) or between cut-in and nominal wind speed (in wind space) with censored regression models. The resulting combined strategy has the advantage that almost all non-linearity and heteroskedasticity of the observations is directly captured. Consequently, relatively simple linear regression models with normally distributed responses can be used.

To assess this new strategy, a wide range of combinations of parametric and non-parametric regression models, with and without inverse power curve transformation, with and without censoring information is considered for data from a wind turbine in Austria. For all models, wind speed forecasts and its transformations are used as regressor variables and, furthermore, some heteroskedasticity models additionally use the standard deviation of the ECMWF ensemble forecasts. It is shown that the censored regression models obtained in wind space with the inverse-transformed power production are more reliable than uncensored regression models in all spaces considered (i.e., in wind space, power space and power-by-wind space). As for the comparison of parametric versus non-parametric censored models in wind space, it can be shown that the more parsimonious parametric models already perform well for relatively small training samples, while the non-parametric models perform somewhat better in large training samples. However, the performance of the parametric models may potentially be improved in future work by using a response distribution with heavier tails (e.g., logistic or Student-t instead of Gaussian) so that the sharpness can be enhanced.

We have not applied the inverse power curve transformation in combinations with other non-parametric regression models except quantile regression. Nevertheless, the inverse power curve transformation could be applied in combination with any other approach for probabilistic wind power forecasting (e.g., ensemble post-processing [15-17] or kernel density estimators [7, 10-12]). However, consideration of the censoring might be more difficult for these approaches.

In addition to data from the wind turbine presented in this manuscript, data from another wind turbine were assessed but not presented as they lead to very similar outcomes. Hence, similar results can be expected for other turbines/regions, but, of course, this still has to be tested in future work. One special feature of the tested turbines is that the wind speeds are relatively small, and thus, right-censoring (at nominal speed/power) does not play an important role although it is supported by our models. Furthermore, switching off the turbine because of too high wind speed did never happen in our data and is therefore not considered in our models. For turbines where this plays a role, it would generally be possible to consider an additional right censoring. Instead of using the manufacturer's power curve, an empirical power curve computed from observation data [35] could be used as well. This is particularly important if forecasts for entire wind parks are required, which consist of different types of turbines. Our results are based on a global NWP model rather than a limited area model, as employed by most other studies. However, given the results of Louka et al. [36] and Müller, [37] we do not expect the findings to change much when based on a different NWP model.

6.1 Computational details

Our results were obtained on Ubuntu (Canonical Group Limited, London, United Kingdom) and Debian GNU/Linux (Debian Project) using R 2.15.1 [38] and packages quantreg 4.79 [39] for (censored) quantile regression, and numerical optimization of the likelihood for the (heteroskedastic) tobit models via optim() with method = “BFGS”. A proper package for the latter is under development, but the code is also available upon request in the meantime.


This study was supported by the Austrian Science Fund (FWF): L615-N10. The first author was also supported by a PhD scholarship from the University of Innsbruck, Vizerektorat für Forschung. We are also very grateful to WEB Windenergie AG for providing the wind turbine data. Data from the ECMWF forecasting system were obtained from the ECMWF Data Server. Finally, we thank four anonymous reviewers for their comments and suggestions.