Weibull wind speed distribution: Numerical considerations and use with sodar data



[1] Two analyses have been performed of the use of the Weibull distribution to describe wind speed statistics. The first is a combination of theoretical considerations in a common domain of c and k parameters concerning some robust indicators of position, spread, skewness, and kurtosis. The second is a calculation of the Weibull parameters using three differing methods based on a 3-a sodar database. The modified maximum-likelihood method is direct, the method of weighted probability moments considers order statistics, and the method based on the minimum RMSE is iterative. As a result of the theoretical analyses, we propose some simple relationships involving Weibull parameters and the range of a fraction of central data, the variation coefficient, and the Yule-Kendall index, which may be applied practically. The calculation of Weibull parameters has revealed the sharp contrast between day, where the fit was highly satisfactory, and night, mainly below 300 m. Moreover, a seasonal pattern was also observed. The comparison between the methods used also proved satisfactory, particularly during day, whereas a slight disagreement was observed during the night for the method based on the minimum RMSE. Finally, randomly generated samples were used to check the accuracy of the Weibull parameters in the domain analyzed, resulting in small residuals and standard deviations of the values calculated.

1. Introduction

[2] The rapid expansion of wind power in recent years has led to the need for suitable descriptions of wind speed. Various expressions have been used [García et al., 1998], amongst which the Weibull distribution has gained widespread acceptance [Sahin, 2004]. This function has two adjustable parameters whose values determine its appearance; hence analyses of this function and the methods of parameter calculation always prove to be of interest.

[3] Pérez et al. [2004] used 1 month of observations to analyze wind behavior in the low atmosphere and to calculate, among other variables, Weibull parameters by four methods, which may be considered typical from the analysis of references cited in that paper. One restriction of this study is the size of the database. Other methods have also recently been used to calculate Weibull parameters, evidencing the fact that this research line is active, since new methods are being proposed which need to be tested. This paper aims to pursue such relatively unexplored lines of investigation using a sufficiently wide enough database as to observe daily and seasonal parameter evolution.

[4] The computational procedure to obtain the Weibull parameters may be key when a wide database is involved and the calculation complexity can be used to establish a simple classification based on the following three classes: direct methods, methods based on order statistics and iterative procedures. Some of the methods are direct since their values are provided by simple expressions involving little handling of data. This is the case with the method of moments, based on the mean and the standard deviation of data [Justus, 1978]. Other methods are based on order statistics, and slow down calculation when extremely long data series are concerned, although the expression used for the parameters may be simple. An easy example of this second kind is the method based on median and quartile wind speeds [Justus et al., 1978]. Finally, iterative procedures imply more complex calculation, which may also prove slower with wide databases, such as the maximum-likelihood method, which is sometimes considered [Seguro and Lambert, 2000]. In any case, the wide diversity of methods currently used evidences the fact that there is no single universally accepted procedure to calculate the Weibull parameters and that the choice of whatever method should be conditioned by computational requirements.

[5] In this paper two approaches have been adopted. The first is a theoretical analysis of the Weibull distribution to obtain and condense information into useful expressions. This approach is not common, as most papers simply present the Weibull distribution without any further consideration. However, we have followed a numerical calculation in order to gain an insight into the behavior of this function. Moreover, the domain of the Weibull parameters is another key point, since considerations concerning this function with parameters far from their usual values hold no interest in terms of practical application.

[6] The second approach compares the values of the Weibull parameters using three methods corresponding to the classification considered above. Two of the procedures have rarely been used, justifying their study in an attempt to gain a better understanding of them. The third method is proposed as an alternative which may prove as useful as the more well-known and applied procedures.

[7] The database considered is obtained from a sodar. Extensive experience related to this technique is available at present [Coulter and Kallistratova, 2004] and offers two main advantages. First, no hypothesis or upscaling of data in height is required as is usually the case in wind analysis [Altaii and Farrugia, 2003; Archer and Jacobson, 2003], since data are provided by the device. Second, it provides information on an atmospheric region which is close to the ground, although as yet not well researched, since most analyses are restricted to only a few tenths of meters in height, corresponding to the surface layer.

[8] Finally, random samples have been generated to calculate the accuracy of the Weibull parameters and to investigate a possible trend in this accuracy.

2. Experimental Description

[9] The sodar transmits acoustic pulses of a certain frequency (2200 Hz) into the atmosphere and receives the backscattered signal, whose frequency is shifted according to the wind component parallel to the propagation of the acoustic waves (Doppler Effect). Wind speed is obtained from this frequency shift.

[10] The equipment used is a DSDPA.90-24 sodar built by METEK GmbH, installed in the Low Atmosphere Research Center (CIBA), 41°49′2″N, 4°56′15″W and about 30 km NW of Valladolid (Spain). The location is a very extensive plateau 840 m above MSL, with no relief elements as a result of which horizontal homogeneity is guaranteed. Nonirrigated crops and grass make up the surrounding vegetation, the roughness length thus being only a few centimeters.

[11] The 3-a measuring period commenced 1 August 2002. Data were continuously acquired over this period, the only noticeable interruptions occurring for about 25 d (10 d in May 2003 and 15 d in June 2003). The minimum height proposed for our measurements was 40 m and the maximum 500 m, measurements being limited to 20 m levels. Two kinds of files are generated by the device: the first comprising raw data, with values each 15 s, and the second considering 10-min averages provided simultaneously for the levels investigated. Only the latter files have been used in this paper.

[12] The files yielded abundant information for each level measured, although only wind speed was considered in this paper. Each value is accompanied by its plausibility code, which represents the results of the plausibility test performed on the averaged power spectra and is used to control data quality. Data were rejected according to standard criteria considered by the manufacturer.

3. Theoretical Bases

3.1. Weibull Distribution

[13] The Weibull distribution is a two parametric function expressed mathematically as

equation image

where v is the wind speed, c is the scale factor, and k is the dimensionless shape parameter. For k < 1 the function decreases monotonically, k = 1 giving an exponential distribution with mean value c and a maximum away from the origin appearing if k > 1. Its cumulative distribution function is

equation image

Hence the percentile vp corresponding to the p fraction of data is expressed as

equation image

[14] Consequently, some numerical summary measures may easily be obtained [Wilks, 2006], some being a function of both parameters. However, other theoretical relationships, which involve quotients of values provided by equation (3) are only a function of the shape parameter.

[15] The mode, vM, corresponds to the maximum of equation (1)

equation image

The theoretical relationship between the median, a position robust estimator corresponding to p = 0.5, and the Weibull parameters is

equation image

The interquartile range, IQR, a robust estimator of spread which establishes the range of half of the data is

equation image

A robust alternative for the variation coefficient may be considered as

equation image

The Yule-Kendall index, a robust skewness estimator, is obtained by

equation image

[16] Finally, a robust kurtosis may be written as

equation image

3.2. Calculation of the Weibull Parameters

[17] Three differing methods are considered in this paper. Two are rarely used, and the third is an original contribution. The first is direct and requires no previous handling of data. The second involves order statistics, which slows down the process, and the third is an iterative procedure based on minimizing a quantity.

3.2.1. Modified Maximum-Likelihood Method

[18] The maximum-likelihood method is based on an iterative solution of the equations

equation image

This method was modified by Christofferson and Gillette [1987] by replacing the iterative calculation of the shape parameter by

equation image

which requires neither iteration nor sorting of data. For this reason, this method has been selected by Ahmed Shata and Hanitsch [2006].

3.2.2. Method of Probability Weighted Moments

[19] Stedinger et al. [1993] and Aksoy et al. [2004] have considered this procedure whose equations are

equation image

where L1,(ln v) and L2,(ln v) are L1 and L2 moments of the logarithm of the wind speed, which are expressed as

equation image

with b0 and b1 given by

equation image

where the time series has previously been sorted in descending order according to ln vN ≤ … ≤ ln vj ≤ … ≤ ln v1. From a computational point of view, the need to order is a disadvantage of this method when extensive data series are handled.

3.2.3. Method Based on the Minimum RMSE

[20] This is an iterative method which first calculates the histogram with observed frequencies, oi, of the experimental wind speed distribution. Second, an initial value of both parameters is selected, c0 and k0. Eight points around (c0, k0) are then also considered corresponding to c0 + nΔc and k0 + nΔk with n equal to −1, 0 or 1. For each of the nine points (ci, ki), the Weibull distribution frequencies, yi, are calculated. Also the root-mean-square error (RMSE) is obtained by means of

equation image

where M is the number of histogram classes. If the lowest RMSE is at the center of the nine (ci, ki), convergence is then reached. If not, the pair (c, k) corresponding to the lowest RMSE is considered as the new initial point and the procedure begins again in an attempt to reach convergence.

4. Results

4.1. Weibull Distribution Analysis

[21] Our initial purpose was to investigate the behavior of the Weibull distribution in a wide and common range of values of both parameters using numerical methods. c values from 0.1 to 20 m s−1 in 0.1 m s−1 steps and k values from 1 to 4 in 0.01 steps were selected. The mode of the Weibull distribution, equation (4), was then calculated and is presented in Figure 1, in which each line is drawn for the corresponding value of the mode. Two branches appear, one horizontal and the other vertical, which remain when the value of c increases. Additionally, for low winds, i.e., low mode, and small k, this parameter is nearly constant in a wide range of c values. The opposite behavior is observed for low modes and small c, which remain almost constant over a wide range of k values. The median provided by equation (5) is represented in Figure 2. In this case, medians turn to the right at low k for all values of c.

Figure 1.

Mode of the Weibull distribution calculated from equation (4) for c from 0.1 to 20 m s−1 in 0.1 m s−1 steps and k from 1 to 4 in 0.01 steps.

Figure 2.

Median of the Weibull distribution calculated from equation (5) for c from 0.1 to 20 m s−1 in 0.1 m s−1 steps and k from 1 to 4 in 0.01 steps.

[22] IQR is shown in Figure 3. Although equation (6) does not appear linear, Figure 3a shows the almost linear relationship between the variables considered. This linear relationship is even maintained when the range between two given percentiles increases. Figure 3b shows an extreme example for a range of 90% of data, from percentile 5 to percentile 95. This linear behavior suggests the possibility of a close relationship between c, k, the fraction of data at the middle of the distribution, fr and its range, rg. We have considered fr from 0.5 to 0.9 in 0.1 intervals, rg is 10 m s−1 for 0.5, 15 m s−1 for 0.6 and 20 m s−1 for the remaining fractions of data in 1 m s−1 steps, according to the region of c and k covered by these values and the usual wind speeds. For each fr and rg, the parameters of a linear fitting, a and b, have been calculated according to

equation image

As a result, we observed that a1 may be expressed as a linear function of only fr,

equation image

and b1 was a linear function of rg−1

equation image

with a3 proving too small (about 10−3), for which it may be ignored, and b3 being a linear function of fr2,

equation image
Figure 3.

Range of central data as a function of the Weibull parameters for (a) a fraction of data equal to 0.5 and (b) a fraction of data equal to 0.9.

[23] The final expression is an easy relationship

equation image

[24] The satisfactory agreement of this equation is obtained by a comparison between initial k and that calculated by the equation. Consequently, this is a useful expression since it relates four variables in wide intervals of values. By way of an example, once c and k are known, this equation provides the range for a given fraction of data.

[25] The variation coefficient, equation (7), may be fitted by two expressions

equation image
equation image

[26] Considering the same approach followed for the interquartile range, the relationship between a fraction of central data, fr, and the median is investigated as a function of k by means of equations (21)(22). Tables 1 and 2 present the parameters and the squared correlation coefficient, r2, whose high values reveal the goodness of the fit.

Table 1. Parameters of the Variation Coefficient in Equation (21) for Several Fractions of Central Data fr
Table 2. Parameters of the Variation Coefficient in Equation (22) for Several Fractions of Central Data fr

[27] Since k is related to a simple wind classification, this may be also established by means of VC:

equation image

As additional information obtained from the VC fit, k below 1.5 corresponds to an IQR above the median, whereas k below 3 corresponds to an IQR greater than 0.5 times the median.

[28] The Yule-Kendall index (equation (8)), has also been calculated. Its values are extremely low, revealing the high symmetry of distributions. As a singular case, k = 3.3 corresponds to the null value of this index. As with the previous variables, the Yule-Kendal index proved a satisfactory fit to a linear function of 1/k

equation image

[29] The kurtosis, K, was the final indicator considered, although, unfortunately no easy relationship with k is possible, as is reflected by Figure 4. However, since 0.263 corresponds to a normal distribution, in our distribution k = 1.35, flat distributions are obtained for k below this value. The maximum is reached at k = 2.27, and above this value, the kurtosis falls slowly.

Figure 4.

Kurtosis, K, as a function of the shape parameter, k, in the range considered.

[30] As a final remark in this section, correlations proposed linked to the Weibull distribution are valid when this distribution is used, whereas empirical correlations are established from measurements [Kavak Akpinar and Akpinar, 2004].

4.2. Results and Comparison of the Weibull Parameter Calculation

[31] First, a short meteorological description is presented in Figure 5, where wind speed medians are represented as a height function. Medians have been obtained by season and period of day in order to observe the yearly pattern and the contrast between day and night. According to atmospheric stability analyses, day is defined as the period from 1 h after sunrise until 1 h before sunset [Mohan and Siddiqui, 1998]. Figure 5 may be analyzed according to wind speed profiles and values. The main result of profile analysis is the sharp contrast between day and night. Wind speed increases slightly with height during the day, summer profiles being nearly flat. However, low-level jets are well defined during the night, the jet core being located at around 300 m, with a wind speed around 12 m s−1. Summer is an exception, since the jet core is lower, at 200 m, as is wind speed, around 10 m s−1. A seasonal pattern is also observed in wind speed values. At lower levels, below 100 m, wind speeds are similar in autumn and winter, with a low contrast between day and night. However, they are higher in spring and summer, the day-night contrast proving particularly high in the latter season. As regards the whole profile, wind speed values are similar in spring and autumn. They are slightly lower in winter and lowest in summer. Finally, at the jet core, the day-night contrast is similar in the four seasons.

Figure 5.

Daily and seasonal cycles of wind speed profile in the low atmosphere.

[32] Figure 6 presents the Weibull parameters calculated with the three methods previously described. Greater c values, around 11 m s−1, are reached from 300 to 400 m. k values remain stable above 200 m. They are similar, slightly greater than 1.8 for the modified maximum-likelihood method and the method of probability weighted moments. The method based on the minimum RMSE provides values between 1.7 and 1.8. However, some small amplitude fluctuations are present below 200 m. These fluctuations hide the k maximum demonstrated by Wieringa [1989] and Emeis [2001].

Figure 6.

Profiles of the Weibull parameters calculated from the complete data set with the three methods considered.

[33] A fast result is obtained for the modified maximum-likelihood method. However, calculations of the method of probability weighted moments are slow due to the large database used. The method based on the minimum RMSE, although iterative, has a relatively fast convergence.

[34] In order to observe daily and seasonal patterns, Figure 7 shows the Weibull parameters calculated with the modified maximum-likelihood method, the number of data used, and the RMSE between the experimental histogram with 0.1 m s−1 class width and the theoretical distribution, is also presented.

Figure 7.

Results of the calculation of the Weibull parameters with the modified maximum-likelihood method.

[35] Although the number of available data decreases with height due to a quality control test, results for the higher levels have been retained since the trend does not change and outliers are not present.

[36] The scale parameter, c, has lower values during the day (around 10 m s−1 above 300 m) than at night (around 13 m s−1 above 300 m), where a maximum is present linked to the development of the nocturnal low-level jet. Summer values do not display the same trend as the rest of the seasons, as they are slightly lower during the day, around 1 m s−1, and about 2 m s−1 lower above the 200 m level during the night.

[37] The shape parameter, k, which is more variable during the night, reaches values from 1.2 and 2.6 and the 300 m level establishes a difference in its behavior. Below this level, dispersion of values is greater during the night than during the day and two groups of seasons are observed: winter and autumn, and spring and summer. Above 300 m, summer and autumn values are close only during the day.

[38] RMSE values justify the two diurnal periods, particularly at lower levels where RMSE during the night is more than two or three times greater than during the day. This result implies that the Weibull distribution is a suitable description of wind during the day. However, discrepancies between experimental and theoretical distributions are noticeable during the night, indicating that another distribution should be investigated.

[39] In order to consider the influence of data availability in k values, this variable is presented in Figure 8, in which differences between day and night are plotted as a continuous line. As a result, day and night availabilities are similar below 100 m and above 400 m. Only positive differences are obtained at intermediate levels, in a height range from 180 to 260 m, with maxima from 12% in spring to 16% in summer. Consequently, data availability as a cause of the contrast between day and night must be discarded due to the low differences obtained, especially for k at the lower levels.

Figure 8.

Seasonal evolution of the data availability and differences between day and night availabilities.

[40] As a previous step to developing the iterative method, RMSE values were calculated in the domain considered: c from 0.1 to 20 m s−1 in 0.1 m s−1 steps and k from 1 to 4 in 0.01 steps. Figure 9 presents the results for winter, and night, at two extreme levels 40 and 500 m. The RMSE values are smaller at 500 m revealing a better fit between experimental and theoretical distributions. However, the relevant fact is the existence of a well-defined minimum, although it is better described at 40 m than at 500 m. For this reason, as a starting point the iterative method considers the center of the domain investigated and checks the RMSE of the eight adjacent points until convergence is reached following the procedure described above.

Figure 9.

RMSE values calculated from Weibull distributions with the corresponding parameters and the experimental distribution. Values above the highest RMSE are not presented in the grey region.

[41] Figure 10 shows the comparison of the c parameter between the three methods considered in this paper. In this plot, daily and nighttime seasonal values for each level have been drawn. The modified maximum-likelihood method and the method of probability weighted moments provide good agreement, and the points of the plot are lined with extremely low dispersion. Since the slope of the fit is nearly 1, the intercept, −0.36, represents the deviation of the second method against the first. This deviation is so low that both methods may be deemed equivalent. Comparison between the modified maximum-likelihood method and the method based on the minimum RMSE reveals a greater dispersion of points and an overestimation of c3 values that increases with c1, being around 1 m s−1 for c = 12 m s−1. However, this discrepancy is caused by the nocturnal data since the diurnal values do not show any noticeable deviation between the two methods. A comparison between this third method and the method of probability weighted moments gives low point dispersion.

Figure 10.

Comparison among the values of the scale parameter, c, calculated by the three methods considered.

[42] Comparison of the k values is presented in Figure 11. It is worth remembering that the range for this variable is narrow. For this reason, agreement may be considered satisfactory, although the Pearson correlation coefficient, r, is occasionally low. As a general rule, the range of values is smaller during the day than during the night. The modified maximum-likelihood method and the method of probability weighted moments provide the same results for this variable. However, the method based on the minimum RMSE overestimates k against the value calculated by the modified maximum likelihood method. This behavior is not observed during the day, meaning that overestimation is caused by the nocturnal values, particularly during spring and summer and, to a lesser extent, in autumn. An additional result is the seasonal coupling observed: spring and summer have higher k values than winter and autumn. Comparison between the method of probability weighted moments and the minimum RMSE reveals the same pattern, but with smaller data dispersion.

Figure 11.

Comparison among the values of the shape parameter, k, calculated by the three methods considered.

[43] Although the RMSE values are slightly below 30 during the night and 10 during the day for the method based on the minimum RMSE at the lower levels, that is, slightly lower than for the previous methods, one disadvantage of this procedure is the iterative process, which proves less direct.

4.3. Accuracy of the Weibull Parameter Calculation

[44] Finally, a bootstrap method has been used to check the accuracy of the calculated Weibull parameters in the three procedures used, although we only present the results for the modified maximum-likelihood method.

[45] For each pair of parameters, c and k, in the domain considered, 20 samples of 5000 data each sample are used. These data have been randomly generated by means of the expression

equation image

where x is a random variable uniformly distributed in [0,1). For each sample, the new parameters, c′ and k′ have been calculated by the modified maximum-likelihood method, with the result that for each initial couple, c and k, 20 new pairs, c′ and k′, are obtained. The new parameters were averaged and residuals, equation image′–c and equation image′–k and standard deviations, s(c′) and s(k′), were also obtained.

[46] Figure 12a shows the results for equation image′–c, where first, second, and third quartiles are also represented for each c considered. As Figure 12a shows, equation image′–c are noticeably lower than c. At low c values, equation image′–c decreases when c increases. Above the lower c values, the spread of points is greater when c increases, mainly at positive equation image′–c. The median, in white points, remains steady beyond c = 1. The other quartiles undergo minor changes when c increases.

Figure 12.

Results of calculations of the parameters with random samples. The values corresponding to c = 0.1 m s−1 have been excluded as they may be considered outliers. Medians are white points, and first and third quartiles are black points. (a) Residuals of the scale parameter, c. (b) Residuals of the shape parameter, k. (c) Standard deviation of the scale parameter. (d) Standard deviation of the shape parameter.

[47] Figure 12b represents equation image′–k. A surface plot was chosen for this variable since this diagram was clearer than the equation image′–k as a k function scatterplot. As Figure 12a shows, equation image′–k is also too small and its greater values are confined to c below 1 m s−1.

[48] Figure 12c presents the c′ standard deviation. In this case, the only noticeable feature is the linear trend observed for quartiles, which may be parameterized according to

equation image
equation image
equation image

In (25)(27), the comparison between slopes of quartiles and the median provides information about the positive skewness of data.

[49] When s(c′)/c quartiles are calculated, almost constant values are obtained for c above 1 m s−1; 0.0047 for the first quartile, 0.0061 for the median and 0.0087 for the third quartile. Slight departures from these values are obtained when c is below 1 m s−1.

[50] Finally, the standard deviation of k′ has been represented as a k function in Figure 12d. The behavior is the same as the previously presented result. In this case the linear equations are

equation image
equation image
equation image

When s(k′)/k is calculated, quartiles have an increasing trend below k equal to 2, and are only constant above this value, 0.0128 being the first quartile, 0.0145 the median, and 0.0162 the third quartile.

[51] The same procedure has been used with the method of probability weighted moments, although for each pair of parameters, c and k, 20 samples of 1000 data have been randomly generated due to a slower convergence of this procedure. Results are twice the value in this case for equation image′–c, s(c′) and s(k′), with the same behavior as for the previous method and very similar for equation image′–k.

[52] Finally, accuracy has also been verified for the method based on the minimum RMSE. However, a certain amount caution should be exercised: only five samples out of 1000 data have been used, since convergence is extremely slow. Additionally, some data have not been considered when convergence for a given couple c and k was reached outside the domain investigated, meaning that equation image′–c residuals are only available for c above 1.4 m s−1. With these restrictions, results are similar to the method of probability weighted moments.

5. Conclusions

[53] Some robust indicators of position, spread, skewness, and kurtosis have been calculated for wind speed in a domain of common c and k values. A relationship between c, k, the fraction of central data and its range has successfully been established. Additionally, the variation coefficient and the Yule-Kendall index have been parameterized by means of simple k functions. These expressions are undoubtedly useful when one database may be described by a Weibull distribution.

[54] The Weibull parameters have been obtained with the whole database. However, the calculation of the modified maximum-likelihood method was the fastest whereas the method of probability weighted moments was the slowest due to the ordering procedure of the long time series used.

[55] The modified maximum-likelihood method has been used to calculate the time evolution of the Weibull distribution parameter profile. The parameter c has lower values during the day and the seasonal contrast is given by the lower summer values. Below 300 m, k was more variable during the night and, according to its values two seasonal groups may be established: spring–summer and autumn–winter.

[56] The RMSE used as a goodness of fit estimator between experimental and theoretical distributions evidenced that the Weibull description of wind speed was better during the day than during the night at lower levels.

[57] The modified maximum-likelihood method has been compared with the method of weighted probability moments and our proposed method based on the minimum RMSE. Agreement between the first and second methods proved satisfactory, and slight discrepancies were observed during the night with the third method.

[58] Finally, an accuracy check has been performed by considering random samples. Residuals of parameters and standard deviation of new parameters calculated were very low, indicating the accuracy of the calculations.


[59] The authors wish to acknowledge the financial support from the Interministerial Commission of Science and Technology and the Regional Government of Castile and Leon.