## 1. Introduction

Weather forecasts are typically made and reported in the form of an expected value for the attribute of interest in a particular time and location. Numerical weather prediction (NWP) models are advanced computer simulation systems that provide such expected value forecasts for a number of attributes. Although the deterministic interactions of simulated physical processes in such systems yield real-value numbers with high precision, these values are uncertain due to the inaccuracy of initial conditions, parameterization of sub-grid scale processes, and various simplifying assumptions (Palmer, 2000; Orrell *et al.*, 2001; Lange, 2003). However, such uncertainty information is not available in the immediate outputs of the system. Yet, in many applications it is desirable that forecasts be accompanied by the corresponding uncertainties. Information about forecast uncertainty may be as significant as the forecast itself. Such information can have important role in planning and decision making processes that utilize the forecasts (Chatfield, 1993; Richardson, 2000).

The uncertainty of a forecast is typically formulated and communicated using prediction intervals (PIs) accompanied by a percentage expressing the level of confidence or nominal coverage rate [e.g., *T* = (2, 14 °C), conf = 95%] (Hahn and Meeker, 1991; Chatfield, 1993). The confidence level specifies the expected probability of the actual observation to fall inside the PI range. This type of forecast (sometimes called a central credible interval forecast or forecast interval) may be harder for a non-specialist to interpret and evaluate, but it provides the user with a more complete description of the predicted phenomenon compared to a point forecast (Chatfield, 2001). In spite of the clear value of PI forecasts, this format of forecast ‘…has been largely overlooked by meteorologists and would benefit from some attention…’ (Jolliffe and Stephenson, 2003).

A major category of solutions for uncertainty analysis and PI estimation, especially in meteorology, is based on ensemble predictions (Ehrendorfer, 1997). In this method, individual predictors are members of an ensemble of forecasts run with different parameters and/or initial conditions. The forecast uncertainty is linked to the dispersion among the members (Richardson, 2000; Toth, 2003). However, ensemble executions of an NWP model incur a high computational cost, making this approach infeasible in many applications especially when updated uncertainty analysis is required in short temporal intervals.

PIs can also be obtained by statistical modelling of forecast error using the historical performance of relevant past forecasts by the system (Chatfield, 2001; Jørgensen and Sjøberg, 2003). In this approach, dynamics of the forecast uncertainty is essentially learned from the recorded accuracy of past forecasts. Such forecasts are available for many deterministic forecasting systems. In the current study, the focus is on this approach as a potentially efficient method that has received relatively little attention in the literature.

It is a well-known fact that the extent of forecast uncertainty varies with the weather situation (Palmer, 2000). For example, low pressure systems are known to be less predictable than the more stable high pressure systems. It is expected that such patterns of uncertainty dependency on the forecasted attributes can be discovered from the historical performance of the NWP forecasts (Lange, 2003; Nielsen *et al.*, 2006; Pinson *et al.*, 2006). Lange *et al.* discovered such dependencies by clustering the performance records into six separate groups and characterizing the attributes of their error distribution individually (Lange, 2003, 2005; Lange and Heinemann, 2003). However, this analysis is not practically used and evaluated for the purpose of deriving PIs from a deterministic forecasting system.

A practical application of weather classification to obtain PIs was proposed by Pinson *et al.* (2006) and Pinson and Kariniotakis (2010). The authors used two predicted variables, wind speed and wind power, to categorize wind energy forecast records into four manually defined classes (Pinson, 2006). PIs were then computed using empirical quantiles of the error distributions in each group and fuzzy membership values of a new forecast in each of the predefined groups. Experimental evaluation of the resulting PIs demonstrated applicability of the historical forecast grouping approach. It provided skilful and relatively reliable PIs from the initial point forecasts.

To improve the quality of the resulting PIs and also to alleviate the problem of manual grouping of the weather forecasts, we investigate the application of automatic objective-based clustering algorithms to obtain optimally defined forecast record groups that follow the inherent structures in data. It is suggested that as these clusters are based on the actual similarities between the past forecast situations they lead to PIs of higher quality. Moreover, this approach does not suffer from the limitations of expert-based definition of partitions which becomes a daunting task by increasing dimensionality of the influential variables. In this study, the application of crisp clustering algorithms (*K*-means, CLARA and Hierarchical Clustering) is examined, and the resulting PIs assessed. Fuzzy C-means clustering is also applied as a natural alternative to the crisp allocation of forecast records to clusters. The next step involves fitting of an appropriate probability distribution function to the actually observed error distribution in each cluster. Statistical techniques are examined in this regard and the required modifications when the fuzzy approach is used are considered. Inherent in all these models is the dynamic calibration of forecasts by uncovering the ‘situation-based’ forecast bias.

The evaluation of PI forecasts and probabilistic forecasts in general, is more complex compared to point forecasts. To test the proposed approaches empirically, the PI models are applied to two real-world data sets. A comprehensive evaluation framework that covers all major measures found in the PI evaluation literature is developed. This approach also brings some new insights to the PI verification process, leading to fairer judgements.

The applicability and quality of the resulting PIs in practical scenarios is also investigated. The results provide insight into the role of different aspects such as the clustering algorithms, number of clusters, feature sets, distribution fitting algorithms and their appropriate choice in the uncertainty modelling process. In addition, higher skill and quality of the output PIs compared to some baseline PI approaches and raw point predictions of the NWP system show the advantages and value of the proposed models.

Next section provides basic definition of PIs and explains the density fitting methods used in the quantile calculation process. Section 'Using clustering techniques for uncertainty modelling' explains the application of clustering algorithms in the uncertainty modelling process. PI forecast quality measures and the verification framework are explained in Section 'An evaluation framework for PI forecasts'. Finally, Section 'Experimental study' reports the experimental results, while conclusions and future directions are provided in the last section.