## 1. Introduction

Verification of numerical weather prediction (NWP) models plays a central role in the improvement of both short and medium range forecasts. Deterministic and ensemble forecasts are assessed to ascertain their skill against observations and the latter are assumed to be exact. Even though this assumption is not in general true, it is widely accepted in the context of verification. In the last few years, many papers have discussed the validity of this assumption and concluded that it may be legitimate for longer forecast ranges, when the forecast error is much larger than the observation uncertainty, but there is an inconsistency when accepting that the model can be uncertain while the observations are exact.

Earlier attempts to take account of observation uncertainties can be found in Ciach and Krajewski (1999), Briggs *et al.* (2005) and Roberts and Lean (2008), who use the information on observation error to define the uncertainty. Bowler (2008) takes a slightly different approach whereby the standard deviation estimates from the data assimilation are used to quantify the error and therefore the uncertainty.

Saetra *et al.* (2004) investigate the effects of observation errors on the statistics for ensemble spread and reliability by adding normally distributed noise with predefined standard deviation to the forecast for each ensemble member. The addition of this uncertainty reduces the number of outliers, leading to flatter rank histograms in the short-range forecast. Moreover, Saetra *et al.* show that rank histograms are highly sensitive to the inclusion of observation error in the verification process, whereas reliability diagrams are less sensitive. In fact, perfect observations or observations with added noise produce almost identical results. Similar results are discussed in Pappenberger *et al.* (2009), who classify observation uncertainty as a result of measurement errors, inhomogeneous observation density, or model or observation interpolation.

Candille and Talagrand (2008; hereafter CT08) validate an ensemble prediction system introducing the ‘observational probability’ method in the verification process (hereafter referred to as OP). They find that reliability and discrimination are degraded, while resolution is improved. Observation uncertainty is defined by the normal distribution, whose expectation and standard deviation are obtained by random draws from a normal and lognormal distribution, which guarantees that the uncertainty is variable in both mean and spread.

The present paper discusses the uncertainty associated with precipitation observations due to inability of such observations to be representative of an area around them. This uncertainty is often referred to in the literature as ‘representativeness error’ and is linked to the large spatial variability of rainfall. The verification approach presented in this paper is applied to precipitation forecasts from two ensemble forecasting systems and includes the uncertainty as sampled from the spatial variability of observed precipitations. As an extension of OP, and because of its empirical nature, it is defined as ‘observed observational probability’ (O-OP). This methodology aims at extending previous attempts to include uncertainty in the verification process to variables that are non-Gaussian distributed. Particular attention has been paid to the asymmetry of the precipitation distribution within each grid box. The stronger the asymmetry, the more likely that representativeness issues play an important role in the scores computation. Synthetic data experiments are used to compare, in a theoretical context, O-OP with the methods assessed in CT08, as well as to discuss the model performance dependency on the asymmetry of the observations' distribution. These experiments help in understanding the behaviour of different verification methods and support the results obtained with forecast data experiments.

Observation uncertainty is defined using information from high-density observation networks available in Europe. Stations contained within each model grid box are used to define the uncertainty, while their averaged value, assigned to each grid point, is the observed status of the atmosphere.

Section 2 contains the description of the observation database and of the forecasting systems. Section 3 describes metrics and the O-OP methodology applied to synthetic and real data, while results are discussed in section 4. Conclusions are drawn in section 5.