A spatial consistency test for surface observations from mesoscale meteorological networks



A spatial consistency test (SCT) is applied to temperature observations of a high-resolution meteorological network composed of automatic surface weather stations. The SCT's purpose is twofold: preventing gross errors (GEs) from entering automatic numerical elaborations and returning a spatial consistency flag to an external quality-control system. The algorithm is based on Bayesian concepts and exploits the existing objective analysis scheme by comparing each observed value with the corresponding cross-validation (CV) analysis value. Local data density is automatically taken into account to allow a less restrictive test for isolated stations that provide precious information on poorly observed areas. Thresholds and parameters are estimated statistically for large datasets, thus eliminating any subjective and ad hoc tuning. Misjudgment rates are estimated for both missed and false rejections. Special attention is devoted to the problem of large representativity errors which, being dependent on a prescribed scale, do require multiple cross-checks to avoid confusion with proper GEs. Copyright © 2010 Royal Meteorological Society

1. Introduction

The presence of a robust and objective data quality-control (DQC) system is very important for local, automatic meteorological networks, which, for their high sampling density and frequency, often produce an abundance of observations from measurement sites that may not be exactly compliant with World Meteorological Organization (WMO) standard criteria on observation-site location and maintenance. Such observations may be frequently affected by gross errors, systematic errors and large representativity errors.

Guidelines on the minimum specific necessities of quality controls on automated stations have been specified by the WMO (Zahumenský 2004; WMO 2008), both at station locations and at data collection centres (internal, temporal and spatial consistency). Data quality, which must be specified and guaranteed by the organization that disseminates the data, can only be monitored and maintained by a comprehensive quality-assurance (QA) system.

This can consist of operational procedures regarding siting specifics, field inspections and station maintenance, and should include a structured system of quality-control tests and decisional algorithms. Generally, results should be reviewed by human operators. An operational example is given by the Oklahoma Mesonet (Shafer et al.2000; Fiebrich and Crawford 2001; McPherson et al.2007). Efforts to document and compare quality-control systems can also be found in Steinacker et al. (2000), Vejen et al. (2002) and Schroeder et al. (2005).

Most of the work on the subject of DQC has been devoted to synoptic observations (Lorenc 1984; Gandin 1988). This is partly due to the fact that the synoptic scale has been the traditional spacing for national networks. However, this is also due to the stringent requirements on data quality posed by research on past climate and numerical model initialization. In the framework of a data-assimilation cycle, a model forecast with known error statistics provides useful and reliable a priori information. This provides the means for basing the quality control on a theoretical Bayesian framework (Lorenc and Hammon 1988; Dee et al.2001), thus relying less on ad hoc, parameter by parameter, tuning of coefficients and thresholds.

This work focuses on an automatic spatial consistency test (SCT). The basic assumption of any SCT is that observables are spatially correlated on scales that can be observed by the network: then, neighbouring measurements can be used to produce an a priori estimate for each observation. As a consequence, an SCT is particularly well suited for high-density networks, the resolution of which is sufficient to detect the dynamic scales of interest correctly. In practice, the application of an SCT may still be problematic for rapidly decorrelating parameters such as, for example, cumulative convective precipitation as measured by rain-gauges, though an SCT might still provide some useful information for long cumulation periods (i.e. one day or longer) and it is sometimes used operationally (Shafer et al.2000).

An SCT algorithm, largely based on the work of Lorenc and Hammon (1988) (hereafter LH88) and exploiting the availability of the objective analysis scheme described by Uboldi et al. (2008) (hereafter ULS08), is here applied to Lombardia's (Italy) meteorological observation network, shown in Figure 1. The area covered by the network extends south from the Alpine ridge, spanning the wide and urbanized Po Plain, and is delimited by the northernmost slopes of the Apennines. It is a densely populated area, with complex orography and land-use. Elevations range from 15–4000 m above mean sea level (a.m.s.l.) and, though landlocked, the area is characterized by the presence of important water masses (three main glacial lakes and major rivers).

Figure 1.

Orography (grey shades), main glacial lakes (cyan) and temperature station locations (red triangles) in Lombardia's domain. The bold black line is the administrative boundary. Toponyms cited in the text are also indicated. The inset shows the geographical location of Lombardia (Italy), longitude 8.5–11.5 E, latitude 44.6–45.7 N.

The network density, though not uniform, is quite high, at least for rain-gauges and thermometers. Other variables are measured more sparsely, as the station instrumentation is not homogeneous through the network, for historical reasons. Sampling frequency also varies; in the current operational setting, only hourly averaged observations are distributed and available for numerical elaborations.

At present (2009), the QA system at ARPA Lombardia includes the SCT as one of the routine checks for temperature. For all observed variables, a climatological range check is performed at first. Then the various tests (persistence, step, SCT) are performed in parallel and their results are combined by means of a simple decisional algorithm. The automatic DQC filters all data entering real-time automatic numerical elaborations. A human operator periodically reviews the results of the automated tests and assigns a final QA flag before certified dissemination.

The SCT described in this article is currently applied to hourly temperature observations only, though it can in principle be applied to more frequent sampling times and to other variables.

In the temperature field, Lombardia's network efficiently resolves most mesoscale features up to the γ-mesoscale (Thunis and Bornstein 1996) such as major valley cold pools, surface effects of Po Plain thermal inversions (ULS08), widespread foehn warming or heat-island effects of major cities; the network is unable to resolve correctly phenomena with length-scales below 10–20 km, such as cold outflows associated with isolated convective cells or katabatic warming in small lateral valleys.

ULS08 showed that the network can actually provide useful meteorological information, which is correctly retrieved by an objective analysis scheme based on a modified optimal interpolation (OI) algorithm. ULS08 also pointed out the importance of automatic data-quality procedures for preventing deleterious gross errors from entering the analysis. The objective analysis scheme itself is a powerful tool in developing an SCT unambiguously based on statistical hypothesis.

This work describes how the SCT is applied to temperature measurements using, as a priori information, the OI-based cross-validation (CV) analysis. Section 2 describes the error model, formulates the test and discusses its characteristics and limitations. Section 3 presents the statistical estimation of parameters and a quantitative evaluation of test performance. To clarify the advantages and limitations of the SCT with special regard to the occurrence of large-amplitude representativity errors, SCT application to a critical weather case is discussed in Section 4.

2. The error model and the spatial consistency test

2.1. Error model

An OI scheme combines background and observations in an analysis field, efficiently dealing with normally distributed errors, provided that the estimates of the a priori error covariances are correct. In operational practice, in addition to biases and large representativity errors (Section 2.3), observations can be affected by rough errors of various origin that completely destroy any informational content from those particular measurements (Gandin 1988).

Some rough errors are so large that they make an observation completely unrealistic. Such errors are easily detected by a simple climatological check: any observed value lying outside the climatological interval equation image is judged to be affected by such an error and rejected or flagged, depending on the operational choices. It is assumed here that such a climatological check is always performed on every observation before it undergoes the SCT and that observations failing the climatological check are rejected so that they do not undergo the SCT.

Observations that pass the climatological check have plausible values. However, they can still be affected by rough errors, making their values useless. Gross errors (GEs) are rough errors that affect plausible observations (LH88). Our goal is to detect such GEs, to reject the corresponding observations from the automatic OI procedure (i.e. not to use them to estimate an objectively analyzed field) and to return a spatial consistency flag to the external QA system.

An observation affected by GE can assume any value within the climatological interval. In other words, the conditional probability density for an observation affected by a GE is uniform over the climatological interval. If O is the (differential) event ‘the observed value is between yo and yo + dyo’, and G is the (discrete) event ‘a GE occurs’, then the probability of event O, conditional on event G, is

equation image(1)

It is also assumed that if a GE does not occur (i.e. the event , complementary to G, occurs) then the observed values are normally distributed around the (unknown) true atmospheric value, with appropriate variance. Furthermore, it is assumed that some kind of a priori information concerning the true value is available and that it is modelled using a Gaussian distribution. In LH88, such information is provided by the (model) background value. Here the observation equation image is compared, instead, with the cross-validation (CV) analysis equation image realized using all other observations, but without using the mth observation itself. In other words, the CV analysis contains all the information available on the observed quantity, with the exception of the observation that undergoes the SCT: as a consequence, the mth observation error and the mth CV analysis error are uncorrelated. In the present approach, the CV analysis represents the a priori information. The probability of event O, conditional to event , is then

equation image(2)

where the functional expression equation image indicates the normal distribution (Gaussian) for the variable y with mean ȳ and variance σ2. Since the observational error and the CV analysis error are uncorrelated, the variance of the CV residual has been written as the sum of the observation-error variance equation image and the CV analysis-error variance equation image.

Finally, the unconditional probability of event O is

equation image(3)

where equation image is the prior probability of a GE occurrence, which summarizes the global reliability of the observational network.

2.2. Spatial consistency test

Based on the error model described in Section 2.1 and following LH88, the SCT is stated here as follows: when the square of the CV residual, equation image, is larger than an appropriate threshold, the observation is judged to be affected by a GE and rejected. The amplitude of the threshold is proportional to the variance of the CV residual:

equation image(4)

The choice of the threshold is then transferred to choosing a value for the parameter T2. LH88, by applying Bayes' theorem to an error model analogous to that of Section 2.1, showed that a test condition like Eq. (4) is equivalent to equation image (i.e. the conditional probability that the observation is affected by a GE, once the actual observed value is known, exceeds 50%). Moreover, they linked the choice of a value for T2 to equation image. In our case we obtain

equation image(5)

We do not replicate here the calculations of LH88. We simply remark that, with respect to their discussion, here the CV analysis substitutes the background value as prior information. Consequently, the CV analysis-error variance, equation image, replaces the background-error variance in Eqs (4) and (5).

The first term in Eq. (5) depends on equation image. A ‘reliable’ network, characterized by a smaller equation image, will have a larger T2, resulting in a less restrictive test if compared with an observational network characterized by a larger equation image.

The estimation of T2 and equation image for Lombardia's observational network is discussed in Section 3. Equations (4) and (5) depend on the error variance, equation image, characteristic of each station. We anticipate here that we chose to replace it with equation image, a quantity characterizing the network globally rather than each single station. The SCT is nevertheless specific for each station, depending on the local data density: this point is discussed in Section 2.4.

Due to the statistical nature of the SCT, it is possible that some observations affected by GE pass the test, and that some ‘good’ observations are rejected. As a consequence, the probability of a GE occurrence, P(G), differs from equation image, the probability that an observation fails the test:

equation image(6)

Eq. (6) is easily verified by using the definition of conditional probability. The probability for a GE to pass the SCT, equation image, and the probability for ‘good’ observations to fail the SCT, equation image, are defined and discussed as follows.

2.2.1. Probability for a gross error to go undetected

An observation affected by a GE can pass the test if the observed value is close enough to the CV analysis. This may happen when the amplitude of the GE is rather small with respect to the random noise level in the data. Such an observation should not, in principle, enter the analysis, but this kind of GE is almost impossible to detect by means of a single test (Collins and Gandin 1990). Since the GE conditional-probability distribution, equation image, is constant over the climatological range, the conditional probability that an observation affected by a GE passes the SCT is given by the ratio of the length of the ‘pass’ interval around the CV analysis, equation image to the length of the climatological range:

equation image(7)

A realistic example of a small-amplitude GE passing the SCT is that of a temperature-observing station, the reports of which are for some reason ‘stuck’ at a plausible value, say 0 °C in winter. Along a 24 hour period, this observation will fail the SCT in some hours and pass the SCT in some others, depending on changes in the real temperature field observed by the other stations. Such a GE may be easily detected by other tests in the external quality-control scheme (such as a simple persistence test).

2.2.2. Probability for a good observation to fail the SCT

The CV residual can also be written as the difference between the observation error and the CV analysis error:

equation image(8)

If the observation is not affected by a GE, i.e. conditional to the event , the CV analysis error and the observational error are random errors, normally distributed and uncorrelated. As a consequence (Dee et al.2001; Tarantola 2005), their squared and normalized difference

equation image(9)

is distributed according to the well-known χ2 distribution. Here equation image is the expectation operator.

Let α be the probability that a good observation (event ) is rejected by the SCT. The failure condition, Eq. (4), is equivalent to the condition τ2 > T2. Then

equation image(10)

In other words, good observations failing the SCT lie within the complementary of the equation image quantile of the χ 2 distribution:

equation image(11)

where α can be easily determined as a function of T2. This value can be taken as a minimum estimate: as is discussed in sections 3 and 4, the simple error model of Section 2.1 does not take into account the presence of large-amplitude representativity errors, which should not be mistaken for GEs but are necessarily flagged as such by the SCT.

2.3. Systematic error and representativity error

The simple error model outlined in Section 2.1 allows the formulation of an SCT aimed at discriminating between GEs and normally distributed observational errors. Observations from any real network are known, however, to be affected by other kinds of errors, primarily systematic errors (biases) and representativity errors. This is particularly true for networks aiming to resolve the mesoscale in complex terrain.

Bias-estimation techniques exploit the characteristics of systematic errors, usually small and persistent in time, by analyzing the observational time series. No bias is introduced by OI, but the presence of systematic errors in either the observations or the background field can influence the parameter estimation, typically leading to an overestimation of the error variances (Dee and Da Silva 1998).

The main limitation regarding the assumption of a normal distribution of observational errors is perhaps due to representativity. Representativity (or ‘representativeness’) errors, which are typically flow-dependent and non-Gaussian in their distribution, are not measurement errors (LH88): they are always defined with reference to a ‘model’ (in the most general sense of the word) of reality. In the case of spatial interpolation, the ‘model’ is represented by the specification of error covariances, which determine the scale of the analysis field. The possibility of resolving small scales by means of simple correlation functions is determined by the network station distribution. Smaller scales and very local responses to mesoscale dynamics may affect just one or very few observations and determine large innovations and CV residuals. A representativity error is in this case the component of an observed value due to small scales that the analysis scheme, with the prescribed covariances, is unable to resolve.

Examples concerning surface-temperature observations in Lombardia are tertiary circulation in lateral alpine valleys, steep slopes differently exposed to solar radiation, proximity to important water masses and many other micrometeorological phenomena strongly dependent on land use.

If the analysis scheme is improved, and smaller-scale features are included, representativity errors are reduced. It may be instructive to consider briefly the case of urban heat islands. Recently, the error-covariance model was modified to account for the heat-island effect around large urban areas (Lussana et al.2009), by introducing anisotropic correlations. With the previous scheme, based on isotropic correlations, heat-island effects originated large representativity errors within and around large- and medium-sized urban areas. The current scheme is able to represent these phenomena, effectively reducing such errors.

In principle, representativity errors are treated in OI as an additional component of observation error, so that very small spatial scales influencing the measurements are filtered out. In practice, though, it may happen that very large representativity errors determine the presence of unrealistic features in the objectively analyzed fields (Kalnay et al.1996). Section 4 presents a detailed discussion of our operational choices with reference to an application case where representativity errors are particularly important.

2.4. Alternative form of the SCT

Exploiting the properties of the OI scheme and under the same assumptions, in particular that the observation-error covariance matrix is diagonal, we obtain an equivalent form for the SCT that depends on a smaller number of parameters than Eq. (4). This form is more convenient to show how the SCT discriminates between isolated stations and stations in data-dense areas.

The OI analysis estimation for station points is

equation image(12)

where equation image is the vector of background values and equation image is the influence matrix (i.e. the gain matrix for the analysis on station points):

equation image(13)

The components of the symmetric matrix equation image are the background-error covariances between pairs of station points and equation image is the observation-error covariance matrix:

equation image(14)
equation image(15)

where equation image is the vector of the unknown ‘true’ values.

A widely used assumption, reasonable for station data, is that the observational errors at different stations are uncorrelated, so that equation image is a diagonal matrix:

equation image(16)

In the present conditions, as is shown in Appendix A, the analysis residual equation image and the CV residual equation image are linked by

equation image(17)

where zm is the mth diagonal element of the inverse of equation image:

equation image(18)

As a consequence, the knowledge of the diagonal elements zm allows the evaluation of each CV analysis equation image directly from the analysis equation image, without performing a complete OI solution for each observation. Explicit knowledge of the diagonal elements (in fact all of the matrix) is made possible by using the solution algorithm described in ULS08.

In the same conditions (equation image diagonal), it is shown in Appendix B that

equation image(19)

By substituting Eqs (17) and (19) in Eq. (4), an equivalent form is obtained for the SCT:

equation image(20)

Since, in particular, equation image includes the effects of representativity errors (Section 2.3), it should vary in space (from station to station) and in time (because the representativity error is flow-dependent). It is also possible to specify different values for the observation-error variance, to characterize subnetworks with different instrumental or operational characteristics. In the current implementation of the OI scheme, however, all observations are assumed to have the same observational variance, equation image (thus averaging the effect of representativity error), and

equation image(21)

where equation image is the identity matrix. The form of the test used in Section 3 and in the operational implementation is then

equation image(22)

where τ2 has been defined in Eq. (9). Because each observation affects the CV analysis at stations nearby, when more than one observation fails the SCT only the one having the largest square residual (left-hand side of Eq. (22)) is flagged. All the CV analysis are then recomputed without using that observation and the SCT is repeated until no observation fails the test.

It is now instructive to comment on the extreme (and idealized) cases of observations that are completely isolated and totally redundant.

2.4.1. Completely isolated (decorrelated) observations

For a completely isolated observation, the background errors of which are completely decorrelated from all others, the CV analysis coincides with the background value: equation image. From Eq. (19),

equation image

where equation image is the background-error variance at station m. As a consequence,

equation image(23)

The SCT, Eq. (22), becomes in this case

equation image(24)

The SCT for completely isolated stations is then equivalent to an SCT that makes use of the background as prior information.

2.4.2. Totally redundant observations

A totally redundant observing station is such that the CV analysis is always identical to the analysis: equation image. In this case, from Eqs (17) and (21), equation image and the SCT reduces to

equation image(25)

This particular case verifies the general expression of the test, Eq. (4), because the theoretical estimate of the analysis-error variance for such a station is zero: equation image.

It is perhaps worthwhile remarking that the concept, used here, of ‘totally redundant’ observations is extremely idealized: ‘redundancy’ is an essential requirement of any observational system (Gandin 1988).

These extreme and idealized cases show that the SCT automatically accounts for inhomogeneity in the data distribution. In fact, the test is less restrictive, Eq. (24), for an isolated observation that provides precious information in a case in which alternative information inferred from other sources is less reliable. The test is more restrictive, Eq. (25), for observations located in densely observed areas, where alternative information is abundant. This property is also demonstrated in Section 4, where the SCT is applied to a critical case study.

Between these idealized extrema, real stations are located in sites that differ in data density. One of the parameters that can be used to characterize the local data density is the ratio equation image. This quantity (also useful for the discussion in Section 3.3) is known for each station, as it is calculated from the OI covariances by Eq. (19). Its extreme values are 0 for totally redundant stations and (with the current assumptions for the OI scheme) 2.00 for completely isolated stations. The distribution of equation image among the network stations has median 0.31 and first and third quartiles 0.22 and 0.47, respectively. This distribution is quite asymmetrical: the network is more ‘redundant’ than ‘isolated’. This is of course a consequence of the choice of parameter values for the OI correlations (ULS08).

3. Estimation of threshold parameters

The test threshold parameters appearing in Eq. (22), equation image and T2, are estimated as outlined in sections 3.2 and 3.3. Since the technique described in Section 3.2 is strictly valid for observations that are not affected by GEs, an iterative procedure is applied. The procedure of Section 3.2 is applied first to the raw dataset to provide a tentative estimate of equation image. This value is used to estimate T2 (Section 3.3) so that the SCT can be applied (to the whole dataset), then equation image is estimated again by only using observations that pass the SCT. Convergence is quickly obtained, as the corrected equation image does not change the T2 estimate.

3.1. Statistical dataset and uncorrelated processes

The tuning of statistical parameters and thresholds used in the SCT must rely on large statistical datasets. As discussed by Gandin et al. (1993), it is reasonable to assume that the whole set of GEs behaves like random errors only if the number of observations used is large enough to compensate for the fact that the (rare) GE occurrences may be correlated in time.

For this reason, the statistical set used in this section is composed of the (hourly mean) temperature observations for the years 2006, 2007 and 2008. The entire set contains about 7 × 106 hourly observations corresponding to 26 304 network measurement realizations, one every hour.

This large dataset is not well suited for parameter estimation because adjacent realizations are correlated in time. It is possible, though, to extract several uncorrelated processes from this dataset.

A process is defined as a sequence of network realizations corresponding to states of the atmosphere that are meteorologically uncorrelated. Each process is composed of 360 elements, randomly extracted from the overall statistical set using the uniform distribution. The size of the processes is chosen in such a way that the average interval between each sampled time and the next is about three days. On the one hand, three days is enough to assume that two realizations (mostly referring to different hours of the day) are uncorrelated. On the other hand, the 360 realizations contained in a process are enough to include a wide range of meteorological phenomena. By repeating the estimation procedure with different processes it is also possible to evaluate the uncertainties associated with the estimated parameters (equation image and T2).

A possible drawback of the statistical estimation procedure is that infrequent events may originate large residuals which can then be misleadingly interpreted as GEs by the SCT. To address this issue, the behaviour of the SCT in weather situations characterized by strong gradients is discussed in Section 4.

3.2. Maximum-likelihood estimation of the observation-error variance

All assumptions made for the OI scheme are made for the SCT as well, with one difference. The OI scheme implemented at ARPA Lombardia (ULS08) assumes implicit knowledge of the observation-error variance, equation image, as only the ratio between observation- and background-error variances actually enters the interpolation scheme. For the SCT, as stated in Eq. (22), the value of equation image must instead be explicitly estimated. To do this we make use of a maximum-likelihood technique, similar to that described by Dee and Da Silva (1999).

At each time t, the innovation vector equation image is regarded as the realization of an unbiased Gaussian random variable equation image (thus neglecting the presence of non-Gaussian errors such as GEs). If a sequence of observation times equation image is selected in such a way that the resulting process equation image is white and Gaussian, then its probability density function (pdf) is

equation image(26)

where Mt is the number of observations available at time t. equation image is the innovation covariance matrix,

equation image(27)

where observation and background errors are assumed to be uncorrelated. In the OI scheme it is assumed that error covariances are stationary in time: here the subscript t only represents changes in the equation image matrix dimension due to missing data.

It is now necessary to recall other assumptions made for the OI implementation. The observation-error covariance matrix is estimated using Eq. (21). The (station–station) background-error covariance is estimated as equation image, where t elements are (correlation) functions of geographical coordinates and elevation of a pair of stations, possibly also accounting for other topographic details (Lussana et al.2009). In this way, the OI influence matrix (Eq. (13)) becomes

equation image(28)

where the ratio equation image is assumed to be known, uniform in space (i.e. the same for all observations) and stationary in time.

Here, for the purpose of the SCT, equation image itself is assumed to be stationary in time. The sensitivity of the equation image estimate to different sample time sequences is discussed below.

For the SCT, an average equation image, the same value for all observations, is used on the right-hand side of Eq. (22). Even with this assumption, as is discussed at the end of Section 2.4, the SCT treats each observation differently depending on the local data density.

By accounting for these assumptions in Eq. (27), the innovation covariance matrix becomes

equation image(29)

The resulting pdf depends on the parameter equation image. By maximizing equation image with respect to equation image, the maximum-likelihood estimate for equation image can be obtained. In order to do that, it is convenient to write the OI analysis residual as

equation image(30)

By substituting Eqs (29) and (30) in Eq. (26), the pdf becomes

equation image(31)

Finally, the condition equation image leads to the maximum-likelihood estimator of equation image, constrained by the assumptions made in the OI scheme on error-covariance matrices:

equation image(32)

Eq. (32) (strictly valid in the absence of GEs) was also proposed by Desroziers et al. (2005) as a hypothesis test for variational data assimilation, after deriving it by remarking that the expected value of the matrix equation image should coincide with equation image if the hypothesis on covariances are correct. Li et al. (2009) proposed using a similar expression to adaptively adjust the observation-error variance in sequential assimilation schemes.

Several different processes equation image are extracted from the overall statistical set, as described in Section 3.1. By applying Eq. (32) to each process, it is possible to obtain a distribution of equation image estimates.

When the described procedure is applied to the raw dataset, i.e. without applying the SCT, the median of the empirical equation image distribution is equation image and its interquartile range is equation image. After making use of the value equation image to estimate T2 as described in Section 3.3, the SCT is applied to the whole dataset. Observations failing the SCT are excluded when the equation image estimation is repeated: the median of the empirical equation image distribution is then equation image and its interquartile range is equation image.

In both cases the variability of the estimate is quite small and the equation image value is stable with respect to the choice of a particular sequence. A successive application of the procedure of Section 3.3 does not change the value of T2.

3.3. Estimation of the test tolerance

When the value of equation image is known, Eq. (22) states that the rejection threshold for the SCT is determined by the tolerance T2, the value of which must also be estimated.

The estimation is realized by imposing consistency between the empirical rejection frequency equation image and the analytically estimated a priori probability that an observation fails the SCT, equation image (Eq. (6)).

The starting point for the estimation of T2 is its relation to the prior probability of GEs in the observational network, equation image (Eq. (5)). This value is unknown, however, and must itself be estimated. The uncertainty in its estimate should also be evaluated.

3.3.1. Estimation of T2

For a given equation image, T2 depends for each station on local data density through equation image:

equation image(33)

where equation image is the value of T2 for a totally redundant station, with equation image. The maximum range of variability is given by the difference between the T2 value for totally redundant and completely isolated stations and it is about 1.1. The interquartile range of T2 among the network stations is evaluated from the distribution of equation image (Section 2.4) as about 0.19.

To estimate the a posteriori empirical rejection frequency equation image, our operational choice is to neglect the dependence of T2 on station density and to let T2 assume a sequence of integer values. For each of them, we run the SCT over a three-year period and determine the rejection frequency (of the whole network) as the ratio between the number of rejections and the total number of observations. By extracting several different processes for each value of the rejection tolerance T2 it is also possible to quantify the uncertainty: for each value of T2 a sampling distribution of equation image is obtained.

Figure 2 shows the sampling distributions of equation image and the curves of equation image as a function of T2 (making use of Eqs (5), (6), (7) and (10)) when equation image is evaluated for completely isolated stations (left curve), totally redundant stations (right curve) and using the median (middle curve) of its distribution.

Figure 2.

Grey boxes: quartiles of rejection frequency distributions, evaluated for different values of the SCT tolerance T2 by using uncorrelated samples extracted from three years of hourly temperature observations. Whiskers indicate extreme points. Lines show the analytical a priori probability that an observation fails the SCT, equation image, for different values of equation image, as follows: left, equation image, completely isolated stations; middle, equation image estimated by the median of its distribution in the station network; right, equation image, totally redundant stations. Correspondence is obtained at T2 ≃ 17. This figure is available in colour online at www.interscience.wiley.com/journal/qj

Correspondence is obtained for T2 = 17 (corresponding to equation image), which is the estimated tolerance for the SCT. The T2 estimate does not change after iterating the estimation of equation image (Section 3.2). Figure 2 was obtained by using the final value equation image.

Choosing T2 < 17 implies estimating equation image much larger than the actual rejection frequency, underestimating the network reliability. Choosing T2 > 17 implies equation image: the SCT rejects more observations than expected, thus the reliability of the network would in this case be overestimated. Based on the error model of Section 2.1, the best estimate is then T2 = 17. While there are no reasons to accept a value T2 < 17 as plausible, we do know that the error model of Section 2.1 does not account for representativity errors (frequent in mesoscale networks). As a consequence, large representativity errors will be flagged as GEs by the SCT, increasing the actual rejection frequency over that expected a priori. To take this knowledge into account, the QA system (or the data user) should consider the T2 = 17 value as a minimum tolerance for identifying suspect observations (see sections 3.4 and 4).

3.3.2. Estimation of P(G)

The empirical rejection frequency, equation image has median 0.0038 and interquartile range 0.00041 for T2 = 17. For this tolerance value, equation image. So, by solving Eq. (6) for P(G) and substituting for equation image, equation image and equation image, we can obtain an estimate for P(G).

For fixed T2 and equation image, equation image depends on station density through equation image. For a climatological interval of equation image, equation image varies over the network from 0.125 (totally redundant stations) to 0.216 (completely isolated stations), with median 0.143 and interquartile range 0.013.

The probability for a ‘good’ observation to fail the SCT is a function of T2 only: for T2 = 17, equation image.

The estimate of equation image is obtained by neglecting equation image in Eq. (6):

equation image(34)

Although equation image does not depend on data density, the uncertainty in equation image originates both from the uncertainty in equation image (estimated using several samples) and the dispersion of equation image among the stations. Inserting the equation image values sampled for T2 = 17 and the station values of equation image in Eq. (34), an empirical distribution is obtained for equation image, with median 0.0044 and interquartile range 0.00052.

The network management activity can benefit from information on equation image. For instance, in the case of ARPA Lombardia, the meteorological network has been obtained by merging observations from several subnetworks, originally intended (and still used) for different purposes: an urban environmental (air-quality) network, an agrometeorological network, a hydro-geological monitoring network (itself composed of various local subsubnetworks), etc. The design of the network's routine maintenance could in principle be improved by producing, and comparing, disaggregated estimates of equation image for each subnetwork.

3.3.3. Distribution of rejection residuals

The grey histogram in Figure 3 shows the distribution of τσo for rejected observations. This quantity in °C is the geometric mean of the CV residual and the analysis residual, and it does not depend on local data density (τ2 has been defined in Eq. (9)). The thick solid line (red in the online article) shows, for comparison, the histogram profile obtained when rejected ‘observations’ are synthetically generated, in the same number, from the uniform distribution assumed for GEs (Eq. (1)). If the error model of Section 2.1 were correct, the distribution of CV residuals for rejected observation would be fitted by this thick solid line (with the exception of a small Gaussian tail corresponding to good observations failing the test, as estimated in Section 2.2.2). The difference between the two histograms is mainly due to the occurrence of large-amplitude representativity errors, which cannot be distinguished from proper GEs by the SCT alone, and which determine the high frequency of small residuals (below 10 °C) above the test threshold.

Figure 3.

Histogram of the ‘geometric mean’ residual τσo for observations that fail the SCT with T2 = 17 (grey bins). The thick solid line shows the histogram profile obtained by generating synthetic observations from the uniform distribution assumed for GEs. The effect of including a station affected by a systematic error for the whole period is shown by white bins. This figure is available in colour online at www.interscience.wiley.com/journal/qj

A further remark concerns the peak shown by white bins in Figure 3. This histogram has been obtained as the grey one, but before removing systematic errors from the dataset. It is shown here in order to stress that the method used to estimate T2 and presented in Figure 2 is sensitive to large systematic errors in the dataset. Though this must be a cause for caution, the construction of a histogram such as that of Figure 3 enables us to detect such biases and may reveal itself to be a very useful screening tool, not only for online applications such as the present one but also for historical data to be used in climate studies.

3.4. Operational configuration

The error model of Section 2.1 does not account for the occurrence of representativity errors. As a consequence, the SCT alone cannot discriminate between true GEs and representativity errors of large amplitude.

Large representativity errors are not GEs, and should not be flagged as such in the final decision by the whole QA system. The information returned to the external QA system is composed of the pass/fail SCT flag and the ‘geometric mean’ residual τσo. Referring to Figure 3, an observation failing the SCT with a large τσo is, most probably, affected by a proper GE. An observation failing the SCT with a small τσo is not likely affected by a proper GE: it is however non-representative of the analysis scale in that particular meteorological situation.

The pair formed by the SCT flag and τσo summarizes all results of the SCT for each observation. The decisional algorithm included in the external QA system will make use of this information, by cross-comparing it with the results of all other tests, to discriminate between large representativity errors and true GEs and to assign a final ‘quality’ flag to each observed value.

Observations failing the SCT are directly eliminated from the OI procedure. The effect of including or rejecting observations affected by a large representativity error in the spatial interpolation is discussed in Section 4 with reference to a critical case study.

4. The SCT in the north foehn case of 12 March 2006

The quality of data in weather situations characterized by strong gradients and intense phenomena is particularly important for forecasters and for the general public, and must be described as accurately as possible. In particular in such situations it is important to detect GEs and to avoid rejecting observations that contain valuable information. The main difficulty is that all weather situations characterized by strong gradients put the assumptions at the base of the OI scheme and, consequently, of the SCT close to their limits. A typical example for Lombardia is that of a sudden increase in temperature due to the adiabatic compression associated with a north foehn event, which, especially at its beginning, may be detected by just a few observations located in mountain valleys. The case of 12 March 2006, already discussed in some detail by ULS08, helps to demonstrate the behaviour of the SCT in the presence of large representativity errors. The adiabatic warming starts by affecting the alpine valleys in the northwestern part of the region during the night, at about 0200 UTC + 1, and progressively extends to the northwestern Po Plain in the following morning. Every hour the SCT is applied sequentially to temperature observations, so that all CV analyses are recomputed after each rejection. At 0400 UTC + 1, five observations fail the SCT: a summary of rejected observations is presented in Table I.

Table I. Values of τ2 (larger than T2 = 17) and residuals of observations failing the SCT in the test case of 12 March 2006, 0400 UTC + 1. The local data density determines the difference between the analysis residual equation image and the CV residual equation image; τσo is their geometric mean; equation image.
 Station nameτ2equation image( °C)yayo( °C) τσo( °C)
1Brescia–Broletto54.4− 13.0− 5.48.4
2Cantù–Asnago30.6+ 6.7+ 5.96.3
3Lavena–Ponte Tresa28.1+ 7.1+ 5.16.0
4Osnago22.4+ 5.9+ 4.95.4
5Cortenova21.0+ 6.3+ 4.35.2

The plot in Figure 4 is a graphical representation of the SCT, showing the CV residuals versus the analysis residuals for this case, before any rejection. In Figure 4, the hyperbole represents the SCT threshold. Totally redundant observations lie on the diagonal and completely isolated observations lie on the solid line with slope 1 + 1/ε2. This representation clearly shows how the SCT is more restrictive for stations located in densely observed areas (near the diagonal) and less restrictive for isolated stations.

Figure 4.

12 March 2006, 0400 UTC + 1. Temperature CV analysis residual equation image versus the corresponding analysis residual equation image. The short-dashed hyperbole represent the SCT threshold; totally redundant observations lie on the dashed diagonal; completely isolated stations lie on the solid line, with slope 1 + 1/ε2. The test is more permissive for isolated stations and more restrictive for stations located in a densely observed area (near the diagonal). Residuals shown here are calculated before the first rejection, which concerns the observation marked at the extreme bottom and left of the graph (observation 1 in Table I). Point positions in the graph may change after each rejection, because each observation affects both the analysis and the CV analysis at stations nearby. This figure is available in colour online at www.interscience.wiley.com/journal/qj

The ‘puddleplot’ in Figure 5 shows in grey shades the integral data influence (IDI) field (ULS08), which is a measure of data coverage based on the three-dimensional correlation functions used in the OI algorithm. Triangles mark the station locations, with observations failing the SCT shown as solid triangles. The radius of the ‘raindrop’ circle is proportional to the normalized square residual τ2 (defined in Section 2.2.2), the quantity compared with the tolerance T2 in the SCT. Such graphical representation has proved to be a useful tool for the subjective component of the operational QA system.

Figure 5.

12 March 2006, 0400 UTC+1, temperature observations at the first SCT. The integral data influence (IDI) field (please see explanation in the text) is shown by grey shades. Triangles mark the station locations, with observations failing the SCT marked by solid triangles. Each circle radius is proportional to τ2, the normalized square residual that is compared with tolerance T2 = 17 in the SCT. Numbers 1–5 indicate the rejection order of observations failing the test, as listed in Table I. Circle radii may change after each rejection, because each observation affects both the analysis and the CV analysis at stations nearby.

A close inspection of the discarded observations and of their corresponding time sequences (not shown) reveals that in fact only the first rejection, Brescia–Broletto, corresponds to a true GE. All the other four stations experience a delay in the arrival of the foehn warming due to a slower erosion of the pre-existing cold pool when compared with other observations in the same areas (station locations are shown in Figure 5 and toponyms cited in the following are indicated either in Figure 1 or in Figure 8 later). In all four cases the delay is due to topographic details in the neighbourhood of the station locations (shallow depressions). The last (5th) rejected observation, for example, is from Cortenova, located at 456 m a.s.l. in Valsassina, a lateral alpine valley east of Lake Como, the altitude of which is about 200 m a.s.l: the foehn warming is recorded here two hours later.

The large deviations of the observed values from the CV analysis for rejections 2–5 are due to large representativity errors. Their effect can be seen by comparing the analysis maps shown in Figures 6 and 7. Figure 6 shows the analysis map obtained without the SCT, including all available observations. Figure 7 shows the analysis map obtained after the SCT has rejected the five observations listed in Table I and marked in Figure 5. Figure 8 shows the difference between the analysis field before (Figure 6) and after (Figure 7) the SCT.

Figure 6.

12 March 2006, 0400 UTC + 1. Temperature (equation image) analysis before the SCT. Locations of rejected observations are marked in Figure 5.

Figure 7.

12 March 2006, 0400 UTC + 1. Temperature (equation image) analysis after the SCT. Locations of rejected observations are marked in Figure 5.

Figure 8.

12 March 2006, 0400 UTC + 1. Difference (equation image) between the temperature analysis fields of Figure 6 (before the SCT) and Figure 7 (after the SCT). Toponyms cited in the text are also indicated. Locations of rejected observations are marked in Figure 5.

The mesoscale feature that can be correctly described in the analysis maps is the adiabatic warming associated with the foehn event, progressively extending from the major alpine valleys in the northwest (Valchiavenna, Lake Como) to the northwestern part of the Po Plain. The application of the SCT does not erase this effect. Instead, discarding observations 2–5 allows a better coherence of the analysis field. Including station 5, Cortenova, disrupts the continuity of the foehn warming over Lake Como (the major valley), while discarding it introduces an error on a smaller scale, regarding just the bottom of the lateral valley (Valsassina). Both analysis errors appear small in amplitude (in spite of the large CV residual) because of the filtering ability of the OI scheme, but the first one has a larger scale.

In conclusion, such large representativity errors should not enter the OI procedure. They should not be flagged as GEs, however, by the external QA system, which takes a final decision after comparing the SCT results with the results of all other checks.

5. Conclusions

Local mesoscale networks can provide useful information at dynamic scales of interest for meteorological analysis and forecast, operational forecast verification and assimilation in non-hydrostatic numerical weather predictions. Quality-control procedures are necessary for any observational network, and automatic quality-control tests are critically important for automatic, high-resolution networks operationally providing large amounts of data in real time.

The availability of a reliable objective analysis scheme, such as that described in ULS08, enables us to set up an automatic procedure capable of testing the spatial consistency between each observation and its estimate based on all other observations (the CV analysis). The statistical hypothesis at the basis of OI and SCT schemes allows the objective estimation of all parameters and thresholds, avoiding subjective ad hoc tunings. Moreover, estimating the prior probability of a GE, equation image, provides a measure of the network's reliability, a useful indicator for monitoring the network performance. Probabilities of misjudgments by the SCT have been estimated, both for a good observation failing the test and for a GE passing the test. Because these probabilities are functions of the test tolerance, their sensitivity to its variations can also be evaluated.

The test is based on a comparison of each observation with the corresponding CV analysis, hence the local data density is automatically taken into account: the test is comparatively more permissive for isolated stations providing precious information and, conversely, more restrictive for stations located in a densely observed area.

The SCT has been tested with three years of hourly temperature data from Lombardia's network, proving to be robust and reliable. The close inspection of rejections corresponding to particularly intense weather phenomena, such as the north foehn case presented here, showed that, besides proper GEs, some of the rejections concern large-amplitude representativity errors. Discarding such observations from the objective analysis procedure improves the analysis fields. However, the purpose of the SCT in the context of a comprehensive data QA system is to flag data for spatial consistency only. No spatial consistency test can discriminate between large representativity errors and GEs. This task must be performed by other components of the overall system. Cross-comparison among flags from many independent tests (characterization of individual time-series, for example, in this case) always remains a necessary requirement for a QA system intended to assign a final single quality flag to each observation.

Because of the link with the objective analysis scheme, any future improvements in this would automatically be transferred to the SCT. In particular, the inclusion in the covariance model of local effects depending on local topographic features is expected to yield a reduction in the corresponding representativity errors. This has already been the case for the urban heat-island effect.

If a new observing station is inserted in the network, the SCT would immediately work for it without the need for a ‘training’ period. Of course, in the event of a major reorganization of the observational network the parameter estimation should in principle be repeated: equation image is expected to change in that case, and representativity errors would also change.

In principle, the extension of the SCT to other variables that are observed by the network and that can be analyzed by the OI scheme should be straightforward. In practice, spatial correlation scales typical of each variable should be compared with the network resolution and its ability to describe them correctly. This limitation (of both the analysis scheme and the SCT) might be overcome, in some cases, by settling for analyzing larger scales: an increase of the representativity component of errors would have to be accepted in this case.

The implementation of the SCT presented is simple if the computation of the CV analysis is efficient (as with the algorithm presented in ULS08, for example). The graphical representation of residuals presented in Figures 4 and 5 proved to be useful in operational network-management practice. The objective tolerance estimation and the construction of the rejection histogram of Figure 3 are robust and easily reproducible techniques. These can also be applied to historical datasets (for climate studies) and to recently installed networks, even if obtained by merging pre-existing subnetworks.

The effort involved in setting up an efficient and reliable quality-control system, including an advanced spatial consistency component, is affordable, even by a small operational group, and it is a key element in maximizing the benefit of the information derived from data (WMO 2008), in consideration of the resources demanded by the operational acquisition of meteorological observations.

Appendix A. Relation between the CV residual and the analysis residual

Assume, without loss of generality, that the observation undergoing the SCT is the last component of the vector equation image. The matrix equation image can be written as a block matrix:

equation image(A1)

where c is the last diagonal element of equation image, equation image is the column vector obtained by excluding the last element (c) from the last column, and equation image is the row-vector obtained by transposing equation image. The matrix equation image is obtained from equation image by taking its last row and column away.

The inverse of equation image appearing in Eq. (13) can be consequently written (ULS08) as

$$\left({\bf S}+{\bf R}\right)^{-1}=\left[\matrix{{\bf X} & {\bf q}\cr {\bf q}^{{\rm T}} & z\cr}\right],$$ so that (A2)
equation image(A3)
equation image(A4)
equation image(A5)

Since these relations can be written for any vector component (not only for the last one) by choosing the corresponding row and column, the index m is attached hereafter to the covariance and inverse-covariance block components appearing in Eqs (A3)–(A5).

If the observation-error covariance is assumed diagonal as in Eq. (16), the mth CV analysis is obtained by using the appropriate covariance matrices as it follows. If equation image , equation image and equation image are the vectors obtained from equation image, equation image and equation image by excluding their mth components, equation image, equation image and equation image, then

equation image(A6)
equation image(A7)

These are the block components appearing in Eq. (A1), referred to the generic mth observation instead of the last one. The CV analysis is then obtained as

equation image(A8)

An expression for the CV residual equation image is obtained by subtracting the mth observation:

equation image(A9)

All components of the innovation vector, equation image, appear now in Eq. (A9). The whole innovation vector can be collected on the right by again making use of the ‘block matrix’ technique:

equation image(A10)

where a block row vector appears (and the mth component has again been placed at the last position). By making use of Eq. (A4),

equation image(A11)

where (compare with Eq. (A2)) the row vector equation image is the mth row of the matrix equation image. By defining the auxiliary vector

equation image(A12)

the CV residual is finally written as

equation image(A13)

The analysis residual is obtained by making use of equation image and of Eqs (12), (13) and (16):

equation image(A14)

By comparing Eqs (A13) and (A14), a relation is obtained between the analysis residual and the CV analysis residual:

equation image(A15)

Appendix B. Variance of the CV residual

The square CV residual, by Eq. (A11), can be written as

equation image(B1)

The variance of the CV residual, equation image, is obtained by applying the expectation operator equation image to Eq. (B1). The covariance components exit the expectation operator. Then, by assuming that observational and background error are independent and substituting Eqs (14), (15) and (A1) in Eq. (B1),

equation image(B2)

Then, by Eqs (A4) and (A5), and since observation errors and CV analysis errors are uncorrelated, Eq. (19) is obtained.


Maria Ranci, main developer of the automatic quality-control system at ARPA Lombardia, helped in balancing the role of the SCT with the other DQC checks and participated with constructive criticism in the discussion of our results.