SEARCH

SEARCH BY CITATION

Keywords:

  • Air quality indices;
  • Ambient exposure and risk;
  • Multivariate space–time models;
  • Unbalanced networks

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

Summary.  The paper is devoted to the development of a statistical framework for air quality assessment at the country level and for the evaluation of the ambient population exposure and risk with respect to airborne pollutants. The framework is based on a multivariate space–time model and on aggregated indices defined at different levels of aggregation in space and time. The indices are evaluated, uncertainty included, by considering both the model outputs and the information on the population spatial distribution. The framework is applied to the analysis of air quality data for Scotland for 2009 referring to European and Scottish air quality legislation.


1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

European legislation on air quality (Directive 2008/50/EC) identifies the needs for improved monitoring and assessment of air quality, including ‘to provide information to the public’.

The aim of this paper is to provide a model-based statistical framework for air quality assessment and the evaluation of population exposure and risk in a national context and applied to Scotland. The air quality strategy for Scotland is based on the European Commission directives, so a further aim of this paper is to develop and use a national Scottish air quality index as a scientific tool and as a source of public information.

The framework is applied to observed air quality data for Scotland for 2009 to provide a retrospective analysis of air quality and its expected effect on population at the country level. Indeed, the ultimate role of air quality assessment should be, on the one hand, to evaluate whether any actions that are undertaken to improve air quality have been successful or not (see Scott (2007)) and, on the other, to provide information about population risk and exposure. In particular, the main focus is on the definition of air quality, exposure and risk indices and on their evaluation at different levels of spatial and temporal aggregation. Each index represents a different aspect of pollution. The role of the air quality index is to provide concise information about the air quality level per se without any reference to the population exposure. However, exposure and risk indices consider also anthropological information and, hence, identify dangerous situations for the population's health.

The problem of assessing air quality over large regions and that of deriving indices is complex in several respects. The main complication arises from the way that airborne pollutants are measured in the field. The high economic costs of installation and maintenance of monitoring networks usually prevent pollutants from being measured with a spatial resolution that is adequate to assess exposure and risk with homogeneous accuracy all over the region.

The problem becomes more prominent when unbalanced monitoring networks are considered, i.e. when not all the pollutants are measured at each monitoring station. In such a case, it is not always clear how to define aggregated indices for the whole region and how to evaluate their uncertainty. Moreover, it is not straightforward to compare across years when, in each year, the structure of the monitoring network (sites included) and the quantity of missing data differ.

The above-mentioned problems are addressed by adopting, as the basis of the statistical framework, the dynamic coregionalization model (DCM) that has been introduced by Fassò and Finazzi (2011) and which can handle both unbalanced monitoring networks and missing data automatically. The DCM is used to evaluate the space–time correlation of the pollutants and their cross-correlation whereas the model output is used to define the indices and their uncertainty. In particular, the model output, in terms of the estimated pollutant concentrations, and the information about population distribution are combined to derive the exposure and risk indices.

In this work, the focus is on so-called ambient exposure rather than on personal exposure. A review of ambient exposure estimation methods can be found in Jerret et al. (2005), though limited to the intraurban case. A good example of personal exposure estimation, in contrast, can be found in Zidek et al. (2007), though it would be impractical to extend the personal exposure approach from city-size to country-size regions.

On the contrary, we aim to provide high resolution ambient exposure maps at the country level. Aware that this may introduce an ecological bias (according to Freedman (2001), ‘The ecological fallacy consists in thinking that relationships observed for groups necessarily hold for individuals’), we point out that our approach is an improvement with respect to the current air quality legislation which is based only on temporal averages of the measured pollutant concentration at the monitoring stations.

The rest of the paper is organized as follows. Section 2 is dedicated to the DCM. Parameter estimation and space–time pollutant concentration mapping are discussed for multivariate data observed in a heterotopic configuration. In Section 3, the problem of defining aggregated air quality indices for state-size regions is introduced and a model derived from the DCM is considered. Exposure and risk assessment indices based on coupling population spatial distribution and model outputs are defined in Section 4. Air quality data for 2009 that were collected over Scotland are considered in Section 5 and analysed within the statistical framework that is developed in this work.

2. The dynamic coregionalization model

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

Hierarchical models represent a useful statistical approach for the analysis of environmental data and they have been applied profitably in both frequentist and Bayesian contexts. Examples can be found in Banerjee et al. (2004) and Cressie and Wikle (2011) for the univariate and multivariate case. A comparison of different space–time hierarchical models in terms of prediction error is provided by Cameletti et al. (2011) though the comparison is limited to a particular data set. Since no general optimality results are available, we opt for the DCM which is flexible and easy to interpret. Moreover, as detailed in Finazzi and Fassò (2012), a software implementation of the DCM is readily available to download from http://code.google.com/p/d-stem/.

The DCM is a hierarchical multivariate space–time model based on latent variables which can handle both heterotopic data (non-colocated data) and missing data in a natural way. As a consequence, model estimation is based on the original data set as it has been acquired in the field without the need for any preliminary interpolation or missing data imputation.

Let inline image be the q-dimensional data response vector at the spatial location inline image and at time inline image. The general form of the model is

  • image(1)

where X(s,t) is a matrix of known covariates, inline image is a vector of global coefficients and ‘⊙’ is the Hadamard product. The p-dimensional latent temporal state inline image has the Markovian dynamics

  • image(2)

with G a stable transition matrix and Gaussian innovation ηN(0,Σ). The q×p matrix K is the loading matrix of known coefficients. The latent spatial component is modelled by both inline image and inline image which are independent and identically distributed over time. For each fixed t, ui(s,t), 1leqslant R: less-than-or-eq, slantileqslant R: less-than-or-eq, slantq, are independent latent zero-mean and unit variance Gaussian processes with spatial correlation function inline image, where ρi is a valid correlation function parameterized by θi and h=‖ss‖ is the Euclidean distance between s and s. In contrast, w(s,t) is described by a q-dimensional linear coregionalization model (LCM) of c components

  • image(3)

where wj(s,t), 1leqslant R: less-than-or-eq, slantjleqslant R: less-than-or-eq, slantc, are independent latent zero-mean and unit variance Gaussian processes with correlation matrix function inline image, 1leqslant R: less-than-or-eq, slanti,ileqslant R: less-than-or-eq, slantq, 1leqslant R: less-than-or-eq, slantjleqslant R: less-than-or-eq, slantc. Each Vj is a correlation matrix and each inline image is, again, a valid correlation function (see Wackernagel (2003) for an introduction to the LCM).

The vectors inline image and inline image contain the scale parameters. Finally, inline image is the measurement error which is assumed to be white noise in space and time. In particular, inline image, 1leqslant R: less-than-or-eq, slantileqslant R: less-than-or-eq, slantq. The parameter set to be estimated is

  • image(4)

where inline image, inline image, inline image and inline image.

The DCM defined in equation (1) is flexible in the sense that the covariates X(s,t) and the latent variables z(t), u(s,t) and w(s,t) describe different aspects of the phenomenon under study and they can be included and excluded from the model depending on what is the focus of the data analysis. The covariates (if available) and the latent spatial variables u(s,t) and w(s,t) are to be included when the model is used for mapping whereas the latent temporal variable z(t) should be included when the focus is on the temporal dynamics.

The main difference between u(s,t) and w(s,t) is that each ui(s,t), 1leqslant R: less-than-or-eq, slantileqslant R: less-than-or-eq, slantq, is characterized by its own correlation function inline image whereas the LCM imposes, for each of the c components, a unique correlation function across variables. It can be said that u(s,t) is the direct component whereas w(s,t) is the interaction component. Although the model can be estimated with both the components included, preliminary studies suggest that, when real data are considered, it is important to choose one component or the other depending on the spatial correlation structure of the data. Indeed, the LCM should be included solely when data are known to be spatially cross-correlated whereas the direct component should be considered solely when the correlation functions describing the q-variables are expected to be different. In this latter case, it is still worthwhile to consider the model in its multivariate form since the q-variables may be temporally cross-correlated. Finally, if the flexibility of model (1) must be increased, a version of the DCM with spatiotemporal varying coefficients can be considered as detailed in Finazzi and Fassò (2011).

The matrix inline image is the N×T matrix of all the observations collected at locations inline image and time inline image. Here, inline image is the collection of ni locations where the variable yi is observed and inline image. The ni are not constrained to be equal; nor are the inline image. Thus, the fully heterotopic case is admitted, i.e. the variables can be observed over disjoint sets inline image of spatial locations. This includes the above-mentioned case of unbalanced monitoring networks. Also note that the matrix Y may include missing data.

The maximum likelihood estimate of Ψ is obtained by the expectation–maximization (EM) algorithm as described in Fassò and Finazzi (2011). The whole estimation procedure has been proven to be stable even when large data sets or large parameter sets are considered. This is largely due to the quasi-closed form property of the EM steps for this model. Note that the standard deviations of the estimated model parameters are directly obtained from an approximated Fisher information matrix inline image computed as in Fassòet al. (2009).

Let inline image be the maximum likelihood estimate of Ψ; then the concentration of the ith pollutant at a new set of sites inline image and time inline image is evaluated by means of a plug-in approach as

  • image(5)

where inline image, inline image is the Kalman smoother output, inline image and inline image are the estimated latent spatial variables, inline image is the matrix of covariates and Ki is again the loading matrix. Note that the Kalman smoother provides a fast algorithm for the evaluation of inline image using the state space representation of model (1) (see for instance Shumway and Stoffer (2006)). In contrast, the conditional expectations of the latent spatial variables with respect to the observed data are evaluated through the usual formulae of the multivariate normal distribution adapted for the missing data case as detailed in Fassòet al. (2009).

The spatial prediction variance–covariance matrix of inline image is given by

  • image(6)

If the sites in inline image cover the whole region inline image as a fine regular grid, we call inline image a map and the ordered collection

  • image(7)

a dynamic map for the ith pollutant. If, instead of the set of sites inline image, a tessellation inline image of the region inline image is considered, the change-of-support problem (see Gotway and Young (2002)) must be addressed and

  • image(8)

must be evaluated for each block inline image. However, if the blocks in inline image are not too large with respect to the spatial variability scale of yi(s,t), then inline image can be replaced by inline image, with s* the centre of the pixel B. In what follows, the dependence on the estimated parameter set inline image will be dropped to simplify the notation.

It should be noted that the dynamic map carries all the information about the temporal and the spatial dynamics of the ground level pollutant concentration. However, the amount of information is huge and it is rarely useful to decision makers. The following sections describe how aggregate information (uncertainty included) can be derived by taking advantage of the flexibility of the DCM and by considering both the estimated model latent variables and the estimated pollutant concentrations.

3. Global air quality indices

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

When environmental space–time data are considered, aggregation is often useful over either space or time. Obtaining aggregated results over time is usually straightforward as the data are usually collected at regular time steps. Spatial aggregation is more complex as the spatial sampling locations are irregularly sparse over the region. For these reasons, the focus of this section is on aggregation over space; in particular, the problem of defining global air quality indices with measures of uncertainty is addressed.

With a global air quality index, we refer here to a single number that can describe, for a region inline image at time t, the air quality in terms of the monitored pollutants. Hence, a global air quality index represents, on the one hand, a simple and concise reporting measure for the public and, on the other, a measure to compare different temporal periods with respect to air quality easily.

From a statistical point of view, the global air quality index is considered here as a latent variable which manifests itself through the pollutants’ concentration measurements collected at the sampling sites. Although in a different context, the same idea has been developed by Chiu et al. (2011) in the definition of health factor indices.

To define the latent global air quality index, the following multivariate model is proposed:

  • image(9)

which is a reduced version of model (1). This kind of model has been used in various environmental applications. For example, for large data sets, using the so-called fixed rank smoothing approach of Cressie and Johannesson (2008), the matrix K(s) is defined by a set of fixed spatial basis functions. In ecological trend analysis, Zuur et al. (2007) used the so-called dynamic factor model to estimate the common trend of a non-large number of time series. To do this, the matrix K(s) is estimated by using suitable constraints.

In our case, the global air quality index should represent a common trend at the monitoring stations but the number of times series can be high and the matrix K(s) is poorly identifiable. Hence, we avoid identifiability problems by fixing K(s) to agree with the station averages. The dimensionality of z(t) is equal to 1 or to the total number of pollutants q. The first case may be considered when the pollutants are highly positively correlated. The z(t) is hence unidimensional and the global air quality index can be defined as

  • image(10)

where zT(t) is the estimated latent state output of the Kalman smoother. In the second case, when the pollutants are not positively correlated, it is better to rely on a q-variate z(t) and each pollutant retains its own temporal trend. Following the aggregation approaches of Bruno and Cocchi (2002) and Lee et al. (2011), two possible global air quality indices are

  • image(11)
  • image(12)

where inline image is the ith component of the estimated latent state zT(t) output of the Kalman smoother.

With regard to the uncertainty that is related to the above indices, this can be evaluated by considering the variance–covariance matrix PT(t) related to z(t). In particular, the variance of I2(t) can be evaluated as

  • image

where pij(t) is the (i,j)th element of the matrix PT(t). From inline image, a 95% confidence interval for I2(t) can be evaluated as inline image. Confidence intervals for I3(t) do not have, in general, a simple closed form but they can be easily evaluated by considering the quantiles of inline image.

When plotted against time, the indices I1, I2 and I3 provide an immediate view of the air quality trend over the region considered. This allows comparison across days and years and to test whether air quality is either improving or worsening over time. From an epidemiological point of view, however, this kind of information is not sufficiently rich to derive conclusions about the potential effect of pollution on population health, which is the object of the next section.

4. Population exposure and risk assessment

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

Mapping pollutant concentration over space and time is important to identify critical areas with respect to air quality. To evaluate the potential effect of airborne pollution on population health, however, the spatial distribution of the population must also be considered. Hence, population exposure and population risk are evaluated by analysing the interaction between the spatial distributions of the pollutants and population.

4.1. Exposure index

Exposure and risk are related concepts and the respective indices may carry similar information. However, in particular contexts, exposure and risk might differ substantially. In a way, the risk index should be related to the chance of extreme events whereas the exposure index should be related to the number of people who are exposed to a given pollution level.

The exposure index for the ith pollutant, the block inline image and the temporal frame inline image are defined here by

  • image(13)

where

  • image

is the estimated temporal average concentration of the ith pollutant and d(B) is the (time invariant) population count of block B. In this case, we prefer to evaluate a temporally averaged index since, as said before, the exposure index should reflect the long-term effect. For instance, the set inline image can represent a month, a whole season or a year. If needed, the exposure index inline image can be aggregated over space to define the following average exposure index for region inline image:

  • image(14)

where D is the total population count of region inline image.

To evaluate how the spatial distributions of population and pollutant concentration interact, an interesting picture is provided by the cumulative exposure distribution, which is given by

  • image(15)

4.2. Risk index

The risk index is defined here by considering a concentration threshold L for a pollutant above which the effect on human health is known to be significant. The threshold L can be, for example, the concentration level which causes respiratory hospital admissions to increase with respect to a baseline rate. Given this threshold, two central quantities are the probability that the pollutant concentration yi exceeds the L-threshold in block B and time t, namely

  • image(16)

and the probability that the number of days for which L is exceeded in block B exceeds M, i.e.

  • image(17)

with inline image the set of days for which the exceedance occurs.

The following risk indices are considered:

  • image(18)
  • image(19)

In particular, the risk index that is defined in equation (19) reflects the current air quality norm, which usually prescribes a maximum number of days that the pollutant concentration can exceed a threshold L. The risk index of equation (19) can be evaluated by noting that

  • image(20)

with Be(p) the Bernoulli random variable with parameter p. Since the sum of independent Bernoulli random variables with varying parameter p has no simple closed formula, expression (20) can be evaluated numerically. For instance, the distribution of the number of days inline image can be evaluated by Monte Carlo simulation. In practice, the probabilities in equations (18) and (19) are computed by using the estimated parameter set inline image.

As a final remark, it is worth noting that the exposure and risk indices that were defined above are conditioned on the observed covariates and the estimated latent variables u, w and z. In other words, both indices must be applied only in retrospective analysis and they cannot be considered as characteristics of a particular spatial site inline image independent of time.

4.3. Exceedance probability evaluation

A key aspect in assessing risk is the evaluation of the exceedance probability inline image. Although, for fixed inline image, inline image can be easily evaluated, it gives a conservative estimate of πi(B,t) as it does not take into account any model misspecification error and inline image is smoother than the real pollutant concentration yi(B,t). Moreover, we are interested in evaluating confidence intervals for πi(B,t), which are not provided by the dynamic kriging. For these reasons, the following procedure is considered.

  • (a)
     The leave one site out cross-validation technique is applied and the cross-validation residuals e(s,t), s ∈ Si, are considered. In particular, e(s,t) is computed by using data Ys which is the data matrix omitting all the data from site s.
  • (b)
     Residuals are Studentized with respect to the dynamic kriging variance inline image, namely
    • image(21)
  • (c)
     Considering all the Studentized residuals inline image, their cumulative distribution function inline image is obtained by kernel smoothing.
  • (d)
     For each block B and time t, the exceedance probability is evaluated as
    • image(22)
    with inline image the kriged pollutant concentration under the estimated model with parameter set inline image.

The cross-validation residuals are presumed to take into account the model misspecification error and they are characterized by a higher variance with respect to classical residuals. The transformation at step (b) of this procedure is not a real Studentization (Cook, 1982) since σ2(s,t) is not an estimate of the residual standard deviation inline image. However, inline image and the Studentization procedure is applied here to homogenize the model residuals which, on their own, are not homoscedastic with respect to space. Indeed, the cumulative distribution function inline image can be evaluated by considering all the Studentized residuals provided that they are white noise in both space and time. Finally, in step (d) the approximations inline image andinline imageinline image are considered negligible since it is assumed that inline image is a fine tessellation of the region inline image.

To evaluate whether the probabilities given by expression (22) are reliable, we provide confidence intervals as follows. We sample from the asymptotic distribution of the estimated parameter set, inline image, where inline image is the approximated Fisher information matrix of Section 2. As a consequence, we obtain a collection of R parameter sets inline image. For each Ψ(j) we evaluate bootstrap replications of inline image and inline image and we compute π(j)(B,t) by using equation (22). The confidence intervals are based on the sample quantiles of inline image.

5. Analysis of the Scottish air quality data for year 2009

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

The methodology that was discussed in the previous sections is applied here to Scottish air quality data for the year 2009. The aims are to obtain better insight into the spatiotemporal dynamics of the principal airborne pollutants, to evaluate population exposure and risk and to understand whether the air quality monitoring network is appropriate to answer the above questions or whether it should be strengthened in terms of number of monitoring stations and spatial distribution. The air pollution standards and the air quality objectives that are considered in the analysis are based on the ‘Air quality standards (Scotland) regulations 2007 for the purpose of local air quality management’. A summary of the current standards and objectives can be found in Department for Environment, Food and Rural Affairs (2009).

This section is organized as follows. Section 5.1 describes the data considered in terms of pollutants, population distribution and covariates. The global air quality index for Scotland is evaluated in Section 5.2 whereas population exposure and risk indices are developed in Section 5.3.

5.1. Description of the data

The sources of data that are considered in this work are essentially three: the ground level concentration of airborne pollutants, measured by the Scottish automatic urban network, the population spatial distribution downloaded from the Oak Ridge National Laboratory and the meteorological covariates downloaded from the Nasa Global Modeling and Assimilation Office. Each source of data is characterized by a different spatial and temporal resolution, as described hereafter.

5.1.1. Pollutant concentrations

The Scottish automatic urban network provides hourly mean data on six main airborne pollutants, namely nitrogen dioxide (NO2), ozone (O3), carbon monoxide (CO), sulphur dioxide (SO2) and particulate matters PM10 and PM2.5. In this work, only the NO2, O3 and PM10 concentration data are considered since they are measured at sufficient monitoring stations to justify a space–time analysis. Moreover, the hourly mean data are averaged to work with daily data.

For the year 2009, the number of monitoring stations is 81. Each station measures only a subset of the three pollutants considered and missing data are possible, due to temporary breakdowns of either the station or the single measuring instrument. Days with less than 75% hourly data (18 h) are considered as days with missing data. The exact number of monitoring stations for each pollutant is reported in Table 1, showing that the network is unbalanced in the sense of Bodnar et al. (2008). The respective spatial distributions are reported in Figs 1(a)–1(c), from which it is clear that the monitoring stations are not evenly distributed over Scotland as they are mainly in the most populated areas.

Table 1.   Summary statistics of the pollutant concentration data for 2009
Pollutant Number of stations Mean (μgm−3) Standard deviation (μgm−3) Missing (%)
NO26632.1923.3512.7
O31055.8018.9912.1
PM106016.608.5816.1
image

Figure 1.  Spatial distributions of the Scottish automatic urban network monitoring sites: (a) NO2 (66 sites); (b) O3 (10 sites); (c) PM10 (60 sites); (d) population spatial distribution

Download figure to PowerPoint

5.1.2. Population distribution

The population distribution has a twofold role here. It is considered as a time invariant covariate and it is used to evaluate the exposure and risk indices that were discussed in Section 4. The Oak Ridge National Laboratory manages the LandScanTM ambient population count database, currently updated to the year 2008 (see Bhaduri et al. (2007)). The database provides estimates of 24-h average population counts over the entire world with inline image resolution (approximately 1 km×1 km). The population spatial distribution for Scotland is reported in Fig. 1(d), from which it is clear that most of the population is in the central belt along the Glasgow–Edinburgh parallel.

5.1.3. Morphological and meteorological covariates

Pollutant concentrations are known to be related to some anthropological and meteorological covariates owing to the physical processes that drive the pollutant diffusion and advection. In this work, we consider population count pop, sea level pressure slp, temperature t, specific humidity sh, wind speed ws and boundary layer height blh. In particular, the boundary layer height is the height of the lowest part of the troposphere that is directly influenced by the ground and it determines the volume that is available for pollutants to disperse. Note that the population count is a good proxy of both the pollutant emissions and the site type (urban, suburban and rural). The meteorological covariates are downloaded from the Nasa Global Modeling and Assimilation Office. In particular, the ‘Modern era retrospective analysis for research and applications’ product (see Rienecker et al. (2011)) is considered, which is characterized by a temporal resolution of 1 h and a spatial resolution of inline image longitude by inline image latitude. Since the pollutant concentrations are daily averages, the meteorological covariates are also averaged over 24 h and they are interpolated at inline image resolution for mapping purposes.

As a final remark we point out that, though the concentration data might be preferentially sampled (see Diggle et al. (2010)), the problem is largely mitigated by the covariates and in particular by the population distribution which acts as a proxy for the pollutant emissions. Net of the covariates, the ‘residual’ data inline image can be assumed to be not preferentially sampled with respect to the residual random field inline image, ∀i,t. Moreover, since the monitoring stations are not very high in number, we believe that the population distribution, which is available at high spatial resolution, is more informative than the network itself on the preferential sampling.

5.2. Global air quality index estimation

The methodology that was discussed in Section 3 is here considered to evaluate a global air quality index for Scotland for 2009. Since NO2, O3 and PM10 are known to have different temporal dynamics over the year, the global air quality index (12) is considered. With regard to the estimation of the latent temporal state z(t), model (9) is considered with

  • image(23)

where

  • image

is the temporal average scaled pollutant concentration at the sampling sites inline image for the ith pollutant, inline image whereas inline image is the network average scaled pollutant concentration. The scaling factors Fi have a different role from that of the concentration thresh olds L. The latter are usually provided by the air quality legislation as thresholds that should not be exceeded whereas the former are used to homogenize the pollutants in the case that they are provided by using different statistics (average, running average, maximum etc.).

The UK air quality index and banding system (IBS) approved by the UK Committee on Medical Effects of Air Pollution Episodes is characterized by a 1–10-index divided into four bands, namely low, moderate, high and very high. Each index value corresponds to a range of concentration where the pollutant concentration can fall when measured over a period of time ΔT. The limits of each range depend on the particular pollutant as well as ΔT. For PM10, an index value equal to 10 corresponds to a running 24-h mean concentration inline image equal to or higher than 128 μg m−3. Although the data that are considered in this work are daily average concentrations rather than running 24-h means, it makes sense to consider inline image. The division by 10 is introduced to keep the index I3 comparable with respect to the index of the IBS. The scaling factors for NO2 and O3 are not immediately available since the IBS prescribes ranges for the running 8-h mean inline image for O3 and the hourly mean inline image for NO2. Preliminary analysis not reported here suggests that

  • image

and

  • image

The scaling factors are then chosen to be inline image and inline image, where 764μg m−3 and 360μg m−3 are the concentrations corresponding to an index value equal to 10 in the IBS for NO2 and O3 respectively.

Using the resulting loading matrix inline image, model (9) is estimated by the EM algorithm and the related temporal component zT(t) is depicted in Fig. 2, where error bounds are defined as inline image, with pi(t) the ith diagonal element of var{z(t)|Y}. Note that each pollutant is characterized by a different temporal dynamic as expected. O3 peaks in March or April whereas NO2 peaks in winter. The particulate matter PM10 does not show a clear trend and the peaks are a consequence of unfavourable meteorological conditions. Fig. 3 shows the evaluated air quality index I3(t) which is representative of Scotland as a whole. By analysing the temporal series of I3(t), it can be concluded that, during the year 2009, air pollution over Scotland remained low with the exception of three events spread over 7 days during March and April. All the events can be associated with moderate concentration levels of PM10 due to adverse meteorological conditions. Note, moreover, that the decisive pollutants are O3 and PM10 whereas NO2 is identified in Fig. 3 only three times.

image

Figure 2.  Kalman smoother output zT(t) of model (9) (inline image, error bounds inline image): (a) NO2 component; (b) O3 component; (c) PM10 component

Download figure to PowerPoint

image

Figure 3.  Scotland daily air quality assessment through the air quality index I3 for the year 2009 (the pollutant that gives rise to the maximum indicated by the marker): |, 95% confidence interval; x, NO2; O, O3; Δ, PM10

Download figure to PowerPoint

5.3. Population exposure and risk evaluation

As discussed in Section 4, the evaluation of population exposure and risk is based on coupling population spatial distribution and estimated pollutant concentrations obtained as output of the DCM. The DCM defined in equation (1) allows joint modelling of the space–time correlation of all the pollutants considered. However, to define the parametric structure of the multivariate model better, it is useful first to estimate one univariate model for each pollutant. In fact, the dimension p of the latent temporal state z(t), the inclusion of either u(s,t) or w(s,t) (or both) and the number c of coregionalization components must be decided before estimating the model. Although multivariate models can be compared by means of cross-validation techniques, the number of possible model parameterizations may be large and it is useful to consider the univariate models as a guide to define the parameterization of the multivariate model. As far as the spatial correlation structures is concerned, the exponential correlation function has been considered, namely inline image, inline image, inline image.

Table 2 reports the value of inline image computed by means of the EM algorithm for each of the univariate models. All the variables and the covariates have been log-transformed and standardized. Standardization helps numerical stability and allows a direct comparison of the parameter values across pollutants.

Table 2.   Estimated parameters for the univariate DCMs and respective cross validation mean-squared error cmse
Pollutant inline image inline image inline image inline image inline image inline image
NO20.464−0.0400.321−0.473−0.197−0.220
Standard deviation0.0050.0210.0650.0550.0150.017
O3−0.090−0.2290.392−0.2970.2160.166
Standard deviation0.0160.0260.0820.0690.0180.021
PM100.1210.2240.289−0.294−0.084−0.222
Standard deviation0.0060.0380.0780.0680.0200.021
  inline image inline image inline image inline image inline image cmse
NO20.3670.9010.0100.46362.300.483
Standard deviation0.0030.0380.0010.0104.28 
O30.2740.9510.0270.427136.470.561
Standard deviation0.0140.0190.0020.02021.96 
PM100.2860.6740.1370.51688.360.366
Standard deviation0.0020.0540.0190.0178.91 

By comparing the values of the estimated inline image-parameters with respect to their standard deviations, it can be seen that all the covariates are significant except inline image for NO2. The estimated parameters inline image are expressed in kilometres and they describe the strength of the spatial correl ation of the latent variables ui(s,t). In particular, in the case of the exponential correlation function, the spatial correlation between any two points in space is around 0.05 when their distance is inline image. Note that inline image has the highest value but also the highest standard deviation. This is because the O3 monitoring network is very sparse and cannot capture a possible high spatial frequency of inline image. This consideration suggests that an LCM may be appropriate for modelling a spatial latent component that is common to all the pollutants, even if there is not enough evidence to conclude that the inline image are equal. In contrast, the inline image-values are significantly different, suggesting that each pollutant should retain its own temporal dynamics; hence p=3 for the multivariate model. The optimal number c of coregionalization components has been assessed through cross-validation, suggesting c=1. The cross-validation mean-squared errors cmse that are obtained by applying the leave one site out technique are reported in Table 2.

Considering now the multivariate model (1), the estimated inline image, inline image and inline image are reported in Table 3 whereas the remaining parameters are

  • image(24)
  • image(25)
  • image(26)
  • image(27)

with the standard deviations in parentheses. The inline image-coefficients related to the covariates are all significantly different from 0 and their sign is in line with the physics of the pollution phenomenon. For instance, as expected, inline image is positive for NO2 and PM10 whereas it is negative for O3, implying, on average, a lower concentration of O3 in low populated areas. The analysis of the matrix inline image, and in particular the variances on its diagonal, suggests a low contribution of the latent temporal variable z(t) in explaining the NO2 and O3 variability whereas the contribution is more relevant for PM10. In contrast, the analysis of inline image reveals the slow temporal dynamics of NO2 and O3 (diagonal elements close to 1) which can be related to the temporal persistence of the pollutants not accounted for by the covariates. The parameter inline image of the spatial correlation function is common for all the pollutants considered and it is also expressed in kilometres. The matrix inline image shows that the component of w(s,t) that is related to O3 is negatively correlated with the components that are related to the other pollutants. Note that inline image for each pollutant, namely w(s,t) accounts for about half of the data variability. By comparing the cmse-values reported in Tables 2 and 3, the gain in terms of prediction capability for the multivariate model can be appreciated. The reduction in cmse is particularly evident for O3 which has the spar sest monitoring network and benefits more from the spatiotemporal correlation with the other pollutants. Note that this is a major benefit of using a multivariate model and in particular the DCM.

Table 3.   Subset of the estimated parameters for the multivariate DCM
Pollutant inline image inline image inline image inline image inline image inline image inline image inline image cmse
NO20.447 0.309−0.415−0.192−0.2110.3170.5670.439
Standard deviation0.005 0.0560.0150.0180.0170.0040.010 
O3−0.166−0.1960.368−0.2510.2080.1880.2430.4510.478
Standard deviation0.0160.0250.0790.0660.0170.0200.0140.019 
PM100.0890.2660.382−0.285−0.064−0.2270.2440.5710.339
Standard deviation0.0050.0340.0770.0680.0200.0220.0030.011 

The estimated multivariate model is then used to evaluate population exposure and risk with respect to each pollutant. In particular, the dynamic maps inline image, inline image and inline image are evaluated over the regular grid inline image with spatial resolution inline image within the Scottish boundaries.

As a first result, Fig. 4 shows the monthly and the yearly average population exposure evaluated by considering the exposure index (14), which can be related to an average Scottish person. Moreover, Fig. 5 displays, for each pollutant, the yearly average exposure distribution based on the cumulative distribution (15) and evaluated by considering a Gaussian kernel smoother with bandwidth 0.5 μg m−3. By looking at the results of Fig. 5, it can be noted that the exposure distributions differ greatly across pollutants. In particular, most of the Scottish people are exposed to the same yearly average PM10-concentration ±5 μg m−3, whereas this is not true for NO2 which is more spread out. Moreover, the exposure distribution of O3 is characterized by a prominent right tail representing people living in rural areas, where the concentration of O3 is higher. Although the graphs are reported on the same axis, they are not directly comparable in terms of health effects.

image

Figure 4.  Monthly and yearly (inline image) average population exposure: (a) NO2; (b) O3; (c) PM10

Download figure to PowerPoint

image

Figure 5.  Yearly average exposure distribution: inline image, NO2; inline image, O3; inline image, PM10

Download figure to PowerPoint

The daily exposure time series given by inline image are reported in Fig. 6, where D is the total population count and inline image is given by equation (15). The thresholds L are 105, 87 and 50 μg m−3 for NO2, O3 and PM10 respectively and they have been derived from the ‘Air quality standards (Scotland) regulations 2007’ following arguments similar to those of the previous paragraph. The time series disappear between day 190 and day 315 since the thresholds are never exceeded from July to October.

image

Figure 6.  Daily exposure time series (number of people exposed to a pollutant concentration exceeding the threshold): inline image, total population of Scotland; O, O3>87 μg m−3; Δ, PM10>50 μg m−3; inline image, NO2>105 μg m−3

Download figure to PowerPoint

To compute the risk indices (18) and (19), the exceedance probabilities and their respective confidence intervals have been evaluated for each day and each pollutant at inline image of resolution. In particular, the confidence intervals have been obtained by considering R=100 bootstrap replications as detailed in Section 4.3 and by analysing, for each map pixel and each day, the respective empirical cumulative distribution function.

The daily exceedance probabilities can be used for evaluating daily risk maps or aggregated risk indices. As an example, Fig. 7 shows the daily time series and the respective 95% confidence interval of the risk index for O3 and L=87 μg m−3 aggregated at the country level.

image

Figure 7.  Aggregated O3 risk index and 95% confidence intervals (|) (with respect to L=87 μg m−3)

Download figure to PowerPoint

With regard to the number of days of exceedance, Fig. 8(a) reports the map of the average number of days of exceedance for PM10 whereas Fig. 8(b) reports the probability map that the threshold L has been exceeded for more than 7 days, which is required to evaluate the risk indicator that is defined in equation (19). Note that the 7-days limit represents one of the objectives of the Scotland national air quality strategy to be achieved by December 31st, 2010. Both the average number of days of exceedance and the probability of exceedance of the 7-days limit have been evaluated by considering the daily exceedance probability maps and by simulating from the distribution that is defined in equation (20). In particular, the results that are reported in Fig. 8(a) are based on 500 Monte Carlo simulation runs (see Section 4.2). The measures of uncertainty that are reported in Figs 8(c) and 8(d) have been obtained by repeating the same simulation approach for the R=100 bootstrap replications of daily exceedance probability maps.

image

Figure 8.  (a) Map of the estimated average number of days of exceedances for PM10, (b) map of the probability of exceedance of the 7-days limit, (c) standard deviation of map (a) and (d) range of the 95% confidence interval on map (b)

Download figure to PowerPoint

The probability that the threshold of 50 μg m−3 has been exceeded for more than 7 days is higher in the Grampian region (north-east) and along the southern border of Scotland rather than in cities such as Glasgow or Edinburgh. However, this is a consequence of the fact that those regions are poorly covered by monitoring stations and the uncertainty in the estimated concentration of pollutant is high. The north-west regions are not covered as well but they are characterized by a very low PM10-concentration and the exceedance probability is not so high despite the uncertainty. These last considerations suggest that the Scottish air quality network should be improved by installing additional monitoring stations in those areas which are uncovered and are characterized by a high risk and/or a high uncertainty. This is particularly true for O3 that is currently monitored at only 10 locations.

6. Conclusions

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

Assessing air quality at the country level and evaluating the related population exposure are challenging tasks. In this paper, both problems have been addressed by developing a statistical framework based on a flexible multivariate space–time model, the DCM, and on a set of aggregated indices built by considering both the model output and the information of the population spatial distribution.

The DCM is sufficiently flexible to accommodate the data complexity related to ground level unbalanced networks and it naturally copes with the inevitable missing data problem. The indices are accompanied by measures of uncertainty and they can be provided at different levels of temporal and spatial aggregation to study different aspects of the pollution phenomenon. The global air quality indices allow us to compare different time periods with respect to the air quality at the country level easily and readily. In contrast, the exposure and risk indices provide an effective way to identify critical areas with respect to air quality and useful information to improve the ground level monitoring network.

As a whole, the statistical framework has been proven able to assimilate the current air quality legislation, both at the European and at the national levels, and to provide easily interpretable results for decision makers. The statistical framework has been successfully applied to the analysis of the Scottish air quality data for the year 2009, shedding light on aspects related to population exposure and risk that have never been investigated before.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References

The analysis was done utilizing the LandScan 2008TM high resolution global population data set copyrighted by UT-Battelle, LLC, operator of Oak Ridge National Laboratory under contract DE-AC05-00OR22725 with the US Department of Energy. The US Government has certain rights in this data set. Neither UT-Battelle, LLC, nor the US Department of Energy, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of the data set.

This research is part of project EN17 ‘Methods for the integration of different renewable energy sources and impact monitoring with satellite data’, funded by Lombardia Region.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. The dynamic coregionalization model
  5. 3. Global air quality indices
  6. 4. Population exposure and risk assessment
  7. 5. Analysis of the Scottish air quality data for year 2009
  8. 6. Conclusions
  9. Acknowledgements
  10. References
  • Banerjee, S., Carlin, B. and Gelfand, A. (2004) Hierarchical Modeling and Analysis for Spatial Data. Boca Raton: Chapman and Hall–CRC.
  • Bhaduri, B., Bright, E., Coleman, P. and Urban, M. (2007) LandScan usa: a high resolution geospatial and temporal modeling approach for population distribution and dynamics. GeoJournal, 69, 103117.
  • Bodnar, O., Cameletti, M., Fassò, A. and Schmid, W. (2008) Comparing air quality in Italy, Germany and Poland using bc indexes. Atmosph. Environ., 42, 84128421.
  • Bruno, F. and Cocchi, D. (2002) A unified strategy for building simple air quality indices. Environmetrics, 13, 243261.
  • Cameletti, M., Ignaccolo, R. and Bande, S. (2011) Comparing spatio-temporal models for particulate matter in Piemonte. Environmetrics, 22, 985996.
  • Chiu, G. S., Guttorp, P., Westveld, A. H., Khan, S. A. and Liang, J. (2011) Latent health factor index: a statistical modeling approach for ecological health assessment. Environmetrics, 22, 243255.
  • Cook, R. (1982) Residuals and Influence in Regression. New York: Chapman and Hall.
  • Cressie, N. and Johannesson, G. (2008) Fixed rank kriging for very large spatial data sets. J. R. Statist. Soc. B, 70, 209226.
  • Cressie, N. and Wikle, C. (2011) Statistics for Spatio-temporal Data. Hoboken: Wiley.
  • Department for the Environment, Food and Rural Affairs (2009) Scottish Government local air quality management policy guidance 2009. Department for the Environment, Food and Rural Affairs. (Available from http://www.scotland.gov.uk/topics/environment/waste-and-pollution/pollution-1 /16215/pg09.)
  • Diggle, P. J., Menezes, R. and Su, T. (2010) Geostatistical inference under preferential sampling (with discussion). Appl. Statist., 59, 191232.
  • Fassò, A. and Finazzi, F. (2011) Maximum likelihood estimation of the dynamic coregionalization model with heterotopic data. Environmetrics, 22, 735748.
  • Fassò, A., Finazzi, F. and D'Ariano, C. (2009) Integrating satellite and ground level data for air quality monitoring and dynamical mapping. Technical Report Graspa WP 34. Gruppo di Ricerca per le Applicazioni della Statistica ai Problemi Ambientali, Bergamo. (Available from www.graspa.org.)
  • Finazzi, F. and Fassò, A. (2011) EM estimation of the dynamic coregionalization model with varying coeffcients. In Proc. Spatial 2: Spatial Data Methods for Environmental and Ecological Processes, Foggia, Sept. 1st–2nd, 2nd edn (ed. B. Cafarelli). Bergamo: Gruppo di Ricerca per le Applicazioni della Statistica ai Problemi Ambientali. (Available from www.graspa.org.)
  • Finazzi, F. and Fassò, A. (2012) D-STEM—a statistical software for multivariate space-time environmental data modeling. In METMA 6 : Proc. Int. Wrkshp Spatio-temporal Modelling (eds A. M. Goncalves, I. Sousa, L. Machado, P. Pereira, R. Menezes and S. Faria).
  • Freedman, D. (2001) Ecological inference and the ecological fallacy. In International Encyclopedia of the Social and Behavioral Sciences, pp. 40274030. Amsterdam: Elsevier.
  • Gotway, C. and Young, L. (2002) Combining incompatible spatial data. J. Am. Statist. Ass., 97, 632648.
  • Jerret, M., Arain, A., Kanaroglou, P., Beckerman, B., Potoglou, D., Sahsuvaroglu, T., Morrison, J. and Giovis, C. (2005) A review and evaluation of intraurban air pollution exposure models. J. Expos. Anal. Environ. Epidem., 15, 185204.
  • Lee, D., Ferguson, C. and Scott, E. M. (2011) Constructing representative air quality indicators with measures of uncertainty. J. R. Statist. Soc. A, 174, 109126.
  • Rienecker, M. M., Suarez, M. J., Gelaro, R., Todling, R., Bacmeister, J., Liu, E., Bosilovich, M. G., Schubert, S. D., Takacs, L., Kim, G.-K., Bloom, S., Chen, J., Collins, D., Conaty, A., da Silva, A., Gu, W., Joiner, J., Koster, R. D., Lucchesi, R., Molod, A., Owens, T., Pawson, S., Pegion, P., Redder, C. R., Reichle, R., Robertson, F. R., Ruddick, A. G., Sienkiewicz, M. and Woollen, J. (2011) Merra—NASA's modern-era retrospective analysis for research and applications. J. Clim., 24, 36243648.
  • Scott, E. M. (2007) Setting and evaluating the effectiveness of environmental policy. Environmetrics, 18, 333343.
  • Shumway, R. and Stoffer, D. (2006) Time Series Analysis and Its Applications, with R Examples. New York: Springer.
  • Wackernagel, H. (2003) Multivariate Geostatistics: an Introduction with Applications, 2nd edn. New York: Springer.
  • Zidek, J., Shaddick, G., Meloche, J., Chatfield, C. and White, R. (2007) A framework for predicting personal exposures to environmental hazards. Environ. Ecol. Statist., 14, 411431.
  • Zuur, A., Ieno, E. N. and Smith, G. M. (2007) Analysing Ecological Data. New York: Springer.