A Bayesian coregionalization approach for multivariate pollutant data



[1] Spatial data collection increasingly turns to vector valued measurements at spatial locations. An example is the observation of pollutant measurements. Typically, several different pollutants are observed at the same sampled location, referred to as a monitoring station or gauged site. Usually, interest lies in the modeling of the joint process for the levels of the different pollutants and in the prediction of pollutant levels at ungauged sites. In this case, it is important to take into account not only the spatial correlation but also the correlation among the different variables at each gauged site. Since, conceptually, there is a potentially observable measurement vector at every location in the study region, a multivariate spatial process becomes a natural modeling choice. In using a Gaussian process, the main challenge is the specification of a valid and flexible cross-covariance function. This paper proposes a rich class of covariance functions developed through the so-called linear coregionalization model [see, e.g., Wackernagel, 1998] for multivariate spatial observations. Following the ideas in the work of, for example, Royle and Berliner [1999], we can reparameterize a multivariate spatial model using suitable univariate conditional spatial processes, facilitating the computation. We provide explicit details, including the computation of the range associated with the different component processes. As an example, we fit our model to a particular day average of CO, NO, and NO2 for a set of monitoring stations in California, USA.

1. Introduction

[2] Multivariate spatial problems are frequently encountered in environmental applications where we usually have a set of, say, n monitoring stations or gauged sites, and at each of those sites we have measurements of p pollutants. Therefore, for a given time period, concatenating the measurements into a single column vector, the observations comprise a np × 1 vector. (In fact, in practice, often not all pollutants are observed at all locations, creating misalignment or missing data issues.) Here we will concentrate on the spatial aspect of these observations. Usually, on the basis of the information we have from the gauged sites, the interest lies in predicting the p different processes at locations where they are not measured.

[3] Multivariate spatial modeling can be implemented in various ways. For instance, one could employ a Markov random field approach, such as applied by Mardia [1988] and, more recently, by Gelfand and Vounatsou [2003, and references therein]. This approach is most commonly used for areal data and for regular grid or lattice data. It results in a joint multivariate distribution for the data, which is typically assumed to be normal.

[4] A different formulation is to envision the data as arising from a multivariate spatial process. This approach presumes that a realization of the process consists of p dependent surfaces over the study region. We only observe the values of the surface at a set of n locations. Again, a multivariate distribution results for the observed data. However, specification of a valid multivariate process is more demanding in that it is determined through its associated finite dimensional distributions. Consistency of such determinations with regard to the entire uncountable dimensional joint distribution is a nontrivial matter. However, with Gaussian processes, consistency only requires the use of a valid cross-covariance function [see, e.g., Wackernagel, 1998, and references therein]. With interest in prediction at arbitrary new locations, multivariate processes seem to be the more attractive distributional framework.

[5] In what follows, we work with a multivariate Gaussian process employing a stationary cross-covariance specification. In a series of papers by N. Le, J. Zidek, and colleagues [see, e.g., Brown et al., 1994], this restriction is removed by viewing the joint covariance matrix for the data as varying through an inverse Wishart distribution centered at the covariance matrix which will arise using a stationary cross-covariance function. While the restriction is removed, so is the process notion; we merely achieve a multivariate distribution for the data. In fact, the randomly realized covariance matrix has no connection to a covariance function. Connection with spatial separation is sacrificed, e.g., we expect to see many pairs of entries where, say, dij, the distance between si and sj, is less than dij and cov[Y(si), Y(sj)] < cov[Y(si), Y(sj)]. Whether this consequence is desirable or not is likely application dependent. Alternative approaches to remove stationarity are mentioned by Gelfand et al. [2003, section 4]. Currently, approaches to nonstationarity through deformation [see Sampson and Guttorp, 1992; Schmidt and O'Hagan, 2003, and references therein] have only been implemented for univariate data with independent replications of the process.

[6] To formalize the definition of a valid cross-covariance function, we need to model not only the dependence of measurements across locations (the usual spatial setting) but also the dependence at each location. That is, if YT(s) = [Y1(s), Y2(s), ⋯, Yp(s)] and we assume stationarity, we need to specify a p × p matrix function C(ss′), where C(ss′)ll = cov[Yl(s), Yl(s′)] such that for any n and locations s1, s2, ⋯, sn, the resulting np × np covariance matrix for YT = [Y(s1), Y(s2), ⋯, Y(sn)] is positive definite. Note that C need not be symmetric. Moreover, the goal is to enable flexible and computationally tractable covariance structures. Since environmental data are rarely found to follow a Gaussian distribution, we assume that suitable transformation is carried out in order to achieve approximate normality. Our approach can be extended to accommodate heavier-tailed distributions through scale mixing of Gaussian processes.

[7] With regard to the problem of prediction using multivariate spatial processes, let {Y(s): sD} represent a multivariate spatial random field, where Y(s) ∈ ℜp and D ⊂ ℜd, with, usually, d = 1, 2 or 3. A well-known approach for prediction at an ungauged location is cokriging, where the predictor of the process at, say, s0 is described by a linear combination of all the available data values of all the p variables [Cressie, 1993]. In cokriging the prediction is based on a known cross-covariance matrix, which, in practice, is not realistic. Usually, the matrix is estimated from the data so that the prediction does not take into account any uncertainty associated with estimating the covariance structure. In this regard, Cressie [1993] discusses potential problems with cross variograms with respect to the scaling of the different variables in the vector Y(.).

[8] We adopt a Bayesian approach here in order to more accurately capture the uncertainty in our model specification. In particular, all unknowns in the model are assumed to arise at random from some (prior) distribution. Using Bayes's theorem, this uncertainty is propagated to (posterior) inference about process unknowns and predictions under the process. Wider interval estimates will usually result, but we would argue that they more realistically express the variability associated with the inference. We choose fairly vague specifications for the prior distributions of our unknowns. Thus our inference becomes fairly insensitive to these choices. Still, a thorough analysis of the data and associated prediction requires some examination of prior sensitivity.

[9] As a result, we are left with the selection of a specific multivariate Gaussian spatial process model, i.e., with the selection of a valid cross-covariance function. We adopt an approach based on the linear coregionalization model first introduced by Matheron [1982] with cross covariograms of the form Σm=1rTmgm(∥ss′∥), where the gms are known variograms, the Tms are unknown nonnegative matrices, and r(<p) gives the number of structures. Each matrix Tm may be thought of as a scalar product matrix between variables. Therefore covariation between variables is captured in a lower-dimensional space (r < p), defined by principal components analysis of Tm [Goulard and Voltz, 1992]. Goulard and Voltz [1992] describe a least squares technique to fit such models. Their aim is to estimate the unknown matrices Tm. Therefore they plug in empirical variogram estimates for the functions gm, ignoring the uncertainty in the estimation of these variogram functions. For more details on coregionalization models with associated classical estimation procedures, see Wackernagel [1998]. Our use for the linear coregionalization model is not dimension reduction. Rather, we set r = p to obtain a rich constructive class of valid cross-covariance functions, as we describe in section 3.1.

[10] Mardia and Goodall [1993] describe modeling of multivariate spatiotemporal data. They propose a separable covariance structure; in order words, the covariance structure of the multivariate process can be factorized as the product of the correlation across locations by the covariance among variables. That is, for the data vector Y, ΣY = RT, where Rij = ρ(sisj; ϕ), with ρ being a valid correlation function in two dimensions and where T is the covariance matrix for Y(s). The linear coregionalization model includes this form as the special case when all of the gm are identical. One implication of this separable form is that the cross covariances are symmetric, i.e., cov[Yk(si), Yl(sj)] = cov[Yl(si), Yk(sj)], which can be restrictive. Also, cov[Yl(s), Yl(s′)]/equation image does not depend on ss′, i.e., on the spatial scale. Another limitation is that the spatial range for each component of the process Y(s) is the same. That is, if ρ is isotropic, say, ρ(sisj) = ρ(dij; ϕ), where ϕ is a decay parameter (equivalently a range parameter), then cov[Yl(si), Yl(sj)] = Tllρ(dij, ϕ). As l varies, we have a different process variability but a common process range. Finally, Mardia and Goodall [1993] suggest inference based on the maximum likelihood estimator, which will be problematic if the likelihood is multimodal.

[11] This paper is organized as follows. Section 2 briefly describes the source for our data set. Then, in section 3 we present the Bayesian model based on the linear combination of independent univariate spatial processes, the above linear coregionalization model. The main advantage of this model is that each of the p different processes are allowed to have a different spatial range. We then discuss prior distributions which might be assigned to the parameters of this model. We can think of the multivariate distribution of our data as arising from a suitable sequence of conditioning, following Royle and Berliner [1999]. In fact, we identify univariate spatial conditional models to do this. Section 3.3 presents the reparameterization of our multivariate model in terms of univariate conditional processes and discusses its implementation in the latter form. In section 4 we apply the proposed model to a data set comprising day average measurements of CO, NO and NO2 from a set of gauged sites in California, USA. Finally, section 5 offers some concluding discussion.

2. Environmental Data

[12] Researchers involved in environmental problems usually face the problem of dealing with data sets of unsatisfying quality. It is quite common that not all of the monitoring networks have observations for the same period of time, or they do not measure the same pollutants and/or covariates which might help in explaining levels of pollutants of interest. In section 4 we will make use of data from the California Air Resources Board, which is a part of the California Environmental Protection Agency. Among the board's aims is the effective and efficient reduction of air pollutants in the state of California.

[13] The Air Resources Board has cosponsored several special field measurement studies for the purpose of collecting air quality, meteorological, and emission data for data analysis and modeling. Descriptions and data are available from the Central California Air Quality Studies.

[14] The board has available 21 years of air quality data. In particular, the data comprises 21 years of criteria pollutant air quality data (1980–2000), 11 years of toxics air quality data (1990–2000), 13 years of dichotomous sampler data (1988–2000), and 7 years of nonmethane organic compound data (1994–2000). (These data can be ordered from the board itself or downloaded directly from http://www.arb.ca.gov/aqd/aqdcd/aqdcddld.htm. On this web site, one can also find files which provide detailed information about the data.)

3. A Bayesian Linear Coregionalization Model

[15] In section 3.1 we propose a multivariate spatial model based on a linear combination of independent spatial processes. Section 3.2 discusses how to perform inference on this model under a Bayesian framework. Then, in section 3.3 we discuss the reparameterization of this model in terms of univariate conditional spatial models. It will be seen that inference based on the conditional univariate models is computationally easier to implement. Finally, in section 3.4, under the proposed model, we show how to obtain the posterior distribution of the spatial range for each component of Y(s).

3.1. Model

[16] We start by developing the separable (or intrinsic) correlation specification of Y(s) (from section 1) through a linear function of random processes. In particular, let wj(s), j = 1, ⋯, p be p-independent but identically distributed univariate spatial processes, each with unit variance and parametric correlation function ρ(ss′; ϕ). When ρ is isotropic and strictly decreasing for a given ϕ, the distance which makes ρ negligible, i.e., 0.05, is referred to as the range. Now, let Yl(s) = Σj=1paljwj(s). It is straightforward to see that C[Y(s), Y(s′)] = ρ(ss′; ϕ)T, where T = AAT, with A being p × p with elements Alj = alj. The matrix T represents the covariance matrix of the process at any location s. It is referred to as a coregionalization matrix, and the model for Y(s) is referred to as a linear coregionalization model to suggest that the components of Y(s) covary over the region. T is of full rank if and only if A is of full rank. In modeling Y(s) we are only interested in a fully p-dimensional process. We are not interested in the case j = 1, ⋯, k < p, i.e., in extracting a lower-dimensional representation for the p variables.

[17] Assuming a common parameter ϕ for each of the spatial processes wj(s) results in a common range for all the component processes in Y(s). A natural extension is to assume that the wj(s)s are still independent with unit variance but now with correlation function ρ(ss′; ϕj). Again, let Yl(s) = Σj=1paljwj(s). Now we have that cov[Yl(s), Yl(s′)] = cov[Σj=1paljwj(s), Σj=1paljwj(s′)] = Σj=1paljaljρ(ss′; ϕj). The latter equality can be written in matrix notation as ΣY(s),Y(s′) = C(ss′) = ADs,sAT = Σj=1pρ(ss′; ϕj) Tj with Tj = ajajT, where aj represents the jth column vector of A and where D is a p × p diagonal matrix with elements ρ(ss′; ϕj), j = 1, 2, ⋯, p. By the construction, C(ss′) is a valid cross-covariance function which is nonseparable for the data vector Y at n spatial locations. The covariance structure of the multivariate spatial process is given by

display math

where ⊗ denotes the Kronecker product and Rj) is n × n with [Rj)]ii = ρ(sisi; ϕj). Writing Y(s) = Aw(s), it is clear that the wl(s) are latent processes which generate the Y(s) given A. Notice that the model has parameters a1, a2, ⋯, ap, ϕ1, ϕ2, ⋯, ϕp. There are p × p parameters in A when only p(p + 1)/2 parameters are required since AAT = T. A convenient reduction is to make A lower triangular. Then, the number of parameters in the model reduces to p(p + 1)/2 + p. In this case the model is given by Yj(s) = Σj=1pajlωl(s), j = 1, ⋯, p:

display math

[18] In summary, the Y(s) process is still stationary, has a symmetric cross-covariance matrix, has a different variance for each component, and, when the correlation function ρ is isotropic and monotonic, has a different range for each component. Note that a measurement or nonspatial error term can be added to equation (2). We do not elaborate details here since for our example in section 4 we do not anticipate nugget effects. However, see Gelfand et al. [2003] in this regard.

3.2. Prior Specification and Posterior Distribution

[19] If, for the moment, we assume that Y(s) has mean 0 and let Θ = (A, Φ), where Φ = (ϕ1, ⋯, ϕp), then

display math

Under a Bayesian framework we need to assign prior distributions to all parameters in Θ, say p(Θ). Then the posterior distribution p(Θ|Y) ∝ p(Y|Θ)p(Θ).

[20] First, notice that there is a 1 to 1 transformation between the elements of the matrices T and A. For example, if p = 2, it is easy to show that

display math

The transformation becomes explicitly messy for larger p but can be developed recursively. For future reference, we record the relationship for p = 3: we add

display math


display math

Hence, rather than assign a prior to A, we can assign a prior to T. Since T is a covariance matrix, a natural choice is an inverse Wishart prior distribution for T [see Box and Tiao, 1992] with low precision and mean, say, equation image = diag(σ12, ⋯, σp2), where σl2 is a prior expected variance for Yl(s).

[21] As for the choice of correlation function ρ(.;ϕj), we illustrate with an exponential, exp(−ϕjss′∥). We assign an independent Gamma prior distribution to each ϕj, which has a large variance and mean based on a crude estimated range, 3/ϕj, taken to be half of the maximum interlocation distance. More informative priors for the ϕj could be employed if prior knowledge regarding the associated ranges was available. Also, more flexible correlation functions, e.g., the powered exponential or Matérn classes, are available.

[22] Then, the posterior distribution for Θ becomes

display math

Inference can be made using Markov Chain Monte Carlo methods (MCMC), and samples from the posterior distributions of the parameters (or any function of them) could be obtained. See Gelfand et al. [2003] for more details on implementing a MCMC algorithm for this model. Unlike previous approaches which use linear coregionalization models [e.g., Wackernagel, 1998] uncertainty regarding all the quantities in the model is taken into account, and inference is made simultaneously through the posterior. Prediction at an ungauged location snew is based upon the posterior predictive distribution p(Y(snew)|Y). This distribution is a posterior mixture of multivariate normals and is routinely estimated or sampled (see section 3.3 below).

3.3. Hierarchical Spatial Conditional Modeling

[23] From basic probability theory a joint probability density function can be written as a product of conditional densities. For example, if p = 3, the full joint distribution for Y(s), p[Y1(s), Y2(s), Y3(s)], can be written as, for example,

display math

The idea of writing a multivariate spatial process as the product of conditional distributions and the potential computational advantage is described, for example, in the work of Royle and Berliner [1999]. They call this approach hierarchical because of the parallel with hierarchical modeling in the Bayesian literature and to differentiate it from approaches which condition on auxiliary variables but do not model them spatially. Further, they suggest that in some cases the sequence of “conditioning” in the conditional model be made in a “cause/effect” fashion. They do note that the conditional parameterization is not restricted to this “cause/effect” setting.

[24] We now write the linear coregionalization model in terms of conditional models and show how inference can be simplified. For simplicity, we continue to assume that p = 3, though similar results can be obtained, after some algebra, for any value of p. Following equation (4), one can write the joint distribution of Y(s), for example, as

display math

where equation imagej(s) is a mean 0, unit variance Gaussian random field with correlation function ρ(ss′; ϕj) and where σj is a scale parameter. Here μj denotes the constant mean of the process Yj(.). More general mean structure μj(s), capturing, say, a trend surface, could be introduced. The connection between the conditional specification in equation (5) and the associated unconditional one is given by

display math

or, in matrix notation, Y(s) = μ + Aequation image(s), revealing the reparameterization from one to the other. This is clearly a version of the model of equation (2). In fact, a11 = σ1, a21 = ασ1, a22 = σ2, a31 = (γ + αβ)σ1, a32 = βσ2, and a33 = σ3. Since T = AAT, the coregionalization matrix can therefore be obtained in terms of α, β, γ, σ1, σ2, and σ3.

[25] Bayesian inference under equation (6) requires the likelihood, which will be written in product form using equations (4) and (5). Under the reparameterization a prior is required for the μs, for the ϕs, and for α, β, γ, σ12, σ22, and σ32. For the μs we used normal priors with a large variance which could be centered at 0 or the sample mean of the corresponding Ys with little sensitivity. α, γ, and β are assumed to be independent, again, normal, with mean 0 and a large variance. With an exponential correlation function a gamma prior distribution is assigned to each ϕj, and inverse gamma prior distributions are assigned to the scale parameters σj2. Notice that we can obtain the posterior distribution for each of these three models separately since the likelihood for Y factors and the prior distributions are assumed to be independent.

[26] A MCMC algorithm can be used in order to make inference regarding the posterior distribution. More specifically, Gibbs sampling [Gelfand and Smith, 1990] can be applied. It is straightforward to see that the full conditionals for α, β, and γ are normal and that the full conditionals for the σj2 are inverse gamma. The ϕjs have nonstandard, messy full conditional distribution since ϕj enters through the correlation matrix associated with the jth conditional model. A slice sampler [Neal, 2003] can be used to sample from such distributions. Computationally, the fitting of the three univariate conditional models is much simpler than the fitting of the multivariate one as we avoid, at each iteration of the Gibbs sampler, calculation of a likelihood involving a 3n × 3n covariance matrix. Instead, we have three likelihoods, each involving an n × n matrix.

[27] Notice that prediction of the processes at ungauged locations of interest follows results from the multivariate normal distribution [see, e.g., Mardia et al., 1979]. For instance, for a new location, snewp(Y(snew)|Y) = ∫p(Y(snew)|Y, Θ)p(Θ|Y). Each sample Θ* from p(Θ|Y) yields a realization from the predictive distribution by drawing from the multivariate normal distribution p(Y(snew)|Y, Θ*).

3.4. Ranges of Each of the p Spatial Processes

[28] For any isotropic, monotonic ρ, there is a range associated with ρ(.;ϕj) which is the range of the Yj(s) in the corresponding conditional model. More specifically, ϕ1 denotes the decay of the process Y1(s) unconditionally. However, for Y2(s), ϕ2 represents the decay of the correlation function of the conditional process Y2(s)|Y1(s) and not of Y2(s) marginally. The same idea applies to ϕ3 and Y3(s)|Y1(s), Y2(s). However, we can obtain the posterior distribution of the ranges of the spatial processes of the unconditional processes in equation (6). We know already that the range of Y1(s) is the range of equation image1(s). The range of Y2(s) solves 0.05 = corr[w2(s), w2(s′)], where w2(s) is the spatial process associated with the unconditional variable Y2(s), that is, from equation (6), w2(s) = αequation image1(s) + equation image2(s) and therefore

display math

The same idea is applied to obtain the range of Y3(s). In this case, the range solves 0.05 = corr[w3(s), w3(s′)], where w3(s) = αβw1(s) + βequation image2(s) + equation image3(s) and

display math

Notice that given α, β, γ, σ12, σ22, σ32, ϕ1, ϕ2, and ϕ3, equations (7) and (8) are monotonic functions, so one can use a standard root-finder algorithm to solve equations (7) and (8) for the range of each of the marginal processes. Hence each posterior realization of Θ produces a posterior realization for the range of Y2(s) and for the range of Y3(s).

4. Analysis of the California Data

[29] In this section we apply the model proposed in section 3 to a limited data set obtained for a collection of monitoring stations in California. The data were obtained from the California Air Resources Board. We chose to analyze the daily average of carbon monoxide (CO), nitrous oxide (NO), and nitrogen dioxide (NO2) based on hourly measurements on 16 July 1999. During this year, there were nearly 700 monitoring stations over California, but not all of them measured the above pollutants. After removing missing data and considering only the sites which have measurements for all three of these pollutants we wound up with 68 monitoring stations. Figure 1 shows the locations of these 68 monitoring sites on a degrees latitude by degrees longitude scale. Since, in the sampling area, 1° of latitude ≈65 km while 1° of longitude ≈110 km, we projected the locations using the Lambert 2 parallel projection method to produce accurate interlocation distances. Figure 1 also shows five numbered sites which were held out and used for prediction of NO2. The observed correlations between these pollutants were 0.46 (CO and NO), 0.56 (CO and NO2), and 0.77 (NO and NO2). In order to achieve approximate normality, we use the logarithm of the daily average of each of these variables.

Figure 1.

Locations of the 63 monitoring stations used to fit the model and of the 5 stations chosen for prediction of NO2 (labeled as points 1–5).

[30] There was no information on potential covariates, such as temperature or wind direction, at these gauged sites. Therefore we fit a model with a constant mean structure, as in equation (5). Owing to the observed correlations, we decided to order the condition in the following way: Y1(s) = CO(s), Y2(s) = NO(s), and Y3(s) = NO2(s); however, any other order could have been used leading to the same joint distribution. Following section 3.3, we assumed that α, β, γ, μ1, μ2, and μ3 are all independent and normally distributed a priori. They are centered at 0 with a variance 10 times the variance of the coefficient estimate obtained under an ordinary least squares (OLS) fit, i.e., under equation (5), with ρ(ss′; ϕj) = 0 if ss′. For each σj2 we use an inverse gamma having infinite variance, with a mean equal to the OLS estimate. For the ϕj's parameters we use gamma priors arising from a mean for the associated range of one half the maximum interlocation distance, with a large variance.

[31] We made use of the software GeoBugs available through Bugs [Spiegelhalter et al., 1996] in order to obtain samples from the posterior distributions of all the parameters involved in the model. The ranges of the processes of CO, NO, and NO2 were obtained following equations (7) and (8). These equations were rapidly solved using the “false position method” as described by Press et al. [1999]. Table 1 shows the posterior summaries for all parameters in the model and also for the ranges of the processes. The benefit of the added flexibility of the coregionalization model is clear, as is the Bayesian learning. Starting from a common prior, we observe that the posterior distribution of the ranges for the three pollutants are all different, with CO having the smallest range among the three. Figure 2 also shows the unconditional process means, i.e., EY1(s) = μCO, EY2(s) = μNO, and EY3(s) = image where the forms for these means are given in equation (6). Figure 3 presents a grayscale plot showing the mean-adjusted estimated spatial surface for each unconditional process. In particular, these are developed from the posterior means of the wj(s)s over a fine grid obtained as functions of the equation imagej(s)s, again using equation (6). Resulting from the use of log Yj(s) as the response, these spatial effects are generally between −1.5 and 1.0. The spatial patterns are different; CO tends to be elevated along the Pacific coast, while NO is more depressed there.

Figure 2.

Posterior median with the associated 95% credible interval (in parentheses) of the elements of the coregionalization matrix and the correlation matrix for each location s.

Figure 3.

Adjusted posterior mean surfaces for the unconditional spatial processes of (a) CO, (b) NO, and (c) NO2. See text for details.

Table 1. Posterior Summaries of All the Parameters in the Joint Model for CO (Y1), NO (Y2), and NO2 (Y3)
CO range, km43.3816.8941.7776.69
NO range, km126.8655.59118.09255.01
NO2 range, km113.0250.29102.46237.63

[32] Using the 1 to 1 transformation between the univariate conditional parameterization and the multivariate one (equation (6) shows the matrix A), we can obtain the posterior distribution of the elements of T, the within-location covariance matrix of the pollutants. The posterior mean entries and 95% credible interval are shown in Figure 2. Evidently, the NO measurements are more variable than those for CO and NO2. Also shown in Figure 2 are the entries in the posterior correlation matrix, R, based on T. Figure 2 confirms the information from the data that NO2 and NO tend to be more correlated, with the posterior median being 0.52. Additionally, a posteriori, CO is more correlated with NO2 than with NO, although the interval estimates overlap.

[33] To perform some validation of the model, we return to the five hold out sites (see Figure 1) and predict the levels of NO2 at these locations under three different models: (1) NO2(s) is modelled unconditionally as NO2(s) = μ1 + σ1w1(s); (2) NO2(s) is modelled conditionally given the levels of CO(s), i.e., NO2(s)|CO(s) = μ2 + αCO(s) + σ2w2(s); and (3) NO2(s) is modelled conditionally given the levels of both CO(s) and NO(s) as NO2(s) = μ3 + γCO(s) + βNO(s) + σ3w3(s). It is expected that models 2 and 3 will present successively better prediction than will model 1. We fit each of the models with the same priors as discussed in section 3.3, and then we perform the prediction for the five sites left out. Again, this was done using GeoBugs. The posterior summary of the predictions, using each of the models, is shown in Table 2. The models all fail to predict correctly at site 5. Site 5 may be anomalous or might suggest that the use of a stationary (in fact, isotropic) model for spatial association is inappropriate. However, the predictive intervals become increasingly tighter, as anticipated.

Table 2. Prediction of NO2 Based on the Three Different Models
(1) Unconditional Model for NO2
(2) Model for NO2Conditioned on CO
(3) Model for NO2Conditioned on CO and NO

5. Discussion and Future Work

[34] The foregoing methodology can still be used in the presence of missing components in the Y(s)s. The fully Bayesian approach would introduce these missing Yj(si)s as unknowns in the model. In the simulation-based model fitting they would be simulated along with all of the other model unknowns. This would result in posterior predictive distributions for the missing components. Viewed as a Gibbs sampler, updating all of the parameters given values of the missing Yj(si)s and the observed Yj(si)s is precisely the updating given a full data set that we have proposed in section 3.3. However, then, to update the missing Yj(si)s will be very awkward. Obtaining the full conditional distributions for these Ys will require a considerable additional amount of matrix computation. The simulation will run very slowly and will become ill behaved if too many data are missing. An inexpensive approximation is to employ multiple imputation [Schafer, 1997]. Such imputation can be implemented in various ways; we omit the details.

[35] Also of interest would be to build spatiotemporal versions of equations (5) and (6). There are many possibilities here. A simple and computationally convenient choice would be to use a cross-covariance function which is separable in space and time (see, e.g., Kyriakidis and Journel [1999] for a review in the univariate case). Again, we omit the details. Of course, if the measurements obtained over time are viewed as independent replications, then the likelihood under conditional or joint modeling just becomes a product over the replications. With an increasing number of components, locations, and time points, the computational demand to fit such models will exceed current capabilities. Suitable approximations to handle such high-dimensional situations are being explored.

[36] The modeling in equations (2) and (5) yields a stationary multivariate process. In few practical situations will stationarity be a reasonable assumption; nonstationary specifications would be sought. Two straightforward approaches are possible here. First, the wj(s)s could be taken to be convenient univariate nonstationary processes. Alternatively, we could allow A to vary spatially, i.e., define Y(s) = A(s) w(s), where the specification of A(s) can be made in various ways. We omit the details here.

[37] Finally, in some settings, there might be a need to introduce a microscale or nugget effect into equations (2) or (5). In this case, there is evidently still an equivalence between the conditional and the unconditional specifications. However, given a residual in the joint model which consists of a spatial component plus a white noise component, the conditional version no longer provides a residual which is composed of a spatial term plus a white noise term. Moreover, the nonspatial components of these residuals are no longer independent across the conditional pieces (see Gelfand et al. [2003] for details).


[38] This research was conducted while the first author was a Postdoctoral Researcher at the Department of Statistics of the University of Connecticut. Both authors were supported in part by NIH grant R01 ES07750-06. The authors thank Bob Weller from the California Air Resource Board for making the data set available. The authors also thank three reviewers, whose extensive comments have led to a greatly improved exposition.