Controlling for unmeasured confounding and spatial misalignment in long-term air pollution and health studies

The health impact of long-term exposure to air pollution is now routinely estimated using spatial ecological studies, due to the recent widespread availability of spatial referenced pollution and disease data. However, this areal unit study design presents a number of statistical challenges, which if ignored have the potential to bias the estimated pollution-health relationship. One such challenge is how to control for the spatial autocorrelation present in the data after accounting for the known covariates, which is caused by unmeasured confounding. A second challenge is how to adjust the functional form of the model to account for the spatial misalignment between the pollution and disease data, which causes within-area variation in the pollution data. These challenges have largely been ignored in existing long-term spatial air pollution and health studies, so here we propose a novel Bayesian hierarchical model that addresses both challenges, and provide software to allow others to apply our model to their own data. The effectiveness of the proposed model is compared by simulation against a number of state of the art alternatives proposed in the literature, and is then used to estimate the impact of nitrogen dioxide and particulate matter concentrations on respiratory hospital admissions in a new epidemiological study in England in 2010 at the Local Authority level.


INTRODUCTION
The health effects of air pollution came to prominence in the mid-1900s, as a result of high-pollution episodes in the Meuse Valley in Belgium; Donora, Pennsylvania; and London, England. Air pollution concentrations are now greatly reduced in much of the world, but in April 2014, the World Health Organisation still estimated that outdoor air pollution was responsible for the deaths of 3.7 million people under the age of 60 in 2012. The impact of long-term exposure to air pollution is typically estimated using cohort studies (Cesaroni et al., 2014), but the follow-up period required for the cohort makes them time-consuming and expensive to implement. Therefore, spatial ecological study designs are now being used, and examples include the works of Elliott et al. (2007) and Lee et al. (2009). These studies are inexpensive and quick to implement, owing to the now routine availability of the required data. Thus, while they cannot provide individual-level evidence on cause and effect, they independently corroborate the body of evidence provided by cohort studies.
Spatial ecological studies utilise geographical contrasts in air pollution concentrations and population-level disease risks over a set of nonoverlapping areal units, and the analysis of these data typically uses Poisson log-linear models. The spatial pattern in disease risk is explained by known covariates and a set of spatially autocorrelated random effects, the former including air pollution concentrations and measures of socio-economic deprivation and demography. The random effects account for any spatial autocorrelation remaining in the disease data after the covariate effects have been accounted for, which could be caused by unmeasured confounding, neighbourhood effects (where subjects' behaviour is influenced by that of neighbouring subjects) and grouping effects (where subjects choose to be close to similar subjects). This is achieved by modelling the random effects with a conditional autoregressive (CAR, Besag et al. 1991) prior distribution, as part of a hierarchical Bayesian model. A relatively small number of these studies have been published to date (e.g. Jerrett et al., 2005;Maheswaran et al., 2005;Elliott et al.;Janes et al., 2007;2009, Greven et al.;2011;Lawson et al., 2012), and the majority suffer from potential statistical limitations that could bias the estimated health risks.
The first limitation is the spatially smoothed random effects, which have been shown by Clayton et al. (1993) and Paciorek (2010) to be potentially collinear to spatially smooth covariates such as air pollution. Paciorek (2010) shows this potential collinearity depends on

DESCRIPTION OF THE STUDY
The study region is mainland England in 2010, and records of emergency hospital admissions due to respiratory disease were obtained from the Health and Social Care Information Centre and analysed by the UK Met Office. The resulting disease data are aggregated counts of the numbers of hospital admissions for each of the n D 323 local and unitary authorities (LUAs) in England, and the expected numbers of admissions based on national age-specific and sex-specific disease rates were also computed to adjust for differing population sizes and demographies across the LUAs. The simplest measure of disease risk is the standardised morbidity ratio (SMR), which is the ratio of the observed to the expected numbers of disease cases and is mapped in Figure 1 and summarised in Table 1. The figure shows the highest-risk areas are cities in the north such as Liverpool and Manchester, while the lowest-risk areas are typically rural such as West Somerset in the far south-west of the country. The SMR map exhibits localised spatial smoothness, with some pairs of neighbouring LUAs having similar risks while other adjacent pairs are very different.
The network of air pollution monitors is sparse relative to the 323 LUAs, so modelled yearly average concentrations on a 1-km 2 grid available from http://uk-air.defra.gov.uk/data/pcm-data are used to characterise exposure. We consider concentrations of nitrogen dioxide (NO 2 ) and particulate matter, the latter including both particles less than 2.5 m (PM 2:5 ) and 10 m (PM 10 ). The top right panel of Figure 1 displays the PM 2:5 concentrations across England, from which the major cities and motorway network can be clearly seen. These modelled concentrations are spatially misaligned to the disease data, and the bottom left panel of Figure 1 shows that on average there are 215 pollution concentrations within a LUA, which ranges between 11 and 4889. The mean concentrations and coefficients of variation within each LUA are displayed in Table 1 for each pollutant, which shows that NO 2 has the greatest relative levels of within-LUA variation. The mean and variance of each pollutant within a LUA have positive and linear relationships, with correlation coefficients of 0.61 (NO 2 ), 0.48 (PM 2:5 ) and 0.16 (PM 10 ).
Socio-economic deprivation (poverty) has a major impact on disease risk (Mackenbach et al., 1997), as poorer populations typically exhibit greater tendencies for risk-inducing behaviours, such as smoking or poor diet, than more affluent populations. However, data on these quantities are not available, so two proxy measures of poverty are used. The first is the percentage of the population in each LUA that are in receipt of Job Seekers Allowance, which Table 1 shows ranges between 5.77% and 27.57%. The second is the average property price in each LUA, which ranges between £67 507 and £751 630. A natural log transformation is used because exploratory analyses showed it exhibited a stronger relationship with disease risk. These covariates were included in a Poisson generalised linear model along with PM 2:5 concentrations, and the residuals are displayed in the bottom right panel of Figure 1. They exhibit strong spatial autocorrelation, with a Moran I statistic of 0.282 (p-value 0.00001). However, the autocorrelation is visually localised, with some pairs of neighbouring regions having very different residual values.

MODELLING
This section outlines our proposed Bayesian hierarchical model, which is the first to simultaneously account for localised residual spatial autocorrelation and spatial misalignment between the pollution and disease data. The model is implemented in the R software environment, via the freely available CARBayes package.

Level 1-likelihood model
The vectors of observed and expected numbers of disease cases are denoted by Y D .Y 1 ; : : : ; Y n / and E D .E 1 ; : : : ; E n /, respectively, where the latter is computed as E k D P r N kr r for area k. Here, N kr is the size of the population in the rth age and sex stratum in areal unit k, while r is the national strata-specific disease rate. Let w D fw 1 ; : : : ; w n g denote the air pollution data, where w k D .w k1 ; : : : ; w kq k / is the vector of q k values within the kth area unit. These data are used to estimate W D .W 1 ; : : : ; W n /, where W k is a random variable representing the distribution of mean air pollution concentrations at different spatial locations within the kth areal unit. Finally, X D .x T 1 ; : : : ; x T n / T is a matrix of p known covariates, where x T k D .1; x k2 ; : : : ; x kp / are the values for area k. We propose modelling Y with a Poisson log-linear model of the form  The pollution summaries relate to the distribution of means and coefficients of variation of the 1-km concentrations within each local and unitary authority. Y k jE k ; R k Poisson.E k R k / for k D 1; : : : ; n Here, R k is the risk of disease in area k relative to E k , and R k D 1:2 corresponds to a 20% increased risk of disease. The covariate regression parametersˇare assigned a multivariate Gaussian prior with weakly informative hyperparameters . ˇ; †ˇ/. Section 3.2 describes our proposed localised spatial autocorrelation model for the random effects D . 1 ; : : : ; n /, while Section 3.3 presents our modelling mechanism for g.W k I w k ;˛/ that relates the air pollution concentrations w k to the disease count Y k while accounting for the spatial misalignment of these data. Figure 1 shows the localised nature of the residual spatial autocorrelation in the disease data, and hence, we allow adjacent random effects . k ; i / to be spatially autocorrelated or exhibit very different values, with the choice being determined by the data. We achieve this by partitioning k into a globally smooth component Â k and a piecewise constant intercept component Z k , which means that . k ; i / will be similar if they have the same intercept (same Z k value) but very different if they have different intercepts. The piecewise constant intercept surface has at most G distinct values D . 1 ; : : : ; G /, and the model we propose is given by

Level 2-spatial autocorrelation model
Inverse-gamma.a D 0:001; b D 0:001/ The intercepts are ordered as 1 < 2 < < G to mitigate against label switching (with 0 D 1 and GC1 D 1), and area k is allocated to one of the G intercepts by Z k 2 f1; : : : ; Gg. The maximum number of intercepts G is fixed in the model, but a shrinkage prior is assigned to each Z k to penalise it towards the middle intercept term. This prior has the penalty term ı.Z k G / 2 , where G D .G C1/=2. If G is odd, then the prior shrinks each data point towards a single intercept term G , while if G is even, it shrinks equally to . G 0:5 ; G C0:5 /. The latter will likely result in two different intercept terms being used to represent the data even if the residual structure is spatially smooth; thus, we recommend setting G to be an odd number. The size of G need only be small, as the intercept surface is designed to allow neighbouring areas to have very different random effects, and areas on different sides of the study region can have the same intercept value. Smooth spatial variation in the residual surface is modelled by Â D .Â 1 ; : : : ; Â n /, and thus, models discrepancies from this smooth structure. We recommend setting G to be a small odd number such as 3 or 5 and test the sensitivity of this modelling assumption by simulation. A weakly informative uniform prior is specified for the penalty parameter ı, so that the data play the dominant role in estimating its value.
The spatially smooth variation in is represented by Â, which is modelled by the CAR prior proposed by Leroux et al. (1999). Spatial autocorrelation is induced into these random effects via a binary n n neighbourhood matrix W, where w ki D 1 if areal units .k; i / share a common border and w ki D 0 otherwise. The CAR prior given in (2) is defined by its full conditional distribution f .Â k jÂ k / for k D 1; : : : ; n, where Â k D .Â 1 ; : : : ; Â k 1 ; Â kC1 ; : : : ; Â n /. It is a special case of a Gaussian Markov random field and can also be written as Â N.0; 2 Q.W; / 1 /, where the precision matrix is given by Q.W; / D OEdiag.W1/ W C .1 /I with .1; I/ being a vector of ones and an identity matrix, respectively. This model induces a single level of spatial smoothness into Â, with D 1 corresponding to the intrinsic CAR prior for strong spatial smoothing proposed by Besag et al. (1991), while D 0, corresponds to independent random effects. This can be better seen from the partial autocorrelation between .Â k ; Â i / implied by this prior, which is given by Here, for all pairs of adjacent areal units, .k; i /; w ki D 1, and hence, they are partially autocorrelated with the level of that autocorrelation controlled globally for all adjacent pairs by .
3.3. Level 3-the pollution-health relationship g.W k I w k ;˛/ As previously discussed, the pollution concentrations are spatially misaligned with the disease data, and the kth LUA has q k concentrations w k D .w k1 ; : : : ; w kq k /. The standard model for relating air pollution to health in the existing literature has the functional form Here, k D EOEW k and is estimated by O k D .1=q k / P q k iD1 w ki , the mean of the observed data w k . However, Wakefield and Shaddick (2006) have shown it is inappropriate to average the pollution concentrations via (4), as this ignores the spatial variation in the concentrations within an areal unit. To see this, let Y ki , for i D 1; : : : ; q k , denote the unknown number of disease cases from the population living in grid square i of areal unit k that experienced pollution concentration w ki . Then an appropriate model for the unobserved Y ki would be Y ki Poisson.E ki R ki /, where E ki D P r N kir r , the expected number of cases of disease in grid square i . Here, N kir is the population size of strata r in grid square i in areal unit k, and clearly, E k D P q k iD1 E kir . The risk model again follows a log-linear form as Y ki is a count and is given by Here, x T kˇC k are measured and unmeasured covariates that have the same effect on disease risk across all q k sub-populations within areal unit k. The observed disease data are Y k D P q k iD1 Y ki , and assuming conditional independence between Y ki jE ki ; x k ; w ki across grid squares i yields Y k Poisson.
Simplifying this expression yields the aggregate likelihood model (1), which is the same as equation (3.3) in Wakefield and Shaddick (2006) except that they use the total population size in each grid square as weights in the preceding sum rather than the expected number of cases E ki . Thus, the aggregate model (5) differs from the naïve ecological model (4) as in the left-hand side the averaging over pollution is performed on the exponentiated risk scale while on the right-hand side it is performed on the raw pollution scale (which is then exponentiated). In this paper, we compare the bias from using (4) rather than (5) by simulation in the next section but show in Section 2 of the supporting information that the potential bias can in part be predicted under a few simplifying assumptions. Finally, we note that as the pollution data are modelled estimates, they are subject to measurement error, but only a single concentration is available at each grid square without a corresponding measure of uncertainty. Therefore, the measurement error likely to be present in these data cannot be quantified, for example using a measurement error model. We also note that a classical measurement error model that treats the q k observations as error-prone measurements of an true unknown concentration in areal unit k is not appropriate, because this assumes there is a single and constant pollution concentration across each areal unit.

Alternative approaches
A number of approaches to dealing with residual spatial autocorrelation have been proposed in the literature, and we provide a brief review here and compare a number of them by simulation in Section 4. The simplest ignores the correlation and sets k D 0, while the most common approach uses a CAR model similar to that proposed by Leroux et al. (1999). Reich et al. (2006) and Hughes and Haran (2013) replace the random effects with a set of basis functions that are orthogonal to the covariates, with those used by the latter also being spatially smooth. We compare the approach of Hughes and Haran (2013) with that proposed here in the simulation study, and a detailed description of their model is given in Section 3 of the supporting information accompanying this paper. The model proposed by Hughes and Haran (2013) does not capture localised spatial autocorrelation, and Lu et al. (2007) and Lee and Mitchell (2013) have proposed extensions to CAR models that capture such localised autocorrelation. For adjacent areal units, their approach models w ki 2 W as a binary random variable, and if w ki D 1, the corresponding random effects are partially autocorrelated, while if w ki D 0, they are conditionally independent (Equation (3)). Lee and Mitchell (2013) propose an iterative algorithm for estimation of W, which is compared in the simulation study in Section 4 and described fully in the supporting information (Section 3). Finally, Lawson et al. (2012) propose a two-stage approach to modelling autocorrelation, where they first model the data with only the known covariates. They then model the residuals from this initial model with a space-time mixture structure in the second stage, allowing them to estimate the unmeasured spatio-temporal autocorrelation structure in the data. Finally, they treat this estimated autocorrelation structure as an offset in a model including the known covariates.

SIMULATION STUDY
This section presents two simulation studies, which respectively assess the impact that different types of residual spatial confounding and within-area variation in the pollution concentrations have on health effects estimation.

Data generation and study design
Simulated disease counts Y k are generated from a Poisson model similar to (1) for the n D 323 LUAs comprising mainland England, and the expected counts E k are generated from a uniform distribution on the range OE70; 130 to give a moderate disease prevalence in terms of Environmetrics D. LEE AND C. SARRAN the existing literature. The log-risk surface is a linear combination of a spatially smooth covariate acting as air pollution and residual spatial autocorrelation. For this first study, the pollution covariate is assumed to have no within-area variation and is generated from a Gaussian spatial processes with a mean of 20 and a spatially smooth variance matrix defined by the Matérn family of autocorrelation functions. For the latter, the smoothness parameter equals 1.5, and the range parameter equals 60. A linear relationship is assumed between air pollution and health, and the regression parameter corresponds to a 5% increase in disease risk for a 2-g m 3 increase in pollution concentrations, which is similar to that estimated in Section 5.
Twelve different scenarios are considered for the spatial confounding component , which cover the range of scenarios likely to be seen in real data. Firstly, the magnitude of this confounding is altered by fixing its standard deviation (denoted SD ) at either 0.1 or 0.01. Six different spatial structures are generated for under both values of SD , where scenarios A-C correspond to a global level of spatial smoothness while scenarios D-F are locally smooth with step changes in between some pairs of adjacent areas. Here, scenario A corresponds to independence in space, scenario B is spatially autocorrelated but less smooth than the pollution covariate, while in scenario C is as smooth as the pollution covariate. Scenarios D-F mirror these global patterns but additionally have step changes in to represent localised smoothness. Example realisations of these spatial surfaces are displayed in Section 4 of the supporting information accompanying this paper, together with complete details of the data-generating mechanism. Scenarios A-C correspond to setting G D 1 in Equation (2), while Scenarios D-F correspond to G D 3. We apply five models to each simulated data set, which include the localised smoothing model (2)

Results
Five hundred data sets are generated under each scenario, and the percentage bias and percentage root mean square error (RMSE) of the estimated air pollution effects are computed as Bias.˛/ D 100 . 1 500 P 500 iD1 . Ǫ i ˛//=˛and RMSE.˛/ D 100 .
q 1 500 P 500 iD1 . Ǫ i ˛/ 2 /=˛, respectively, where Ǫ i is the estimate (posterior median) for the i th simulated data set. Also computed are the coverage probabilities of the 95% uncertainty intervals from each model, and all the results are displayed in Table 2. The results presented for Model-Local relate to G D 5 so as not to equal the true values (G D 1 and G D 3) that generated the data, but a sensitivity analysis to this choice is presented in Section 5 of the supporting information, which shows the results are largely insensitive to this choice.
The top panel in Table 2 shows negligible bias in almost all cases, with percentage biases being lower than 2% in all but four cases and at most 3% overall. Scenario A corresponds to no spatial confounding as is independent in space, and all models perform similarly with less than 7% RMSE and coverages close to 95%. The only exception to this is for Model-HH when SD D 0:1, whose coverage is less than 85%. This relatively poor coverage for Model-HH is consistently observed for the other scenarios and is caused in part by its relatively poor point estimation (as measured by RMSE) compared with most of the other models. The other possible reason for its poor coverage is that it includes a much lower-dimensional (hence more parsimonious) set of random effects compared with most of the other models, which may thus lead to less variation being propagated through the model. The poor point estimation of Model-HH occurs because its random effects are orthogonal to the fixed effects, whereas in Scenarios B and C, the simulated data are generated so that the residual spatial autocorrelation is potentially collinear to the fixed effects. The random effects in Model-HH are also globally smooth, and thus, it cannot capture the localised step changes present in Scenarios D-F.
Scenarios B and C correspond to increasing spatial confounding, and all models exhibit dramatic rises in RMSE and falls in coverage when SD D 0:1. Model-CAR, Model-Local and Model-LM perform best in this regard and have similar results, in terms of both RMSE and coverage. Model-HH, which is designed to overcome this spatial confounding, does not outperform the other models that are prone to suffer from this confounding, for the reasons discussed earlier. When the standard deviation of drops to 0.01, the results are similar to those under Scenario A, because the level of spatial confounding is very small and hardly affects fixed-effects estimation.
Finally, under the localised spatial confounding Scenarios D-F, Model-Local performs best followed by Model-LM in terms of RMSE, having lower RMSEs than the remaining models by up to 10 times. This is because they are the only models that allow for a non-constant level of spatial smoothing across the study region and thus are able to accurately represent the localised spatial autocorrelation. However, the price for this is that Model-Local performs worse in terms of coverage than the globally smooth CAR model (Model-CAR) in Scenarios E and F when SD D 0:1. This is because the piecewise constant intercept term in Model-Local accurately captures the step changes in the localised residual structure, leaving the random effects Â to represent the remaining smooth spatial structure. However, this remaining smooth structure is collinear to the fixed effect, resulting in reduced coverage as observed in Scenarios B and C. This does not occur for scenario D, where this remaining structure is independent in space and thus not collinear to the covariate. In contrast, this drop in coverage does not happen to the same extent for Model-CAR, because its globally smooth random effects are trying to model the localised structure, which they are not designed to do. The result is a largely inflated random-effects variance 2 , as the amount of smoothing to the spatially smooth mean function in the CAR prior is reduced. This inflated level of variation causes a similar increase in variation in the posterior distribution of the fixed effect, leading to greater coverage. However, this increase in coverage comes at the cost of a large (up to 10 times) decrease in the accuracy of its point estimation.

Data generation and study design
Simulated data are generated using the same approach as in the first study, except that the pollution concentrations vary within each LUA. The number of concentrations observed for each LUA is the same as in the real data, and we examine the impact that this variation has on the The top panel displays the bias (as a percentage of the true value) for the pollution-health relationship estimated by each of the five models, and the middle panel displays the root mean square error (as a percentage of the true value), while the bottom panel displays the coverage probabilities (as a percentage) of the 95% uncertainty intervals.
estimated health effects under a number of scenarios. We vary the size of the estimated health effects and the type of within-LUA variation in pollution, as the discussion in Section 3 and the supporting information suggests that both will affect the results. The pollution-health relationships considered here have relative risks of 1.05 and 1.5 for a 2-g m 3 increase in concentrations, and the latter is chosen to be overly large. The standard deviation of pollution within each LUA is fixed at 1 or 10 and is also allowed to be independent or positively linearly related to the mean pollution level in an LUA. In all cases, the distribution of pollution concentrations within an LUA is assumed to be Gaussian. Finally, we also investigate the impact of changing the scaled expected counts .E k1 ; : : : ; E kq k / and first assume they are all equal, that is, E ki D 1=q k . We then relax this assumption and generate them from a uniform distribution on the unit interval (with appropriate rescaling). In all scenarios, we compare the aggregate model (5) proposed here with the naïve ecological risk model ((4), with a Poisson data likelihood) commonly used in these studies, and both are completed by (2).

Results
Five hundred data sets are generated under each of the 16 scenarios described earlier, and the results presented in Table 3 include the percentage bias, percentage RMSE and coverage probabilities for the estimated pollution-health relationships. The top panel of the table displays results for constant E ki D 1=q k , while the bottom panel corresponds to them varying within an LUA. The table shows broadly similar bias, RMSE and coverage results under the two specifications of .E k1 ; : : : ; E kq k /, suggesting that a non-constant set of within-LUA expected counts does not impact health effects estimation. The table also shows that, when the pollution-health effect has a similar size to that observed in the literature, neither the ecological model nor the aggregate model exhibits any systematic bias, regardless of the level of within-area variation in the pollution concentrations. This is because the bias term 0:5b˛2 (based on the Gaussian and linearity assumptions, see the supporting information) in the naïve ecological model is small as the true value of˛D 0:0244. The RMSE values and coverage probabilities for both models are similar, with the latter being close to their nominal 95% levels. These results thus suggest that while the ecological model is inappropriate mathematically, the small effect sizes seen in air pollution and health studies render its bias negligible in practice in this context. Conversely, for the larger effect size of a relative risk of 1.5 for a 2-g m 3 increase in pollution .˛D 0:203/, the ecological model shows large bias, large RMSE and low coverage if the within-area variation in the pollution concentrations increases with the mean in a linear fashion. This result conforms to our theoretical expectations discussed in the supporting information, while the aggregate model does not suffer from these problems. Finally, as expected, if the within-area variation in the pollution concentrations is independent of the mean pollution level, then the ecological model is once more unaffected.

Modelling
All models compared in the simulation studies were applied to the England data described in Section 2, which include the full aggregate model comprising (2) and (5) (Model-Local-Agg) proposed here (with G D 5) and the naïve ecological model comprising (2) and (4) with a Poisson data likelihood (Model-Local, also with G D 5) that ignores within-area variation in the pollution concentrations. The different models for residual spatial confounding are also applied to these data, which include a simple generalised linear model (Model-GLM), a globally smooth CAR model (Model-CAR, (1) and (2) with k D Â k ) and the recent approaches of Hughes and Haran (2013) (Model-HH) for orthogonal autocorrelation and Lee and Mitchell (2013) (Model-LM) for localised autocorrelation.
The covariates included in each model are one of the three pollutants considered here (NO 2 , PM 2:5 , and PM 10 ) and the proxy measures of socio-economic deprivation, the latter including the percentage of the working age population in receipt of Job Seekers Allowance and the natural log of the median property price in each LUA. A single pollutant was included in each model because of the positive pairwise correlations between pollutants, which ranged between 0.79 and 0.94 at the LUA level. The expected numbers of respiratory hospital admissions were computed for the 7665 wards in England, which allows the pollution concentrations within each LUA to be weighted by .E k1 ; : : : ; E kq k / as described in Equation (5) for Model-Local-Agg . The remaining models use the mean concentration in each LUA as Table 4. Estimated relative risks and 95% uncertainty intervals (confidence intervals for Model-GLM and credible intervals for the remaining models) for 5-g m 3 (NO 2 ) and 1-g m 3 (PM 2:5 and PM 10 ) increases in pollution concentrations from the models considered in this paper  (4). Inference for all the Bayesian models (i.e. not Model-GLM) was based on 100 000 Markov chain Monte Carlo samples, which were generated from five parallel Markov chains, and convergence was visually checked by examining trace plots of sample parameters including the pollution-health relationship. The main study results are presented in the next subsection, while the following subsection presents some additional sensitivity analyses.

Main results
The estimated relationships between each pollutant and respiratory hospital admissions from each model are displayed in the top panel of Table 4 and are presented as relative risks for a realistic increase in pollution concentrations. These increases are 5 g m 3 for NO 2 and 1 g m 3 for PM 2:5 and PM 10 , because NO 2 has larger average concentrations and larger variation across England. Overall, the table shows evidence of substantial relationships between air pollution and respiratory disease risk, as 17 of the 18 estimated relative risks have 95% uncertainty intervals that do not include the null risk of 1. The most definitive effects are observed for NO 2 and PM 2:5 , while weaker effects are observed for PM 10 .
The table also shows clear evidence that the model used to allow for residual spatial autocorrelation can have a large impact on health effects estimation, although the relative risks for NO 2 are relatively stable ranging between 8.5% and 9.4% increases in disease risk for a 5-g m 3 increase in concentrations. However, the results for PM 2:5 and PM 10 show large variation in the estimated effect sizes, ranging between 3.2% and 5.5% increases for PM 2:5 and between 0.8% and 3.7% for PM 10 . The largest estimated effect sizes are obtained from Model-CAR for each pollutant, while those from Model-Local are always smaller. Of course, one is unable to say which estimate is 'correct', but the results from Scenarios D-F in the simulation study in Section 4 suggest that as the spatial autocorrelation after accounting for the measured covariates is localised for these data (Figure 1), then Model-Local is likely to produce the most accurate effect estimates. Further supporting evidence comes in the form of overall model fit, as the deviance information criterion (Spiegelhalter et al., 2002) values are 3607.9 .p:d D 311:0/ and 3535.7 .p:d D 234:6/ for Model-CAR and Model-Local, respectively, suggesting a better model fit for the latter. An interesting side note is that the effective number of parameters p:d is lower for Model-Local compared with that for Model-CAR, even though the former has additional intercept and allocation parameters . i ; Z k /. This is because the inclusion of the piecewise constant intercept terms in Model-Local means the random effects Â are smoothed more than in Model-CAR (posterior medians for the variance 2 are 0.0131 and 0.1809, respectively), as they do not have to account for the localised residual spatial structure. Finally, a comparison of Model-Local and Model-Local-Agg shows that ignoring within-area variation in the pollution concentrations has little effect on the results, which was suggested by the simulation study as the estimated effect sizes . Ǫ/ are relatively small.

Sensitivity analyses
We then undertook a sensitivity analysis to assess the impact of changing the model assumptions of Model-Local-Agg, and the results are displayed in the bottom panel of Table 4. This analysis included the following: (i) changing the maximum number of risk classes G from five to three; (ii) changing the hyperparameters .a; b/ of the inverse-gamma distribution for 2 from .a D 0:001; b D 0:001/ to .a D 0:1; b D 0:1/ and .a D 0:5; b D 0:0005/; and (iii) removing the weighting by .E k1 ; : : : ; E kq k / in (5) to give equal weight to all pollution concentrations. The table shows that changing G and .a; b/ had no impact on the results, as the estimated relative risks for all pollutants remained almost identical. Removing the population weighting also had little effect, although the results for NO 2 and PM 2:5 changed by around 0.5%.

DISCUSSION
This paper is the first to propose an integrated modelling framework for estimating the long-term effects of air pollution on human health, accounting for localised spatial autocorrelation in the disease data and the inherent spatial misalignment between the exposure and the response. The model proposed here is widely applicable to geographical association studies beyond the air pollution arena and is available for other researchers to use via the R package CARBayes available free from http://www.R-project.org/. This paper also provides an in-depth simulation study into the impact of spatial autocorrelation and spatial misalignment on fixed-effects estimation and presents a new study into the long-term effects of air pollution on respiratory disease in England in 2010.
One of our main findings is that inappropriate control for residual (i.e. after the effects of known covariates have been removed) spatial autocorrelation in the disease data can result in incorrect fixed-effects estimation. This problem encompasses both point estimation and uncertainty quantification, and the first simulation study presented here shows some interesting results. Firstly, if, as is the case with the respiratory admissions data presented here, the residual spatial autocorrelation is not globally smooth, then wrongly assuming it is leads to substantially poorer estimation of covariate effects (in terms of RMSE) compared with using a localised smoothing model. Differences in fixed-effects estimates were also seen empirically in the real data in Section 5, although one is of course unable to say which estimate is 'correct' in this case. The other main finding from the simulation study is that, as expected, the greater the level of confounding between the fixed effects and the residual spatial structure, the poorer all models do in terms of fixed-effects estimation. This poor performance again encompasses point estimation and uncertainty quantification. However, what is surprising is that the commonly used CAR model, which has been subject to recent criticisms by Reich et al. (2006) and others, does no worse than other more sophisticated models and in fact outperforms the orthogonal smoothing model proposed by Hughes and Haran (2013).
The other main finding of this paper is that for air pollution and health studies, where effect sizes are typically small, ignoring the withinarea variation in the exposure caused by the spatial misalignment of the data does not lead to systematic bias. This result is illustrated in the second simulation study presented in Section 4 and is corroborated by the real-data results presented in Section 5. The magnitude of any such bias depends on a number of factors, including the effect size being estimated, the distributional shape of the within-area variation in the exposure, and the relationship between the mean and higher-order moments of the within-area exposure distribution. However, although no bias was observed here, in general, within-area variation in an exposure should not be averaged away by computing a mean, as bias can result if the aforementioned conditions are right as shown in the simulation study in Section 4.
The spatial air pollution and disease data used in this study are routinely available for multiple consecutive time periods, and in future work, we will extend the model proposed here to the spatio-temporal domain. Furthermore, the modelled concentrations used here have complete spatial coverage but are modelled estimates rather than measured concentrations and as such are prone to biases and uncertainties. However, no information on these are available, and thus, we plan to develop a two-stage modelling approach for these data, where the first stage provides better estimates of pollution by fusing the modelled concentrations with observed monitor data using techniques similar to that of Berrocal et al. (2009). This approach would thus allow for the measurement error in the pollution data to be correctly propagated into the disease model, which was not possible for the data used in this paper.