SEARCH

SEARCH BY CITATION

Keywords:

  • Disease mapping;
  • Integrated nested Laplace approximations;
  • Leave-one-out cross-validation;
  • Spatiotemporal models

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

Summary.  Spatiotemporal disease mapping models have been used extensively to describe the pattern of surveillance data. They are usually formulated in a hierarchical Bayesian framework and posterior marginals are not available in closed form. Hence, the standard method for parameter estimation is Markov chain Monte Carlo algorithms. A new method for approximate Bayesian inference in latent Gaussian models using integrated nested Laplace approximations has recently been proposed as an alternative. This approach promises very precise results in short computational time. The aim of the paper is to show how integrated nested Laplace approximations can be used as an inferential tool for a variety of spatiotemporal models for the analysis of reported cases of bovine viral diarrhoea in cattle from Switzerland. Conclusions concerning the problem of under-reporting in the data are drawn via a multilevel modelling strategy. Furthermore, a comparison with Markov chain Monte Carlo methods with regard to the accuracy of the parameter estimates and the usability of both approaches in practice is conducted. Approaches to model choice using integrated nested Laplace approximations are also presented.


1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

Spatiotemporal disease mapping models have been used extensively to describe the spatial and temporal pattern of registry data. Various specifications of the spatial and temporal trends and the space–time interaction term have been proposed in the literature (Bernardinelli et al., 1995b; Knorr-Held, 2000; Lagazio et al., 2003). From an inferential point of view, this class of models is formulated within a hierarchical Bayesian framework (Besag et al., 1991; Banerjee et al., 2004). As, in general, posterior marginals are not available in closed form, Markov chain Monte Carlo (MCMC) algorithms have been used for parameter estimation so far. But the often complex dependence structure in spatiotemporal models requires specific algorithms to obtain reliable estimates (Knorr-Held and Rue, 2002; Schmid and Held, 2004). Furthermore, MCMC methods may lead to a large Monte Carlo error and the computation time can be long.

Recently, an approximate method for parameter estimation in specific Bayesian hierarchical models, so-called latent Gaussian models, has been proposed in Rue et al. (2009). This method uses integrated nested Laplace approximations (INLAs) to approximate the posterior marginals of interest. Since spatiotemporal disease mapping models incorporate a latent Gaussian field, the INLA approach can be used for inference here. A major advantage of INLAs is that computational time is short and they can easily be used via the R library (R Development Core Team, 2005) INLA (Martino and Rue, 2009). Quantities for model criticism and comparison, e.g. the well-known deviance information criterion (DIC), are provided as standard. Furthermore, cross-validated diagnostic tools based on the predictive distribution can be obtained. These tools, namely the probability integral transform (PIT) and the logarithmic score (Gneiting and Raftery, 2007), have recently been applied to count data by Czado et al. (2009). A detailed comparison of these criteria with results from MCMC methods is given in Held et al. (2010).

In this paper, the scope of INLAs concerning spatiotemporal disease mapping is assessed by means of a case-study. Parameter estimates for a data set containing reported cases of bovine viral diarrhoea (BVD) in cows from Switzerland were obtained by using INLAs. Their accuracy is assessed via a comparison with results from MCMC techniques. Posterior samples were drawn by using auxiliary mixture sampling (Frühwirth-Schnatter et al., 2009) and a second-order Taylor series expansion of the log-likelihood to obtain a suitable proposal for a Metropolis–Hastings algorithm (Rue and Held, 2005). Furthermore, the usability of INLAs and MCMC methods in terms of available software and computational time is discussed briefly.

BVD is a viral diarrhoea infection in cattle. It is one of the most widespread cattle diseases in Switzerland and causes damage of several million Swiss francs every year (Swiss Federal Veterinary Office, 2006). The reported cases, which were collected by the Swiss Federal Veterinary Office as part of routine surveillance from 2003 to 2007, show an increasing trend (Table 1). However, it is well documented that case reporting data can potentially be biased owing to limited case detection or low reporting motivation (Doherr and Audige, 2001). This suspicion is confirmed by the fact that the number of reported BVD cases varies heavily throughout the country. Nevertheless, there is no obvious reason for such large variability in these data as BVD is a slowly spreading viral disease. Several of the 185 Swiss regions reported no cases of BVD during the time period whereas some regions had a stable, high number of reported cases. Additionally, the strong rise in reported cases gives reason to doubt that the temporal heterogeneity is due to a real increase in prevalence of disease.

Table 1.   Number of reported BVD cases, 2003–2007
Year20032004200520062007
Number141172239406712

Switzerland is a confederation of 26 cantons. Each canton consists of one or more regions. The cantonal veterinary authorities are responsible for the realization of federal veterinary legislation. Hence, cantons build a second, coarser spatial grid. It is suspected that the system of case registration is highly influenced by the affiliation of a region to a certain canton. This heterogeneity could be caused by a cantonal difference in incentives for a farmer to report a case, e.g. financial compensation in the case of a diseased animal, or a different practice in conveying disease information to farmers. Hence, multilevel models similar to those of Langford et al. (1998, 1999) addressing this issue are formulated and evaluated by using model choice.

This paper is organized as follows: Section 2 introduces variations of spatiotemporal models which are appropriate to investigate the spatial and temporal pattern in the data and to assess cantonal heterogeneity. In Section 3 the INLA approach is described; Section 4 discusses tools for model comparison that are returned by INLAs. All results with regard to case reporting for BVD are presented in Section 5 in detail. Various aspects of the comparison of INLAs and MCMC methods are discussed in Section 6. We close with some general results in Section 7.

2. Spatiotemporal models

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

In what follows we outline five models to describe the spatiotemporal pattern of the BVD data by using a disease mapping approach. As described in Section 1, Switzerland is a confederation of 26 cantons which consist of one or more regions: 184 in total. As the cantonal veterinary authorities are responsible for the implementation of federal veterinary legislation, it is of interest to investigate whether the cantonal affiliation of a region has a pronounced influence on case reporting. Hence, a cantonal effect is included in three of the models. This is done by using a multilevel approach such that variability in the response is attributed to different hierarchical levels (Langford et al., 1998, 1999). Furthermore, models incorporating a linear as well as a non-parametric time trend are proposed. Space–time interactions that adjust for the fact that spatial units can behave differently from the main time trend when observed over a long time span are additionally included in each type of model. Models assuming a linear time trend have been proposed in Bernardinelli et al. (1995b) and developed further in Assunção et al. (2001). Non-parametric space–time interaction models have been introduced by Knorr-Held (2000) and used by several researchers and in different applications, e.g. for age–period–cohort models (Lagazio et al., 2003; Schmid and Held, 2004) and the joint analysis of two or more diseases (Richardson et al., 2006). The characteristics of all models that will be introduced in Sections 2.1 and 2.2 are summarized in Table 2.

Table 2.   Characteristics of all models from Sections 2.1 and 2.2
ModelCantonal heterogeneity αTime trend βSpace–time interaction δ(Knorr-Held, 2000)
M1Linear
M2RW1Type II
M3RW2Type II
M4RW1Type IV
M5RW2Type IV

2.1. Linear time trend

Since BVD cases are available for each Swiss region separately, this fine grid can be used as a basis for a spatial analysis. Additionally, data from the Principality of Liechtenstein are included (which counts as both a region and a canton). A case means that at least one diseased animal within one herd was detected. Under the rare disease assumption, it is assumed that the number of cases of disease yit in region i=1,…,I at time t=1,…,T is Poisson distributed with parameter λit, which can be interpreted as the relative risk of the disease in region i at time t. Additionally, the number of herds mit is included as an offset to adjust for the different number of herds at risk. Following the standard generalized linear mixed model formulation (Breslow and Clayton, 1993) with Poisson response, a logarithmic link is used.

To account for a linear time trend, Bernardinelli et al. (1995b) proposed a Bayesian spatiotemporal model. It can be seen as an extension of the standard model for disease mapping that was introduced by Besag et al. (1991). In the standard setting defined for one spatial level, a main linear time trend and a so-called differential trend for each area i are incorporated in the model as well as spatially structured and unstructured effects. In this application, where cantonal heterogeneity is considered as well, we assume that a cantonal effect αj interacts with the linear time trend. Hence, the rate of cases over time can have a different slope for each canton. Reasons for this heterogeneity could be cantonal differences in incentives for a farmer to report a case, e.g. financial compensation in the case of a diseased animal, or a different practice in conveying disease information to farmers. The linear predictor of this model (M1) can be written as

  • image(1)

with i=1,…,185, t=1,…,5 and j=1,…,27. The index j(i) denotes the canton j which region i belongs to. The offset mi was provided by the Swiss Federal Veterinary Office and is supposed to be constant in time. The vector ν=(ν1,…,νI)T is independent and identically mean 0 normally distributed with variance inline image. The νis account for differences between regions whereas the αjs model cantonal heterogeneity. In this model the similarity of the incidence between neighbouring regions is described via the cantonal term. Hence, it incorporates a two-level structure. The parameter φ represents the overall linear time trend. The term ϕj depicts the interaction between the linear time trend and the cantonal intercept αj and is modelled as a random slope. Thus, φ+ϕj represents the individual time trend for canton j. Each ϕj can be interpreted as the amount by which the time trend of canton j differs from the overall trend φ. A prior distribution for α=(α1,…,αJ)T and ϕ=(ϕ1,…,ϕJ)T must be defined as well. Since it is assumed that the cantonal effects α are independent for each canton, the differential trends ϕ are modelled in the same way (Bernardinelli et al., 1995b). Furthermore, it is necessary to allow for correlation between the intercept and slope in a random-slope model (Hedeker and Gibbons (2006), section 4.4.2). A standard assumption is that (αj,ϕj)T follows a bivariate normal distribution with mean 0 and some unknown precision matrix P, to which a Wishart prior is assigned. Bernardinelli et al. (1995b) also proposed that the time variable t should be centred at 0 to avoid high correlation between the intercept and slope. We have followed this advice in our application. The specification of hyperpriors is discussed in Section 2.3.

2.2. Non-parametric time trend

In model M1, the time trend in log-incidence is taken as linear. This assumption can be relaxed by adopting a non-parametric setting as proposed in Knorr-Held (2000). Custom-made modifications of this general setting are formulated for the BVD data in what follows.

The second model M2 is the non-parametric analogue of model M1. In contrast with model M1 it includes a main time trend β=(β1,…,βT)T and an interaction δ=(δ11,…,δ1T,δ21,…,δ2T,…,δJT)T between canton and time to which specific prior distributions must be assigned. The linear predictor is

  • image(2)

Here, the αjs are modelled as independent and identically mean 0 normally distributed with variance inline image. For β and δ we use intrinsic Gaussian Markov random-field priors of the general form

  • image(3)

including a so-called structure matrix R (Held and Rue, 2010). The main time trend is specified as a random walk (RW) of first order with structure matrix

  • image(4)

The assumption of temporal structure is plausible as the number of reported cases is constantly increasing over time. The joint prior density of β can be written as (Rue and Held, 2005)

  • image(5)

To specify the prior on δ we consider the interacting spatial (α) and temporal (β) main effects: since the cantonal effects α are modelled as spatially unstructured, a so-called type II interaction prior (Knorr-Held, 2000) is used for δ, i.e. the interactions δjt in the different cantons follow independent RWs in time. Hence, the form of the resulting joint distribution for δ is similar to expression (5), including an additional sum over all cantons:

  • image(6)

Following Clayton (1996) and Knorr-Held (2000), its structure matrix can be obtained as the Kronecker product of the interacting main effects and has rank J(T−1). To ensure identifiability of the main time trend β, the δjts must sum to 0 for each j=1,…,J.

Instead of a first-order RW prior for β an RW of second order can be used. This assumption might be appropriate for the BVD data which exhibit an increasing number of counts over the observed time period. A first-order RW trend smooths towards a constant whereas the second-order RW penalizes deviations from a linear trend. The structure matrices of β and δ and the linear constraints must be adapted appropriately; see Schmid and Held (2004) and Rue and Held (2005) for details. This new model, which includes a second-order RW main time trend and the respective interaction, is called model M3 in this application.

So far, all models proposed explicitly include cantonal heterogeneity. To investigate whether a cantonal component is necessary, models with regional effects only are considered as well. Similarities between neighbouring regions are now modelled by using an intrinsic Gaussian Markov random field for ψ=(ψ1,…,ψI)T with prior density

  • image(7)

The sum in expression (7) includes all pairs of adjacent regions i and i. The linear predictor of the resulting model M4 is given as

  • image(8)

In equation (8), the time trend β is modelled as a first-order RW. Since the Swiss regions build a fine spatial grid we assume (in contrast with the preceding models) that the interaction effects δ are also spatially structured. This means that both the temporal and the spatial neighbours as well as the temporal neighbours of the spatial neighbours enter the conditional distribution of the Gaussian Markov random field. This assumption is appropriate if temporal trends are different from region to region but are more likely to be similar for adjacent regions. This can be incorporated in the model with a type IV interaction prior (Knorr-Held, 2000) of the form

  • image(9)

The appropriate structure matrix can be obtained by the Kronecker product of the structure matrices (4) of the first-order RW term β and the structure matrix of the intrinsic Gaussian Markov random-field prior on ψ. This model induces full dependence over time and space. The rank of the structure matrix is now (I−1)(T−1). To avoid problems of identifiability, the δits need to sum to 0 for each i and each t, i.e.

  • image
  • image

One of these I+T constraints is redundant.

By analogy with the non-parametric models including cantonal heterogeneity, a fifth model M5 is fitted. In this model a second-order RW prior is assigned to β and the structure matrix of the interaction term δ is obtained as the Kronecker product of the structure matrices of ψ and β (second-order RW).

In this application, a herd is the unit of analysis (see Section 2.1). Therefore, a large herd may be more likely to be a case than a small herd, as there are more animals at risk. In most Swiss regions the mean number of cows per herd is between 30 and 40. An ecological regression including the logarithm of the mean herd size as explanatory variable is conducted in Section 5.2 to investigate this issue.

2.3. Priors

Since the models are formulated in a Bayesian way, prior distributions must be assigned to all variance and precision components. In the parametric setting (1) a Wishart prior is assigned to the precision matrix P of the bivariate normal distribution for (αj,ϕj)T. The Wishart distribution Wi2(l,L) has two components, namely the degrees of freedom l and the matrix L. Here, they were chosen as l=4 and

  • image

a priori. For inline image an inverse gamma prior IG(1,0.01) was used. The parameterization of the inverse gamma distribution is as in Natario and Knorr-Held (2003) and Rue et al. (2009).

In the non-parametric settings independent IG(1,0.01) priors were used for inline image, inline image, inline image (first-order RW) and inline image (as specified in expression (6)). In models M4 and M5 the prior of inline image was adjusted for the fact that it represents conditional variability on the same spatial level as inline image and chosen as IG(1,0.018) (Bernardinelli et al., 1995a). For the models including a second-order RW specification of β (model M3 or M5), an IG(1,0.00005) prior was used for inline image and inline image. For a discussion of the sensitivity with respect to this prior see Natario and Knorr-Held (2003).

3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

INLAs are a recently proposed method for approximate Bayesian inference in structured additive regression models with latent Gaussian fields (Rue et al., 2009).

Spatiotemporal models such as introduced in Section 2 fit into this framework and are built in a hierarchical fashion including three stages. The first stage is the observational model π(y|x). The second stage is the latent Gaussian field π(x|θ) with precision matrix Q, e.g. x=(μ,νT,αT,βT,δT)T for expression (2). It is typically controlled by a few hyperparameters θ which are not necessarily Gaussian (third stage). All unknown variance parameters (e.g. inline image) as specified in Section 2 enter θ.

For such models it is not possible to compute the posterior distributions analytically. Hence, MCMC methods have been used to obtain estimates so far, but they have some drawbacks: the computational time may be long if samples are highly correlated. Especially for models with a complex dependence structure within the Gaussian field such as proposed in Section 2, advanced MCMC algorithms are required to provide a reasonable sampler for the posterior marginals. This issue is discussed in more detail in Section 6. In contrast, INLAs provide accurate approximations to the posterior marginals in short computational time. In what follows we present the inference strategy briefly; for details refer to Rue et al. (2009).

The main goal is to estimate the marginal posterior distribution

  • image(10)

given the data for each component xi of the latent Gaussian field x. Parts 1 and 2 of equation (10) are processed in an elaborate way. From π(x,θ,y)=π(x|θ,y)π(θ|y)π(y) it follows that part 1 of integral (10) can be approximated by

  • image(11)

which is the Laplace approximation of a marginal posterior distribution (Tierney and Kadane, 1986). In expression (11), inline image denotes the Gaussian approximation (Rue and Held, 2005) to π(x|θ,y) and x*(θ) is the mode of the full conditional of x for a given θ. To integrate out the uncertainty with respect to θ, it is essential to explore the properties of expression (11) and to find good evaluation points θk for a numerical integration of equation (10). This is done by an iterative algorithm (Rue et al., 2009). Additionally, an appropriate area weight Δk must be assigned to each θk (see equation (12)).

For the approximation of part 2 in equation (10), three alternatives were proposed in Rue et al. (2009): a Laplace approximation, a simplified Laplace approximation and the simplest of these: the Gaussian approximation. Here, the distribution of a non-normal variable is approximated with a Gaussian distribution by matching the mode and the curvature at the mode (Rue and Held (2005), section 4.4.1). According to Rue and Martino (2007), this method often gives reasonable results but there can be errors in the location or due to the lack of skewness or both. Therefore, the approximations can be improved by applying the Laplace approximation also to π(xi|θ,y). This so-called full Laplace approximation is very precise. Rue et al. (2009) also proposed an alternative method, the simplified Laplace approximation, which is based on a series expansion of the full Laplace approximation. This method takes less computation time than the full Laplace approximation and is equally accurate in many applications. Putting things together, we obtain

  • image(12)

as an approximation of the posterior marginal density (10).

As noted in Section 2.2, the incorporation of linear constraints on x is required in models M2–M5. This is possible by using INLAs but will slow down the computational time for the simplified and the full Laplace approximation, if the number of constraints is large.

From a user's point of view INLAs can be used in a modular way. The program inla which is written in C is bundled within an R library (R Development Core Team, 2005) called INLA which permits model specification and processing of the results directly in R. It can be downloaded freely from http://www.r-inla.org and is available for LINUX, Macintosh and Windows environments. All analyses within this paper were run by using the INLA package built on June 9th, 2010, on inla version 1.2.

A detailed comparison of INLA approximations and MCMC histograms for model M2 is given in Section 6.

4. Model comparison and calibration

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

An important feature of the INLA approach is that criteria for model choice and assessment of model calibration can be obtained directly from the INLA output (Rue et al. (2009), section 6.4). Even cross-validated quantities that are needed for computation of the logarithmic score and the PIT that are discussed below can be computed by INLAs without rerunning the model. Their accuracy in comparison with quantities that are obtained by MCMC methods is discussed in Held et al. (2010).

4.1. Deviance information criterion

The DIC is a popular criterion for Bayesian model selection. According to Spiegelhalter et al. (2002) it is the sum of the posterior mean of the deviance inline image and the number of effective parameters pD. A low mean deviance indicates a good model fit, but it decreases with an increasing number of parameters. Hence, the effective number of parameters is added to penalize model complexity. So the model with the lowest DIC provides the best trade-off between model fit and complexity.

4.2. Logarithmic score

One approach for assessing the predictive performance of a model is to use cross-validated scoring rules which assign each model a numerical score based on the predictive distribution. Cross-validation means that one observation yit is left out in each step of the validation process and the predictive distribution Pyit=Prob(Yitleqslant R: less-than-or-eq, slantyit|yit) based on the remaining observations is computed. For discrete Yit the logarithmic score that is considered in Section 5.1 is defined as

  • image(13)

where πyit=Prob(Yit=yit|yit) denotes the cross-validated predictive probability mass at the observed count. Both Pyit and πyit are available in INLA. According to Stone (1977), the cross-validated mean logarithmic score is asymptotically equivalent to the Akaike information criterion if the observations are independent. Here, scoring rules are negatively oriented, which means that, the smaller the score, the better the predictive power of the model. An attractive feature of this measure is that it can be applied to parametric and non-parametric settings and does not require models to be nested, nor to be related in any way (Gneiting and Raftery, 2007).

4.3. Probability integral transform histogram for count data

A PIT histogram assesses the predictive quality of a model with respect to calibration. The PIT for a certain region is the value of the predictive cumulative distribution function at the observed count. If the observation was drawn from the predictive distribution—which would be the ideal case—and the predictive distribution is continuous, the PIT values have a standard uniform distribution. As a diagnostic tool, a histogram of the obtained PIT values is plotted and checked for uniformity. If there are deviations from uniformity, forecast failures and model deficiencies might be present. U-shaped histograms indicate underdispersed predictive distributions; hump or inverse U-shaped histograms point to overdispersion (Czado et al., 2009).

In the case of count data, as in the present paper, the predictive distribution is not discrete, and the PITs are no longer uniform under the hypothesis of an ideal forecast. Hence, an adjustment is necessary as, for example, suggested by Czado et al. (2009). The resulting histogram can be interpreted in the same way as a PIT histogram derived for continuous data.

5. Results by using integrated nested Laplace approximations

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

All models from Section 2 were fitted to the BVD data by using INLAs (full Laplace approximation). Model choice is conducted in Section 5.1 to find the best model and to determine whether cantonal heterogeneity is present in the data. Some interesting results with regard to under-reporting in the data are presented in Section 5.2.

5.1. Model choice and calibration

The DIC and its components are shown in Table 3 as well as the mean logarithmic scores. For model M4 the fit is best, but it has a large pD because of the complex dependence structure within the Gaussian field. The best trade-off between model complexity and fit is found for model M2; it has the lowest DIC value. In general, models including a first-order RW time trend are preferred to the analogous model with a second-order RW trend. The mean logarithmic score is also lowest for model M2. Hence, cantonal heterogeneity is present in the data and a first-order RW formulation is most appropriate for the time trend; see Table 2.

Table 3.   DIC and mean logarithmic score LS
Modelinline imagepDDICLS
M11712.191.31803.51.011
M21622.7117.81740.50.979
M31674.7107.01781.61.001
M41516.9231.61748.51.002
M51645.5177.51823.01.052

To investigate model calibration, PIT histograms for all models are shown in Fig. 1. The histograms for models M1, M2, M3 and M5 are close to uniform except for higher columns at the left-hand and right-hand end of the histograms. This indicates underdispersion of the predictive distribution. The PIT histogram for model M4 is very close to uniformity and, hence, calibration is best for this model.

image

Figure 1.  PIT histograms for models (a) M1, (b) M2, (c) M3, (d) M4 and (e) M5

Download figure to PowerPoint

Since model M2 was considered as the best model by the model choice criteria, its calibration should be examined in further detail. Fig. 2 shows a PIT histogram for model M2 separately for each year. Underdispersed predictions are present particularly for the years 2003, 2005 and 2006. Hence, the underdispersed predictions can be attributed to a poor predictive performance in some years.

image

Figure 2.  PIT histograms for model M2, separately for each year: (a) 2003; (b) 2004; (c) 2005; (d) 2006; (e) 2007

Download figure to PowerPoint

5.2. Results for the bovine viral diarrhoea data

The fitted relative spatial incidence for each region (νi+αj(i)) for the best model (M2) is shown in Fig. 3(a) on an exponential scale; lakes are indicated as striped areas. The large range of relative incidence (0.1–16) indicates biased case reporting since such large differences cannot be explained from the nature of the disease. We now consider the effects of cantons and regions separately to investigate which one is more pronounced. Plots of the cantonal (αj) and regional effects (νi) on an exponential scale are shown in Figs 4 and 5(a) respectively. The cantonal effect has a larger influence on the total relative incidence than the effect on regional level, which indicates that there is a strong heterogeneity in reporting between cantons. The cantons Berne, St Gallen, Appenzell-Innerrhoden and Appenzell-Ausserrhoden show an increased relative incidence for BVD which is elevated by a factor of 10. The incidence of a reported BVD case is lowest in the cantons Valais, Aargau and Thurgau. This is clear evidence for under-reporting in the data due to different policies of the cantonal authorities. An unstructured spatial heterogeneity between regions is also present; see Fig. 5(a). This map might represent regional differences in disease prevalence. It is also suspected that there are regions with single stockholders who are aware of the disease or have faced financial damage caused by BVD in the past. To investigate what happens if only regional terms are included in the model the incidence fitted by model M4 is plotted in Fig. 3(b). Clearly, cantonal borders are not taken as much into account as in model M2. Cantonal borders are shown in Fig. 4.

image

Figure 3.  Fitted relative incidence estimated by (a) model M2 ( exp (νi+αj(i))) and (b) model M4 ( exp (νi+ψi))

Download figure to PowerPoint

image

Figure 4.  Cantonal effects αj on an exponential scale (model M2)

Download figure to PowerPoint

image

Figure 5.  Regional effects νi on an exponential scale (model M2) and relative risk of BVD in 2008

Download figure to PowerPoint

With model M1, the main linear time effect φ was estimated as 0.28 with 2.5%- and 97.5%-quantiles of −0.03 and 0.55 respectively, indicating a positive trend. The estimated log-rate for time (including μ) for all models is shown in Fig. 6(b). Models M2 and M4 also show an increasing time trend, but a more pronounced increase in reported cases for the years 2006 and 2007 compared with the three preceding years. This large rise in reported cases can be explained by the increasing amount of information on BVD which was given to stockholders by the Swiss Federal Veterinary Office from the end of 2005.

image

Figure 6.  (a) Linear time trend for each canton (model M1) on a log-scale (inline image , Fribourg; inline image , Liechtenstein; inline image , Vaud; inline image , others) and (b) estimated main time trend for all models on a log-scale (including μ) ( inline image , model M1; inline image , model M2; inline image , model M3; inline image , model M4; inline image , model M5)

Download figure to PowerPoint

Estimates of the cantonal time trend (μ+αj+(φ+ϕj)t) that were obtained by model M1 are shown in Fig. 6(a). A strong positive differential trend ϕj can be observed for the canton Fribourg and Liechtenstein. In these two areas the rise in reported cases was steeper than on average. The canton Vaud shows a strong negative differential trend. The posterior mean of the correlation is 0.39 with a 95% credible interval of [−0.07;0.75]. This means that a positive correlation between cantonal effect and differential trend is present; cantons with similar relative risk estimates behave fairly similarly over time. Evaluating the estimated interaction effects for the best non-parametric model, M2, a strongly positive trend is found for Fribourg and Liechtenstein in the year 2007. This shows that in these two areas an immediate rise in reported cases in the year 2007 took place, which was even stronger than the mean trend. This is also true for the canton Lucerne in the years 2006 and 2007. In contrast, the number of reported cases in the canton Vaud decreased in 2006 and 2007.

To investigate whether the mean herd size per region has an influence on incidence of the disease, an ecological regression was conducted. The logarithm of the mean herd size was included in the linear predictor (2) of model M2 as an explanatory variable. The resulting point estimate is −0.45 with a 95% credible interval of [−1.26;0.38]. Hence, for BVD no clear association between log(mean herd size) per region and incidence of disease could be found.

Swiss cantons vary greatly with respect to size and number of regions. There are cantons which consist of only one region, whereas, for example, the canton Berne is split into 26 regions. This fact has an influence on the fit of the models proposed. Fig. 7 shows the logarithm of the observed proportion and the fitted values of models M1, M2 and M4 for four regions: Fig. 7(a) for the canton Appenzell-Innerrhoden, which is both a region and a canton, Figs 7(b) and 7(c) for two of five regions of the canton Lucerne and Fig. 7(d) for one region in the canton Valais, where not a single case of BVD was reported during the whole time period. Here, the observed proportion on the logarithmic scale is equal to −∞. As the behaviour of models M3 and M5 is very similar to that of model M1 in all cases, the results are not included in these figures. In Fig. 7(a) models M2 and M4 behave quite similarly, adjusting very well to the non-linear time trend in Appenzell-Innerrhoden. The situation is different in Figs 7(b) and 7(c): the fit of models M2 and M4 is similar in Fig. 7(b), but very different in Fig. 7(c). Model M4 is, in contrast with the models including a cantonal effect, sensitive to regional departures from a cantonal time trend which is present in the region shown in Fig. 7(c). In models including interactions between time and canton, the shape of the fitted time effect is equal for each region within one canton; just the level of the fitted values can change. This smoothing effect of models M1 and M2 can be noted for larger cantons. In contrast, for some regions models M1 and M2 are more sensible: Fig. 7(d) shows the fit for a region in the canton Valais where not a single case of BVD was reported during the whole time period. As model M4 does not take into account cantonal borders, the incidence of BVD is estimated too high in those regions that are close to the borders of cantons with reported cases.

image

Figure 7.  Logarithm of the observed proportion ( inline image ) and fitted values for models M1 ( inline image ), M2 ( inline image ) and M4 ( inline image ) for (a) the canton Appenzell-Innerrhoden, (b), (c) two regions in the canton Lucerne and (d) one region in the canton Valais

Download figure to PowerPoint

6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

To assess the accuracy of the estimates that were obtained by INLAs, the best model (M2) was analysed by using MCMC methods. Unfortunately, freely available MCMC software like WinBUGS (Lunn et al., 2000) and BayesX (Brezger et al., 2005) does not incorporate linear constraints properly. Instead, so-called ‘centring on the fly’ is used as an ad hoc approach to incorporate sum-to-zero constraints. However, it is unclear whether the resulting algorithm has the correct equilibrium distribution. Hence, advanced MCMC routines were implemented by the third author using a low level programming language (C) with correct incorporation of all linear constraints as discussed in Rue and Held (2005). Two different approaches were used to obtain samples of the posterior marginals, namely auxiliary mixture sampling (Frühwirth-Schnatter et al., 2009) and a Metropolis–Hastings algorithm with a proposal constructed by using a second-order Taylor expansion of the log-likelihood (Rue and Held, 2005). To obtain very precise estimates that are suitable for a comparison of MCMC methods and INLAs ns=3 030 000 samples were drawn by using a thinning of 100 and a burn-in of 30000. Hence, 30000 samples were left for an estimation of the posterior quantities. Negligible auto-correlation was left in these samples. The Monte Carlo standard error of the estimates for each component of the latent field and the variances was estimated by using the method of consistent batch means (Jones et al., 2006). We used √ns to determine the size of batch. The resulting Monte Carlo error estimates were smaller than 0.01 in each case, except for inline image (seinline image). Nevertheless, an application of the stopping criterion that was described in Jones et al. (2006) confirmed that the length of our MCMC chain is sufficient for this parameter. The results for auxiliary mixture sampling and the Taylor approximation were virtually identical and only results by using the Taylor approximation are shown.

To compare INLA and MCMC methods, histograms of the MCMC samples and the approximations of the posterior marginals by INLA are compared in Figs 8 and 9. For the variance components of model M2 (see equation (2)) these plots are shown in Fig. 8, with variances shown on the log-scale. The MCMC histograms and INLA approximations are virtually identical. For the latent Gaussian field x MCMC histograms and INLA approximations look virtually identical for all components (μ,ν,α,β,δ), when the simplified or full Laplace approximation are used for the approximation of part 2 in expression (10). Small shifts can be observed for the cantonal components when the Gaussian approximation is used; see Fig. 9. Hence, for an improved approximation the simplified or full Laplace approximation must be used.

image

Figure 8.  MCMC histograms and INLA approximations of the posterior marginals for all variances within model M2 (on a log-scale): (a) inline image); (b) inline image); (c) inline image); (d) inline image)

Download figure to PowerPoint

image

Figure 9.  MCMC histograms and INLA approximations of the posterior marginals of four Swiss cantons (αj, on a log-scale; the INLA approximations were obtained by using the Gaussian (– – –) and the full Laplace approximation ( inline image )): (a) Zurich; (b) Berne; (c) Lucerne; (d) Uri

Download figure to PowerPoint

Unfortunately, the incorporation of linear constraints slows down the computation by INLAs for the simplified and full Laplace approximation. The computer times for model M2 are 26.79 s (Gaussian approximation), 105.32 s (simplified Laplace approximation) and 233.23 s (full Laplace approximation). Depending on the number of linear constraints and the data, the difference in computer time between the Gaussian and the simplified or full Laplace approximation can be very high. The MCMC sampler produced 246 iterations per second.

The DIC that was obtained with MCMC sampling is 1622.2+119.2=1741.4. The DIC that was computed by INLAs (full Laplace approximation) is very close to this value; see Table 3. We also computed the logarithmic score and the PIT histogram from the MCMC samples by using importance sampling (Stern and Cressie, 2000). However, the distribution of the importance weights was heavily skewed and dominated by a small number of extreme values. In such circumstances the estimates are known to be unreliable (Marshall and Spiegelhalter, 2003). In INLAs, the computation of these quantities fails for a few observations, which are indicated by the inla program (Martino and Rue, 2009) , but can easily be obtained by rerunning the model without one of these observations in turn. For a detailed comparison of PIT and logarithmic score from INLA and MCMC methods see Held et al. (2010).

Another issue which must be addressed is the usability of both approaches. INLAs can easily be run by using R and all output can be processed directly. This is even true for the complex class of spatiotemporal disease mapping models that was introduced in Section 2. In contrast, to use MCMC techniques, complex algorithms must be implemented by hand and care must be taken concerning the samples obtained.

7. Discussion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

Regarding the BVD data, model M2 is chosen as the best model by the model choice criteria that were considered. Hence, cantonal heterogeneity is present in the data and the affiliation of a Swiss region to a certain canton highly influences the number of reported BVD cases. Furthermore, a non-parametric formulation of the time trend which points out an immediate rise in reported cases for the years 2006 and 2007 is adequate. This finding gives rise to the hypothesis that the disease awareness regarding BVD has been rising since 2006 when information on the disease was given to stockholders by the Swiss Federal Veterinary Office. In 2008, a large-scale programme which included testing every cow in Switzerland started, to eradicate this disease by the end of 2011. The estimated relative risk for the disease in 2008 is shown in Fig. 5(b) (Besag et al., 1991). The pattern obtained differs considerably from the pattern that was found in Fig. 3(a). Hence, pronounced under-reporting is present in the analysed case reporting data for the years 2003–2007. Reasons for the cantonal differences in case reporting must be found and the policy makers should think of strategies to prevent them.

Our analysis shows that INLAs are a flexible and useful tool that can be used to fit spatiotemporal models. Furthermore, the results provided can easily be used for data analysis. However, some experience in choosing the most appropriate approximation technique and appropriate settings for the approximation routines is needed. A comparison with results from an MCMC analysis in Section 6 showed that INLA approximations and MCMC histograms are virtually identical for hyperparameters and the components of the latent Gaussian field, if the simplified or the full Laplace approximation is used.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References

Financial support by the Swiss Federal Veterinary Office is gratefully acknowledged. Many thanks go to Sarah Haile for checking the manuscript and to Håvard Rue for the INLA support. The revision has benefited from very helpful comments and suggestions by two reviewers.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Spatiotemporal models
  5. 3. Integrated nested Laplace approximations—a new approach for approximate Bayesian inference
  6. 4. Model comparison and calibration
  7. 5. Results by using integrated nested Laplace approximations
  8. 6. A comparison of integrated nested Laplace approximations and Markov chain Monte Carlo methods
  9. 7. Discussion
  10. Acknowledgements
  11. References