In what follows we outline five models to describe the spatiotemporal pattern of the BVD data by using a disease mapping approach. As described in Section 1, Switzerland is a confederation of 26 cantons which consist of one or more regions: 184 in total. As the cantonal veterinary authorities are responsible for the implementation of federal veterinary legislation, it is of interest to investigate whether the cantonal affiliation of a region has a pronounced influence on case reporting. Hence, a cantonal effect is included in three of the models. This is done by using a multilevel approach such that variability in the response is attributed to different hierarchical levels (Langford *et al.*, 1998, 1999). Furthermore, models incorporating a linear as well as a non-parametric time trend are proposed. Space–time interactions that adjust for the fact that spatial units can behave differently from the main time trend when observed over a long time span are additionally included in each type of model. Models assuming a linear time trend have been proposed in Bernardinelli *et al.* (1995b) and developed further in Assunção *et al.* (2001). Non-parametric space–time interaction models have been introduced by Knorr-Held (2000) and used by several researchers and in different applications, e.g. for age–period–cohort models (Lagazio *et al.*, 2003; Schmid and Held, 2004) and the joint analysis of two or more diseases (Richardson *et al.*, 2006). The characteristics of all models that will be introduced in Sections 2.1 and 2.2 are summarized in Table 2.

#### 2.1. Linear time trend

Since BVD cases are available for each Swiss region separately, this fine grid can be used as a basis for a spatial analysis. Additionally, data from the Principality of Liechtenstein are included (which counts as both a region and a canton). A case means that at least one diseased animal within one herd was detected. Under the rare disease assumption, it is assumed that the number of cases of disease *y*_{it} in region *i*=1,…,*I* at time *t*=1,…,*T* is Poisson distributed with parameter *λ*_{it}, which can be interpreted as the relative risk of the disease in region *i* at time *t*. Additionally, the number of herds *m*_{it} is included as an offset to adjust for the different number of herds at risk. Following the standard generalized linear mixed model formulation (Breslow and Clayton, 1993) with Poisson response, a logarithmic link is used.

To account for a linear time trend, Bernardinelli *et al.* (1995b) proposed a Bayesian spatiotemporal model. It can be seen as an extension of the standard model for disease mapping that was introduced by Besag *et al.* (1991). In the standard setting defined for one spatial level, a main linear time trend and a so-called differential trend for each area *i* are incorporated in the model as well as spatially structured and unstructured effects. In this application, where cantonal heterogeneity is considered as well, we assume that a cantonal effect *α*_{j} interacts with the linear time trend. Hence, the rate of cases over time can have a different slope for each canton. Reasons for this heterogeneity could be cantonal differences in incentives for a farmer to report a case, e.g. financial compensation in the case of a diseased animal, or a different practice in conveying disease information to farmers. The linear predictor of this model (M1) can be written as

- (1)

with *i*=1,…,185, *t*=1,…,5 and *j*=1,…,27. The index *j*(*i*) denotes the canton *j* which region *i* belongs to. The offset *m*_{i} was provided by the Swiss Federal Veterinary Office and is supposed to be constant in time. The vector *ν*=(*ν*_{1},…,*ν*_{I})^{T} is independent and identically mean 0 normally distributed with variance . The *ν*_{i}s account for differences between regions whereas the *α*_{j}s model cantonal heterogeneity. In this model the similarity of the incidence between neighbouring regions is described via the cantonal term. Hence, it incorporates a two-level structure. The parameter *φ* represents the overall linear time trend. The term *ϕ*_{j} depicts the interaction between the linear time trend and the cantonal intercept *α*_{j} and is modelled as a random slope. Thus, *φ*+*ϕ*_{j} represents the individual time trend for canton *j*. Each *ϕ*_{j} can be interpreted as the amount by which the time trend of canton *j* differs from the overall trend *φ*. A prior distribution for *α*=(*α*_{1},…,*α*_{J})^{T} and *ϕ*=(*ϕ*_{1},…,*ϕ*_{J})^{T} must be defined as well. Since it is assumed that the cantonal effects *α* are independent for each canton, the differential trends *ϕ* are modelled in the same way (Bernardinelli *et al.*, 1995b). Furthermore, it is necessary to allow for correlation between the intercept and slope in a random-slope model (Hedeker and Gibbons (2006), section 4.4.2). A standard assumption is that (*α*_{j},*ϕ*_{j})^{T} follows a bivariate normal distribution with mean 0 and some unknown precision matrix **P**, to which a Wishart prior is assigned. Bernardinelli *et al.* (1995b) also proposed that the time variable *t* should be centred at 0 to avoid high correlation between the intercept and slope. We have followed this advice in our application. The specification of hyperpriors is discussed in Section 2.3.

#### 2.2. Non-parametric time trend

In model M1, the time trend in log-incidence is taken as linear. This assumption can be relaxed by adopting a non-parametric setting as proposed in Knorr-Held (2000). Custom-made modifications of this general setting are formulated for the BVD data in what follows.

The second model M2 is the non-parametric analogue of model M1. In contrast with model M1 it includes a main time trend *β*=(*β*_{1},…,*β*_{T})^{T} and an interaction *δ*=(*δ*_{11},…,*δ*_{1T},*δ*_{21},…,*δ*_{2T},…,*δ*_{JT})^{T} between canton and time to which specific prior distributions must be assigned. The linear predictor is

- (2)

Here, the *α*_{j}s are modelled as independent and identically mean 0 normally distributed with variance . For *β* and *δ* we use intrinsic Gaussian Markov random-field priors of the general form

- (3)

including a so-called structure matrix **R** (Held and Rue, 2010). The main time trend is specified as a random walk (RW) of first order with structure matrix

- (4)

The assumption of temporal structure is plausible as the number of reported cases is constantly increasing over time. The joint prior density of *β* can be written as (Rue and Held, 2005)

- (5)

To specify the prior on *δ* we consider the interacting spatial (*α*) and temporal (*β*) main effects: since the cantonal effects *α* are modelled as spatially unstructured, a so-called type II interaction prior (Knorr-Held, 2000) is used for *δ*, i.e. the interactions *δ*_{jt} in the different cantons follow independent RWs in time. Hence, the form of the resulting joint distribution for *δ* is similar to expression (5), including an additional sum over all cantons:

- (6)

Following Clayton (1996) and Knorr-Held (2000), its structure matrix can be obtained as the Kronecker product of the interacting main effects and has rank *J*(*T*−1). To ensure identifiability of the main time trend *β*, the *δ*_{jt}s must sum to 0 for each *j*=1,…,*J*.

Instead of a first-order RW prior for *β* an RW of second order can be used. This assumption might be appropriate for the BVD data which exhibit an increasing number of counts over the observed time period. A first-order RW trend smooths towards a constant whereas the second-order RW penalizes deviations from a linear trend. The structure matrices of *β* and *δ* and the linear constraints must be adapted appropriately; see Schmid and Held (2004) and Rue and Held (2005) for details. This new model, which includes a second-order RW main time trend and the respective interaction, is called model M3 in this application.

So far, all models proposed explicitly include cantonal heterogeneity. To investigate whether a cantonal component is necessary, models with regional effects only are considered as well. Similarities between neighbouring regions are now modelled by using an intrinsic Gaussian Markov random field for *ψ*=(*ψ*_{1},…,*ψ*_{I})^{T} with prior density

- (7)

The sum in expression (7) includes all pairs of adjacent regions *i* and *i*^{′}. The linear predictor of the resulting model M4 is given as

- (8)

In equation (8), the time trend *β* is modelled as a first-order RW. Since the Swiss regions build a fine spatial grid we assume (in contrast with the preceding models) that the interaction effects *δ* are also spatially structured. This means that both the temporal and the spatial neighbours as well as the temporal neighbours of the spatial neighbours enter the conditional distribution of the Gaussian Markov random field. This assumption is appropriate if temporal trends are different from region to region but are more likely to be similar for adjacent regions. This can be incorporated in the model with a type IV interaction prior (Knorr-Held, 2000) of the form

- (9)

The appropriate structure matrix can be obtained by the Kronecker product of the structure matrices (4) of the first-order RW term *β* and the structure matrix of the intrinsic Gaussian Markov random-field prior on *ψ*. This model induces full dependence over time and space. The rank of the structure matrix is now (*I*−1)(*T*−1). To avoid problems of identifiability, the *δ*_{it}s need to sum to 0 for each *i* and each *t*, i.e.

One of these *I*+*T* constraints is redundant.

By analogy with the non-parametric models including cantonal heterogeneity, a fifth model M5 is fitted. In this model a second-order RW prior is assigned to *β* and the structure matrix of the interaction term *δ* is obtained as the Kronecker product of the structure matrices of *ψ* and *β* (second-order RW).

In this application, a herd is the unit of analysis (see Section 2.1). Therefore, a large herd may be more likely to be a case than a small herd, as there are more animals at risk. In most Swiss regions the mean number of cows per herd is between 30 and 40. An ecological regression including the logarithm of the mean herd size as explanatory variable is conducted in Section 5.2 to investigate this issue.

#### 2.3. Priors

Since the models are formulated in a Bayesian way, prior distributions must be assigned to all variance and precision components. In the parametric setting (1) a Wishart prior is assigned to the precision matrix **P** of the bivariate normal distribution for (*α*_{j},*ϕ*_{j})^{T}. The Wishart distribution Wi_{2}(*l*,**L**) has two components, namely the degrees of freedom *l* and the matrix **L**. Here, they were chosen as *l*=4 and

*a priori*. For an inverse gamma prior IG(1,0.01) was used. The parameterization of the inverse gamma distribution is as in Natario and Knorr-Held (2003) and Rue *et al.* (2009).