## Introduction

The class of site-occupancy models developed independently by MacKenzie *et al.* (2002) and Tyre *et al.* (2003) is widely used in the analysis of presence-absence data collected in surveys of natural populations. These models extend conventional types of binary-regression models to account for errors in detection of individuals, which are common in surveys of animal or plant populations. Site-occupancy models use repeated surveys within sample locations or other measures of survey effort to resolve the ambiguity of an observed zero, which can occur if a species is absent at a sample location or if a species is present but undetected. Therefore, the probabilities of species presence (occurrence) and species detection given presence are estimated together when site-occupancy models are fitted to presence-absence data (more correctly, detection/non-detection data).

The collection and analysis of site-occupancy data may be used to address a variety of ecological inference problems that require accurate predictions of species occurrence. For example, metapopulation models (Hanski & Gilpin 1997) are often specified in terms of patch occupancy (site occupancy). In this context, the proportion of area occupied (PAO) by a species in a collection of sites may be relevant. Similarly, species distribution models (Scott *et al.* 2002; Elith & Leathwick 2009) are used to predict the spatial pattern of species occurrences over a species’ geographic range or over a subset of that range that has scientific or operational relevance. In both examples, a quantitative (functional) relationship between species occurrence probability and one or more aspects of its environment must be estimated accurately (i.e. free of bias from detection errors). Given sufficient data, site-occupancy models can be used to estimate this relationship accurately (MacKenzie *et al.* 2006) and to predict species occurrence probability at sampled or unsampled locations (Kéry *et al.* 2010). Other species distribution models that do not account for the effects of detection errors (e.g. binary-regression models) generally produce biased predictions of species occurrence probability.

Classical methods, such as maximum likelihood, can be used to estimate the parameters of site-occupancy models, and software exists for calculating these estimates [see programs presence (http://www.mbr-pwrc.usgs.gov/software/presence.html)] and unmarked (Fiske & Chandler 2011)]. Once computed, the maximum likelihood estimates (MLEs) of the parameters can be used to predict species occurrence probability at sampled or unsampled locations, although it may be challenging to obtain accurate estimates of the uncertainty of these predictions. For example, parametric bootstrapping can be used to estimate the uncertainty of the predictions (Laird & Louis 1987), but this approach generally requires substantial computational effort.

In a Bayesian analysis, a model's parameters and its predictions are treated identically in the sense that all inferences are based on the posterior distribution of the model's parameters (Gelman *et al.* 2004). Inferences about predictions account for uncertainty in the model's parameters because the distribution of these predictions is obtained by averaging (marginalizing) over the posterior distribution of the parameters. Furthermore, these inferences are valid regardless of sample size because they do not rely on asymptotic approximations, unlike classical (non-Bayesian) methods. For these reasons, Bayesian methods of estimation and inference provide an attractive and useful alternative for ecological problems that require predictions of species occurrence.

The probability density function of the posterior distribution of a site-occupancy model's parameters cannot be expressed in closed form owing to analytically intractable integrals. Therefore, stochastic simulation methods, such as Markov chain Monte Carlo (MCMC), are typically used to estimate summaries of the posterior distribution (Geyer 2011). Software to implement these methods is available and includes the programs winbugs (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml), openbugs (http://www.openbugs.info) and jags (http://mcmc-jags.sourceforge.net), all of which have been used to conduct Bayesian analyses of site-occupancy data (MacKenzie *et al.* 2006; Royle & Dorazio 2008; Kéry 2010; Link & Barker 2010; Dorazio *et al.* 2011; Kéry & Schaub 2012). These programs are popular largely because they only require users to specify the underlying assumptions of a model. The technical details of constructing and implementing a MCMC algorithm are accomplished by the software with either limited or no control by the user. While this division of labour may seem desirable, considerable experience is often required to specify a model and to initialize the Markov chain so that the software constructs an appropriate algorithm. Model specification includes several choices – parameterization (hierarchically centred or not), priors and hyperparameter values, and functions to link the probabilities of species occurrence and detection to the effects of covariates on these probabilities. Initializing the Markov chain is not difficult and the software can even assign some parameters without user input; however, users must be careful not to assign parameter values that have low (or zero) posterior probability. For example, a site-specific parameter for species occurrence must not be initialized at zero (absence) if the species is detected during one or more surveys of the site and doing so generates an error message that can be difficult to interpret without user experience. Model specification and initialization is particularly challenging when attempting to analyse site-occupancy data for multiple species with existing software (Dorazio *et al.* 2010, 2011). Appendix S1 of Kéry & Schaub (2012) contains some commonly encountered problems and workarounds when using winbugs.

Given the potential for difficulties with existing software, it would seem useful to have a MCMC algorithm developed specifically for the analysis of site-occupancy data. Gibbs sampling algorithms are available for relatively simple site-occupancy models wherein species occurrence probability is constant and species detection probability is constant within surveys [(page 107 of Royle & Dorazio 2008) and (pages 177–178 of Link & Barker 2010)], but MCMC algorithms have not been developed for more complex models that contain the effects of site-specific covariates of occurrence and site- or survey-specific covariates of detection. For these models, a common choice of prior distribution and parameterization (multivariate normal priors of logit-scale parameters) leads to conditional posterior distributions that do not have familiar forms and must be sampled using specialized algorithms that require tuning (e.g. Metropolis–Hastings). These algorithms are inherently less efficient than Gibbs sampling because only a fraction of the proposed samples is accepted and tuning is usually needed to obtain desirable acceptance rates.

In this paper, we show that a Bayesian analysis of site-occupancy data can be carried out accurately and efficiently using Gibbs sampling when the model is specified using probit-scale parameters and uniform or multivariate normal priors. To illustrate this Gibbs sampling algorithm, we analyse site-occupancy data of the blue hawker, *Aeshna cyanea* (Odonata, Aeshnidae), a common dragonfly species in Switzerland. These data were analysed by Kéry *et al.* (2010) using the method of maximum likelihood and a logit-scale parameterization of the site-occupancy model. Here, we compare the results of using Bayesian and classical (non-Bayesian) methods of inference. We also provide the code used in this analysis, which was written using the R software program (R Development Core Team 2012).