## Introduction

A distinguishing characteristic of many ecological data sets, whether comprised of data measuring binary presence/absence, counts of abundance, proportional occupancy rates or continuous population densities, is their tendency to contain a large proportion of zero values (Clarke & Green 1988, Fig. 1). When this number of zeros is so large that the data do not readily fit standard distributions (e.g. normal, Poisson, binomial, negative-binomial and beta), the data set is referred to as ‘zero inflated’ (Heilbron 1994; Tu 2002). Zero inflation is often the result of a large number of ‘true zero’ observations caused by the real ecological effect of interest. For example, the study of rare organisms or events will often lead to the collection and analysis of data with a high frequency at zero (Welsh *et al.* 1996). However, the term can also be applied to data sets with excess zeros caused by ‘false-zero’ observations because of sampling or observer errors in the course of data collection. Failure to account for either source of zero inflation will cause bias in parameter estimates and their associated measures of uncertainty (Lambert 1992; MacKenzie *et al.* 2002).

The presence of zero inflation due to excess true zeros, a special case of overdispersion (McCullagh & Nelder 1989; Hinde & Demétrio 1998; Poortema 1999), creates problems with making sound statistical inference by violating basic assumptions implicit in the use of standard distributions (Mullahy 1986; Cameron & Trivedi 1998). One common violation is a misrepresentation of the variance–mean relationship of the error structure (Barry & Welsh 2002). In ecology, transformations are often used to overcome such problems. However, the difficulty with this approach for zero-inflated data sets is that, while the transformation may normalize the distribution of the non-zero values, no transformation will spread out the zero values. The high frequency of zero values is simply replaced by an equally high frequency of the value to which zero is transformed (Hall 2000).

The presence of zero inflation as a result of false zeros may or may not violate the distributional assumptions, but will lead to uncertainty regarding parameter estimates because it is no longer possible to determine whether a difference in the number of individuals surveyed over time and space is because of a change in the size of the population or because of a change in the detection probability of the individuals (MacKenzie *et al.* 2002).

Zero inflated count data and the application of models which cope with zero inflation are found in a wide range of disciplines including epidemiology (Böhning *et al.* 1999; Lewsey & Thomson 2004), medicine (Campbell *et al.* 1991; Ghahramani *et al.* 2001; Cheung 2002), occupational health (Lee *et al.* 2002; Carrivick *et al.* 2003; Wang *et al.* 2003; Yau *et al.* 2004) and econometrics (Freund *et al.* 1999).

The ecological literature has seen a recent upsurge of interest in techniques for dealing with excess zero values. Zero-inflated models have been applied in a range of ecological scenarios, including data sets with zero inflation caused by true zero (Welsh *et al.* 1996, 2000; Barry & Welsh 2002; Podlich *et al.* 2002; Kuhnert *et al.* 2005; Martin *et al.* 2005) and false-zero observations (Kery 2002; MacKenzie *et al.* 2002, 2003, 2004; Tyre *et al.* 2003; Wintle *et al.* 2004).

In this paper, we propose a framework for understanding how zero-inflated data sets originate and to decide which of the many available models to apply in any given case. In doing so, we aim to bring these models to the attention of a broader ecological readership and help ecologists navigate the growing number of zero-inflated modelling approaches at their disposal. First, we define the different kinds of zeros that occur in ecological data and describe how they arise. We then describe the approaches used to model the two types of data typical of that collected in ecological studies; presence/absence and count. The use of a selection of these models is then illustrated through two detailed examples in which the data are subject to different kinds of zero inflation. Finally we discuss the potential gains in ecological understanding made by applying such models.

### Sources of zeros in ecological data

Zero values occur in one of four ways; two of which can be defined as ‘true zero counts’ and two as ‘false-zero counts’ (Table 1). The first kind of true zero arises from a low frequency of occurrence, which can be the result of range of ecological processes and life-history strategies (Gaston 1994) or the result of a strong ecological effect that leads to sites having no organisms present. For example, a species may be absent because of demographic processes, competition, or poor habitat quality (e.g. because of disturbances or unsuitable vegetation structure). These zeros are true zeros resulting from the real ecological effect that we are trying to determine. Secondly, a zero may occur simply by chance, because the species does not saturate its entire suitable habitat (e.g. because of local extinctions caused by demographic stochasticity).

Type of zero | Definition |
---|---|

True zero | Species does not occur at a site because of the ecological process, or effect under study (e.g. habitat unsuitable) |

Species does not saturate its entire suitable habitat by chance | |

False zero | Species occurs at a site, but is not present during the survey period |

Species occurs at a site and is present during the survey period, but the observer fails to detect it (particularly common for rare or cryptic species) |

The first kind of false zero is caused by failing to record a species that inhabits a site because, although it occupies the site, it was not present at the time of the survey. This can be caused by using a sampling area that is small and/or length of visit that is short, relative to the temporal and spatial scale of movements of the species (Tyre *et al.* 2003). The second kind occurs when the species occupies a site and is present at the time of sampling, but the observer does not detect the species. These errors are common for cryptic or secretive species (MacKenzie *et al.* 2002).

It is worth noting that the type of zero represented by a particular observation depends on the study objective. For example, in the case where a species may be temporarily absent from a study site, if the aim is to quantify where the species is instantaneously, its absence would not constitute a false zero (i.e. the species really was not there when surveyed). However, if we were interested in what areas were being used by the species over a longer time frame, then its absence would constitute a false zero.

Aside from the categories defined above, a large number of zeros can arise in ecological data in other ways, when observations are obtained from outside the environmental range of a species, referred by Austin & Meyers (1996) as ‘naughty naughts’. The solution to this problem involves a reduction or filtering of data sets to exclude the ‘naughty naughts’ from outside the species’ range (Austin & Meyers 1996; Elith & Burgman 2002) or to simply avoid their collection through thoughtful sampling design.

### Choosing an appropriate zero-inflated model

When considering how to model zero-inflated data sets, it is important to take into account which kinds of zeros are present (Table 1). In this section, we outline the recommended modelling approaches when the data set are dominated by either true zeros, false zeros or a combination of the two for presence/absence and count data (Table 2).

Zero inflation | Modelling approach | Key references |
---|---|---|

The zero-inflated models are based on the binomial distribution for presence/absence data, and on the Poisson or negative-binomial model for count data. ZIP, zero-inflated Poisson; ZIB, zero-inflated binomial.
| ||

None | Single distribution models (e.g. binomial) | McCullagh & Nelder (1989) |

True zeros | Zero-inflated mixture models, ZIB or ZIP with point mass at zero, or hurdle models | Lambert (1992), Welsh et al. (1996) and Hall (2000) |

False zeros | Zero-inflated mixture models (e.g. ZIB or ZIP) | MacKenzie et al. (2002, 2003) and Tyre et al. (2003) |

Both | Mixture of two or more distributions | None found |

Zero-inflated Poisson (ZIP) and binomial (ZIB) models fitted to data without covariates have a long history (Johnson & Kotz 1969). Lambert (1992) provides the general form of ZIP regression with covariates to model defects in a manufacturing process. Models specifically for zero-inflated count data have been under development by Heilbron (1994), Welsh *et al.* (1996, 2000), Faddy (1998), Hall (2000), Dobbie & Welsh (2001), Barry & Welsh (2002) and Wang (2003) and applied using a Bayesian philosophy to statistical inference by Angers & Biswas (2003), Martin *et al.* (2005) and Kuhnert *et al.* (2005). The development of zero-inflated models for continuous data, such as fish stock assessment has also received attention (i.e. log-normal, delta log-normal and delta-Gamma models) (Aitchison 1955; Stefansson 1996; Syrjala 2000) and has been developed further by Fletcher *et al.* (2005).

#### No zero-inflation

In the absence of zero-inflation, a standard single distribution model such as the binomial or Poisson is used. McCullagh & Nelder (1989) provide a full discussion of the sampling distributions and models for this type of data (Table 2).

#### Zero-inflation due to true zeros

When true zeros lead to an excess of zeros, zero-inflated models such as two-part (also known as conditional or hurdle models) or mixture models are recommended (Lambert 1992; Welsh *et al.* 1996). The negative binomial has also been advocated for modelling data sets with many zeros because of its ability to account for overdispersion (Warton 2005). However, Welsh *et al.* (1996) and Hall (2000) demonstrated that the excess number of zeros often exceeds those expected under a negative-binomial distribution.

For count data, a two-part modelling approach has appeared in the ecological literature, whereby the first part is a binary outcome model (i.e. Bernoulli), and the second part is a truncated count model (e.g Poisson or negative binomial) (Cameron & Trivedi 1998). This approach assumes that zeros arise from a single process and a set of covariates. One of its computational benefits is that it is possible to fit these models in two parts, for example, fitting zeros using a logistic regression separately from fitting non-zeros using a truncated Poisson (e.g. Welsh *et al.* 1996; Dobbie & Welsh 2001). Using this approach, one is able to estimate the probability that a species is present and then given it is present, estimate the relative mean number of individuals.

Mixture models are combinations of probability distributions chosen for their ability to represent two or more real ecological processes. The ZIP mixture model used to model count data is a mixture of a point mass at zero and a Poisson distribution. With this approach, zeros may arise from one of two processes and their related covariates, a zero-process from which only zero values are observed and a Poisson process in which non-zero and a proportion of the zero values, appropriate to the Poisson distribution are observed (Lambert 1992). The interpretation of mixture model parameters is less straight forward than the two-part model. For example, to get the true estimate of relative mean abundance from the ZIP one must multiply the estimated relative mean number of individuals at a site by the probability that the relative mean number of individuals at a site is generated through a Poisson distribution.

Where there is zero inflation and overdispersion caused by large counts of individuals (e.g. flocking birds), the use of a zero-inflated negative binomial (ZINB) mixture model has been shown to be appropriate (Welsh *et al.* 2000).

#### Zero inflation due to false zeros

If false zeros are present in the data a zero-inflated mixture modelling approach is required (MacKenzie *et al.* 2002; Tyre *et al.* 2003) because we are interested in modelling two processes, a process leading to true zeros and a process leading to false zeros.

A recent set of articles highlight the problem of false zeros in ecological data sets that are collected for the purpose of assessing site occupancy (Kery 2002; MacKenzie *et al.* 2002) and making inferences about species–habitat relationships, or the effects of anthropogenic activities, on species distributions (e.g. Tyre *et al.* 2003; Gu & Swihart 2004). Failing to take account of false-zero observations in analyses may have substantial impacts on the ability to accurately infer relationships between site occupancy and habitat attributes or management actions (MacKenzie *et al.* 2003; Field *et al.* 2005; Rhodes *et al.* 2005). The zero-inflated binomial (ZIB) model and its extensions provide an appropriate framework for analysing data that are collected for these purposes and which are likely to contain false-zero observation error (MacKenzie *et al.* 2002; Tyre *et al.* 2003; Wintle *et al.* 2004, 2005).

#### Zero inflation due to both excess true zeros and false zeros

In the literature there has been no formal discussion of how to model data sets that contain both excess true zeros and false zeros. Using a Bayesian framework one approach would be to incorporate information on the contribution of false zeros to the data (e.g. detection probability) as an informative prior in a zero-inflated model.

#### Uncertainty regarding the source of zero inflation

In some cases it is not possible to determine the source of zero observations. One way of dealing with this uncertainty is to use a truncated distribution whereby the zeros are eliminated completely and only the occurrences are modelled. For example, Baum & Myers (2004) were unable to determine whether the absence of sharks in bycatch data sets were a result of true zeros (e.g. there were no sharks in the bycatch) or the fisherman's failure to record the sharks in the bycatch. They dealt with this uncertainty by using a truncated negative-binomial model to estimate trends in shark numbers from only the non-zero shark catches.

### Modelling zero inflation in ecological data

In this section, we present two examples that deal with zero inflation: (i) generated by excessive numbers of true zeros in count data and (ii) arising from false zeros in presence/absence data. Both examples are illustrated through Bayesian inference using simulation-based Markov Chain Monte Carlo (Ellison 2004).

In the first example, we illustrate the use of the ZIP and ZINB mixture models and compare their performance with standard Poisson and negative-binomial models, in an examination of the impact of livestock grazing on the relative mean abundance of four Australian woodland birds, where zero inflation is a result of an ecological process leading to an excess of true zeros. The second example demonstrates the use of the ZIB mixture model in making inferences about the suitability of habitat in a highly fragmented landscape for four woodland bird species. It specifically accounts for zero inflation resulting from false zeros generated through the sampling process.

### Modelling the impact of grazing on bird assemblages with zero-inflated count data caused by excess true zeros

#### Zero inflated mixture models

Using the mixture modelling approach, *p*(*x*_{i}) represents the probability that an observation *i* is generated through either the Poisson distribution or the negative binomial, irrespective of whether the observation is a zero or non-zero value. Equations 1 and 2 show expressions for the two models under investigation.

#### ZIP mixture model

where

In both equations, *λ*(*z*_{i}) represents the mean number of individuals at site *i* and it can be expressed as a function of the explanatory variables, *z* through a log transformation. Similarly, *p*(*x*_{i}) can be expressed as a function of the explanatory variables, *x*, using a logit transformation where *x* does not necessarily have to be the same set of covariates as those represented by *z.* Here, the parameters *α*_{0} and *α*_{1} represent constant terms in each regression component and *β*_{0} and *β*_{1} are vectors, representing the coefficients estimated for each explanatory variable fitted in the model.

#### ZINB model

where

Equation 2 has an additional parameter, *φ* which allows estimation of an overdispersion parameter in situations where large counts have been recorded or alternatively, a large number of zeros have been observed. In both models, if *p*(*x*_{i}) is equal to 1, we default to the usual Poisson and negative-binomial models for count data. See Lambert (1992); Welsh *et al.* (1996) and Dalrymple *et al.* (2003) for more details.

#### The species, study site and data collection

Martin *et al.* (2005) and Kuhnert *et al.* (2005) examined the impact of livestock grazing on the relative abundance of 31 woodland birds in subtropical Australia. Bird count data was collected across three broad levels of grazing (low, moderate and high) in eucalypt woodland habitat. Eight replicate sites of each grazing regime were sampled. Sites were visited on two separate days and over two seasons, giving a total of 24 sites and 96 site visits.

For comparisons of relative mean abundance estimates to be valid, detection or capture probabilities of individuals are assumed to be equal (e.g. across different sites). In this study this assumption was justified by the open vegetation structure of the sites and conspicuous behaviour of the birds examined (Martin *et al.* 2005).

Using data from four of the bird species investigated by Martin *et al.* (2005) and Kuhnert *et al.* (2005), we compared the relative mean abundance estimates and credible intervals from fitting Poisson, negative-binomial, ZINB mixture and a ZIP mixture models. To get an estimate of relative mean abundance from the ZIP mixture that could be compared with the relative mean abundance estimate from the Poisson model, the ZIP mixing probability *p*(*x*), the probability that the number of individuals at a site has a Poisson distribution, was multiplied by *λ*(*z*) the mean of the estimate given that it was generated from a Poisson distribution.

Models were fitted using the Bayesian statistical modelling freeware package, WinBUGs (Spiegelhalter *et al.* 2003). The deviance information criterion (DIC) was calculated to compare the fit of the four models (Spiegelhalter *et al.* 2002). From a Bayesian perspective the DIC is analogous to Akaike's information criterion (Akaike 1973), in that its intent is to assess the models in terms of their fit and complexity (Burnham & Anderson 2002). The DICs computed by WinBUGs were checked using the formula recommended by Celeux *et al.* (2003).

We modelled four bird species separately treating grazing as a fixed effect. Convergence was achieved after a burn-in of 10 000 iterations and estimates were obtained after a further 30 000 iterations. Convergence of the Markov chains was examined using the coda package (Best *et al.* 1995).

An examination of the frequency of counts for the four bird species under investigation revealed that data for three of the species (brown thornbill *Acanthiza pusilla*, noisy miner *Manorina melanocephala* and superb fairy-wren *Malurus cyaneus*) were zero inflated (Fig. 2). This zero-inflation was a result of species showing strong preferences for particular grazing levels and an avoidance of others.

On the contrary, the data for the rufous whistler *Pachycephala rufiventris* was more consistent with properties of the Poisson distribution.

For purpose of illustration, counts for each species were pooled across visits and modelled across grazing levels. Season was not a significant contributor to the model and was not included as a factor. Although informative priors were used in the full study (Martin *et al.* 2005), here we considered non-informative normal priors with a mean of zero and precision parameters equal to 0.0001. In this example, the mixing probability *p*(*x*) was fixed across grazing levels, however, one could allow *p*(*x*) to vary by modelling grazing as a covariate (see Appendix S1 for code). Full details on the study design, data collection, analyses using both mixture and two-part zero-inflated models and results for all species are described in Martin *et al.* (2005) and Kuhnert *et al.* (2005).

#### Results

Comparison of model fit as determined by the DIC of all four models showed that the ZIP performed best for the brown thornbill, which coincidentally was the most zero-inflated species (Fig. 2). The negative binomial performed best for the noisy miner, and the ZINB performed best for the superb fairy-wren and rufous whistler (Table 3). The standard Poisson had the poorest fit for all four bird species. The DICs for the rufous whistler were only marginally different amongst the four models. This is a result of the data for this species exhibiting properties more consistent to that of the Poisson distribution, i.e. the mixing probability *p*(*x*) from the ZIP that an observation came from a Poisson distribution was closer to 1 (Table 3).

Model | Brown thornbill | Noisy miner | Superb fairy-wren | Rufous whistler |
---|---|---|---|---|

Estimates of the mixing probability and 95% credible interval in brackets from the ZIP are reported in the last row of the table, where is the probability that an observation is generated through the Poisson distribution. A dash (–) denotes the model could not be fit.
| ||||

Poisson | 123.5 | 245.1 | 267.1 | 195.4 |

Negative binomial | 67.7 | 137.9 | 121.0 | 180.8 |

ZINB mixture | – | 141.1 | 105.9 | 177.0 |

ZIP mixture | 60.8 | 167.3 | 120.9 | 189.6 |

ZIP | 0.341 | 0.479 | 0.337 | 0.822 |

95% CI | (0.132–0.586) | (0.249–0.774) | (0.185–0.513) | (0.649–0.983) |

Comparing the estimates from the negative-binomial, ZINB and ZIP mixture with the Poisson model revealed that the 95% credible intervals from the negative-binomial, ZINB and ZIP mixture were much broader than those using the standard Poisson model for the three species whose frequency were most zero inflated as illustrated for the ZIP and Poisson in Fig. 3. The superb fairy-wren was predicted to be significantly less abundant under high grazing than either low or moderate under the Poisson model, whereas under the negative-binomial, ZINB and ZIP model there was no substantial difference in relative mean abundance estimates across the three grazing levels. Conversely, estimates from the four models did not vary substantially for the rufous whistler, the species whose distribution was least zero inflated (Fig. 2). In general, the Poisson model was over-confident regarding the uncertainty (smaller credible intervals) and in the case of the superb fairy-wren led to a significantly different conclusion regarding the impact of high grazing on its relative mean abundance.

### Modelling influences on woodland bird patch occupancy when patch occupancy observations are subject to false-zero errors

To illustrate the use of the ZIB model, we analyse site occupancy data and investigate influences of habitat type and landscape metrics (patch area and connectivity), on site occupancy rates for four woodland bird species in the Mt Lofty Ranges (MLR) in south-eastern Australia. We compare the inference resulting from the application of the standard logistic regression model with that resulting from the use of a generalized ZIB model.

#### The ZIB model

Under imperfect detection, site occupancy data are best thought of as realizations of two binomial processes acting simultaneously at two different time scales (MacKenzie *et al.* 2002; Tyre *et al.* 2003). The first process influences *p*, the probability of a site being occupied over a relatively long-time period. The second process influences the detectability *q*, the probability of observing the species in a particular visit (or survey) at a site, given that it is present over the longer time period. The survey period may be comprised of 1, 2, … , *v* visits. The outcome of the two processes is a finite mixture distribution known as the ZIB mixture model (Hall 2000). Failure to detect the species can occur because the species is absent (occurring with probability 1 − *p*) or it is present and remains undetected over the *v* visits [arising with a probability *p*(1 − *q*)^{v}]. When the species is present at the site and detected, the actual number of observations is drawn from a binomial distribution. Thus, ignoring the influence of covariates, the ZIB model is:

where *y* is the number of detections in *v* visits to a site, and *p* and *q* are defined as above. The model may be easily generalized to allow covariates to influence *p* and *q* as in a logistic regression. Tyre *et al.* (2003) presents a maximum likelihood implementation of that extension in R (R Development Core Team 2005) and MacKenzie *et al.* (2002) do so in PRESENCE, and Wintle *et al.* (2005) present a Bayesian version using WinBUGS (Spiegelhalter *et al.* 2003). Note that the maximum likelihood version of the ZIB model cannot be estimated unless two or more visits are undertaken on at least some of the survey sites.

#### The species, study site and data collection

The MLR of South Australia is a highly fragmented landscape with only 14% of its original 686 000 ha area now covered by native vegetation. The MLR is an area of national conservation significance with numerous bird species threatened by loss and fragmentation of habitat (Paton *et al.* 1994; Garnett & Crowley 2000). The bird community is the subject of a multispecies recovery plan and planning for large-scale reinstatement of habitat is a high research priority for the region (Westphal *et al.* 2003). In order to target management and restoration efforts most effectively, it would be useful to investigate how occupancy rates of various species depend on local habitat and landscape characteristics.

To this end we modelled the effect of habitat type, patch area and landscape connectivity on occupancy levels of four MLR bird species of conservation concern: the scarlet robin, *Petroica multicolor*, buff-rumped thornbill *Acanthiza reguloides*, white-throated tree creeper, *Cormobates leucophaeus*, and rufous whistler *Pachycephala rufiventris*. Three-repeat surveys (20 min–2 ha active timed searches; Loyn 1986; Field *et al.* 2002) were conducted at each of 155 forest and woodland sites during the main breeding season (September to December) in 2003. To model the effect of habitat, sites were classified by major habitat type as either ‘stringybark’ (canopy dominated by *Eucalyptus obliqua*, *Eucalyptus baxteri*) or ‘gum’ (*Eucalyptus leucoxylon*, *Eucalyptus viminalis*, *Eucalyptus fasciculosa*, *Eucalyptus goniocalyx*). To model landscape characteristics, the area of each patch containing a survey site was obtained from a GIS, and connectivity was calculated according to Moilanen & Nieminen (2002). A subset of possible combinations of habitat, area and connectivity variables yielded five candidate models (Table 4).

The variable ‘Habitat’ is a binary variable where a value of 0 indicates stringy-bark eucalyptus woodland vegetation and a 1 indicates gum-bark eucalyptus woodland vegetation.
| |

Model 1 | logit[Pr(Y = 1)] = β_{0} |

Model 2 | logit[Pr(Y = 1)] = β_{0} + β_{1} × Habitat |

Model 3 | logit[Pr(Y = 1)] = β_{0} + β_{2} × Habitat + β_{3} × Area |

Model 4 | logit[Pr(Y = 1)] = β_{0} + β_{2} × Habitat + β_{3} × Connectivity |

Model 5 | logit[Pr(Y = 1)] = β_{0} + β_{2} × Habitat + β_{3} × Area + β_{4} × Connectivity |

#### A generalized ZIB model for woodland bird occupancy data

The standard approach to modelling the influence of landscape and habitat attributes on the probability of occupancy (*p*) at a given site is to use a logistic regression (McCullagh & Nelder 1989) such that:

where *α* and the vector *β* are the regression coefficients and the vector *X* represents the values of the independent environmental variables influencing *p*. This model assumes that the observations, *Y*, are realizations of independent Bernoulli trials with event probabilities *p*. However, because our data contain multiple (3) visits to 155 sites in the model fitting data set, it is possible to embed eqn 4 in eqn 3, allowing simultaneous estimation of regression coefficients *β* and the detection probability parameter *q*. The combination of eqns 3 and 4 may be thought of as a generalization of the ZIB model that allows unbiased estimation of habitat model coefficients *β*.

The generalized ZIB model and the standard logistic regression model were fitted to each of the five candidate models (Table 4) using WinBUGS. Non-informative normal priors with a mean of zero and precision parameters equal to 0.0001 were used (see Appendix S2 for code). DIC statistics were calculated for each model and used to compare the five competing models (Spiegelhalter *et al.* 2002). Convergence was achieved after a burn-in of 10 000 iterations. Estimates were obtained after a further 30 000 iterations. Convergence of the Markov chains was examined using coda package (Best *et al.* 1995). For the purpose of this paper we were primarily interested in the difference in inference obtained under the two types of model.

#### Results

The four bird species showed varying responses to woodland vegetation types and landscape attributes. Best models, determined on the basis of DIC included the variable ‘Habitat’, with white-throated treecreepers strongly preferring stringybark woodland, scarlet robins showing a similar but weaker preference for stringybark and both buff-rumped thornbills and rufous whistlers displaying a moderate preference for gum woodland (Table 5). On the basis of our results, only one of the species, the scarlet robin, was strongly influenced by habitat area (Table 5 and Fig. 4a) and only one species, the white-throated treecreeper, was strongly influenced by patch connectivity (Table 5). Single visit detection probabilities (*q*) for all species ranged from *c.* 0.24 (rufous whistler) to 0.61 (white-throated tree creeper) (Table 5).

Species | M | Variable | Posterior coefficient estimate and 95% CI | |
---|---|---|---|---|

Standard logistic | Generalized ZIB | |||

The favoured model presented for each species is the best of the five competing models (Table 4) on the basis of deviance information criterion values and *q*is the detection probability.
| ||||

Scarlet Robin (q = 0.336) | 3 | Habitat | −1.146 (−1.909 to 0.424) | −1.61 (−3.218 to −0.061) |

Area | 0.180 (0.022 to 0.344) | 0.258 (0.047 to 0.568) | ||

Rufous whistler (q = 0.243) | 2 | Habitat | 0.909 (0.058 to 1.817) | 1.133 (0.079 to 2.796) |

White-throated treecreeper (q = 0.611) | 4 | Connectivity | 0.167 (−0.018 to 0.354) | 0.189 (−0.059 to 0.449) |

Habitat | −2.932 (−3.903 to −2.066) | −3.674 (−6.339 to −2.374) | ||

Buff-rumped thornbill (q = 0.311) | 2 | Habitat | 1.438 (0.694 to 2.233) | 1.876 (0.863 to 5.77) |

According to model DICs, the best standard logistic model was always the same as the best generalized ZIB model in terms of which variables were most important. This may be the result of assuming ‘*q*’ was equal across covariates, hence the model likelihood for the ZIB was proportional to the logistic regression likelihood. An alternative approach is to model ‘*q*’ as a function of covariates, allowing factors that affect occupancy to be teased apart from those that affect detectability (MacKenzie 2005).

Regardless, both the magnitude of the effect and their credible intervals were always greater in the ZIB model (Table 5). In other words, using the logistic regression failed to account for the zeros generated by false absences, resulting in a consistent underestimation of both the mean and variance of model effects. This result corroborates the findings of Tyre *et al.*'s (2003) simulation study. Inference based on standard analyses could therefore be erroneous, and, if used for conservation planning purposes, lead to misdirected management actions. For example, if a set of occupancy models were used to underpin multispecies habitat reconstruction planning (e.g. Westphal *et al.* 2003), mis-specification of the type, amount and connectivity of habitat required for each species could result in suboptimal allocation of reconstruction effort across the landscape.