1. Species richness is often used as a tool for prioritizing conservation action. One method for predicting richness and other summaries of community structure is to develop species-specific models of occurrence probability based on habitat or landscape characteristics. However, this approach can be challenging for rare or elusive species for which survey data are often sparse.
2. Recent developments have allowed for improved inference about community structure based on species-specific models of occurrence probability, integrated within a hierarchical modelling framework. This framework offers advantages to inference about species richness over typical approaches by accounting for both species-level effects and the aggregated effects of landscape composition on a community as a whole, thus leading to increased precision in estimates of species richness by improving occupancy estimates for all species, including those that were observed infrequently.
3. We developed a hierarchical model to assess the community response of breeding birds in the Hudson River Valley, New York, to habitat fragmentation and analysed the model using a Bayesian approach.
4. The model was designed to estimate species-specific occurrence and the effects of fragment area and edge (as measured through the perimeter and the perimeter/area ratio, P/A), while accounting for imperfect detection of species.
5. We used the fitted model to make predictions of species richness within forest fragments of variable morphology. The model revealed that species richness of the observed bird community was maximized in small forest fragments with a high P/A. However, the number of forest interior species, a subset of the community with high conservation value, was maximized in large fragments with low P/A.
6. Synthesis and applications. Our results demonstrate the importance of understanding the responses of both individual, and groups of species, to environmental heterogeneity while illustrating the utility of hierarchical models for inference about species richness for conservation. This framework can be used to investigate the impacts of land-use change and fragmentation on species or assemblage richness, and to further understand trade-offs in species-specific occupancy probabilities associated with landscape variability.
There are two challenges in using community-level summaries such as species richness in conservation and management applications. First, species identity is not preserved in many standard analyses used for inference about richness, which are based on simple aggregate species numbers (species–accumulation curves, Gotelli & Colwell 2001; Cam et al. 2002) or encounter frequencies (capture–recapture methods, Boulinier et al. 1998). However, species-specific patterns of occurrence should be accounted for in modelling approaches (Fischer et al. 2004) because the response of species richness to features that can be manipulated (landscape, habitat) is necessarily species specific. A second issue is that in most practical situations species are detected imperfectly. The importance of addressing the biasing effects of imperfect detection on community assessments is widely acknowledged (Boulinier et al. 1998; Nichols et al. 1998; O’Dea et al. 2006; Kéry, Royle & Schmid 2008). Moreover, because detectability naturally varies by species (Boulinier et al. 1998), we expect that observed summaries of community structure (e.g. based on species lists) are biased towards abundant and widespread species, which are likely to show diminished response to ecological gradients.
One method for examining species richness in heterogeneous landscapes is to model species occurrence probabilities, or occupancy, based on localized habitat characteristics (Dorazio & Royle et al. 2005). Occupancy can be an effective assessment method (Manley et al. 2005), generally requires less effort and expense than estimating total abundance of all species (MacKenzie et al. 2006), and allows for imperfect detection of species (MacKenzie et al. 2002). Multi-species occupancy models have been used for inference in community studies in a number of situations, including estimation of richness and community overlap (Dorazio & Royle 2005), construction of individual-based species accumulation curves (Dorazio et al. 2006), and in determining the influence of habitat and landscape variation on richness (Kéry & Royle 2008, 2009; Russell et al. 2009).
In addition to understanding total species richness, inferences on the number of rare, endangered or functionally important species are frequently of interest in conservation planning and monitoring programmes (Samu, Csontos & Szinetar 2008). Occupancy estimates for rare species and guild or assemblage richness (number of species in a subset of the population) can be more informative about areas of high conservation priority than assessments on only species that are common. Unfortunately, it can be difficult to get reliable estimates of occupancy for rare and/or elusive species because traditional sampling efforts often do not generate enough data for standard analyses (Queheillalt et al. 2002; Stockwell & Peterson 2002). Some approaches to mitigating this problem combine data on rare, but functionally similar, species (e.g. by genus) or use indicator species to deduce occupancy of those species with limited data (O’Connell, Jackson & Brooks 2000; Fleishman, Blair & Murphy 2001; Sergio et al. 2006). Such approaches discard valuable information about species-specific responses, and could be misleading or erroneous if rarely observed species respond differently than indicator species (Andelman & Fagan 2000; Kéry, Royle & Schmid 2008; Lawler & White 2008). The question remains regarding the most efficient and cost-effective method for estimating the occurrence and distribution of uncommon and elusive species (Thompson 2004; MacKenzie et al. 2005).
Our research is motivated by a desire to develop a community-level quantitative framework for predicting areas of conservation value, and to provide high-quality baseline data for vertebrate monitoring programmes in urbanizing landscapes. To this end, we present a recently developed approach for assessing community composition based on species-specific occupancy and detection (Dorazio & Royle 2005) in which individual species occurrence models are linked together within a hierarchical (or multi-level) model (Gelman & Hill 2007; Royle & Dorazio 2008). Many multi-species field studies and monitoring programmes have limited data on a large portion of observed species; as such, typical species-by-species analyses are simply unable to provide occurrence estimates or information about the effects of environmental factors on occurrence probabilities. An advantage of the hierarchical modelling framework over typical species richness analyses is that it accounts for both species-level effects as well as aggregated effects of landscape/habitat on the community as a whole (Kéry & Royle 2008, 2009), leading to a more efficient use of available data and increased precision in occupancy estimates, especially for infrequently observed species. We demonstrate the strengths of this approach by applying the hierarchical modelling framework to a bird community in forest fragments across the Hudson River Valley (HRV), New York (DeWan et al. 2009), a biologically diverse and ecologically significant region that is under intense development pressure, in the north-eastern United States (Finton et al. 2000; Smith et al. 2001). Efforts are underway to prioritize the landscape for conservation actions, yet little is known about many of the species in the region (DeWan et al. 2009). We focused our analyses on the community response to habitat fragmentation by modelling species-level changes in occupancy to two factors with well-established effects on the success of breeding birds: forest fragment area and edge-effects as measured by responses to perimeter, and perimeter/area ratio (P/A) (Rafe, Usher & Jefferson 1985; Helzer & Jelinski 1999).
Materials and Methods
We used a hierarchical model that links species-specific detection and occupancy, which are then related (across species, at the community level) through an additional component of the hierarchical model (Dorazio & Royle 2005; Dorazio et al. 2006). A hierarchical (sometimes referred to as multi-level or state-space) model is one in which various biological and sampling components are formally specified and related to one another (Gelman & Hill 2007; Royle & Dorazio 2008). For example, in the context of estimating occupancy, hierarchical models can help distinguish absence from non-detection by explicitly incorporating models that specify presence vs. absence as one process and detection vs. non-detection as another process that is dependent upon whether or not the species is in fact present. Hierarchical models posit weak, stochastic relations rather than deterministic relations among parameters and processes (Link 1999; Link et al. 2002), resulting in improved estimation of individual parameters by considering them in the context of a group of related variables (Bayesian shrinkage: ‘borrowing strength from the ensemble’Link & Sauer 1996). In the context of our community model, this allows for increased precision of occurrence estimates for rare or elusive species through utilization of collective community data (Russell et al. 2009) and improved ‘composite’ analyses of species groups (Sauer & Link 2002). With limited resources and budgets, many multi-species data collection efforts have very small sample sizes – to such an extent that it is not possible to carry out formal inference on a species-by-species basis. The hierarchical modelling approach allows for the most effective use of available data while not requiring a priori assumptions on group structure or relatedness among species.
The data come from a breeding bird survey collected over a 2-year period (15 May to 1 July 2006 and 15 May to 1 July 2007) at 72 randomly selected independent points in deciduous and mixed-deciduous forest fragments across the HRV, New York. The sampling locations ranged over the entire 9546 km2 region which includes all or part of nine counties that border the Hudson River, north of New York City. Points were located at least 500 m apart using Hawth’s stratified random sampling tool (Beyer 2004), and then mapped and field-checked, eliminating those that: (i) had recent disturbance that altered the cover classification (n = 1); (ii) were too dangerous to access (e.g. steep ravine) (n = 4); or (iii) did not receive private landowner permission to access the site (n = 21). Forest fragments ranged in size from 0·14 to 8677·4 ha (μ = 533·7 ha), while P/A ranged from 0·08 to 1·5 km ha−1 (μ = 0·2). Two trained observers recorded the presence of all species seen or heard during the 10-min, 250 m fixed-radius point counts at each sampling station (Hutto, Pletschet & Hendricks 1986). Sites were visited on three separate occasions during the breeding season (once each per 2-week period), although not all sites were surveyed both years. The perimeter and area of the fragment in which the point occurred was recorded. A total of 78 species were observed in this study. Of these, the data for 32 species were particularly sparse with less than 20 detections each over the entirety of the sampling season (see Appendix S1 in Supporting Information for a complete species list). Because of the small size of the data set, typical single species approaches for estimating occupancy were inadequate for the majority of observed species. For more details on the sampling design and region, see DeWan et al. (2009).
The repeated sampling protocol allows for non-detection to be discerned from point-level absence at each location (MacKenzie et al. 2002). We developed a hierarchical model which assumes that site-specific occupancy (i.e. ‘true’ presence/absence) for species i = 1,2,…,N at site j = 1,2,…,J, denoted z(i,j), where z(i,j) = 1 if species i occurs in site j and is zero otherwise. The model for occurrence is specified as where ψi,j is the probability that species i occurs at site j. The state variable z(i,j) is usually not known with certainty. Instead, we observe data x(i,j,k) for species i at site j during sampling period k, which are also assumed to be Bernoulli random variables if species i is present (i.e. if z(i,j) = 1); otherwise, if z(i,j) = 0, then x(i,j,k) = 0 with probability 1. The observation model is represented by where θi,j,k is the detection probability of species i for the kth sampling period at site j, if species i is present at site j. Note that the model satisfies the condition that detection is a fixed zero when a species does not occur (because z(i,j) = 0).
In the simplest specification of the model, the occurrence and detection probabilities, ψ and θ, are determined by unspecified species and site-level effects (Dorazio et al. 2006). These effects are incorporated into the model linearly on the logit-probability scale: and where ui and vi are species-level effects and αj and βj are site-level effects on occurrence and detection respectively. Because high abundance species are likely to be both easier to detect and more prevalent across the landscape, we modelled a correlation between occurrence and detection in the model by allowing ui and vi to be jointly distributed such that where are the variance components among species for occurrence and detection, respectively, and σuv is the covariance of the 2 × 2 matrix Σ (Dorazio & Royle 2005; Kéry & Royle 2008).
Extensions of this basic model have explicitly incorporated landscape and survey characteristics into the probabilities of occupancy and detection (Kéry & Royle 2009; Russell et al. 2009). We followed this approach, and modelled the occurrence probability for species i at j by incorporating site-specific habitat characteristics. In this case we used the size and relative shape of the forest fragment in which the point count occurred. As counts were conducted in a 250 m radius, occupancy and detection estimates for individual species are provided at the point (not fragment) level. Thus we are considering how occupancy at a random point is affected by the area and shape of the forest fragment in which it occurs. We incorporated fragment area, perimeter and P/A in the occupancy estimates by assuming that the logit transform of the occurrence probability was a linear combination of a species effect and the site-specific habitat characteristics as follows:
We standardized the covariates so that the means of the perimeter, area and P/A data were zero. Thus, the inverse-logit of ui is the occurrence probability for species i in sites with ‘average’ habitat characteristics. The coefficients α1i, α2i and α3i are the effects of perimeter, area and P/A, for species i respectively. The detection probability for species i was assumed to vary based on the date of the survey (linear and squared effects) and the year of the survey. We assumed that the community was closed (i.e. the species pool remained constant) over the two years during which the survey was conducted, but added a year effect (constant across species) to account for shifting detection between the two years as a result of annual fluctuations in seasonality:
Our model contains seven parameters for each species in the community, and one (year effect) that is estimated across species. As observations were sparse for many species in the sample (Appendix S1), estimating all of these parameters would not be possible if the data were analysed on a species-by-species basis. As such, we added an additional hierarchical component of the model by assuming that the species-level parameters were random effects, each governed by community-level ‘hyper-parameters’. For example, we assumed that where μα1 is the community response (mean across species) to perimeter and σα1 is the standard deviation (among species), thus the hyper-parameters are simply the mean and variance for each covariate as measured across species (Kéry & Royle 2009).
We estimated model parameters and community summaries using a Bayesian analysis of the model with vague priors for the hyper-parameters (e.g. uniform distribution from 0 to 1 for community-level occupancy and detection covariates; normal distributions with mean zero and variance 1000 for community-level habitat and sampling covariates). Hierarchical models are naturally analysed using Bayesian methods (Gelman & Hill 2007). We carried out our analysis with WinBUGS (Spiegelhalter et al. 2003), general purpose software for Bayesian analysis that uses Markov chain Monte Carlo (MCMC). The advantage of WinBUGS is that it only requires specification of the model, and not a technical development of the MCMC algorithm. The complete model specification and additional details, including an assessment of model fit using a Bayesian P-value approach, are presented in Appendix S2.
Species richness and community-level Summaries
The mean estimates for the community response to fragment perimeter and area were negative, while the response to P/A was positive (Table 1). This suggests that, in general, the mean probability of occupancy across species in this community was higher at points in smaller, more irregularly shaped fragments than in larger fragments with less edge. The posterior intervals for each of the community hyper-parameters contain both positive and negative values (Table 1), which is a manifestation of the variability in the community (Appendix S1). In our study, which encompasses a diverse bird community, we would naturally expect the response of individual species to vary with landscape fragmentation. Thus, diffuse posterior distributions for the community-level habitat covariates are as expected and simply reflect the diversity within the community.
Table 1. Community-level summaries of the hyper-parameters for the detection and occupancy covariates
95% Posterior intervals
P/A, perimeter/area ratio.
Date effect (linear term)
Date effect (linear term)
Date effect (squared term)
Date effect (squared term)
We used the model to make predictions of species richness at localized points across a landscape with heterogeneous forest fragments that varied by area and P/A (Fig. 1; for details on how richness was calculated and estimates on the precision of the predictions, see Appendix S3). Species richness was maximized in small areas with high perimeter to area ratios (large amounts of edge habitat) (Fig. 1 left panel). However, assemblage richness of forest interior breeding birds (17 species), a subset of the population with high conservation value, was maximized in large fragments with less edge (Fig. 1 right panel).
Mean probabilities of occurrence varied widely among species, ranging from 6·5% to 98·5%. Detection was low for many species and also varied widely (7·1–75·9%). There was a strong correlation between occupancy and detection (posterior mean for ρ was 0·73, 95% posterior interval: 0·52–0·88; Fig. 2), a phenomenon that is likely due to heterogeneity in abundance among species (Dorazio & Royle 2005). Posterior summaries of occupancy and detection for each species, as well as species-specific responses to the habitat covariates, are presented in Appendix S1.
Fragment area, compared with perimeter or P/A, had a large impact on mean estimates of occupancy for many species within the community (Appendix S1). Over the range of surveyed fragments, 24 species showed (on average) an increase in occurrence probability as area increased (greater than 10% change in mean estimates of occupancy from minimum to maximum fragment size in the survey), 31 species showed a decrease in occurrence probability (>10%) with increasing area and 23 species showed no change in occurrence probability with area (<10% change).
Many species whose mean occurrence probabilities increased in response to increased area were forest-dependent species of high conservation concern. On average, nine forest interior breeders (Acadian flycatcher Empidonax virescens, black-and-white warbler Mniotilta varia, blackburnian warbler Dendroica fusca, black-throated blue warbler Dendroica caerulescens, black-throated green warbler Dendroica virens, cerulean warbler Dendroica cerulea, hooded warbler Wilsonia citrina, worm-eating warbler Helmitheros vermivorum and winter wren Troglodytes troglodytes) showed substantial increases in occupancy probabilities as fragment area increased, but less response to changes in perimeter or P/A ratio (Fig. 3; Appendix S1). Although the number of observations for these species was fairly low (6–36 for each), the community approach allowed us to obtain estimates of the response of each species to fragment area and regularity of shape. The precision on species-level estimates of occupancy and effects of fragmentation increased for most species in the community model compared with standard species-specific models (for selected results comparing the community model to a single species modelling approach, see Appendix S4). When modelling each species separately, occupancy estimates for species with sparse data could not be obtained without exhibiting extreme sensitivity to the prior. For the above nine forest interior species, the standard deviations on the estimated species-specific effects of area were generally lower using the hierarchical community model (range 1·24–1·83) than a standard species-level model (range 1·40–2·03; Appendix S4). Three species (ovenbird Seiurus aurocapilla, scarlet tanager Piranga olivacea and veery Catharus fuscescens) also had a positive response to area, but the effects were less discernible on estimates of occupancy because they were widely observed (e.g. occupancy was universally high). A few forest-dependent species (brown creeper Certhia americana, Canada warbler Wilsonia canadensis, northern parula Parula americana, red-breasted nuthatch Sitta canadensis and wood thrush Hylocichla mustelina) responded more closely to the community-level response with decreasing occupancy probabilities as fragment area increased (Fig. 4).
Although reliable summaries of species occurrences and distributions are required for effective conservation, analysis of multi-species data can be challenging because sampling techniques often identify numerous species with few detections. One way to address this issue is to utilize models that integrate data across species, allowing for composite analyses of communities or groups of species. Hierarchical models are particularly valuable in this context, in part because they do not require a priori assumptions about community structure; any composite analysis will improve estimates on metrics of interest, regardless of relationships among species (Sauer & Link 2002). For conservation purposes, it is generally useful to consider species from one community or related communities; otherwise community-level summaries may not be meaningful. In some situations it may be possible to incorporate additional group structure into the model when relationships among species have been well established. Estimates for rarely observed species will naturally be drawn to group averages (‘Bayesian shrinkage’ toward the mean; Link 1999), but the precision of estimates can be improved with even a minimal number of observations (Appendix S4). Accuracy of species-specific estimates will always be limited by the amount of available data, which is reflected in the diffuse posterior distributions for many habitat covariates (Appendix S1). Such estimates can only be objectively improved through additional data collection efforts. However, as with meta-analysis in classical statistics (Osenberg et al. 1999), many ‘weak’ inferences can be combined to make a stronger collective response. Thus, by accounting for both species-level effects as well as the aggregated effects of landscape covariates on the community as a whole, hierarchical models provide a valuable alternative to single species analyses of community data.
Our model produced a number of key findings relevant to prioritizing conservation actions and was capable of making predictions of bird species richness based on fragment area and edge effects (Fig. 1), which should be verified through additional sampling. Understanding the relationship between environmental factors and species richness will improve the efficacy of conservation efforts in the protection of biodiversity in urbanizing landscapes. For example, our estimates of the community and species-level relationships between occupancy probabilities and habitat characteristics allows a direct valuation of forest fragments in terms of either total species richness (Fig. 1 left panel) or assemblage richness (Fig. 1 right panel), and illustrates an explicit trade-off between these two competing objectives. Overall, the community-level response to area and P/A suggests that many species increased in occupancy in response to fragmentation inducing a concomitant increase in species richness. These results are consistent with the intermediate disturbance hypothesis (Grime 1973; Horn 1975; Connell 1978) which suggests that diversity is maximized in areas of moderate disturbance. Similar to Lepczyk et al. (2008), we found that extremely large fragments with extensive forest interior may be less common (DeWan et al. 2009) and estimates of species richness would be expected to decline if sites were dominated by edge-tolerant or generalist species. In a conservation context, our overall estimates of species richness may not be particularly valuable; however, the hierarchical framework offered a means to acquire improved precision in estimates of occupancy for rarer species, which we used to determine assemblage richness for a subset of the community with high conservation value.
Many of the forest-breeding species responded to increased fragmentation with decreased probabilities of occupancy (Fig. 3; Appendix S4). However, occupancy for some forest-breeding species responded negatively to fragment area (Fig. 4). Although this may not be surprising for more urban-tolerant species (e.g. red-breasted nuthatch), these results were not typical for others that are sometimes considered sensitive to fragmentation (e.g. Canada warbler, wood thrush). In addition, some area-sensitive species were so common that their relationship to area would not have been discernible through typical occupancy approaches. Scarlet tanager, ovenbird and veery were observed frequently during sampling and had high occupancy estimates. If we had a priori grouped these species together as an indicator of sensitivity to fragmentation, without testing the assumptions, we would have been unable to discern differences among species in their response to fragment area and P/A. We therefore suggest that researchers use caution in a priori grouping if the purpose is to understand how a species or assemblage may be responding to landscape variability.
Our approach allows for estimation of occupancy and detection probabilities of all observed species, even if they are poorly represented in the sample data. Detection probabilities were very low for many species (Fig. 2; Appendix S1), further supporting a number of studies that have demonstrated the importance of accounting for detection in occupancy and abundance modelling (Bailey, Simons & Pollock 2004; MacKenzie et al. 2006; Kéry, Royle & Schmid 2008). Detection probability can also be significantly affected by abundance (Royle & Nichols 2003), which is evidenced in our analysis by the high correlation between detection and occupancy. Variance around species-specific estimates of occupancy, detection and the covariates will inevitably be high for species with limited data. However, the community-level approach typically provides more precise estimates for rare species than traditional species-level analyses (Appendix S4) and was especially valuable for the nine forest interior species that were sensitive to habitat fragmentation, yet would not have yielded reliable estimates of occupancy due to low sample size. Our analysis framework should be particularly effective in reducing cost and increasing efficiency for organizations where funding for field-based data collection is limited.
Many conservation and management decisions rely on estimates of species richness to prioritize areas for protection and monitoring. For example, DeWan et al. (2009) developed a map of high priority conservation areas in the HRV region based on indices of richness for a subset of forest interior bird species. Their analyses were limited to species that were neither too common nor too rare. The results from our community-level approach can be used to improve such maps and more accurately determine areas of high conservation value to protect from urban development. We demonstrated, using a diverse bird community, the applicability and relevance of our hierarchical modelling approach to: (i) assess species richness while accounting for individual species; (ii) improve the precision on estimates of occupancy and detection for many species, even species with relatively sparse data; and (iii) investigate the impacts of fragmentation on breeding birds at the community and species levels. Our hierarchical framework offers an exciting tool for wildlife agencies and conservation organizations who struggle to effectively monitor and protect biological diversity. Monitoring the status and distribution of biodiversity and rare species is a priority at local, national and international scales (Oberbillig 2008). Because of challenges in sampling and cost, lack of quality data has been identified as a serious challenge for biodiversity conservation, particularly for rarer species (The Heinz Center 2002). Many sampling designs already include data collection on multiple species (Heyer 1994; Wilson et al. 1996) and multi-species inventory techniques can reduce sampling costs and effort (Manley et al. 2005; Vesely et al. 2006). The community approach allows researchers to use data from all sampled species to improve estimates of species richness and generate previously unavailable estimates of occupancy for rare or elusive species. The flexibility of hierarchical modelling can provide greater insight into how a particular taxonomic community responds to environmental changes, while also accounting for species-specific differences. If incorporated into monitoring and assessment programmes, this framework could improve estimation of species richness and inferences for rare species, and provide scientifically sound information to support conservation planning and action.
The authors thank Sarah Converse, Beth Gardner, Gonçalo Ferraz, Marc Kéry and Julien Martin for discussions on the topic and providing comments on earlier versions of the manuscript. We also thank three anonymous reviewers and Chris Elphick for many useful suggestions. This research would not be possible without the funding and support of the following: The Hudson River Estuary Program, Biodiversity Research Institute, and New York Cooperative Fish and Wildlife Research Unit at Cornell University.