Spatial models to account for variation in observer effort in bird atlases

Abstract To assess the importance of variation in observer effort between and within bird atlas projects and demonstrate the use of relatively simple conditional autoregressive (CAR) models for analyzing grid‐based atlas data with varying effort. Pennsylvania and West Virginia, United States of America. We used varying proportions of randomly selected training data to assess whether variations in observer effort can be accounted for using CAR models and whether such models would still be useful for atlases with incomplete data. We then evaluated whether the application of these models influenced our assessment of distribution change between two atlas projects separated by twenty years (Pennsylvania), and tested our modeling methodology on a state bird atlas with incomplete coverage (West Virginia). Conditional Autoregressive models which included observer effort and landscape covariates were able to make robust predictions of species distributions in cases of sparse data coverage. Further, we found that CAR models without landscape covariates performed favorably. These models also account for variation in observer effort between atlas projects and can have a profound effect on the overall assessment of distribution change. Accounting for variation in observer effort in atlas projects is critically important. CAR models provide a useful modeling framework for accounting for variation in observer effort in bird atlas data because they are relatively simple to apply, and quick to run.


| Variation in effort and false negatives
Because most bird atlas projects rely on citizen scientists to complete the majority of the field surveys, field methods are designed to promote mass participation with the aim of achieving comprehensive spatial coverage (Greenwood, 2007). Inevitably, this leads to trade-offs between data quality and coverage (Robertson, Cumming, & Erasmus, 2010;Szabo, Butchart, Possingham, & Garnett, 2012), such as the adoption of somewhat flexible field protocols which often do not impose standardization of survey effort. This lack of structure results in spatial variation in observer effort, with the highest effort often expended in areas with habitats or bird communities of most interest to amateur surveyors (e.g., Szabo, Davy, Hooper, & Astheimer, 2007;Tulloch, Mustin, Possingham, Szabo, & Wilson, 2013). Further, accessibility may constrain spatial coverage, such that areas at a greater distance from centers of human populations often receive the lowest effort, especially if there is a lack of accessible roads or paths (McCarthy, Fletcher, Rota, & Hutto, 2012;Syfert, Smith, & Coomes, 2013). Hence, spatial bias can result in significant taxonomic bias, that is, under-representation of certain species or species groups (Robertson et al., 2010).
When repeat atlases projects are used to assess shifts in species' distributions, changes in survey effort can result in biased measures of changes in range margins (Kujala, Vepsäläinen, Zuckerberg, & Brommer, 2013). It is especially important, therefore, that estimates of changes in distribution between atlas periods include an assessment of changes in survey effort. Many published bird atlases have not accounted for variation in survey effort when producing estimated changes in range size, and indeed, not all bird atlases have adequately collected data to measure survey effort. The application of species distributions models (SDM) is one way of accounting for biases due to variable survey effort. Species distribution models are algorithms that "identify a mathematical or logical function linking species' occurrences and a set of predictors" (Kamino et al., 2012).
A multitude of different SDMs have been developed and applied to ecological data (e.g., Aizpurua, Paquet, Brotons, & Titeux, 2015;Comte & Grenouillet, 2013;Elith, Kearney, & Phillips, 2010;Rocchini et al., 2011), but accounting for false negatives is a persistent challenge (Chefaoui & Lobo, 2008;Kéry, 2011;Rocchini et al., 2011). A failure to detect a species in a given area could be because the habitat is not suitable, or it could merely be due to insufficient survey effort or inappropriate survey protocols. Developing survey protocols to maximize the likelihood of detection across multiple species is challenging, and the probability of detecting all species-given their presence-is almost always less than one (Kéry, 2011). Occupancy models (MacKenzie et al., 2006) are increasingly used to account for nonperfect detection in biological atlas data (e.g., Broms, Johnson, Altwegg, & Conquest, 2014). These models are based on capture-recapture theory and hence rely on repeated site visits to model both species occurrence and detectability (MacKenzie et al., 2006). Unfortunately, data captured by bird atlas projects are often highly unstructured and may include opportunistic data, and highly variable sequences of survey effort between blocks, and over multiple years (e.g., Wilson, Brauning, & Mulvihill, 2012). Further, data capture during first generation atlases was often not sophisticated enough to retain a permanent record of the visit history, or even visit years, to each block.
Hence, application of occupancy models is not always feasible, and such models can be computationally challenging (Broms et al., 2014;Kéry, Gardner, & Monnerat, 2010), which is a major hurdle if practitioners with limited resources are to apply them to large numbers of species.

| Aims
In this study, we use data from the Atlas of Breeding Birds in Pennsylvania, henceforth the "PBBA I" (Brauning, 1992;fieldwork 1983-1989, and the Second Atlas of Breeding Birds in Pennsylvania, henceforth the "PBBA II" (Wilson et al., 2012;fieldwork 2004-2009, to explore the use of a SDM approach to account for variation in observer effort, both within and between repeated atlas projects. We apply the methods to data from the West Virginia Breeding Bird Atlas II project, henceforth WVBBA II (unpublished;fieldwork 2009, to demonstrate that relatively simple conditional autoregressive (CAR) models offer a useful framework for modeling species distributions using atlas data, by incorporating information on broad-scale landscape features and observer effort.
Throughout this study, we use the term "occupancy probability" as shorthand for "occupancy and detection probability." Our models predict species occurrences based on hypothetical but realistic levels of survey effort per block and hence we do not explicitly model detection probabilities.
Specifically, we test the following hypotheses: 1. Spatial (CAR) models are effective at predicting species' occupancy probabilities for bird atlases with incomplete data 2. Spatial (CAR) models are superior to nonspatial models at predicting species' occupancy probabilities 3. Accounting for variation in observer effort improves model fit 2 | METHODS

| Data
The PBBA I established a survey grid based on U.S. Geological Survey (USGS) 7.5-min topographic quadrangle maps; dividing each USGS quadrangle into six atlas "blocks," the bottom-right of which being designated as a "priority" block. This resulted in 4,937 atlas blocks, including 787 priority blocks, of 3.75' longitude and 2.5' latitude (approximately 5.2 × 4.62 km) across the state of Pennsylvania (Brauning, 1992). Priority blocks were targeted for more thorough coverage than other blocks. Fieldwork was completed by approximately 2,000 volunteers in the years 1983-1989. The PBBA II (2004-2009) was a repeat effort with very minor changes to field survey protocols. Although the median change in block effort between atlas projects was an increase of 5 hr (mean = 8.68, SD = 33.67), there was not a uniform increase in effort across all blocks. Not only were the changes in effort not evenly distributed across the state (Wilson et al., 2012), but effort was lower than PBBA I in 37% of blocks. It should be noted that details of survey effort in each block (times and duration of individual visits) in PBBA I are longer available, but the database does have a record of total effort hours.
Blocks for the WVBBA II (2009)(2010)(2011)(2012)(2013)(2014), developed for the WVBBA I (1984-1989Buckelew & Hall, 1994), were delineated in the same way as the PBBA blocks. This resulted in 2,653 atlas blocks across the state of West Virginia, 469 of which were designated as priority blocks. Due to a lower population density, and consequently smaller volunteer pool, block coverage-in 2009-2014-for the WVBBA II was incomplete in comparison with the PBBA II. No bird records were submitted for 580 atlas blocks (21.9%), and observer effort exceeded 1 hr in less than half of blocks (48.2%). However, coverage of the 469 priority blocks in WVBBA II was comprehensive, with a mean of 19.6 hr (SE = 0.65) of survey effort expended per block. Like the Pennsylvania atlas, spatial distributions of observer effort in the WVBBA II were not uniform, consisting of fairly comprehensive coverage in the Allegheny Mountains and lower elevation "panhandle" (eastern half of the state), but very patchy coverage across much of the Allegheny Plateau (western half of the state; Figure 2).

| Study species
We tested our hypotheses on Pennsylvania atlas data for six species:   T A B L E 2 Spatial autocorrelation of landscape covariates among atlas blocks (Global Moran's I). p < .0001 for all z-tests We used the models developed for the PBBA II to produce predicted probabilities of block occupancy for 136 species (i.e., those that were found in at least 20 blocks) in WVBBA II.

| Data analysis
The predicted probability of occurrence by block was modeled using Hierarchical Logistic Regression in WinBUGS for each species (Lunn, Thomas, Best, & Spiegelhalter, 2000). The model took the form: where p i is the predicted probability of occurrence in block i, α is the intercept, β k is the parameter estimate for s landscape covariates χ, γ i is a correction factor to account for observer effort (i.e., deviation from a specified "standard"), δ i is the spatial effect, and ε is random error. Effort effects were modeled after Link and Sauer (2007), using the formula: where B is the standardized number of hours (e.g., mean or median), and C and D are estimated parameters that determine the shape of the relationship between hours and probability of detection. This formulation enables estimation of effort effects that range from linear (C = 0, D = 1), to diminishing returns (C = 0, 0 < D < 1), and diminishing returns with an asymptote (C < 0, D > 0). This function provides a multiplier; such that if B = 30, the multiplier would be >1 for blocks that receive <30 hr of survey effort-increasing the predicted probability of occurrence in blocks with low effort. In the above example, the multiplier would equal 1 for survey effort of 30 hr.
Spatial effects were included using a Gaussian CAR model, which accounts for spatial autocorrelation in lattice data, such as gridded atlas blocks. Spatially explicit models are "expected to yield better predictions" (Bahn et al., 2006) than nonspatial models. In ecological terms, such models incorporate important spatial information that may relate to unknown effects of bird population dynamics and underlying environmental variation. Spatial autocorrelation was assessed using global Moran's I test statistic using ESRI's Spatial Statistics Tool.
A Moran's I value close to zero indicates spatial randomness while a positive value (up to 1) indicates positive spatial autocorrelation.
Statistical significance of Moran's I was tested using z-tests (Z score is based on the Randomization Null Hypothesis computation). The computation of spatial autocorrelation was based on Queen contiguity and Euclidean distance (Anselin, 2005).
Models were implemented using Markov chain Monte Carlo (MCMC) simulation in WinBUGS. Vague normal prior distributions (0, 0.0001) were used to begin the MCMC sampling. Models were fitted with 5,000 iterations following a minimum 1,000 sample burn-in.
Model convergence was checked by examining trace plots for all parameters (Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2012). Most models showed convergence after a few hundred iterations, but models applied to the sparser WVBBA II species data required up to 6,000 iterations to reach convergence.
To cross-validate our models and test our hypothesis that spatial models are effective at providing predictions for bird atlases with incomplete data, we ran five models for each of the six test species (PBBA II data): 1. all-using data from 100% of atlas blocks 2. 25% random-models trained using 25% of blocks, and tested on the remaining 75% 3. 50% random-models trained using 50% of blocks, and tested on the remaining 50% 4. 75% random-models trained using 75% of blocks, and tested on the remaining 25%

5.
Priority blocks-models trained using only priority blocks, which comprise 16.5% of all blocks, and tested on the remaining 83.5% Model accuracy was evaluated for test data using the area under the receiver operating characteristic (ROC) curve, commonly denoted as area under the curve (AUC). Area under the curve values of 0.5 imply that model accuracy is no better than random, while AUCs of 0.8 or more are considered good, and values of 0.9 or more are considered excellent (Brotons, Herrando, Estrada, Pedrocchi, & Martin, 2008 Our third hypothesis, that incorporating observer effort effects improves model fit, was implicitly tested by comparing models 4a with 4c, and 4b with 4d. We also investigated the relationship between effort and predicted occupancy by running models with To assess changes in block occupancy between atlas efforts, we used a relative change measures: and:

| RESULTS
Data from the two Pennsylvania atlases show that there is a strong relationship between changes in observer effort (hours of effort) and changes in the number of species detected within each block ( Figure 3). For blocks where there was a reduction in effort, there was typically a corollary reduction in the number of species observed; while, conversely, the number of species detected usually increased in blocks where effort increased between atlas periods. However, the relationship between changes in effort and changes in observed species richness was not linear, but shows saturation such that increases in effort of more than 40 hr do not continue to accrue these effort effects.
Underlying the relationship between changes in effort and number of species detected is the fact that the probability of detection for each species is a function of hours of effort expended in each block.
Models that corrected PBBA II data for effort show that increased effort hours would significantly increase block detections for all species tested ( Figure 4). None of the models reached an asymptote within 50 hr of survey effort, and there were notable differences among species. For Ovenbird, the predicted number of occupied blocks increased slowly with increased survey effort, suggesting that this species was likely to be detected-where present-even with a limited amount of observer effort. In contrast, the predicted number of occupied blocks species, recorded occupancy rates in priority blocks were on average 4.6 times higher than in nonpriority blocks ( Figure 8). Further, the relationship between priority and all block (both priority and nonpriority) detection rates was not linear, with under-detection especially pronounced in moderately widespread species (those found in approx. 25%-75% of priority blocks), as opposed to localized and ubiquitous species (Figure 8). Our models produced predicted block occupancy rates that were very close to those from priority blocks for all species, with the relationship between actual and predicted very close to the identity line (Figure 8).
A map of predicted change in block occupancy of the Carolina Wren in PA between the PABBA I and PABBA II (Figure 9) suggests that most of the isolated instances of apparent loss of block occupancy were likely the result of decreased observer effort in some blocks. The predicted probability of changes in block occupancy provides a clearer map of likely range expansion of this species than the recorded occupancy, even though close to 100% block coverage was achieved in both PABBAI and PABBA II.

| DISCUSSION
Our results suggest that CAR models incorporating coarse landscape and effort effects are successful at predicting species' occupancy probabilities in bird atlas blocks with little to no observer effort.
Further, as the landscape covariates added rather little (if any) predictive power for our test species, models incorporating only spatial and effort effects may provide adequate models that circumvent a considerable amount of GIS-based analysis required to extract landscape covariates, and the subsequent model selection required to identify the best predictors. However, because our model testing was limited to only six species in a relatively homogenous state (all of Pennsylvania is within the Temperate and Broadleaf Mixed Forest Biome; Olsen et al., 2001), we caution against assuming that our findings would apply to all species and regions. The reason that our CAR models had high predictive power, even when landscape covariates were not included, was likely due to the fact that landscape covariates were highly spatially autocorrelated between adjacent atlas blocks; hence, the spatial component of the model accounted for large-scale patterns in land cover.
The model testing based on various percentages of training data suggests that our CAR models would be applicable to bird atlas projects with incomplete coverage. Even for very sparse data (e.g., 25% training data for Henslow's Sparrow, see Fig. S1.5), our predicted probability of occupancy map provided a good approximation of actual species' distributions. The main failing of our models was an under-prediction of isolated block occurrences that were outside of the species' core range within the state (e.g., Fig. S1.5). However, it is likely that for many species, isolated block occurrences away from the species' core ranges represented small and temporally erratic populations. Hence, if bird atlas data are to be utilized for conservation planning, correctly demarcating core species' ranges is critical (Rondinini, Wilson, Boitani, Grantham, & Possingham, 2006).
By correcting for survey effort, the number of species assessed that expanded their range (block occupancy) rather than show a range contraction between PBBA I and PBBA II changed sufficiently to put an entirely different complexion on atlas findings. Recorded data suggested that species showing increased block occupancy outnumbered those showing decreased block occupancy by more than two to one (2.59:1), but after correcting for effort, the ratio was much closer to parity (1.25:1). The potential effects of not correcting for survey effort to evaluate range shifts have been documented by others (Kujala et al., 2013). Our analysis supports the need for SDMs that incorporate variation in observer effort (MacKenzie et al., 2006) to correctly measure range shifts.
While our methods show that spatial models can account for variation in observer effort, there are some limitations to our analysis.
While the number of effort hours is correlated with the number of species detected in an atlas block, there are several other factors that could influence the probability that any given species is detected, including the efficiency and level of prior experience of observers, the number of individual visits within and between years, the diel distribution of survey effort, and the spatial distribution of effort within a given block. Observer effort may also be influenced by habitat diversity, with more effort required to survey blocks with diverse habitats.
Another potential limitation to our spatial models is the likely presence of anisotropy-that is, directional dependent spatial relation-

ships. The Valley and Ridge Physiographic Province of south-central
Pennsylvania and much of West Virginia has a pronounced southwest to northeast topography, a result of the weathering of belts of rocks from repeated by folding and faulting (Fenneman, 1938). This topography has a direct impact on land use, with farmland and human development F I G U R E 5 Comparison of predictive performance on test data, as measured by the Point Biserial Correlation coefficient (CORR) of models for six species based on 2nd Pennsylvania Breeding Bird Atlas data (2004)(2005)(2006)(2007)(2008)(2009) dominating the valleys, and forests on the ridges, and hence valleys and ridges have markedly different habitats (Wilson et al., 2012).
Although we included some model selection in our analysis, the approaches that account for spatial variation in survey effort, it is critical that effort is comprehensively and accurately quantified.
Volunteer/surveyor effort hours is now documented by most bird atlas projects, but other ways of measuring effort, such as distance travelled through the sampling unit (Robertson et al., 2010), or species accumulation lists (Moreno & Halffter, 2000). With the increasing use of online data capture for atlas projects (Robertson et al., 2010), the requirement to include a measure of survey effort with each data submission is a simple addition to online data capture forms.
Our analysis of the PABBA II and WVBBA II data revealed that total effort hours may not be sufficient for producing effortcorrected SDMs for species that are active at specific times of the days, most notably nocturnal species. While the time of day of observations was required, along with overall effort hours in the online data submission portal for PABBA II, we suspect that nocturnal effort hours were under-reported (Wilson et al., 2012). We therefore suggest that the importance of parsing daytime and nocturnal hours is emphasized in the future atlas efforts, through communication with surveyors and through careful development of recording forms/online portals to document effort hours accordingly. This would also allow for the application of occupancy-detection models, which may be especially useful for scarce or difficult to detect species (Guillera-Arroita, 2017).
It is not possible to state a broadly applicable minimum requirement for survey effort and block coverage from our analysis. The minimum requirement would depend to some extent of habitat heterogeneity, species richness, and species' densities. However, the CAR models that we have employed work best when unsurveyed blocks are adjacent to blocks with data-hence, large tracts of unsurveyed blocks should be avoided. We suggest that our methods be applied to different regions, and atlases with a variety of grid sizes and coverage, to assess their general applicability. Finally, we encourage data analysts to report the CPU time required to run SDMs as a matter of course, thereby enabling managers of bird atlases to adequately budget for data analysis following data collection.

ACKNOWLEDGMENTS
Both the Pennsylvania and West Virginia Atlas projects were completed by large numbers of volunteer fieldworkers and coordinators, who we thank for their tireless dedication. PBBA II was funded principally by the Pennsylvania Game Commission with support from grants administered by the U.S. Fish and Wildlife Service, Wildlife and Sport Fish Restoration Program. WVBBA II was sponsored by the West Virginia Division of Natural Resources, Wildlife Resources Section. We are most grateful to two anonymous referees who provided very useful and insightful comments, which greatly improved our manuscript.