Predictive multi‐scale occupancy models at range‐wide extents: Effects of habitat and human disturbance on distributions of wetland birds

Predicting distributions is fundamental to ecology, yet hindered by spatially restricted sampling, scale‐dependent relationships and detection error associated with field surveys. Predictive species distribution models (SDMs) are nonetheless vital for conservation of many species. We developed a framework for building predictive SDMs with multi‐scale data and used it to develop range‐wide breeding‐season SDMs for 14 marsh bird species of concern.


| INTRODUC TI ON
Predicting the spatial distribution of species is a fundamental task in ecology and biogeography (Guisan & Thuiller, 2005;Guisan et al., 2013). Yet, accurate prediction from species distribution models (SDMs) is a difficult problem that can be derailed by challenges common to ecological studies. First, many SDMs are based on spatially restricted sampling that achieves limited coverage of environmental gradients, and thus, data do not represent the range of conditions inhabited by a species, resulting in poor transferability (i.e. models may extrapolate poorly in space and time; Wenger & Olden, 2012;Yates et al., 2018). Second, modelled relationships are often sensitive to the measurement scales of occurrence and environmental covariate data (Wheatley & Johnson, 2009;Wiens, 1989) and tools for identifying optimal scales are still being developed (Chandler & Hepinstall-Cymerman, 2016;McGarigal, Wan, Zeller, Timm, & Cushman, 2016;Miguet, Fahrig, & Lavigne, 2017;Stevens & Conway, 2019). Third, field observations of animals are often corrupted by measurement error due to the failed detection of cryptic organisms (Kéry, 2002;MacKenzie et al., 2002), resulting in biased predictions if heterogeneity in detectability is not accounted for (Gu & Swihart, 2004;Guillera-Arroita et al., 2015;. Despite these challenges, predictive SDMs remain a vital component of many conservation programs (Guisan et al., 2013;Scott et al., 2002) and tools for addressing these problems are thus necessary.
Prediction is often a fundamental objective of SDM applications (Guisan et al., 2013), yet there is no consensus methodology for building and selecting predictive SDMs from multi-scale environmental covariate data (Elith & Leathwick, 2009;McGarigal et al., 2016). Studies of species-habitat relationships are commonly local or regional in nature and assess the effects of environmental attributes at relatively fine scales, and thus, the interplay of fine-scale and broadscale habitat and human disturbance features affecting species distributions is often poorly understood and not accounted for in predictive SDMs (Hobbs, 2003;Jackson & Fahrig, 2015;Wheatley & Johnson, 2009). Multi-scale analyses that seek to identify the best spatial scale for modelling species-environment relationships are desirable and thus commonly advocated (McGarigal et al., 2016;Wiens, 1989). The process of identifying the most appropriate scales for measuring species-environment relationships is made even more challenging, however, by the large number of hypothesized covariates that are commonly available to researchers as a result of remote sensing and geographic information system technologies (Guisan & Thuiller, 2005;Rushton, Ormerod, & Kerby, 2004). Selection of SDM structures (both covariates and their scales) for prediction must therefore balance the fit of models to existing data with the generality needed to predict to unsampled locations (Heikkenen, Marmion, & Luoto, 2012;Wenger & Olden, 2012;Yates et al., 2018). Modelling approaches that optimize model complexity for the explicit purpose of spatial prediction are becoming more common (e.g. Wenger & Olden, 2012), especially for SDMs that predict the apparent distribution of organisms (i.e. the mathematical product of occupancy and detection probabilities; Kéry, 2011;Lahoz-Monfort et al., 2014). Yet employment of model selection techniques that explicitly optimize SDM complexity for predicting new data while also accounting for failed detection remains rare (but see Broms, Hooten, & Fitzpatrick, 2016;Stevens & Conway, 2019).
Despite their technical challenges, predictive SDMs aide conservation decision-making for many species of management and conservation concern, which often includes wetland and riparian species. Wetlands and riparian areas are among the most biologically productive and diverse ecosystems, yet wetlands are highly degraded and threatened globally (Bedford, Leopold, & Gibbs, 2001;Brinson & Malvárez, 2002;Tockner & Stanford, 2002). Many species of wetland-dependent birds have suffered population declines and range contractions as a consequence of habitat modification and degradation (Conway & Sulzman, 2007;Naugle, Johnson, Estey, & Higgins, 2001;Quesnelle, Fahrig, & Lindsay, 2013). Secretive marsh birds (marsh birds) are a cryptic group of species that are emergentvegetative wetland obligates, and for which data-driven SDMs (i.e. built using analysis of field data and not simply expert opinion) to predict their distributions over broad extents are lacking. Substantial concern over marsh bird conservation exists within the USA (Conway & Timmermans, 2005; U.S. Fish & Wildlife Service, 2005;U.S. Fish & Wildlife Service, 2008), and multiple species are listed as threatened or endangered at the state, national and international levels (COSEWIC, 2002;Diario Oficial de la Federacion, 2002). As such, predictive SDMs built using field records of marsh bird occupancy are needed to identify existing threats and inform strategic conservation of these birds over broad spatial extents.
Aside from the obvious reliance of marsh birds on emergent wetland vegetation at fine scales, biologists know surprisingly little about the key habitat components necessary for marsh bird persistence, or how their distributions are shaped by anthropogenic modification of landscapes and watersheds. Like many other species, studies of marsh bird habitat relationships have been mostly local or regional in scope and focused on the effects of fine-scale vegetation (i.e. within 100 m of sampling points) on space use (e.g. Budd & Krementz, 2010;Conway, Eddleman, Anderson, & Hanebury, 1993;Darrah & Krementz, 2008;Darrah & Krementz, 2010;Darrah & Krementz, 2011;Lor & Malecki, 2006;Winstead & King, 2006).
These studies generally lack the contrast of data collected over broadscale disturbance gradients needed to predict distributions across a species range. As an example, studies have often been conducted at wetlands containing the necessary emergent vegetation, Bayesian model selection, habitat model, hierarchical occupancy model, marsh bird, scale dependence, species distribution model but also within watersheds that were already heavily modified by agriculture, human development, modification of hydrologic regimes or all of these factors (e.g. Alexander & Hepp, 2014;Baschuk, Koper, Wrubleski, & Goldsborough, 2012;Bolenbaugh, Cooper, Brady, Willard, & Krementz, 2012;Darrah & Krementz, 2008;Darrah & Krementz, 2010;Darrah & Krementz, 2011;Glisson, Brady, Paulios, Jacobi, & Larkin, 2015;Krementz, Willard, Carroll, & Dugger, 2016;Valente, King, & Wilson, 2011;Winstead & King, 2006). Wetland degradation over broad spatial scales has been hypothesized as a driver of population declines for marsh birds (Eddleman, Knopf, Meanley, Reid, & Zembal, 1988), but little work has quantified these effects. For example, we are unaware of studies that assessed impacts of hydrologic modification on distributions of marsh birds over broad spatial extents (e.g. across a species range), despite the wellestablished influence of hydrologic modification on the structure and function of wetland ecosystems (Bedford et al., 2001;Tockner & Stanford, 2002) and evidence that water levels can impact finescale use of habitat by marsh birds Roach & Barrett, 2015).
The interplay of fine-scale and broadscale habitat and disturbance features that shape breeding distributions of marsh birds is also poorly understood. Studies have attempted to tease apart effects of habitat over a range of spatial scales, but they typically measured different attributes at each scale (e.g. Glisson, Brady, et al., 2015;Harms & Dinsmore, 2013;Naugle et al., 2001;Pickens & King, 2012;Roach & Barrett, 2015;Valente et al., 2011), thus confounding scalar effects with effects of the habitat features themselves (McGarigal et al., 2016;Wheatley & Johnson, 2009). As a consequence, prioritizing wetlands for marsh bird conservation over broad extents via extrapolation of site-specific and fine-scale results assumes that results are transferable to other regions and scale up linearly to broader spatial extents; these assumptions may not be valid and, hence, are unlikely to result in reliable prediction (Miller, Turner, Smithwick, Dent, & Stanley, 2004;Roach, Hunter, Nibblelink, & Barrett, 2017;Schneider, 2001).
Current information gaps mean that biologists are ill-equipped to identify the best remaining habitat for marsh birds across their ranges. Predictive SDMs built using multi-scale covariate data collected over existing environmental and disturbance gradients are therefore needed to predict marsh bird occupancy across large landscapes. Moreover, while marsh bird populations are of significant conservation concern within the USA, the status of many species is uncertain (i.e. range contracting or expanding). To address these needs, our objectives were to (a) develop predictive SDMs for each of 14 priority marsh bird species using data collected from across their breeding ranges within the continental USA, (b) assess the impacts of wetland features and anthropogenic disturbances measured over a variety of spatial extents on marsh bird distributions and (c) assess empirical evidence for contraction of range-wide breeding distributions for each marsh bird species during our 14-year study. To accomplish these objectives, we built optimally predictive multi-scale SDMs for 14 species of marsh birds and selected covariates and their spatial extents to optimize out-of-sample prediction across the U.S. breeding range of each species. We also accounted for spatial and temporal heterogeneity in detectability during field sampling through use of hierarchical Bayesian occupancy models. Thus, we took a holistic, multispecies and multi-scale approach to building predictive SDMs and identifying threats to marsh birds. Moreover, our approach for developing optimally predictive occupancy models using multi-scale environmental covariate data is generalizable and can be applied to any species.

| Study area
We included survey data from 8,457 sites throughout the continental USA that were contained within the geographic range of 14 marsh bird species of conservation concern: pied-billed grebe and yellow rail (Coturnicops noveboracensis). We used ranges from the Gap Analysis Program (GAP) to identify the geographic range extent for each species (Gergely & McKerrow, 2013) and included survey points (sampling described below) that fell within the range identified by the GAP model for that species. Thus, study areas were species-specific and were effectively the entire U.S. portion of the breeding range for each of the 14 species. We also visually inspected survey sites where a species was detected outside of its GAP range in ArcGIS (ArcMap 10.4.1, ESRI, Redlands, CA), and included these extralimital sites if they were near the range boundary or fell within plausible gaps of the range for a given species (Figures S1.1-S1.14).
Our approach for including extralimital sites kept us from excluding plausible breeding-season detections, given the imperfect accuracy of GAP range models. However, the number of extralimital sites relative to sample sizes used to build SDMs for each species was miniscule (<1% of data points for each species), and thus, inferences were primarily determined by surveys at sites within the GAP range for each species.
Our focus was on predicting the distribution of a species within their breeding range as a function of environmental covariates, which is distinctly different from identifying the factors that determine range boundaries (Brown, Stevens, & Kaufman, 1996).
Conservation investments for marsh birds, like many other species, are limited. This reality provided the impetus for developing models using samples located within the breeding range for each species, as opposed to using all samples within the continental USA for developing species-specific SDMs (as would be appropriate if studying range boundaries). We focused on predicting occupancy within the breeding range because necessary components of strategic conservation for these birds include identifying important areas within each species' range and restoration of breeding habitat via spatial targeting of management efforts over broad extents.

| Data collection
Detection-non-detection data from marsh bird surveys were collected by >50 government agencies and non-governmental organizations from 1999 to 2012 at 8,457 sites throughout the USA (see Appendix S1). Field surveys were completed during the breeding season (1 March-15 July) following the North American standardized marsh bird monitoring protocol (Conway, 2011). Most surveys included both a passive and a call-broadcast segment. Broadcasting marsh bird calls through a speaker increases detection probabilities by eliciting vocal responses from birds occupying the area that otherwise are likely to go undetected (Conway & Gibbs, 2005;. Field surveyors typically visited sites repeatedly over a breeding season. While field sampling was spatially extensive and included a large number of locations for each species, most of these sites were sampled for ≤2 years during the study (Table S1.1) and the frequency of sampling varied because of the large number of organizations involved in data collection. Not all of the 8,457 sites were located inside the range of each species, and thus, the number of sites included in analyses varied among the 14 species (903-8,457; Table S1.1; Figures S1.1-S1.14). Also, analyses for six species used data collected over a shorter temporal extent (10-13 years) because some years did not include sampling locations within all 14 species' ranges (Table S1.1).
For each visit at a given site within each year, surveyors recorded detection for each species if it was detected during the visit and recorded non-detection if it was not detected during the visit. Replicate visits within each year allowed estimation of occupancy-habitat relationships while also explicitly accounting for failed detections that commonly occur during field surveys (Kéry, 2002;MacKenzie et al., 2006). Because marsh bird detection probabilities are influenced by a number of factors, both within and beyond the control of investigators, we assessed four survey-level attributes recorded during field sampling as hypothesized covariates in species-specific detection models (Conway & Gibbs, 2011;Conway, Sulzman, & Raulston, 2004;Nadeau, Conway, Smith, & Lewis, 2008) We obtained spatial data to build SDMs for marsh birds as a function of wetland habitat and anthropogenic disturbances measured over a variety of spatial extents (using the focal-patch-landscape design described by Brennan, Bender, Contreras, & Fahrig, 2002; covariate data and spatial analyses described in Appendix S2). We used data from the National Wetland Inventory (NWI) to characterize attributes of wetlands, and data from the National GAP Land Cover Dataset, version 2 (GAP) to describe anthropogenic disturbances surrounding field sites. We considered 21 wetland covariates representing combinations of NWI wetland system and subsystem (wetland system-subsystem), wetland classes, water regimes and special modifiers as variables to predict the distribution of marsh birds (Tables S2.1-S2.3). We also considered nine GAP land cover variables related to anthropogenic disturbance ( and one variable that described cover of non-native vegetation (non-native cover). Importantly, all of the NWI and GAP variables (Appendix S2) measured attributes of wetlands (e.g. water levels and flooding regimes, vegetation coverage) and anthropogenic disturbance (e.g. human development and agriculture) that are known to directly affect the structure and function of wetland ecosystems, and hence the distribution of organisms within wetlands (Bedford et al., 2001;Tockner & Stanford, 2002;Zelder & Kercher, 2005). Thus, all covariates considered were ecologically plausible factors affecting the distribution of breeding marsh birds, despite the fact that the impacts of many of these variables on distribution of marsh birds over broad extents has received limited study.
Relationships between an animal's use of an area and the attributes of that area are often sensitive to the observational scales at which data are collected (Hobbs, 2003;Levin, 1992;Mayor, Schneider, Schaefer, & Mahoney, 2009;Wiens, 1989). Thus, we evaluated occupancy-covariate relationships at multiple spatial extents surrounding each site. We measured the amount of cover for each wetland and disturbance variable at three spatial extents (100-m, 224-m and 500-m radii buffers) using the focal-patch-landscape design (Brennan et al., 2002) and circular moving-window analyses.
Attributes of each site were described at a spatial grain of 30 m and assigned proportional coverage values for each covariate measured over each of the three extents. Multi-scale analyses such as these can result in overlap of spatial-extent buffers, resulting in broadextent covariate measurements around neighbouring sites that are similar in value (Zuckerberg et al., 2012). This similarity may cause a reduction in covariate variance as the measurement scales increase, which some have suggested may result in analyses favouring smallscale covariate measurements as optimal (Lipsey, Naugle, Nowak, & Lukacs, 2017). We did observe small reductions in covariate variance with increased measurement extent (i.e. from 100-m to 500-m buffers; Table S1.2), but this did not result in favouring of fine-scale habitat measurements for predictor variables in our analyses (see Section 2.3 and Section 3 below).
In addition to the local variables, we also measured 10 broadscale disturbance variables at multiple levels within watersheds surrounding each site (Table S2.5). We used one variable describing the number of current and former mines and mineral processing plants to characterize the effects of mining on water quality, one variable describing the number of pollutant release and transfer reporting facilities to characterize the possibility for contamination, and six hydrologic-modification variables related to the storage capacity of water by upstream dams, as well as the magnitude of restriction and interruption of natural flow regimes by dams for each site (see Appendix S2). We also measured the two composite GAP variables for development and agriculture (described above) to characterize anthropogenic disturbance within a watershed. Moreover, each of these variables was measured at two spatial levels: catchments and networks. Each catchment contained the drainage area for a single surface water reach. However, catchments receive water from upstream sources, and thus, wetlands within a catchment may be influenced by conditions upstream within a watershed. Sampling points were thus also contained within a broader watershed network that also included upstream wetlands (hereafter the network) to account for upstream dynamics. Disturbance variables measured at the catchment and network levels were described at a spatial grain of 30 m (i.e. similar to local variables but calculated over broader extents).

| Statistical analysis
We used a hierarchical formulation of the multiseason occupancy model (Mackenzie et al., 2006) to predict the probability of occupancy as a function of wetland and disturbance variables. We built models to predict net occupancy probability (ψ) as a function of covariates (p. 312, Royle & Dorazio, 2008) and were not interested in modelling local extinction and colonization dynamics explicitly.
Hence, we modelled occupancy and detection processes as: This parameterization assumed the true occupancy state (z) for each site (i) and year (t) was a random variable arising from a stochastic Bernoulli process, where y i,j,t represents the detection (1) or non-detection (0) event at visit j, whose distribution is also Bernoulli but governed by the occupancy state and probability of detection on that survey occasion (p i,j,t ). As such, the temporal grain of occupancy was a single breeding season, whereas the temporal extent was the entire study duration for that particular species (10-14 years). We did not consider temporally dynamic occupancy models where z i,t+1 was dependent in a Markovian fashion to z i,t because sites were sampled inconsistently across years, and most sites were sampled for only 1-2 years (Table S1.1). Thus, we did not believe our spatially extensive data were adequate for temporally dynamic models of marsh bird occupancy.
This statistical model treats the true occupancy at a site (i.e. present or absent) as closed and unchanging within a single breeding season, but assumes the status can change among seasons.
In addition to modelling the relationship between occupancy and habitat, occupancy models explicitly model the process of detection during field surveys. Ignoring failed detection during presence-absence surveys results in mathematical confounding of detection and occupancy probabilities, and the resulting models predict the joint probability of where a species both occurs and is likely to be found during sampling (i.e. apparent distribution; Guillera-Arroita, Lahoz-Monfort, MacKenzie, Wintle, & McCarthy, 2014;Kéry, 2011;Lahoz-Monfort et al., 2014). This problem of confounding detection with occupancy is particularly serious for marsh birds because they have notoriously low detection probability (Conway & Gibbs, 2011). Thus, we built hierarchical occupancy models that allow for mathematical separation of occupancy and detection probabilities (Guillera-Arroita et al., 2015) to predict the true distribution of marsh birds without making two restrictive but commonly made assumptions that are not valid for marsh birds and many other animals: (a) that occupancy status did not change over the study duration, and (b) that detection probability did not change among surveys.
We used a Bayesian statistical paradigm for all model fitting and inferences (model fitting details provided in Appendix S3) and used model selection to optimize the model structure and spatial extent of covariates for prediction. We randomly split sites into training and testing data for each species, where training data were used to fit models and estimate posterior distributions of parameters, and testing data were held out to assess predictive abilities for each model. We used 60% and 40% of sites for the training (542-5074 sites, depending on species) and testing (361-3385 sites, depending on species) data, respectively, where all data at a site were included in the set to which a given site was assigned (Table S1.1). All model ranking was conducted using the logarithmic scoring rule (hereafter log scoring rule ;Broms et al., 2016;Gelman, Hwang, & Vehtari, 2014;, where log scores for each model were calculated from the posterior predictive density using the testing data set (Stevens & Conway, 2019;Appendix S3). This was equivalent to selecting a model based on its ability to maximize the joint probability of observing the testing data (i.e. log-likelihood evaluated at the testing data for every combination of model parameters contained within the posterior distributions; Gelman et al., 2014;, which consisted of samples from across the entire U.S. breeding range of each species. Selecting models based on fit of their predictions to out-ofsample data is a key component of predictive model selection Houlahan, McKinney, Anderson, & McGill, 2017), and our approach to model selection optimized spatial prediction across the breeding range of each species. This approach regulated model complexity explicitly using out-of-sample predictive performance, thus naturally alleviating the problem of degraded predictive performance through overfitting of training data that is common to many SDMs (Heikkenen et al., 2012;Wenger & Olden, 2012). As such, this approach to model selection is conceptually different than selecting the appropriate level of complexity based only on within-sample fit to training data (i.e. selection via a model's explanatory ability), or by attempting to balance fit and complexity using information criteria, which can result in overparameterized models with degraded out-of-sample (Broms et al., 2016;Stevens & Conway, 2019). Nonetheless, we also sought to make ecological inferences about effects of covariates identified for their ability to predict marsh bird occupancy, as is common in ecological studies when prediction is the goal (Broms et al., 2016).
Such inferences are reasonable given the occupancy modelling framework (that separates detection and occupancy probabilities) and the suite of ecologically relevant covariates that we used.
Moreover, prior to including occupancy covariates in models, we selected a detection model for each species based on its log scores from the intercept-only occupancy model (see Appendix S4) and then used the best detection model for examining covariates that affected occupancy.
We identified optimal multi-scale SDMs for predicting rangewide occupancy for each marsh bird species. We first conducted scale optimizations for each covariate by fitting univariate occupancy models for each local variable at all three extents (100, 224 and 500 m) and then selecting the optimal extent for each variable (see Appendix S5). Next, we identified the best scale-optimized habitat covariate within each of the four groups of variables representing wetland types and disturbance (wetland systemsubsystem, class, water regime and special modifiers and GAP disturbance variables). This resulted in four scale-optimized covariates that were used to create a local-habitat model set for each species. Specifically, the model set consisted of all additive combinations of each scale-optimized covariate, for a total of 15 local-habitat models, and the best habitat model was selected from this set for each species. In addition, we identified the best broadscale disturbance variable by fitting univariate models for each covariate at both the catchment and network levels and identifying the best variable-level combination for each species.
We then created a final model set that included the best localhabitat model from the prior step plus the best catchment or network level disturbance variable, as well as hypothesized temporal trends in occupancy over the study duration that considered linear, quadratic and cubic trends (Appendix S5), and selected the best model for each species. Importantly, habitat and disturbance covariate data sets were not changing over the study duration (i.e.

Species Top model Bayes-P
Pied-billed grebe − riv.lp (224)  TA B L E 1 The most supported hierarchical occupancy model predicting range-wide breeding distribution within the continental USA for each of the 14 marsh bird species they were temporally static), and thus serve to aide spatial prediction of occupancy rather than temporal assessment of species persistence at a site. This limitation is due to the lack of available time-specific measurements of wetland and disturbance conditions over the study duration at the extent of the continental USA, which is a common limitation of SDMs developed over very broad spatial extents (Elith & Franklin, 2013). In summary, we used a multistage Bayesian model selection procedure to identify which scale-optimized covariates were included in the final model set for each species, and selected the top model from this set as the model that was best at predicting out-of-sample data (see Figure   S1.15 for visual depiction of process). Lastly, we refit top models to the entire data set for each species (combining training and testing data;  to generate final parameter estimates and to evaluate goodness-of-fit by generating Bayesian P-values based on the sum of absolute residuals (pgs. 246-249 in Kéry, 2010).

| RE SULTS
Optimally predictive models included wetland and human disturbance variables measured over all spatial extents (100, 224, 500 m) and levels (catchment, network), and the variables differed among species ( effects for ≥1 of these were found for 10 species (Table S4.2). In addition, models for eight species included interactions between Julian date and time of day, and thus, joint, nonlinear effects of covariates on detection were common but species-specific. Top models fit the data well for 13 species (all except sora; Table 1).
Watershed-level hydrologic disturbance variables also typically had negative effects (Figure 3a), and agriculture within a watershed had either negative (common gallinule and king rail) or neutral effects (pied-billed grebe and yellow rail; Figure 3b). However, some exceptions to these patterns did emerge (Figures 1-3). F I G U R E 1 Effects of natural wetland vegetation (a), agriculture (b) and human development (c) on breeding-season occupancy (ψ) within the continental USA for marsh birds. Solid lines represent expected occupancy probabilities when all additional covariates were held at their median values, and colours represent individual species. Species affected by natural vegetation (scrub-shrub wetland; a) were pied-billed grebe (light brown), American coot (grey), king rail (light green), Ridgway's rail (light blue), sora (dark blue) and black rail (black). Species affected by agriculture (b) were least bittern (red), common gallinule (orange), limpkin (magenta) and yellow rail (yellow). Species affected by human development (c) were pied-billed grebe (light brown, linear decreasing), American bittern (dark green), American coot (grey), king rail (light green), clapper rail (dark brown, nonlinear decreasing), Virginia rail (dark red) and black rail (black). Specific covariates and scales of effect for each (100, 224 or 500 m) are provided in Table 1 In Distributions of marsh birds were also affected by wetland and disturbance variables at all five spatial extents (Tables 1 and F I G U R E 2 Effects of natural water regimes (a) and human-modified water regimes (b) on breeding-season occupancy (ψ) within the continental USA for marsh birds. Solid lines represent expected occupancy probabilities when all additional covariates were held at their median values, and colours represent individual species. Species affected by natural water regimes (a) were American bittern (dark green), American coot (grey), purple gallinule (purple) and king rail (light green). Species affected by human-modified water regimes (b) were least bittern (red, linear decreasing), common gallinule (orange), limpkin (magenta), clapper rail (dark brown), sora (dark blue), Virginia rail (dark red, linear increasing), black rail (black) and yellow rail (yellow). Specific covariates and scales of effect for each (100, 224 or 500 m) are provided in Table 1 F I G U R E 3 Effects of watershed-level disturbance associated with modification of hydrology (a), agriculture (b) and human development (c) on breeding-season occupancy (ψ) within the continental USA for marsh birds. Solid lines represent expected occupancy probabilities given all additional covariates were held at their median values, and colours represent individual species. Species affected by modification of hydrology (a) were American bittern (dark green), least bittern (red, linear decreasing), Ridgway's rail (light blue, nonlinear decreasing), sora (dark blue, linear decreasing) and Virginia rail (dark red, nonlinear decreasing). Species affected by agriculture (b) were pied-billed grebe (light brown), common gallinule (orange), king rail (light green) and yellow rail (yellow). Species affected by human development (c) were American coot (grey), limpkin (magenta) and purple gallinule (purple). Values along the x-axis represent standardized units for each hydrologicmodification variable (a). Specific covariates and level of effects for each (catchment or network) are provided in Table 1 S5.1-S5.56), and all species were affected by features at >1 extent.
Occupancy of pied-billed grebe, American bittern, American coot, Virginia rail, black rail and yellow rail was most affected by localhabitat features at the broadest spatial extent (500 m), whereas occupancy of least bittern, limpkin and purple gallinule was most often affected by habitat features at fine extents (100 m).
However, most species responded to one or more attributes at all three local scales, and the patterns varied among species (Table 1).
Importantly, temporal trends in occupancy were included in top models for nine species, and often included nonlinear trends over the study duration (Table 1)

| D ISCUSS I ON
We built SDMs for 14 species of marsh birds that incorporate wetland features and anthropogenic disturbances while also addressing challenges that hinder predictive performance of many SDMs: (1) regulation of model complexity based on out-of-sample predictive ability, (2) range-wide sampling over a broad range of conditions, (3) multi-scale analyses with spatial scales optimized for prediction and (4) direct modelling of heterogeneity in species-specific detectability.
These four issues have often been addressed in isolation or in pairs (e.g. Broms et al., 2016, integrated #1 and #4 above in a multispecies model, but not #2 or #3), yet we are unaware of another study that integrated all of these elements for a wide-ranging animal at a continental scale, in particular using occupancy models that include detection error. Our process can be used with Bayesian SDMs for any species to optimize model complexity and the ecological neighbourhood size (Addicott et al., 1987) of species-environment relationships for prediction. Moreover, this approach is conceptually and empirically different from the common tactic of selecting models via within-sample data followed by post hoc predictive validation Houlahan et al., 2017;Stevens & Conway, 2019).
Our models are also among the first SDMs built for marsh birds at the extent of their entire breeding ranges using data collected in the field (as opposed to expert opinion; see also Glisson, Conway, Nadeau, Borgmann, & Laxson, 2015). We provide evidence that marsh bird occupancy is affected by landscape modification, both locally and within watersheds. Not all localities within a geographic range are equally suitable for a given species, and our models can be used to deduce predictions of occupancy probability as a function of habitat and disturbance variables that improves upon binary range models that currently exist to support conservation of these birds over large extents (e.g. GAP models). As such, our models can assist decision-makers in focusing limited resources to inform conservation and monitoring efforts for marsh birds. We also demonstrate the complexity of species-specific detection error that occurs during field sampling, which included nonlinear effects and covariate interactions. Accounting for failed detection and its contributing factors is thus essential to avoid biased predictions when building SDMs for wetland-dependent birds.
We created SDMs to maximize spatial predictive ability across the range of each marsh bird species. Inadequate spatial transferability is a common problem for SDMs that limits a model's predictive ability in new areas where data were not collected; a problem F I G U R E 4 Time trends for breeding-season occupancy (ψ) within the continental USA for each of the nine species of marsh birds whose optimal model contained a time trend. Lines represent expected occupancy probabilities given all additional covariates were held at their median values, and colours indicate species with increased (a) or decreased (b) ψ over the study duration (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012). Species with increased ψ were pied-billed grebe (light brown), least bittern (red), common gallinule (orange) and sora (dark blue). Species with decreased ψ were American bittern (dark green), American coot (grey), purple gallinule (purple), Ridgway's rail (light blue) and black rail (black) that occurs when models are generated from sampling a subset of the environmental conditions over which a species is found (Elith & Leathwick, 2009;Elith & Franklin, 2013). When this occurs, prediction involves extrapolation in both geographic and environmental space and commonly results in poor performance despite metrics that may imply good fit of the model to training data (Heikkenen et al., 2012;Sequeira et al., 2016;Yates et al., 2018). We did not test spatial transferability by using regionally distinct training and testing data (Wenger & Olden, 2012), as our goal was not to use data from one subregion to predict occupancy in another. Instead, we selected models based explicitly on their predictive performance using out-of-sample data collected from across each species' range within the USA. The range-wide extent of field sampling meant that these analyses were conducted, by design, to minimize problems of spatial transferability within the breeding range by selecting models to optimize fit of predictions to out-of-sample data collected from across that range. Consequently, resulting models have the best spatial predictive capacity within the constraints of the covariates available over such broad extents, and predictions that are expected to result in interpolation when applied broadly, rather than extrapolation to novel conditions.
In addition, we identified the optimal spatial extents for a suite of species-environment relationships used to predict marsh bird occupancy. Ecological theory and empirical studies demonstrate that animal responses to environmental conditions are sensitive to scales of observation (Wheatley & Johnson, 2009;Wiens, 1989) and that local space use is often affected by conditions operating over broader spatial extents (Addicott et al., 1987;McGarigal et al., 2016).
We selected optimally predictive spatial extents for wetland and disturbance covariates, and then multi-scale models that included covariates at their optimal extents. Species responded individualistically to wetland conditions and disturbances across numerous scales, including conditions within focal patches (100, 224 or 500 m) and entire watersheds. Each species also responded to conditions at more than one scale, and thus, arbitrary selection of a single scale would have resulted in suboptimal prediction. These results highlight the benefits of a multi-scale approach to characterize habitat suitability, as opposed to the common approaches of measuring habitat relationships only at fine spatial scales or confounding changes in measurement extent with changes to the variables measured. More specifically, our results imply that the emphasis on effects of finescale vegetation on breeding marsh birds has likely missed other important drivers of occupancy across broad spatial domains.
Our models provide an important step towards prediction, and ultimately conservation, of areas with suitable breeding habitat for marsh birds across the continental USA. Existing predictions of marsh bird habitat are primarily based on expert opinion and collections of fine-scale and regionally focused studies (Bolenbaugh et al., 2012;Darrah & Krementz, 2008;Roach & Barrett, 2015;Roach et al., 2017). Prediction of suitable habitat over broad extents has thus commonly relied on spatial-temporal extrapolation and assumed that fine-scale habitat relationships scale up linearly across a landscape and that region-specific models are transferable. Not surprisingly, such predictions of suitable habitat have proven difficult to validate with field data (Bolenbaugh et al., 2012;Roach et al., 2017). In contrast, we used detectionnon-detection data collected throughout each species' breeding range to build predictive SDMs with their structure explicitly selected to optimize the range-wide spatial predictive capabilities for each species. These models provide a rigorous foundation for predicting habitat suitability using existing spatial covariate data measured at their appropriate extents, avoiding cross-scale extrapolation and providing more reliable predictions than models based solely on wetland attributes. Moreover, the covariates we used for modelling occupancy represent attributes known to affect the structure and function of wetland ecosystems. Thus, the interpretation of ecological effects on breeding-season occupancy for marsh birds is appropriate even though our primary emphasis was spatial prediction.
We provide evidence that anthropogenic disturbance, in a variety of forms and over a variety of extents, affects habitat suitability for marsh birds. Agriculture and urban development are among the biggest threats to wetland ecosystems globally, with direct effects from conversion of wetlands for other uses, but also degrading effects on remaining wetlands via fragmentation, changes to hydrology, increased nutrient loads and changes to water chemistry (Bedford et al., 2001;Tockner & Stanford, 2002;Zelder & Kercher, 2005). These changes have resulted in substantial effects on the distribution of plant and animal species that reside within wetlands. Agriculture may benefit wetland birds in some regions or circumstances (e.g. flooded rice fields in the southeastern USA; Czech & Parsons, 2002;Pierluissi & King, 2008), yet loss of wetlands to agriculture and development have been implicated as a threat to marsh birds (Glisson, Brady, et al., 2015;Naugle et al., 2001). Although our results imply the effects of individual disturbances were species-specific, they also provide direct evidence of anthropogenic impacts on the breeding distributions of marsh birds that have been speculated for 30 years (Eddleman et al., 1988).
Modification of hydrology, both locally and within the larger watersheds where birds reside, affects the breeding distributions of marsh birds. Dams, diking, excavation and other physical modifications alter water flows, water depths and seasonality of fluctuating water levels within a watershed and, in turn, affect soil saturation, water chemistry, water temperatures and sediment dynamics that drive distributions of plants and animals (Bednarek, 2001;Tockner & Stanford, 2002;Zelder & Kercher, 2005). Water depth and its fluctuation can directly affect wetland birds in a variety of ways, including through accessibility to food, nesting habitat and escape cover (Ma, Cai, Li, & Chen, 2010;, and consequently, water depth is a reported driver of marsh bird habitat use (Flores & Eddleman, 1995;Harms & Dinsmore, 2013;Krementz et al., 2016;Lor & Malecki, 2006). Lack of natural water fluctuations can facilitate wetland succession by eliminating seasonal disturbances (e.g. high flows), which in turn can affect the composition and structure of wetlands to the detriment of marsh birds (Conway, Nadeau, & Piest, 2010;Winstead & King, 2006).
Thus, the extent of effects caused by hydrologic modifications may reach beyond those sampled by many field studies, extending far downstream of the source(s). Our findings are novel because we demonstrate that hydrologic changes can affect marsh birds via processes that operate at a variety of spatial scales, including local effects as well as effects that extend to the level of water- sheds. As such, we demonstrate that hydrologic modification has multi-scale, watershed-level and species-specific effects on marsh bird occupancy.
Most of our results also corroborate those of previous studies on the distribution and abundance of marsh birds. For example, we expected natural vegetation and water regimes to be beneficial because vegetation characteristics, water depth and flow affect the behaviour and distribution of breeding marsh birds (Alexander & Hepp, 2014;Baschuk et al., 2012;Conway & Sulzman, 2007;Darrah & Krementz, 2008. For example, the interplay of vegetation and water depth can affect the distribution and abundance of prey species (Baschuk et al., 2012), and consequently marsh bird foraging behaviour. Conway and Sulzman (2007) reported that black rails avoided deeper and open water but were commonly located in shallow water (≤2 cm) near transitional plant communities between wetlands and uplands. Natural flow regimes can maintain early successional wetlands beneficial to marsh birds, whereas artificial stabilization of flows can change the structure and composition of important vegetation communities Winstead & King, 2006). Eddleman et al. (1988) suggested that marsh bird conservation should be prioritized at natural wetlands with a high degree of elevation diversity (i.e. a variety of water depths). Glisson, Brady, et al. (2015) provided direct evidence that marsh bird occupancy in Wisconsin may be greater in natural than restored wetlands, and our results suggest that both natural vegetation and flow regimes are often beneficial for breeding marsh birds.
Many of the effects estimated from our models are congruent with existing knowledge of marsh bird ecology, yet some results were not anticipated and warrant further investigation. For example, we found positive effects of scrub-shrub wetlands measured over a variety of spatial extents on five species, with strong effects on occupancy for three of those species (pied-billed grebe, king rail and black rail). Effects of scrub-shrub wetlands on marsh birds reported by previous studies are highly inconsistent. Many studies have aggregated scrub-shrub and forested wetland into one variable representing woody vegetation, making it impossible to isolate the effects of scrub-shrub wetlands (forest is poor quality habitat for marsh birds; Budd & Krementz, 2010). Studies have variously reported negative effects (Darrah & Krementz, 2010), no effects (Budd & Krementz, 2010) and positive effects (Krementz et al., 2016) of this composite woody vegetation variable on marsh birds. Moreover, Conway and Sulzman (2007) reported black rail occurrence was positively correlated with mixed shrub vegetation, and Eddleman et al. (1988) reported that multiple rail species utilize shrubby wetlands during the breeding season. At a minimum, our results call into question the practice of aggregating scrub-shrub and forested wetlands when assessing multi-scale habitat relationships for marsh birds. Our findings also suggest a more refined understanding of the relationships between marsh bird use of wetlands and shrub composition measured over a range of spatial scales is needed.
Perhaps more surprising are the positive relationships we documented between human modification of habitats and occupancy for several marsh bird species. For example, areas with modified water regimes, such as diked and excavated wetlands, were beneficial for four species (clapper rail, limpkin, sora, Virginia rail). Others have suggested these practices may be detrimental to marsh birds, particularly in coastal areas (Ma et al., 2010). We interpret these relationships as providing evidence that man-made and managed wetlands may provide important habitat for these species, and many wetlands managed specifically for wetland birds can be categorized as diked and excavated (e.g. wetlands on National Wildlife Refuge System lands). We also suspect these variables may be acting as proxies for wetland attributes related to the depth and distribution of water during the breeding season in created wetlands, and its subsequent effects on foraging behaviour, nest site selection and habitat use (Eddleman et al., 1988;Legare & Eddleman, 2001;Ma et al., 2010). Less intuitive are the positive fine-scale relationships between occupancy and agriculture (limpkin, common gallinule), and the fine-scale (American bittern) and broadscale (purple gallinule) relationships with human development. Other studies have also reported positive associations between marsh bird occupancy and agriculture (Quesnelle et al., 2013;Valente et al., 2011 (Bolenbaugh, Lehnen, & Krementz, 2011;Darrah & Krementz, 2008 We also assessed empirical evidence for recent changes to breeding ranges for marsh birds and, to our knowledge, are the first to do so at the extent of the entire range of each species within the continental USA. Concern over population declines and range contractions for marsh birds has been prevalent for at least three decades, yet rangewide status assessments for this group are rare. Our results indicate recent range contraction for five species (American bittern, American coot, black rail, purple gallinule, Ridgway's rail). However, we found little evidence of changes to breeding occupancy for several species, and evidence of increased occupancy for four species (common gallinule, least bittern, pied-billed grebe and sora). Population declines and range contraction likely occurred prior to our study, and temporal changes in occupancy may fail to detect abundance declines if the area occupied remains unchanged. Nonetheless, five species had declining breeding-season occupancy over the 14-year study, and those are species of conservation concern. Importantly, these results provide a baseline for which future studies of the status and distributions of marsh birds can be compared, and thus fill a vital information gap for monitoring, conservation and recovery efforts for these birds.
Lastly, our analyses reduced a large set of plausible predictor variables to a smaller set of variables that were used to construct final model sets for predictive comparison. This approach produced models with few covariates relative to size of the initial covariate set, yet it remains possible that other tactics to reduce the number of covariates (or their individual contributions towards prediction) would result in different models, possibly with better predictive abilities. Alternative approaches to variable selection that still focus on out-of-sample prediction are also worthy of consideration for future studies. In particular, formal Bayesian variable selection methods can be employed to reduce large sets of predictor variables, or to use all variables simultaneously while optimizing the shrinkage of uninformative parameters for prediction. For example, continuous model selection techniques like the Bayesian Lasso or ridge regression can include all variables simultaneously and optimize the prior distribution of regression coefficients for prediction in order to regulate the magnitude of each coefficient and therefore its contribution towards prediction O'Hara & Sillanpää, 2009).
These techniques have been successfully demonstrated in ecological applications for developing predictive models of the distribution and abundance of organisms (Gerber, Kendall, Hooten, Dubovsky, & Drewien, 2015;Stevens & Conway, 2019).
While our models provide the first data-driven SDMs built at the extent of the breeding range for most of our 14 species, they also provide a foundation for further refinement to improve understanding of multi-scale habitat relationships and predictions of habitat suitability for marsh birds. Our spatial covariate data sets were limited in their temporal resolution and were only available as time-invariant snapshots to aide spatial prediction. These data are updated periodically over time, however, as are the field-sampling databases that store marsh bird detection-non-detection records.
Our models provide predictions that can be refined as additional data become available, and their Bayesian implementation provides a rigorous foundation for adaptation over time using Bayesian learning (i.e. using posterior distributions from our models as priors to regulate updated parameter estimates; . This approach could also be used for spatial adaptation, effectively enabling regional heterogeneity in relationships (Nice et al., 2019) to be modelled directly by updating our models with localized data.
Further refinement could also include spatially hierarchical analyses (e.g. Lipsey et al., 2017) that define occupancy at multiple nested levels (e.g. pixel, catchment, watershed) and model conditional occupancy processes directly. This could facilitate modelling of crossscale interactions among variables, reduce overlap of buffers used to measure covariate data, and enable study of range boundaries, furthering our understanding of why breeding marsh birds occur in some areas but not in others. Nonetheless, our models provide rigorous predictions of occupancy that were optimized for spatial prediction to take advantage of existing broadscale data on wetland attributes and human disturbances, and consequently will facilitate spatial prioritization of habitat conservation for this sensitive group of wetland birds.

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information may be found online in the Supporting Information section at the end of the article.