Predicting spatiotemporal abundance of breeding waterfowl across Canada: A Bayesian hierarchical modelling approach

Our aim was to develop predictive statistical models for mapping the abundance of 18 waterfowl species at a pan‐Canadian level. We refined the previous generation of national waterfowl models by (a) developing new, more interpretable statistical models that (b) explicitly account for spatiotemporal variations in waterfowl abundance, while (c) testing for associations with an updated suite of habitat covariates.

the direction and strength of species-habitat associations with covariates (Elith & Leathwick, 2009b). GLMs assume specific forms of functional response, but additional flexibility can be provided by adding smoothing terms and/or hierarchical structures (Merow et al., 2014). Although ML and statistical methods have traditionally been opposed, recent SDMs studies have shown that they could be complementarily combined with integrated approaches, such as boosted GLMs (Smith et al., 2019) or sequential ML application for covariate selection and GLM for final fitting (Brownscombe et al., 2019).
Because waterfowl are migratory birds with species-specific habitat preferences, their breeding distribution and abundance patterns are expected to vary not only in space (Barker et al., 2014), but also in time (Afton & Anderson, 2001;Drever et al., 2012). For species distribution modelling, the joint consideration of these two dimensions through dynamic habitat covariates and/or spatiotemporal hierarchical structures is expected to lead to more accurate predictions (Fink et al., 2010;Runge, Martin, Possingham, Willis, & Fuller, 2014). In addition, as the waterfowl count data derived from surveys, such as the WBPHS, are sampled according to a spatially structured, repeated-measures design, it is probable that the data exhibit a spatial and temporal autocorrelation (Ross, Hooten, & Koons, 2012). Explicitly accounting for these dependencies is necessary for reliable inference (Dormann, 2007;Hefley et al., 2017;Holloway & Miller, 2015). Spatiotemporal hierarchical models estimated in a Bayesian framework are proven methods for meeting these challenges (Hefley & Hooten, 2016; Martínez-Minaya, Cameletti, Conesa, & Pennino, 2018;Wikle & Hooten, 2010). These methods have been applied to modelling waterfowl abundance in North America (Humphreys, Murrow, Sullivan, & Prosser, 2019;Ross, Hooten, DeVink, & Koons, 2015;Ross et al., 2012).
In the spirit of continuous improvement, our study aimed to revisit the Canadian breeding waterfowl models constructed by Barker et al. (2014) by (a) developing new, more interpretable statistical models that (b) explicitly account for spatiotemporal variations in waterfowl abundance, while (c) testing for associations with an updated suite of habitat covariates. Our objective was to implement hierarchical GLMs (HGLMs) in a Bayesian framework to explicitly predict and map the spatiotemporal pan-Canadian abundance of 18 breeding waterfowl species over a 25-year period, using a suite of 232 candidate habitat covariates. This high dimensionality required defining an innovative, robust and flexible modelling strategy to cope with computational requirements and ensure both parsimony and high predictive power in the final models.

| Study area
Our study area included all of Canada, excluding the Northern Arctic ecozone (CEC, 1997) and other adjacent Arctic islands, with a total area of ≈8.5 million km 2 (Figure 1).

K E Y W O R D S
Bayesian hierarchical models, breeding waterfowl, Canada, covariate selection, habitat, INLA-SPDE, spatiotemporal, species distribution modelling

| Waterfowl data
Each spring, aerial observers of the WBPHS count adults of common waterfowl species seen within 200 m transects subdivided into segments of ≈28.8 km, using fixed-wing aircraft (Smith, 1995).
We retrieved segment-level annual counts of waterfowl species conducted over a period of 25 years  in the Canadian portion of the WBPHS. This period was defined to match the 1990 expansion of the survey area into eastern Canada and the availability of habitat covariates. The data from 2007 were excluded because of a deviation from the usual survey design (Barker et al., 2014). To identify a temporal signal and ensure representativeness, only the segments sampled for ≥5 years (n = 2,227 out of the 2,287 available segments; Figure 1) were considered in our analyses.
We extracted data for 14 waterfowl species and four species groups, each group including two or three species difficult to distinguish by the survey protocol (Table 1). These four groups are referred to as "species" hereafter. Based on the work of Barker et al. (2014), Johnsgard (2010) and Poole (2005), we classified the studied species into three nesting guilds: cavity, ground and overwater (Table 1). For a better characterisation of the breeding population and to avoid male-biased sex ratios, we transformed the raw waterfowl counts to total indicated pairs (TIP) based on species-specific life histories (Barker, 2015;Dzubin, 1969;Smith, 1995). To account for incomplete detection in estimating continental population sizes, visibility correction factors (VCFs) exist to adjust the WBPHS counts (Smith, 1995). However, these visibility adjustments are generated by the use of widely different methods at various frequencies and spatial scales across Canada (Barker, 2015; C. Roy, Environment and Climate Change Canada, 2020, personal communication). As explained by Barker et al. (2014), these differences preclude the use of VCFs to model detections effort at the segment scale.

| Habitat covariates
Given the ecological diversity of North American waterfowl species and the wide range of habitat covariates that are potentially relevant for modelling their abundance and distribution, we adopted an exploratory multistage variable selection approach (see Section 2.5).
As we had no information on the exact location of individual waterfowl occurrences within the 28.8 km segments, we had to characterize the waterfowl-environment associations operating at the scale of the area surveyed within each segment. Although the survey protocol records birds detected within 200 m of either side of the flight line, the flight line itself may not be precisely as mapped and may vary slightly among years, due to navigational error. We sampled segment-scale covariates in a 1 km buffer surrounding the mapped segment centre lines. Preliminary correlation tests between waterfowl abundance and habitat covariates measured at seven different radii ranging from 500 m to 10 km provided similar results, suggesting relative scale insensitivity over the range of buffer widths evaluated. The probable underlying reason is that the environmental F I G U R E 1 Location of the 2,227 segments of the Waterfowl Breeding Population and Habitat Survey (WBPHS) used in our study, with different colours indicating the number of years each segment was sampled. The beige colour denotes the area used for model predictions gradients at the measured scale rarely change abruptly. All covariates were centred and scaled to a mean value of zero and unit variance.

| Hierarchical spatiotemporal modelling
To predict the spatiotemporal abundance of 18 waterfowl species at a pan-Canadian level, we fit HGLMs in a Bayesian framework using the integrated nested Laplace approximations (INLA; Rue, Martino, & Chopin, 2009), a paradigm for Bayesian inference in latent Gaussian models. The main benefit of using INLA instead of the classical Markov Chain Monte Carlo techniques is computational. Being deterministic, the INLA approach needs no sampling, thus allowing faster estimation, even for complex spatiotemporal models with high dimensionality. We fit models using the R package "inlabru" (Bachl, Lindgren, Borchers, & Illian, 2019) that is built on the widely used "INLA" package (Lindgren & Rue, 2015).
For each individual species, our response variable TIP(s, t) was the TIP recorded for segment s in the year t. Segment centroids were used as model-point locations. The TIP counts exhibited overdispersion and excess zeros. Accordingly, we modelled TIP(s, t) using a zero-inflated negative binomial (ZINB) model, which can be described as a mixture of a point mass at zero with probability p and a negative binomial distribution with probability 1−p; see Greene (1994) for details. Adopting the ZINB R-INLA type 2 likelihood (Breivik, Storvik, & Nedreaas, 2017; Cosandey-Godin, Krainski, Worm, & Flemming, 2014;Hu et al., 2016), we modelled TIP(s, t) as follows: where p(s, t) is an additional probability for zero; I k=0 is an indicator function taking the value one, if true, and zero otherwise; and NB n, k, (s, t) is the negative binomial distribution with the dispersion parameter k. where with X(s, t) a design matrix with fixed covariates, = { 0 , …, k } is the vector of regression coefficients, and u(s, t) is a spatiotemporal random effect. Under a ZINB R-INLA type 2 likelihood, the additional probability for zero p(s, t) in Equation (1) is where the parameter adjusts for a zero-probability change. This probability is inversely proportional to the linear predictor (s, t).
The spatiotemporal random effect u(s, t) represents other processes not explained by the covariates, including spatial autocorrelation. It was modelled as a spatially continuous Gaussian random field with yearly realisations (dependent replicates with year as a grouping structure), using the stochastic partial differential equation (SPDE) approach (Lindgren, Rue, & Lindström, 2011 Note: n, number of covariates per class; y, period or year of data acquisition; r, spatial resolution; SD, standard deviation. The metrics "y-mean," "mon-mean" and "cu.mon-mean" referred to the three temporal schemes used for spatiotemporal covariate extraction, including (a) annual means; (b) monthly means for the January-May period; and (c) cumulative 1-month interval means for the January-May period (i.e., past climate from May to April, May to March, May to February and May to January), respectively. a For details on the classes/variables, see Table S1.1.
by modelling the spatial variation as a Gaussian Markov random field (GMRF) with zero mean and covariance defined by a Matérn correlation function. The GMRF was modelled and projected across our entire study area based on a spatial mesh created by a constrained refined Delaunay triangulation. The field values at model-point locations were interpolated using the weighted average of the three closest mesh nodes. The triangle sizes were small in the areas covered by the WBPHS but grew larger as the density of observations decreased ( Figure S1.1). To avoid edge effects, the mesh was extended outside the domain of interest. The mesh design was exploratory, where we sought a trade-off between the computational costs and precision of the spatial field. The priors for model hyperparameters were set to the R-INLA "non-informative" defaults (see Text S1.1 for details). As in a frequentist approach, this strategy allows the data to predominate in calculating posterior distributions.

| Four-step habitat covariate selection
To optimize the predictive abilities, interpretability and parsimony of the final model, as well as to handle the computational challenges related to the high dimensionality of our dataset, we defined a fourstep habitat covariate selection strategy based on (1) correlation tests and collinearity diagnosis; (2) component-wise gradient boosting with a stability selection procedure; (3) forward covariate selection; and (4) structured HGLM. This strategy was applied individually to each waterfowl species.

| Step 1: Correlation tests
As a pre-processing step, we performed bivariate Pearson's correlation tests between TIP and each of the 232 candidate habitat covariates. The covariates were evaluated sequentially in decreasing order of the absolute values of the correlation coefficients. The covariates were retained if they were "non-collinear" with any previously added covariate. The collinearity threshold was defined as a Pearson correlation coefficient of 0.70 in absolute value. A maximum of 50 covariates were retained.
In short, the first step of the process involves computing the negative gradient of a pre-selected loss function; we used a negative binomial log-likelihood as the loss function. In the following step, the covariates were fitted separately to a negative gradient vector using their corresponding base learners (i.e., GLMs). Once all potential base learners had been tested, the best-fitting one was added to the current model, and the negative gradient was reevaluated at the updated model fit. This procedure was repeated for a species-specific number of iterations, which was determined by an initial stability selection procedure to identify the covariates that were most commonly selected after applying the CGDB algorithm on 100 random subsamples of half of the available TIP data. For details on this procedure, see Hofner et al. (2015). We set the number of selected base learners (i.e., the number of covariates) per boosting run to 15, and the upper bound of the perfamily error rate to 2 with unimodality assumption. Given these specifications, the covariates selected in at least 90 of the 100 subsamples were considered stable and were retained for further analysis in Step 3.

| Step 3: Forward covariate selection
A forward covariate selection (FCS) in conjunction with a modified version of the HGLM described in Section 2.4 was conducted to search for the best combination of the remaining covariates. To reduce the computational costs of FCS, we used a spatially and temporally unstructured version of the HGLM, with random intercepts for segment and year. The combinations were ranked by the deviance information criteria (DIC), which measure the compromise between the fit and parsimony of the model (Rue et al., 2009;Spiegelhalter, Best, Carlin, & Linde, 2002). In FCS, all covariate pairs were tested, and the best-performing pair with the lowest DIC was retained. The number of covariates was iteratively increased, maximizing the decrease in DIC until no remaining covariate decreased the model DIC. The covariates included in the model with the lowest DIC were retained for Step 4.

| Step 4: Structured HGML
The structured HGLM of Section 2.4 was fit using the covariates from Step 3. We examined the resulting 95% credible intervals of the covariate coefficients and identified those in which the coverage included 0. If there were any, a final model was fit without any of the "non-significant" covariates.

| Habitat covariate selection summary
We summarized the representation of the four habitat covariate classes ("Geoclimatic," "Hydrologic," "Land cover" and "Forest attributes") in the final models by calculating the percentage of the total covariates that belonged to a given class weighted by the number of non-collinear candidate covariates in that class. We set the collinearity threshold to an absolute Pearson correlation coefficient of 0.70, as before. This statistic was calculated for the 18 species individually as well as by nesting guild (Table 1). Because covariates were classified into four classes, the weighted percentage for each covariate class should have been approximately 25%, if covariates were selected in proportion to their availability. We graphically reported individual associations included in the final models, with information on the magnitude and sign of the associations based on the estimated regression coefficients.

| Model checking
We interpreted the model inferential properties using the means, standard deviations (SD) and 95% credible intervals of the posterior distributions of model parameters. For all 18 species, we provided a model summary table of these statistics for each model parameter.
We also reported the nominal range, or the distance at which the spatial correlation modelled by the GMRF declines to 0.1 (Krainski et al., 2018); high nominal ranges would indicate that there is important residual spatial autocorrelation.
Model goodness of fit was assessed by computing the coefficient of determination (R 2 ), that is the proportion of variance in recorded TIP explained by the model, using the median of the posterior predictive distribution (i.e. the distribution of the predicted data). The who used this goodness-of-fit metric. A two-sample t test was used to compare the differences in ρ between the two studies.
INLA provides leave-one-out cross-validation model checks as a part of the fitting process (Held, Schrödle, & Rue, 2010), including the computation of conditional predictive ordinates (CPO; Pettit, 1990).
CPO analysis, which computes a leave-one-out predictive distribution for each observation, is a computationally less-demanding alternative to traditional K-fold cross-validation, which requires refitting each model K times to obtain a complete set of out-of-sample predictions. Details on the CPO analysis and its results are available in Supplementary material S5.1.

| Predictive mapping
A regular grid of 300-arcsecond resolution (cell area ranging from 26.6 to 63.9 km 2 ) covering the study area (Figure 1) was used for the predictions. Estimated predictions from fitted models were computed via the "predict" function of the "inlabru" R library (Bachl et al., 2019) that works by repeatedly drawing samples from the posterior distributions of the model parameters. Mapped predictions were obtained as function of both environmental covariates and the estimated GMRF. We provided maps of the yearly median in the units of TIP/km 2 and coefficient of variation (CV; SD/mean) of the predictions for each species in each grid cell. We also provided the annual maps of the GMRF.

| Model building and parameter estimates
Our models were successfully fit for all 18 species (see Tables S2.1 On average, 50 ± 0.0 habitat covariates were retained after Step 1, 11.6 ± 1.4 after Step 2 and 6.9 ± 2.0 after Step 3. The mean number of habitat covariates included in the final models (after Step 4) was 5.3 ± 2.2, ranging from eight for the goldeneye, northern pintail and ring-necked duck to one for the ruddy duck (Tables S2.1-S2.18).
We illustrate with a model of average performance obtained for bufflehead (Table 3)

TA B L E 4
Goodness-of-fit model evaluation metrics for the 18 species and comparison with the study of Barker et al. (2014), who modelled all 17 of our 18 species, excepting only Canada goose Furthermore, we calculated Spearman's correlation (ρ) between the recorded TIPs and TAS predictions for comparing our models with those of Barker et al. (2014), who reported this statistic (Table 4).
The correlation between our two sets of Spearman's correlations was ρ = 0.96, p < .01, indicating that the rank orders of model performance among species was very nearly identical in the two studies.

| Habitat covariate selection
Using guild-level stratification, Figure 2 shows the representation of the four covariate classes in the 18 models compared to the number of non-collinear candidate covariates in each class. Only "Forest attribute" covariates (37.90%) were selected more frequently than expected (>25%) given their availability. The "Hydrologic" (22.29%), "Geoclimatic" (21.23%) and "Land cover" (18.58%) covariates were selected less frequently than expected. "Forest attribute" was the single most highly represented covariate class for ground (40.96%) nesters, but not for overwater-(31.93%) and cavity-nesting (36.90%) species.
The covariates from the "Hydrologic" class were the most frequently selected in cavity nester models (42.18%). For overwater nesters, the "Geoclimatic" class was the most represented one (32.58%).
The 18 models included a total of 94 significant waterfowl-habitat associations involving 42 distinct habitat covariates (Figure 3).
The "Forest attribute" class accounted for 34/94 associations, of which 23 were positive. The proportional biomass of Populus tremuloides was the most frequently selected covariate with 10/94 positive associations in 10/18 species (Figure 3). The second most selected "Forest attribute" covariate was the proportional biomass of Picea mariana accounting for 5/32 associations, all of which were negative (Figure 3).
Thirteen covariates of the "Land cover" class accounted for 24/94 associations, of which 17 were negative. Among the 13 selected "Land cover" covariates, only three were reported as associations for more than one species (Figure 3). These three covariates included "edge density of cropland" (4/5 positive associations and 1/5 negative association with merganser), "patch density of sub-polar taiga needle-leaf forest" (4/4 negative associations) and "proportion of treed land cover" (3/3 negative associations).
The covariates of the "Geoclimatic" class accounted for 16/94 associations (Figure 3), of which nine were negative and seven were positive ( Figure 3). All 16 of these associations were either NDVI-(9/16) or elevation-related (7/16) covariates. The two most commonly selected "Geoclimatic" covariates included the standard deviation of elevation, which was negatively associated with four species, and the standard deviation of monthly NDVI within years, which was positively associated with four species. No direct climatic covariates, such as temperature or precipitation, were selected in any model.

| Predictive mapping
For each species, the maps of the predicted TIP and GMRF are available in Appendix S5.1. In addition, the spatial data objects (TIF files) containing predictions with technical information to import them into an R workspace are provided in Appendix S5.2. We illustrate with temporally averaged sample maps for American black duck, bufflehead and canvasback ( Figure 4). The recorded TIP (Figure 4, first row) visibly matches with the spatial patterns of the predicted TIP (Figure 4, second row).
The CV values (Figure 4, third row) were high in British Columbia, Yukon, Nunavut and northern Quebec, where the survey data were absent.
Within the surveyed distribution area of the species, the CV values were low and homogeneous. The general spatial patterns of GMRF (Figure 4, fourth row) were close to the predicted TIP, but with a coarser level of details. The annual fluctuations in the spatial patterns of the predicted TIP were low (see spatiotemporal sequences provided in Appendix S5.1).

| Modelling approach
We demonstrated the capacity of our Bayesian hierarchical modelling approach to predict spatiotemporal variations in the abundance F I G U R E 2 Representation of the habitat covariate classes in the 18 final models in comparison with the number of non-collinear candidate covariates in each class. If covariates are selected in proportion to their availability, then the weighted percentage for each class should be approximately 25% of 18 waterfowl species across Canada. On average, the model variance explained was 47% for spatiotemporal predictions and 74% for TAS predictions. We took advantage of the strengths of both ML and GLM paradigms to construct our models. We exploited the rapidity and power of the ML techniques, in particular of gradient-descent boosting (Hothorn et al., 2010), to select highly informative subsets of an initial set of 232 candidate covariates before fitting HGLMs.
Considering the high dimensionality of our dataset, using a HGLM approach only would have been computationally infeasible. This strategy led to robust, parsimonious and interpretable models that exceeded the predictive power of the pure ML models of Barker et al. (2014). We recognized the need for caution when comparing results among different modelling techniques. Our sample size was 40% larger than theirs (2,227 vs. 1,591), and it has been shown that GLMs improve their predictive accuracy as the sample size increases and might even surpass ML models for sufficiently large samples (Aguirre- Gutiérrez et al., 2013). The improved performance of our models might also be related to our richer suite of candidate covariates and/or to the explicit consideration of latent spatiotemporal processes via random fields included in the models.
Our waterfowl dataset was statistically challenging in being overdispersed and zero-inflated count data with discontinuous temporal and spatial domains. Taking advantage of the powerful INLA-SPDE approach (Lindgren & Rue, 2015), our analysis allowed the estimation of waterfowl associations with habitat covariates and spatiotemporal random effects in the form of a GMRF. The Gaussian random fields were able to capture the much otherwise-unexplained heterogeneity. These structures presumably represent large-scale processes that were independent of the model covariates. The random effects revealed a marked spatial autocorrelation in the distributions of many species, especially those with restricted core distribution areas such as the American black duck.
Ignoring spatial and temporal correlations could have resulted in incorrect estimates of covariates' effects and erroneous predictions. In our study, we used three indirect ways for disentangling the GMRF processes from the effects of environmental covariates. First,

F I G U R E 3
Waterfowl-habitat associations identified for each species along with the information on the sign (positive or negative) of the association as shown in Figure 4, the visual inspection of the GMRF revealed spatial structures indicating the presence of residual autocorrelation that could not be explained by the environmental covariates alone.
Second, to evaluate the specific contribution of the GMRF in model predictions, we re-computed model the R 2 values, our goodness-offit metric, using only the spatial-temporal random effects. With covariates excluded, the mean R 2 value dropped from .47 to .17. Third, in the fourth step of our covariate selection process, we evaluated the possibility that covariates selected within an unstructured version (neither spatially nor temporally explicit) of the HGLM were "wrongfully" identified as significant because autocorrelation was not explicitly modelled. This was done by confronting the covariates with an alternative hypothesis via a spatiotemporally explicit version of the HGLM that includes the GMRF. On average, for the 18 waterfowl species, 1.6 covariates were dropped in this step, as they were no longer significant when autocorrelation was accounted for.

| Waterfowl-habitat associations
Synthesis of duck-habitat associations extracted from the 18 models provided new insights into the environmental factors driving the distribution and abundance of breeding waterfowl in Canada. Searching for causal explanations for each of the 94 waterfowl-habitat covariate associations identified in the final models would be infeasible.
However, thanks to our explicit consideration of spatial autocorrelation, it is possible to state that the identified habitat covariates were not simply proxies for biome-level characteristics; had that been the case, the relationships would have been subsumed within the random effect. We believe it is reasonable to conclude that all covariates retained in the final models do indicate habitat selection of species within their respective ranges. This must reflect, in at least some cases, underlying processes related to feeding and/or nesting behaviour. For example, the positive association we found between cropland abundance and mallard TIP has previously been related to their foraging preferences (Forcey, Linz, Thogmartin, & Bleier, 2007;Lieske et al., 2018;Newbold & Eadie, 2004). Similarly, the positive association between deadwood biomass and goldeneye TIPs is easily interpreted in light of their cavity-nesting behaviour, although not, to our knowledge, previously demonstrated.
Forest-related covariates were the group most frequently selected in the final models, relative to their availability in the initial set. In contrast, no direct climatic covariates (e.g. covariates derived from measured precipitation or temperature) were selected in any model. Vegetation and forest patterns are driven by multi-scale factors, including regional climate, local topography, soil characteristics, hydrology and disturbance regime (Grime, 2006;Perry, Oren, & Hart, 2008). Thus, at the spatial resolution of our analysis (11.2 km 2 segments), the forest-related covariates derived from 6.25 ha imagery (Beaudoin, Bernier, Villemaire, Guindon, & Guo, 2017)) were "super-proxies" for possibly many underlying, interacting environmental conditions in which the waterfowl species select their breeding sites. This is well illustrated by the covariates related to the aboveground tree biomass of Populus species, which were included in 14/18 models. As with many waterfowl species, the trees of this genus are abundant along the prairie-boreal ecotone in Canada (Beaudoin et al., 2017) and in riparian habitats within the continuous boreal forest (Harper & Macdonald, 2001 1997). This new product enabled us to go beyond the traditional finding that "Ducks like water" (Pimm, 1994) and to reveal the details of species-specific wetland-habitat preferences. For example, we found that two cavity-nesting species (bufflehead and merganser) were positively associated with the patch density of bogs, whereas northern pintail was positively associated with the patch density of fens. We hope our demonstration of this new wetland map will encourage its broader use and continued development.

| Limitations
Considerable efforts have been made over recent decades in developing national-or continental-level environmental layers relevant to species distribution modelling in general and to waterfowl ecology in particular (topography, climate, land cover including wetlands and forest cover attributes such as tree species biomass). Many of these products are derived from remote-sensed imagery by means of ML or other statistical approaches. They cannot be validated over their whole extent. In addition, many of these datasets are based on the imagery captured in a single year, which forced us to assume that several habitat covariates were static over the study years. Although we took great pains to acquire the best available data, we recognize that our ability to model spatially varying temporal trends remains limited.
SDMs depend on the assumption that species distribution and abundance patterns are, at least, partially determined by measurable environmental conditions and, thus, may be modelled using habitat covariates (Austin, 2007). Because the studied waterfowl species are migratory birds, they are presumably affected by environmental conditions outside as well as on their breeding grounds.
By focusing on the breeding habitats only, our models missed a part of the interannual variability explained by varying conditions on migratory routes and wintering grounds (Franklin, 2010;McPherson & Jetz, 2007). The covariates indirectly related to the conditions outside the breeding season might have accounted for a part of this variability. We also acknowledge the importance of biotic factors, such as dispersal, competition and predation, as well as historical factors (Stralberg et al., 2017). Such factors are rarely incorporated into SDMs (Austin, 2007).
Despite the standardized WBPHS protocol that has been followed for more than 60 years, the data remain subject to imperfect detection due to spatial-temporal variation among observers, in environmental conditions and in species' behaviours. This is generally the case with species count datasets (Kéry & Schmidt, 2008). This could affect our findings in terms of both predicted abundances and habitat associations identified. To account for incomplete detection, VCFs are estimated as part of the survey protocol to adjust the WBPHS counts (Smith, 1995). As VCFs are calculated in widely different ways throughout the survey, we believe that using them will not correct the predictions but rather will introduce a different source of bias in the results (Barker et al., 2014;C. Roy, Environment and Climate Change Canada, 2020, personal communication).
Finally, we acknowledge issues related to the segment-level measurement of our response variable. Segments are organized sequentially along transects within strata. Accordingly, treating segments as independent samples would have resulted in pseudo-replication.
However, we believe that our spatial autocorrelation analysis substantially accounts for this spatial structure as well as for the repeated measurements. In addition, we recognized that representing segment locations by their centroids led to a loss in spatial accuracy in the sense that waterfowl abundances vary spatially within segments. However, the locations of individual observations within segments are not available. We believe that the assumption of internal homogeneity was appropriate to our modelling approach. Hughes, 2008) information on breeding waterfowl exist for these areas. These datasets could be combined with the WBPHS through the use of integrated SDMs (Fletcher et al., 2019;Isaac et al., 2020;Miller, Pacifici, Sanderlin, & Reich, 2019). In addition, the recent incorporation of waterfowl data from citizen science within advanced spatiotemporal modelling framework has shown promising results (Humphreys et al., 2019).

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13129.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data employed in the present work are public and can be down-