Using modelled prey to predict the distribution of a highly mobile marine mammal

Species distribution models (SDMs) are a widely used tool to estimate and map habitat suitability for wildlife populations. Most studies that model marine mammal density or distributions use oceanographic proxies for marine mammal prey. However, proxies could be a problem for forecasting because the relationships between the proxies and prey may change in a changing climate. We examined the use of model‐derived prey estimates in SDMs using an iconic species, the western Arctic bowhead whale (Balaena mysticetus).


| INTRODUC TI ON
Species distribution models (SDMs) are a widely used tool to estimate and map habitat suitability for wildlife populations. Correlative SDMs associate species presence or presence-absence data with environmental predictor variables, fit a relationship between predictor and response, and make predictions for an area of interest (Redfern et al., 2006). These types of SDMs have been used to conduct near real-time monitoring (Abrahms et al., 2019;Hazen et al., 2017) and address a number of conservation challenges, including evaluating and mitigating ship-strike risk (Redfern, Becker, & Moore, 2020;Redfern et al., 2019), and they have shown potential in forecasting species distributions (Becker et al., 2018). Many studies that model marine animal density or distributions use oceanographic proxies for prey (e.g., Becker et al., 2016;Redfern et al., 2006;Sequeira, Mellin, Rowat, Meekan, & Bradshaw, 2012), which is expected to be the true driver of species' migratory phenology and their distribution on foraging grounds. Oceanographic proxies are used because prey data are often unavailable at the spatial and temporal extents, and resolutions, needed to develop species distribution models. This lack of availability can be attributed to the logistical difficulties and costs associated with comprehensive sampling in marine environments. Oceanographic proxies may include static geographic features (e.g. depth, slope and distance from the coast), climatologically averaged environmental data (e.g. World Ocean Atlas (Locarnini et al., 2013)) and dynamic variables (e.g. sea surface temperature, salinity and chlorophyll) from in situ sampling, remote-sensing platforms or ocean models. However, the use of oceanographic proxies could be a problem for forecasting species distributions because the relationships between the proxies and prey distribution or density may change in a changing climate (Gregr, Baumgartner, Laidre, & Palacios, 2013;Silber et al., 2017). Further complications may arise from climate-induced shifts in the distribution and phenology of predators and their prey (Carroll et al., 2019).
Finally, the use of proxies obtained from limited in situ and modelled oceanographic data may not represent the spatial and temporal scales of processes important to foraging, such as processes that aggregate prey into concentrations sufficient to trigger predator feeding activity (e.g. Carroll et al., 2017;Mayo & Marx, 1990).
In the present study, we examined the use of model-derived prey estimates in SDMs of an iconic marine species, the western Arctic bowhead whale (Balaena mysticetus). Western Arctic bowhead whales engage in seasonal migrations between the Bering, Chukchi and Beaufort seas (Moore & Reeves, 1993) and, as such, are also widely known as the Bering-Chukchi-Beaufort population.
They spend winter in the Bering Sea. In April-May they migrate to the Chukchi Sea. Most whales then migrate eastward to feeding grounds in the Canadian Beaufort Sea, arriving in May-June (Moore & Reeves, 1993;. From mid-July to October, whales migrate westward and return to the Bering Sea (Mate, Krutzikowsky, & Winsor, 2000;Quakenbush, Small, & Citta, 2013). Our study focuses on this late summer/early fall period when bowhead whales are migrating westward, offshore of the north coast of Alaska, USA, where they largely feed on copepods and euphausiids (Lowry, 1993;Lowry, Sheffield, & George, 2004).
We used SDMs to model the bowhead whale distribution in this region during the late summer/early fall season from 1988-2012.
Extensive aerial surveys have focused effort in this area during this season because it is an important bowhead whale migration and foraging area. We paired bowhead whale presence-absence data from aerial surveys with bathymetry data and ocean conditions derived from the Biology Ice Ocean Modeling and Assimilation System (BIOMAS) Arctic Ocean model (Zhang et al., 2010(Zhang et al., , 2015. BIOMAS is a pan-Arctic biophysical model which estimates many physical and biological variables that are important to bowhead whales, including sea temperature, sea ice thickness, primary productivity and zooplankton groups known to be consumed to bowhead whales, namely copepods and predator zooplankton (e.g. euphausiids). We trained and tested Maxent and boosted regression tree (BRT) SDMs with data from mutually exclusive training and testing years and used the models to predict bowhead whale habitat suitability for each 8-day period from late August through late October of each test year.
The objectives of our study were to: (a) Assess the relative utility of bathymetry and model-derived predictor variables, including prey availability (from BIOMAS), to drive SDMs that correctly categorize bowhead whale habitat use in space and time, and (b) Assess the value of modelled zooplankton prey versus physical variables for predicting bowhead whale habitat use. These two objectives provide the foundational research needed to successfully use SDMs to forecast bowhead whale habitat under different climate change scenarios. More generally, these objectives contribute to our ability to predict how marine mammal distributions will change in response to climate-driven changes in oceanographic conditions.

K E Y W O R D S
Arctic, boosted regression tree, bowhead whale, forecast, hindcast, maxent, ocean model, species distribution model, zooplankton 2 | ME THODS

| Study area
Our study area is the western Beaufort Sea, from 140-160° W and from the Alaska coast to 72°N (Figure 1a). Our study period begins in 1988 and ends in 2012. For each year, we analysed data from 21 August-31 October. The region and dates of our study were determined by the dates and area covered by the bowhead whale survey programs used in this study.

| Presence-absence data
Observations of bowhead whale presence and absence were collected during aerial surveys of the Alaskan Beaufort and Chukchi seas. Our study used data from both areas and data collection programs. These surveys were originally intended to provide real-time information and data synthesis products to the U.S. Department of the Interior and other agencies regarding the fall migration of bowhead whales. Several U.S. government entities and their contractors have managed the aerial surveys since inception in 1979 (Clarke, Brower, Ferguson, & Willoughby, 2017). Since 2008 the surveys have been conducted by the U.S. Department of Commerce National Marine Fisheries Service (NMFS) and co-managed by NMFS and the Bureau of Ocean Energy Management. In aggregate, data collected during the surveys described here constitute one of the longest continuously running marine mammal monitoring programs in existence.
During 1988-2012, the years used in our study, the broader survey area spanned a large region which was partitioned into survey blocks of various sizes between 140-169°W, and between the Bering Strait (65°40ʹN) and 73°N. Surveys in most years were conducted between early September and late October, with surveys beginning earlier and ending later in some years. Additional details on the history of these surveys and their methodologies are provided in Appendix S1 and Figures S1.1 and 1.2.
Prior to 2006, the role and position of observers, primary versus secondary and port versus starboard, was not well documented for all surveys. However, each flight carried a minimum of two observers, a data recorder and two pilots executing the survey design described in Appendix S1. Improved documentation of in-flight observation protocols began in 2006, with a primary observer stationed at port and starboard bubble windows, and a dedicated data recorder that served as a secondary observer . A few recent studies have analysed data from surveys after 2008, when survey protocols were more homogeneous (Brower, Ferguson, Schonberg, Jewett, & Clarke, 2017;Clarke, Ferguson, Willoughby, & Brower, 2018). However, because our study included survey data from 1988-2012, a period of time when data were collected with different aerial survey platforms and without comprehensive documentation of observer configurations and data collection protocols, we treated the entire dataset as a presence-absence dataset.

| Environmental predictor variables
We derived a suite of environmental predictor variables from BIOMAS (Zhang et al., 2010). Output from ocean models such as BIOMAS are contiguous in space and time and do not contain data gaps due to cloud cover, common in observations from earthorbiting satellites. This feature of ocean models makes their output attractive for use as predictors in SDMs. BIOMAS couples three sub-models for sea ice, ocean circulation, and the pelagic ecosystem, respectively. The ecosystem model is adapted to the Arctic Ocean  (Kishi et al., 2007). With nitrogen and silicon as model currencies (Zhang et al., 2010, table 1), this model includes two phytoplankton components (diatom and flagellate), three zooplankton components (microzooplankton, copepods and predator zooplankton), dissolved organic nitrogen, detrital particulate organic nitrogen, particulate organic silica, nitrate, ammonium and silicate. Additional details on BIOMAS are provided in the Appendix S1. Comparisons of BIOMAS with satellite-observed variables demonstrate that it provides a good representation of the interannual variation sea ice extent, primary productivity, and surface-air temperatures. Comparisons of the model with in situ data show that it captures the timing and magnitude of the spring bloom in the Chukchi and Beaufort seas.
From the suite of BIOMAS outputs, we selected variables we deemed to have greatest potential influence on bowhead whale distribution: sea ice thickness, sea temperature, diatoms, flagellates, copepods and predator zooplankton. We included sea ice thickness and sea temperature because seasonal fluctuations in temperature and ice are dominant features of the Arctic environment, and bowhead whales are an ice-adapted animal whose migrations track seasonal changes in the environment. We used sea ice and sea temperature variables from BIOMAS rather than from satellite observations because BIOMAS assimilates satellite observations of sea ice concentration and sea surface temperature, and modelled values for these parameters reflect the combined effects of satellite observations and model dynamics. We included diatoms and flagellates because they represent the phytoplankton, and chlorophyll is often used as a proxy for phytoplankton and primary productivity in SDMs. We included copepod and predator zooplankton groups because bowhead whales are known to feed on organisms in both of these general zooplankton categories within our study area during the season under study (Lowry, 1993). We included bathymetry because previous studies have identified it as an important feature of bowhead whale habitat in the Beaufort Sea (e.g. Clarke et al., 2018).
We acquired bathymetry data from the International Bathymetric Chart of the Arctic Ocean version 3.0 (Jakobsson et al., 2012).

| Data selection and pre-processing
Aerial survey data and environmental predictor variables were temporally binned into 8-day periods and were spatially gridded to 4.6 km to conform to the native resolution of BIOMAS in the Beaufort and Chukchi seas (Zhang et al., 2010), NOAA Pathfinder v5.2 (Casey, Brandon, Cornillon, & Evans, 2010) and to other datasets available at these common spatial and temporal resolutions.
Depth resolved BIOMAS outputs (sea temperature, diatoms, flagellates, copepods and predator zooplankton) were summarized by calculating the mean value within the upper 100 m of the water column (summarized in Figure S1.3). A depth resolved representation of environmental variables is appropriate because bowhead whales exploit resources throughout the water column and are known to feed on prey located at various depths (Heide-Jørgensen, Laidre, Nielsen, Hansen, & Røstad, 2013), with feeding at the ocean floor being inferred from visual observations of mud on the heads of surfacing whales (Moore, George, Sheffield, Bacon, & Ashjian, 2010).
Our study used survey data from transects, crosslegs, circling and search legs to and from primary areas that were surveyed. To produce the bowhead whale presence dataset for this study, we assigned a 1 to each date (8-day period) and location (grid cell) at which one or more bowhead whales were sighted. We compiled an observed absence dataset, representing where whales were not sighted, by assigning a 0 to every date and location surveyed, after removing bowhead whale presence records.
Using the temporally binned and spatially gridded presence-absence data, we estimated survey effort and bowhead whale sighting rate. We estimated effort by summing the number of visits to each cell by the survey aircraft. We estimated sighting rate by dividing the number of presences at each grid cell by the number of visits to that grid cell. After temporal binning, there was a maximum of one visit and one bowhead whale presence for each grid cell within each 8-day period and year. Thus, the maximum number of presences or visits (i.e. effort) attributed to each grid cell over the whole study period was 225 (nine 8-day periods per year multiplied by 25 years).
This representation of survey effort and sighting rate is consistent with the data used in our species distribution models. After temporally and spatially binning survey data as described above, there was an average of 7,753 (std = 1934) grid cells surveyed each year, and an average of 135 (std = 89) grid cells at which a bowhead whale presence was recorded each year (Table S1.1).
Highly correlated variables may hinder interpretation of variable importance. To address this issue, we calculated Pearson's correlation coefficient between each pair of variables on a cell-by-cell basis.
Due to the large number of grid cells in our temporal and geographic domain (>4,000,000), we used a randomly selected subset of 100,000 grid cells for our correlation analysis. This sample size provided stable correlation coefficients, as determined by performing correlations with smaller and larger numbers of grid cells. Following the guidance of Dormann et al. (2013), we removed or modified variables with a correlation coefficient of |r| ≥ .70.

| Species distribution models
Species distribution models use species occurrence (response) and environmental data (predictors) to characterize the relationship between a species and its environment. Once derived, the fitted relationship is typically applied to environmental predictor data from a more expansive region than the species occurrence data alone, producing a continuous map of habitat suitability over some region of interest. Values in the final prediction map are usually scaled from 0 to 1 to represent habitat suitability ranging from poor to excellent, and to facilitate comparisons between prediction maps. There are many algorithms that may be used to derive the fitted relationship between species and environment (Elith et al., 2006). We used two SDM algorithms: Maxent and boosted regression trees. We selected these two modelling algorithms because multi-model comparisons have highlighted each as having high accuracy (e.g. Elith et al., 2006).
Each algorithm is commonly used in applied SDM studies, yet each operates on different principles. Both Maxent and BRT algorithms are capable of fitting complex nonlinear relationships. Maxent (Phillips, Anderson, Dudík, Schapire, & Blair, 2017;Phillips, Anderson, & Schapire, 2006) operates by selecting a distribution that maximizes entropy subject to a set of constraints drawn from environmental data across the study area or at the location of absences. See Elith et al., (2011) for a detailed yet accessible explanation of this algorithm. We used Maxent (v 3.4.0) with default settings. Boosted regression trees combine regression trees with boosting, a technique used to combine individual tree models in an adaptive fashion (Elith, Leathwick, & Hastie, 2008). We implemented BRTs with the "gbm" package in R (v 3.5.2, R Development Core Team, 2018) with code adapted from Elith et al., (2008). In fitting BRTs, there are three important settings: learning rate, tree complexity and number of trees.
To determine the optimal settings for our dataset, we ran a 10-fold cross-validation procedure and selected settings that minimized predictive deviance. Based upon this analysis we selected a learning rate of 0.01 and tree complexity of 9 (Table S1.2). The optimal number of trees determined by cross-validation was 2,200; however, we found little difference in deviance scores and habitat maps produced with 2,200 versus 1,000 trees. Therefore, to reduce computation time we used 1,000 trees, suggested as minimum by Elith et al. (2008). The bag fraction was set to 0.5. We fitted all SDMs using environmental predictor variable information from the same 8-day period and year of each presence-absence record.

| Predictive performance tests
We evaluated Maxent and BRT models built with (a) BIOMAS variables and bathymetry and (b) only BIOMAS variables (no bathymetry).
For each set of variables, we withheld 1 to 24 of the 25 available years for model evaluation. Test years withheld for model evaluation were selected randomly without replacement over all cross-validation folds. For example, models that were tested with two years were trained with 23 years, and there were 12 cross-validation folds. Test years for the first cross-validation fold were selected randomly from 1988-2012. Test years for the second cross-validation fold were selected randomly from 1988-2012, excluding the two test years from the first fold. This selection procedure was repeated to produce additional cross-validation folds until there were no more sets of two unused test years. Training data for each cross-validation fold consisted of the non-intersecting set of 1988-2012 with the randomly selected test years (Table S1.3 and Appendix S1). This type of crossvalidation uses the heterogeneity within the dataset as a surrogate for heterogeneity among datasets (Wenger & Olden, 2012), and it is suited to testing model performance in an unsampled time period (e.g. the past or the future) (Vaughan & Ormerod, 2005). Absence data used to train each SDM was a random selection of 10,000 grid cells drawn from the set of grid cells surveyed during the training years at which no bowhead whale was recorded. After discarding absences that did not overlap the spatial domain of predictor variables, we had approximately 9,870 absences to use in each SDM (Table   S1.3). Identical presence and absence datasets were used in corresponding Maxent and BRT SDMs.
We used each fitted SDM to make predictions for all nine 8-day periods (21 August-31 October) for each test year. We calculated one area under the receiver operator characteristic (ROC) curve (AUC) (Fielding & Bell, 1997;Hanley & McNeil, 1982) for each set of test years and model (Maxent and BRT) by concatenating predictions from all 8-day periods in the test years and evaluating them as if they were one prediction (Appendix S1). We also computed another model performance metric, the true skill statistic (TSS) (Allouche, Tsoar, & Kadmon, 2006). It is necessary to select a threshold to compute TSS, and we used the sensitivity-specificity sum maximization approach (Liu, Berry, Dawson, & Pearson, 2005) to select thresholds.
We assessed the relative contribution of each variable using output from Maxent software and the R "gbm" package.
We also fit models using an additional 15 variable combinations.
These model configurations consisted of 14 additional possible combinations of BIOMAS variables and the case of using only bathymetry. We fit and evaluated these models using the procedures described above for all other models.

| Habitat maps
To synthesize and present a concise representation of predicted habitat suitability, we generated maps of mean predicted habitat suitability in our study area for each 8-day period. We created the maps using results from the set of models having 5 test years, 20 training years and 5 cross-validation folds. From these models we generated one prediction map for each of the nine 8-day periods, each of the 25 years, and both Maxent and BRT models, resulting in 50 predictions of habitat suitability for each 8-day period. To generate the mean predicted habitat suitability for each 8-day period, we calculated the mean predicted habitat suitability value for corresponding grid cells from all 50 predictions.

| Data filtering
After subsetting the full bowhead whale dataset with our data and locations filters, there were 3,378 bowhead whale presence records and 527,021 absence records. Due to high correlation between copepod and predator zooplankton, we combined those variables by computing their average to represent bowhead whale prey (hereafter zooplankton). Diatoms were highly correlated with both copepods and zooplankton and were therefore excluded from our analysis (Table S1.4). The final BIOMAS variables used in our study were zooplankton, sea temperature, sea ice and flagelletes. These four variables BIOMAS variables were used for all experiments presented in our study, except where we fit models using the 15 combinations of BIOMAS variables and bathymetry.

| SDM performance
Using 1-17 test years, mean performance scores for the models that included bathymetry and BIOMAS variables remained above 0.8 and 0.5 for AUC and TSS, respectively (Figure 2a, Table S1.3). Overall, mean scores for the models built with only BIOMAS variables were lower than for models that included bathymetry and BIOMAS variables ( Figure 2b, Table S1.3). A notable decline in AUC and TSS scores for both model configurations (with and without bathymetry) occurred when the number of test years was greater than 21.
Since model performance scores were relatively insensitive to the number of test years less than or equal to 17, we elected to fit models using 5 test years and 20 training years to compare performance of models with different variable combinations. Using 5 test years allowed for prediction onto each of the 25 test years, allowed for 5 cross-validation folds, and allowed for an 80/20 training/testing data split (Table S1.5). Comparison of performance scores from models built with various variable combinations revealed a clear pattern ( Figure 3, Table S1.6). Models that included bathymetry scored the highest (mean AUC = 0.810, mean TSS = 0.512). Models that included zooplankton but not bathymetry scored second highest (mean AUC = 0.764, mean TSS = 0.437). Models that included temperature and excluded bathymetry and zooplankton scored third highest (mean AUC = 0.714, mean TSS = 0.366). Models that included flagellates and excluded bathymetry, zooplankton and temperature scored fourth highest (mean AUC = 0.670, mean TSS = 0.264), and the sea ice only models had the lowest scores (mean AUC = 0.632, mean TSS = 0.256).

| Environmental drivers
Overall, bathymetry was the most important variable in models that used that variable, while zooplankton was the most important variable in models that did not include bathymetry (Figure 4).
Variable importance measures for the Maxent SDMs that included bathymetry and BIOMAS variables suggested that bathymetry was the most important predictor variable (67.44%) followed by sea ice (19.64%), zooplankton (4.91%), flagellates (4.86%) and sea temperature (3.15%). Importance of zooplankton, flagellates and sea temperature were all within a standard deviation of one another. For BRT SDMs, bathymetry was again the most important variable (38.08%), followed by sea ice (17.22%), sea temperature (16.13%), zooplankton (15.12%) and flagellates (13.45%). With the exception of sea ice F I G U R E 2 Area under the curve (AUC) scores from Maxent (circle) and BRT (triangle) models as the number of independent test years varies from 1 to 24. Mean AUC score across all cross-validation folds is shown by the large marker. AUC scores for each fold are shown above and below the mean. The test years were mutually exclusive between folds for each test year bin. Model built with bathymetry and BIOMAS variables(a). Model built with only BIOMAS variables (b) Variable importance for BRT models was ordered as follows: zooplankton (30.78%), sea temperature (28.14%), sea ice (20.63%) and flagellates (20.45%). Sea ice and flagellates were within a standard deviation of one another, as were zooplankton and sea temperature.
Differences in the contribution of BIOMAS variables after bathymetry was removed from the models was greatest for zooplankton, whose contribution increased by 46.19% for Maxent models and 15.66% for BRT models.
Response curves for both Maxent and BRT SDMs ( Figure 5) provide a heuristic measure of bowhead whale habitat preferences.
For the models that included bathymetry and BIOMAS variables (Figure 5a), bowhead whale presence was associated with zooplankton most strongly between 0.5-1.0 mmol-N/m 3 . In contrast, for models that included only BIOMAS variables (Figure 5b), the response of zooplankton was more well defined, with the region between 0-0.5 mmol-N/m 3 being associated with low predicted habitat suitability, and the region between 0.5-1.0 mmol-N/m 3 associated with higher suitability. All other BIOMAS variable responses curves were similar to their counterparts in models with versus without bathymetry (Figure 5a vs. Figure 5b). The most highly preferred sea temperatures (averaged over 0-100 m depth) were between 4 and 12°C, with relatively high preference for temperatures in that range. There was a strong preference for low sea ice thickness. High predictions of bowhead whale habitat suitability were associated with low levels of flagellates, between 0 and 0.3 mmol-N/m 3 . The response to bathymetry was weak for bathymetry values less than approximately −500 m, and the response increased rapidly for bathymetry values of approximately −500 m to near 0 m.

| Habitat maps
We summarized habitat suitability predictions by averaging each 8-day prediction from Maxent and BRT models, and we used the models fit with 5 test years and 20 training years for this calculation. Results are shown for models that used BIOMAS variables and bathymetry (Figure 6a, Table S1.5) and only BIOMAS variables  Figure 6b). However, there are a few notable differences. Models with bathymetry predicted low habitat suitability in the narrow band of very shallow coastal water, which corresponds with lower sighting rates in this region during the surveys. Models that excluded bathymetry did not identify these nearshore waters as an area of low habitat suitability. Predicted habitat suitability from models with bathymetry resolved habitat suitability at a finer-grain, owing to differences in bottom topography. In comparison, predicted habitat suitability from models without bathymetry showed a smoother prediction surface. Finally, predicted habitat suitability in the final three 8-day periods (8-31 October) was noticeably higher for models that included bathymetry than for models that did not include bathymetry.
Mean habitat suitability for each 8-day period (Figure 7a Predictions of habitat suitability for corresponding grid cells, 8-day periods, and years, from Maxent and BRT were very similar ( Figure S1.4). The overall Pearson correlation coefficient between predictions from Maxent and BRT models that included BIOMAS predictor variables and bathymetry was r = .93, and for models that included only BIOMAS predictor variables it was r = .89.

| Sighting rates
Calculation of presence-absence sighting rate from the raw survey data (Figure 7b), consistent with the presence-absence data used in SDMs, revealewaters received more survey effort than deeper waters to the northeastd a generally unimodal pattern with the maximum presence-absence sighting rate in 14-21 September, and the minimum in 24-31 October. Sighting rate for the first 8-day period was within 1% of the rate for 24-31 October.

| D ISCUSS I ON
We found that including physical and biological modelled environmental data, especially modelled prey, is useful for building SDMs that accurately represent seasonal dynamics of species occurrence.
Our best performing models included static and modelled dynamic environmental variables whose values were contemporaneous with species' presence-absence records. Our results suggest that static variables may carry additional weight for species whose distributions that have not changed with respect to those static features and that geographic stability in a species' distribution, as in the case of the western Arctic bowhead whale in the western Beaufort Sea, may allow for SDMs trained on a small percentage of the data to achieve a high degree of predictive performance.

| The importance of bathymetry in bowhead whale distribution models
Predictions from models built with BIOMAS variables and bathymetry, versus those built with only BIOMAS variables, are more realistic in the narrow band of very shallow inshore waters (Figure 6a vs. Figure 6b), where bowhead whales are not typically seen.
Comparisons among variable combinations used to build models ( Figure 3, Table S1.6) also emphasized the importance of the bathymetry variable. These comparisons and the variable importance plots  showed that the two best models were (1) BIOMAS variables and bathymetry, and (2) bathymetry alone. The importance of bathymetry highlights the spatial stability in bowhead whale distributions and suggests there is likely a strong relationship between bathymetry and biophysical processes that determine the distribution of bowhead whales. When bathymetry was removed from the models, the importance of zooplankton increased more than each of the other variables (+46.19% for Maxent and +15.66% for BRT SDMs).
Modelled zooplankton declined seasonally by over 50% on average during our late summer/early fall within-year study period. Although bathymetry is a static variable whose correlation with zooplankton changes on a seasonal basis, it may be a reasonable proxy for prey  Table S1.3). We found that models with as many as 17 test years, and only 8 training years, maintained scores in general agreement with models that were tested with 1 year and trained with 24 years (Figure 2).

| The importance of modelled prey data in bowhead whale distribution models
Bathymetry has been found to be an important variable in other marine species distribution modelling studies (Ainley et al., 2011;Burgos et al., 2020;Gomez et al., 2017;Sequeira et al., 2012).
Although our model with bathymetry alone performed well, it is not dynamic and produces the same prediction for every 8-day period ( Figure S1.5). Consequently, it misses important temporal variability in the distribution of bowhead whales. Specifically, sighting rates are lower at the beginning and the end (Figure 7b) of the within-year study period (21 August-31 October), likely reflecting the seasonal dynamics of bowhead whale migration through the region. The models fit with only BIOMAS variables, and those with BIOMAS variables and bathymetry correctly predicted lower habitat suitability at the beginning and end of the study period ( Figure 6). There were only small differences in the predictive capacity between the models that were trained with only bathymetry versus those trained with BIOMAS variables and bathymetry ( Figure 3, Table S1.6). This was surprising because models trained with only bathymetry could not reproduce the within-season temporal dynamics seen in models that included BIOMAS predictor variables (Figures 6 and 7, S1.5). This may be an artefact of the temporal scale at which we measure AUC and TSS. Because AUC and TSS were calculated using test year(s) across 8-day periods, rather than evaluating 8-day periods across test year(s), the model with bathymetry alone was not penalized for its inability to capture the within-year changes that are seen in the sighting rates (Figure 7b). However, the within-year temporal variability in bowhead distribution shows that it is critical to include dynamic variables in the model (compare Figure 6 with Figure S1.5 and Figure 7). Scores from all models that included zooplankton were second in rank to those that included bathymetry ( Figure 3, Table   S1.6). The stability of scores across the combinations of variables that included zooplankton provides strong support for the positive utility of our particular zooplankton product (a combination of copepods and predator zooplankton) for predicting the distribution of zooplanktivores in the western Beaufort Sea.
The Canadian Beaufort Sea, east of our study area, and the Point Barrow, Alaska area at the western end of our study area are important feeding areas. The coastal shelf waters between these habitats are often thought of as a migratory corridor, yet feeding is known to take place in these waters (Clarke et al., 2015(Clarke et al., , 2018Lowry et al., 2004). Western Arctic bowhead whales feed on zooplankton, primarily euphausiids and copepods, and to a lesser extent amphipods (Lowry, 1993;Lowry et al., 2004;Moore et al., 2010). Energetic calculations (Walkusz et al., 2012), data from time-depth recorders (Laidre, Heide-Jørgensen, & Nielsen, 2007), and opportunistic observations (Moore et al., 2000) suggest that bowhead whales need to locate and feed on dense patches of prey in order to meet their caloric demands. We did not analyse behavioural data that could have identified feeding activity for individual bowhead whale presence records. Therefore, we are making the implicit assumption that feeding takes place and is important in the area and time of year of our study. This assumption has support, as feeding is known to take place in our study area (Clarke et al., 2015;Lowry et al., 2004).  Figure 5).
Our results suggested moderate importance of the sea ice variable, and the response curve suggests that bowhead whales are more likely to be found where sea ice thickness is low. It is important to note that BIOMAS models the thickness of ice, rather than the ice extent which is more commonly reported by remote observing platforms, such as satellites.

| Future research and conclusions
It is important to acknowledge that the generalized migration of western Arctic bowhead whales includes the Bering, Beaufort and Chukchi seas (Moore & Reeves, 1993). We intentionally limited our analysis to the areas in the western Beaufort Sea because it had the longest and most consistent data collection (Clarke et al., 2013 (Torres et al., 2015;Yates et al., 2018) and extrapolation (Mannocci, Roberts, Miller, & Halpin, 2017) will be critical to understanding how confidently SDMs can be used to make future predictions. Further study is required to understand how the models and variables used in this study would perform in other bowhead whale habitats and at other times of year.
The research presented here demonstrates the applicability of presence-absence datasets (e.g. observer bias not estimable, multiple or under-documented survey protocols) to generate predictions of habitat suitability using modelled environmental conditions. Due to the high cost of implementing advanced survey protocols, there is a growing need to understand the nuances of Maxent, BRT, and other presence-absence modelling algorithms when used to make out-of-sample hindcasted or forecasted predictions. Many monitoring programs and opportunistic platforms collect data in a fashion that is not suitable for state-of-the-art density modelling (e.g. Roberts et al., 2016). Therefore, research about our ability to incorporate these data into non-extrapolative SDMs is needed to lay the groundwork for future studies on how extrapolation, transferability and forecasting can be done with confidence (Derville, Torres, Iovan, & Garrigue, 2018;Fiedler et al., 2018;Mannocci et al., 2017) using presence-absence data.
Our models show the possibility of improving SDM performance using modelled dynamic predictor variables that estimate expected prey availability. Using more direct estimates of prey availability, rather than oceanographic proxies for prey, could be important for forecasting species distributions because the relationships between the proxies and prey may change in a changing climate (Gregr et al., 2013;Silber et al., 2017). Our results complement the understanding of bowhead whale ecology and show how presence-absence data can be used in combination with both static and spatio-temporally varying modelled environmental data to predict bowhead whale distributions.
The BIOMAS predictor variables were derived from a dynamic data assimilative three-dimensional ocean model with the demonstrated ability to project future ocean conditions (Banas et al., 2016), and a natural next step for study of this system is to use the BIOMAS variables to explore the potential effects of

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13149.