Static species distribution models in the marine realm: The case of baleen whales in the Southern Ocean

Information on the spatio‐temporal distribution of marine species is essential for developing proactive management strategies. However, sufficient information is seldom available at large spatial scales, particularly in polar areas. The Southern Ocean (SO) represents a critical habitat for various species, particularly migratory baleen whales. Still, the SO’s remoteness and sea ice coverage disallow obtaining sufficient information on baleen whale distribution and niche preference. Here, we used presence‐only species distribution models to predict the circumantarctic habitat suitability of baleen whales and identify important predictors affecting their distribution.


| INTRODUC TI ON
Information on marine species' spatio-temporal distribution and their relationship to the environment is pivotal for well-informed, proactive management strategies and conservation actions (Becker et al., 2016;Guisan et al., 2013). However, obtaining sufficient data on marine mammal distribution across large spatial scales is challenging due to financial and logistic constraints, particularly in remote oceans (Kaschner et al., 2012;Robinson et al., 2011). Marine mammal occurrence data are frequently biased towards coastal areas and shallow waters (Robinson et al., 2011) or, for polar regions, to easy-to-access regions during summer months.
Species distribution models (SDMs) are empirical methods that relate information on species occurrence to environmental variables to predict potential species distribution and identify potential ecological factors governing their distribution (Phillips et al., 2006).
SDMs are promising to further our limited knowledge of marine mammals' distribution and support marine conservation prioritization, for example, identify biologically important areas (Guisan et al., 2013;Redfern et al., 2006;Smith et al., 2020). Although SDMs in marine environments are relatively less common compared to their application in the terrestrial realm, recent years showed a significant increase in SDM usage for marine habitats (Marshall et al., 2014;Melo-Merino et al., 2020;Redfern et al., 2006;Robinson et al., 2011). The main challenge to model the distribution of marine species is the availability of sufficient reliable species data (Dambach & Rödder, 2011;Robinson et al., 2011).
Two species information types are commonly used in SDMs: presence-absence and presence-only data. Presence-absence models (e.g. generalized additive models-GAMs) require carefully designed surveys and thus are more common in small-scale SDM studies (e.g. Esteban et al., 2013;de Stephanis et al., 2008). Absence data are hard to estimate correctly (Lobo et al., 2010), especially for highly mobile and species and from remote areas (Smith et al., 2020).
Marine mammals spend a vast amount of time submerged and can be visually detected only when on or near the water surface. Their detection is sensitive to species behaviour and oceanographic and meteorological conditions (Barlow et al., 2001). This imperfect detection can lead to false absences, which affect SDMs evaluation and bias species distribution inferences (Guillera-Arroita, 2017; Lobo et al., 2010). This is even more serious as the detectability of marine mammals varies in time and space (Guillera-Arroita, 2017).
Furthermore, even dedicated surveys typically provide only a snapshot of species distribution and represent only a limited time and space range . Hence, not surprisingly, most SDMs use presence-only data. Presence-only models contrast species occurrences to a large sample of background locations to characterize the environment throughout the study area. Recent literature demonstrates the statistical validity of only a few presenceonly SDM algorithms, including point process models and Maxent (Renner et al., 2015). The implementation of robust presence-only SDMs is particularly advantageous in the marine realm due to the difficulty of efficiently obtaining systematic presence-absence data (Smith et al., 2020).
The Southern Ocean (SO) is a biodiversity hotspot area, showing distinctive biogeographic features and high environmental variability (Convey et al., 2014;De Broyer et al., 2014;Fabri-Ruiz et al., 2019;Guillaumot et al., 2020). The SO's sea ice environment represents a critical habitat for many threatened migratory and resident species, particularly for baleen whales (Filun et al., 2020;Thomisch et al., 2016;Van Opzeeland et al., 2013). Nevertheless, research efforts in the SO were limited because of its remoteness, vastness and sea ice coverage, posing considerable financial and logistical constraints (Bombosch et al., 2014;Scheidat et al., 2011).
Our knowledge of the biodiversity in most SO areas seems to reflect sampling effort rather than the actual biodiversity status (Convey et al., 2014) and thus improving sampling effort deserves a high priority for Antarctic science (Guillaumot et al., 2018).
Spatio-temporal information on species distributions from the SO, necessary for conservation planning and management, is particularly patchy. Research efforts are generally biased towards relatively small areas of the SO (e.g. the West Antarctic Peninsula), repetitive ship tracks (e.g. to and from Antarctic stations) and mainly limited to summer months. Simultaneously, deep-sea and remote regions (e.g.
the Bellingshausen and Amundsen Seas) remain largely underinvestigated (De Broyer et al., 2014). Most research vessels that operate in the SO are biased towards the operationally safe ice-free water and do not engage in the risk and costs of going deep into the sea ice (Herr et al., 2019;Williams et al., 2014), rendering modelling species distribution in the SO challenging (Guillaumot et al., 2018(Guillaumot et al., , 2020.

Nevertheless, carefully implemented and evaluated presence-only
SDMs can be a cost-effective tool to study species potential distribution and habitat and planning for future surveys in the SO.
In the SO, several baleen whale species have been extensively hunted to near extinction levels during the 20th-century commercial whaling, particularly Antarctic blue and fin whales (Kennicutt et al., 2016;Tulloch et al., 2018). Populations recovery is generally incomplete and shows variant recovery rates between species and SO regions, with some species exhibiting high recovery rates (e.g. humpback whales, Megaptera novaeangliae; Friedlaender et al., 2011;Tulloch et al., 2018) while others remain highly threatened (e.g. Antarctic blue whales, B. musculus intermedia; Branch et al., 2004;Tulloch et al., 2018). Information on the ecology and distribution of baleen whales in the SO is pivotal for Antarctic blue whale, Antarctic minke whale, baleen whales, fin whale, humpback whale, Maxent, presence-only models, Southern Ocean, species distribution models, static species distribution models the International Whaling Commission's conservation efforts and measures addressing potential climate change impacts in polar ecosystems . However, such information is limited (Leaper & Miller, 2011); and thus, relatively few studies have modelled the distribution of baleen whales in the SO. Some species receive more attention, for example, humpback whales, while others, for example, fin and Antarctic blue whales, receive less attention (Širović & Hildebrand, 2011).
The focus of this paper is to model the circumantarctic distribution of four baleen whale species that feature sufficient sighting data: Antarctic minke whale (AMW, Balaenoptera bonaerensis); Antarctic blue whale (ABW); fin whale (FW, B. physalus); and humpback whale (HW). We performed a rigorous screening of baleen whale circumpolar distribution data in the SO. We used Maxent (Phillips et al., 2006) as it is appropriate for the available presence-only data, with two ways of handling spatial sampling bias (no correction versus rarefication). We used spatial-block cross-validation for independent model evaluation and optimizing model complexity to improve predictions.
For each species, we predicted its circumantarctic habitat suitability and identified the most important predictors affecting their distribution and species suitability response to environmental changes. We compared our results with previous studies on these species in the SO and discuss reasons for observed differences. Finally, we evaluate the potential limitations of implementing static SDMs in the highly dynamic SDMs.

| Species data
Cetacean sightings south of 45°S were compiled from different sources. Only sightings after 1980 were considered to maintain a reasonable temporal match between environmental predictors and sightings. Data from three biodiversity repositories were quality controlled: the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/), the Ocean Biodiversity Information System (OBIS, 2018) and OBIS-SEAMAP (Halpin et al., 2009) awi.de/exped ition/ schif fe/polar stern.html) and data published in PANGAEA (https://www.panga ea.de/; details in Appendix S1-S5).
Data on baleen whales with sufficient sightings (AMW, ABW, FW and HW) were subjected to further quality control. We excluded erroneous occurrences or those with high uncertainty, for example, GBIF occurrences flagged with "known geospatial issues" and "possible" certainty level for Polarstern data. As biodiversity data repositories compile data from various sources, the same sighting can be duplicated within or between repositories. We excluded occurrences explicitly duplicated within and between data sources to avoid spurious high relative occurrence rates: only one instance of sightings with identical coordinates and date was retained. We excluded telemetry and catch data to avoid highly correlated occurrences, spatially or temporally. The final dataset consists of ~32 thousand sightings. The temporal distribution of species-specific sightings is shown in Figures S2, S7, S12 and S17. Note that figures in the Supporting Information are grouped by species (Figures S1-S20).

| Environmental predictors
Potential predictors were obtained at the highest available spatial and temporal resolution (Table S1). We prepared ecologically relevant predictors summarizing environmental conditions in the SO and act as a proxy for prey availability (Redfern et al., 2006). We calculated monthly and seasonal mean and standard deviation of each dynamic predictor to explore temporal trends and intra-seasonal variability, respectively. Seasons were determined as three-month intervals from January, except for metrics representing sea ice (see below).
Bathymetry data were downloaded from GEBCO (Weatherall et al., 2015). From bathymetry, we derived slope, aspect and closest distances to coast, 500 m and 1,000 m isobaths. The Antarctic coast was defined as the ice shelf edge, that is, excluding any cavities under the ice shelves. The 1,000 m isobath was used to represent the location of the continental shelf break.
We only considered Chl-a mean and standard deviation in summer, as the spatial coverage in other seasons was rather poor, prohibiting the calculation of meaningful circumpolar averages. Daily sea ice concentration (SIC) was obtained from Spreen et al. (2008). We used SIC data for complete years (2003-2010 and 2013-2017), with seasons customized according to the major phases of annual sea ice extent (https://seaice.uni-bremen.de/sea-ice-conce ntrat ion/time-serie s/): season 1 (January-March, summer, lowest extent); season 2 (April, sea ice formation start); season 3 (May-November, high extent); and season 4 (December, high sea ice melting). We determined the closest distance to seasonally averaged sea ice edge (SIE), where SIE was identified as the largest polygon with mean SIC >15% (Parkinson, 2002). We assigned a value of zero to cells intersecting with SIE, positive values north of SIE (open water; SIC <15%) and negative values south of SIE (SIC >15%) (following Ainley et al., 2004).
All predictors were projected into equal-area projection at | 1539 EL-GABBAS Et AL.
the climatological location of the Polar Front as defined by Orsi et al. (1995), which was chosen as a natural boundary of the SO with rather homogeneous hydrographic conditions south of it. Spatial gaps were interpolated using ordinary Kriging (Wackernagel, 1995) when necessary. After the rejection of less-informative predictors, as based on their temporal trends and personal experience, the initial list of predictors included 32 predictors (Table S1b).
We implemented predictor transformation when necessary (e.g. square root) to avoid the effect of few extreme values on model stability (Dormann & Kaschner, 2010). We excluded highly correlated predictors by maintaining a moderate maximum variance inflation factor of 4.5 (Zuur et al., 2010). This approach resulted in a total of 15 predictors used in the models (Figures S21-S22 and Table 1). Figure S23 shows environmental conditions at species-specific sightings against their full range in the study area.

| Species distribution models
We used Maxent v3.4.1  to train two model sets: (1) using all occurrences to estimate habitat suitability under the point process modelling framework (following: Renner et al., 2015; Model All ) and (2) using only one occurrence per cell (Model Unique ).
The latter is a special case of rarefaction, a commonly used method to correct for sampling bias and diminish the effect of spatial autocorrelation (Aiello-Lammens et al., 2015). It is expected that bias correction can lead to broader areas of suitable habitats (El-Gabbas & Dormann, 2018a;Phillips et al., 2009). Here, we used both models not to quantify the effect of sampling bias corrections, but to investigate whether and how they would affect our conclusions, under the assumption that differences in results reflect on model stability.
We used a 5-fold spatial-block cross-validation to evaluate model performance by maintaining spatial independence between training and testing dataset and to reduce the effect of spatial autocorrelation . We determined block size and how to distribute blocks into cross-validation folds using blockCV R-package (Valavi et al., 2019): size was determined as median spatial autocorrelation range of environmental conditions at sighting locations; blocks were distributed into folds balancing the number of occurrences ( Figure S24).
To improve model performance, we tuned Maxent's parameters using cross-validation (Merow et al., 2013). We used ENMeval Rpackage (Muscarella et al., 2014) to estimate the best combination of feature classes (transformation of predictors) and regularization multiplier (model complexity). For each model type and species, we used 40 combinations: five feature classes (L/LQ/H/LQH/LQHP; where "L" linear, "Q" quadratic, "H" hinge and "P" product transformation) and eight regularization multiplier values (0.5 to 4, with 0.5 increment). The combination with highest testing AUC (area under the ROC curve) using cross-validation was used in the final models (Table S2). We present the mean habitat suitability along with the coefficient of variation (ratio between standard deviation and mean prediction) as a measure of predictive uncertainty. In addition to cross-validation, we ran full models that used all occurrences. In each model, we estimated predictor importance using permutation importance and jackknifing. We show the results of the full models in the main text and cross-validated models in the Supporting Information. Note: Statistics: type of statistics used to calculate each predictor (SD = standard deviation); season: which season or month range was used; transformation: transformations implemented to maximize uniformity of the data; abbreviation: the abbreviation used in the figures; VIF: the value of variance inflation factor. Summer was defined as from January to March. See Table S1 for more information on the predictors used.

| RE SULTS
In general, both model types (Model All and Model Unique ) give similar results, with Model Unique resulted in a broader range of suitable habitats and slightly lower testing AUC, as expected after bias correction ( Figure 1 and Table S2). Generally, the most important predictors were sea ice related ( Figure 2). The uncertainty of cross-validated predictions was generally low and did not show a pronounced spatial pattern, reflecting the stability of these sub-models.
F I G U R E 1 Predicted habitat suitability of four baleen whale species in the Southern Ocean using Model All (all occurrences, left) and Model Unique (removed duplicate sightings, right). These maps represent predictions from the respective "full model," calibrated without cross-validation. Mean prediction from cross-validated models and their coefficient of variation are shown in the Supporting Information. Map colours range from blue (low suitability) to red (high suitability). All maps are on Maxent's cloglog scale

| Antarctic minke whale
Models predicted a circumantarctic habitat of AMW, with a general preference closer to the Antarctic coast except for a small patch southwest of the Balleny Islands and the Amundsen Sea coast towards the Ross Sea. 1 Most of the southern part of the Weddell Sea was predicted less suitable ( Figure 1 and Figure S1). The most important predictors were distance to summer SIE, mean summer SIC and SIC variability (Figure 2 and Figure S3). AMW was shown to prefer locations close to SIE and moderate SIC (<50%; Figures S4 and S5).

| Antarctic blue whale
Suitable areas for ABW were near the Antarctic coast (yet 50-300 km offshore), ranging from 30°W eastwards to 170°W (Figure 1 and Figure   S6), that is, along the East Antarctic coast and notably rather sparsely off West Antarctica. Other suitable areas include small patches in the Bellingshausen and Amundsen Seas and between Elephant and the South Sandwich Islands. The most important predictors were SIC variability, mean summer SIC and distance to 1,000 m isobath ( Figure 2 and Figure S8). Other relatively important predictors were bathymetry, temperature at 200 m and distance to summer SIE. Suitable habitats were predicted in areas with high SIC variability in December (c.a. 35%-45%) and low mean summer SIC (<40%) or low-to-moderate distance to 1,000 m isobath (<250 km; Figures S9 and S10). ABW habitat is more suitable close to SIE (with lower suitability south of it), at high temperature at 200 m (3-5°C) and locations with moderate depths (3,500-4,500 m; Figure S9).

| Fin whale
The most suitable areas for FW extend eastwards from Elephant Island to South Georgia Island, near Bouvet Islands, small patches close to the Antarctic coast from 30°E eastwards to 180°E and offshore of the Ross Sea (Figure 1 and Figure S11). Important predictors were distance to summer SIE, mean summer SIC, SIC variability, distance to coast and SSH variability (Figure 2 and Figure S13). Highest 1 The locations of geographic features mentioned in this study are shown in Figure 3.

F I G U R E 2
Permutation importance of environmental predictors used to train the models of (a) Antarctic minke whale; (b) Antarctic blue whale; (c) fin whale; and (d) humpback whale. Results of Model All (all occurrences) are shown in dark grey bars, while the results of Model Unique (removed duplicates) are shown in light grey bars. Bars and their accompanying error bars represent the mean and standard deviation of the permutation importance of cross-validated models. Blue dots represent the permutation importance of full models calibrated without cross-validation. The horizontal dashed line represents 5% permutation importance, above which environmental predictors were considered as potentially important for the distribution of the species (light green-dashed area). Plots for the jackknifing test are shown in the Supporting Information. For more information on the predictors used, see Table 1 [Colour figure can be viewed at wileyonlinelibrary.com] suitability was shown north of the SIE (<200 km and at ~1,500 km from it, only <100 km from the coast) or locations with low SIC (<50%), low temperature at 200 m (<-1.5°C) or low SSH variability (Figures S14 and S15).

| Humpback whale
The effect of sampling bias correction on predicted distribution was most evident for HW, due to intensive sampling west of the Antarctic Peninsula and in East Antarctica. Generally, suitable areas are the Western Antarctic Peninsula eastwards to the South Orkney Islands, around the South Sandwich and Bouvet Islands, and a strip close to the coast from 15°W eastwards to 170°W ( Figure 1 and Figure S16). The most important predictors were distance to summer SIE, SIC variability from April to November and summer SIC. Other important predictors include distance to coast, distance to 1,000 m isobath and SSH variability (Figure 2 and Figure S18). HW suitability was higher at locations close to SIE at summer SIC <60%. On the open water side of SIE, high suitability was found only at locations with high SIC variability (Figures S19 and S20). Moderate suitability is predicted <300 km from 1,000 m isobath and locations close (<100 km) or far (>1,000 km) from the coast ( Figure S19).

| Baleen whale habitats in the Southern Ocean
Overall, the most important predictors affecting baleen whales' habitat suitability in the SO are those derived from SIC. Sea ice cover varies within and between years, and this variability plays an integral role in whale distribution . Our use of seasonal variability of SIC can be considered as a proxy for site accessibility for whales; the higher the SIC standard deviation, the more accessible for whales (Wege et al., 2020). SIC variability affects prey (krill) survival, population dynamics and abundance (Fraser & Hofmann, 2003;, with highest observed abundances close to the SIE (Brierley et al., 2002;Murase et al., 2002;. Obtaining reliable data on the distribution and abundance of prey, particularly krill, are currently not possible at the circumantarctic scale (Robinson et al., 2011), rendering most studies dependent on remotely sensed predictors as a proxy for prey availability (Herr et al., 2019).
Transition zones, for example SIE and continental shelf break, are known high-productivity areas (Beekmans et al., 2010). The use of predictors describing distance to them can serve as a proxy for prey availability. The majority of visual observation data available to us were recorded using vessels unsuited for penetrating the ice, except  (11) Bouvet Island a few sightings obtained from icebreaker vessels (e.g. Polarstern) and icebreaker-supported helicopter surveys (e.g. Herr et al., 2019). This explains why only little sightings came from the south of the SIE.
Nevertheless, distance to SIE was one of the most important predictors for the models of the four study species. In the following, we briefly compare our species-specific results with results of other studies (summarized in Tables 2 and S3)

| Antarctic blue whale
ABW was once an abundant species in the SO, but is currently extremely rare after its intensive exploitation during the whaling industry era from 1904 until 1978 (Branch et al., 2007;Double et al., 2015;Kasamatsu, 1988;Miller et al., 2015). After the ceasing of the whaling industry, the circumpolar ABW abundance was reported to be depleted to only less than 1% of its original abundance before whaling (Branch et al., 2004(Branch et al., , 2007, making the ABW one of the most endangered baleen whale species in the SO (Leaper & Miller, 2011). Little is known on the distribution and migration patterns of ABW in the SO and its relationship with krill (Branch et al., 2007;Double et al., 2015;Thomisch et al., 2016).
We found the most important predictors are SIC-derived and distance to 1,000 m isobath. Other SDM studies provide limited information on the effect of sea ice on ABW's suitability. High ABW habitat suitability was predicted at low SIC (<40%) and close to summer SIE ( Figure S9). Similarly, Širović et al. (2004) and Thomisch et al. (2016) reported a negative correlation between sea ice coverage and the number of detected ABW calls in the Western Antarctic Peninsula and the Weddell Sea, respectively.
Nevertheless, ABW was also acoustically present in areas with high winter SIC (90%) in the Weddell Sea (Thomisch et al., 2016) and under non-navigable ice conditions in the Ross Sea . This suggests the overwintering of ABW in highly icecovered areas, potentially in local recurring polynyas (Thomisch et al., 2016). A high encounter rate of ABW near the SIE was also reported by other studies (Branch et al., 2007;Kasamatsu, 1988 Širović and Hildebrand (2011).
In contrast, Kasamatsu, Matsuoka, et al. (2000) reported a high encounter rate at lower temperatures, and Shabangu et al. (2017) showed high suitability of calling whales at ~0°C sea surface temperature (SST). We found moderately low importance of distance to coast, SSH (positive relationship) and Chl-a (positive relationship).
Similarly, Širović and Hildebrand (2011) found a non-significant relationship between Chl-a and calling ABW off the Western Antarctic Peninsula. In contrast, Shabangu et al. (2017) showed that these predictors were among the most important predictors for call detections: peak suitability close to coast, then sharply declined until ~1,000 km; low suitability at SSH around −1.5 m and high elsewhere and high suitability at low Chl-a.

| Fin whale
Although FW was the most caught species in the SO during the 20th-century commercial whaling (>718 K whales taken), there is limited information on its distribution, abundance, demographics and environmental variables affecting its ecology (Herr et al., 2016;Santora et al., 2014). A relatively recent estimation of FW population in the SO has shown that it is currently at only 2% of the presumed pre-whaling estimated abundance (Leaper & Miller, 2011).
We found that the most important predictors are SIC-derived predictors, distance to coast, SSH variability and temperature at 200 m. We found highest (although moderate) suitability close and far (~1,500 km) from the coast. In contrast, Williams et al. (2006) found that abundance increases with the distance from coast, with the lowest intensity close to it off the northern Antarctic Peninsula. Santora et al. (2014) reported FW preference for more complex bathymetry off the northern Antarctic Peninsula. Murase SST and FW abundance off the Western Antarctic Peninsula, and similarly, Kasamatsu (1988) and Kasamatsu, Matsuoka, et al. (2000) reported a higher encounter rate at warmer temperatures (>1°C).
We found low importance of Chl-a with no clear relationship, which conforms with Murase (2014)  found a high encounter rate far from SIE, and, similarly, Scheidat et al. (2011) reported that the FW majority was observed >140 km from SIE.

| Humpback whale
Although HWs were highly exploited during the 20th-century whaling industry, with >150,000 caught whales between 1904 and 1966 (Nowacek et al., 2011), the population has been increasing since the cessation of the whaling industry . HWs are the most common whale species in the Western Antarctic Peninsula area in summer (Scheidat et al., 2011) and seem to be absent from the Ross Sea (Branch, 2011;Leaper & Miller, 2011). This conforms with the areas predicted as suitable habitats by our models (Figure 1 and Figure S16). Important predictors were SIC-derived, distance to coast and 1,000 m isobath, as well as SSH variability. Highest suitability is predicted at locations with low SIC or locations either close to SIE or far from it on the sea ice-free side ( foraging behaviour in East Antarctica. In contrast, we found no clear relationship with salinity, Chl-a and temperature at 200 m, and negative with SSH, but neither of them was an important predictor. In concordance to our results, Riekkola et al. (2019) found a negative relationship between HW foraging behaviour in the Pacific sector of the SO and SSH and low importance of speed, while Kasamatsu, Matsuoka, et al. (2000) found no relationship between HW density and SST.

| Reasons for discrepancies between studies
Unambiguously asserting reasons for the discrepancies between the results discussed above is challenging, as we do not know the true preferred niche of these species. Generally, inconsistency can be attributed to data and methodological reasons. Most studies used occurrences from a limited time frame (e.g. from within summer months of 1-2 years) or covered only a small section of the SO, for example, the northern Antarctic Peninsula (Santora et al., 2014;Williams et al., 2006), the Western Antarctic Peninsula Kasamatsu, 1988;Murase et al., 2013;Širović & Hildebrand, 2011;, East Antarctica ( Seas (Kasamatsu, Ensor, et al., 2000). The use of spatially or temporally limited sightings and environmental data makes it difficult for these models to capture the full range of species niche (e.g. causing truncated or biased response curves; Barbet-Massin et al., 2010;Thuiller et al., 2004). Although it is technically possible for these models to predict potential distributions at the circumantarctic scale, the necessary extrapolation to novel conditions or new combinations increases prediction uncertainty (Zurell et al., 2012).
Contrastingly, this study used circumantarctic visual observation data, covering a wide range of baleen whale suitable environmental conditions (and their combinations) in the SO. To date, only a few studies investigated the distribution and niche characteristics of baleen whales at the circumantarctic scale (e.g. Bombosch et al., 2014;Branch, 2011;Branch et al., 2007), possibly due to challenges obtaining sufficient data. SDM studies at large scales such as the SO assume stationary species-environmental relationships through space and time, that is, same niche characteristics at smaller areas of the SO or between seasons (Dormann et al., 2012;El-Gabbas & Dormann, 2018b;Osborne et al., 2007). The distribution of baleen whales varies between seasons and spatial divisions of the SO (Riekkola et al., 2019;. For example , Beekmans et al., (2010) found inconsistent relationships between environmental predictors and AMW density at circumantarctic and regional scales, suggesting that the relationships between AMW and environmental conditions can be best studied at a regional rather than circumantarctic scale.
The vast majority of our sightings were made from the end of December to the end of February ( Figure S25). This evident temporal bias towards summer months seems inevitable when using only visual observation data. Passive Acoustic Monitoring (PAM), however, has provided ample evidence for the (near-) year-round presence of several species in this area (Filun et al., 2020;Schall et al., 2020;Thomisch et al., 2016;Van Opzeeland & Hillebrand, 2020;Van Opzeeland et al., 2013). Although we attempted to correct for spatial sampling bias using rarefaction, the absence of visual observations from the Weddell Sea has affected model predictability in this area ( Figure 1). The integration of other data types in SDMs, for example, from tagged animals (e.g. Hindell et al., 2020)  to vary together (e.g. Figure S5). However, Maxent quantifies permutation importance based on training AUC drop after permutation (Phillips, 2017). Thus, spatio-temporal biases in species data can highly affect this estimate.

| Static SDMs in highly dynamic marine environments
The majority of SDMs, particularly when covering large spatial scales, including this study, are static. Static models use predictors summarizing environmental conditions over long periods (seasonal or annual averages over >10-50 years; e.g., Sbrocco & Barber, 2013), irrespective of the exact time of species sighting (Bateman et al., 2012). They assume species-environment relationships fixed in space and time and that locations with species detections represent suitable yearround habitats, which likely is a rather poor assumption, especially for migratory species (Bateman et al., 2012;Reside et al., 2010).
Static models are more appropriate in highly static environments (as is the case for many terrestrial settings) and for modelling less mobile resident species (e.g. plants and lizards). However, the marine environment is immensely dynamic and undergoes significant changes over short periods, which likely affects the distribution of highly mobile species (Fernandez et al., 2017).
Static models can neither capture environmental dynamics nor predict near-real-time species distribution necessary for dynamic ocean management. In a dynamic setting, static models can only provide a fictitious representation (in time) of species suitability for the period over which the model is calibrated. To obtain robust SDMs, it is necessary to maintain a spatio-temporal match between species occurrences and environment (dynamic SDMs ;Fernandez et al., 2017;Reside et al., 2010). This is particularly important for highly mobile marine species whose distribution is defined by both short-and long-term variations in ocean conditions (Mannocci et al., 2017). In contrast to conventional static models, dynamic SDMs capture the year-round species-environment relationships and allow predicting habitat suitability at finer temporal resolution (day-week-month).
The environment in polar regions, particularly the SO, is highly dynamic due to the seasonal waxing and waning of sea ice (Dayton et al., 1994). It hence appears intuitive to use dynamic, rather than static, SDMs to study habitat preference of migratory whales in the SO. However, obtaining many circumantarctic oceanographic variables at fine spatial and temporal resolution is challenging, compromising dynamic models' feasibility. Many variables are limited to the sea surface and are not available at high temporal resolution (e.g. daily or weekly) (Fernandez et al., 2017). For example, daily or weekly salinity and productivity data are not available from the SO, and daily oceanic temperatures are limited to the water surface.
Other variables show inconsistent and incomplete spatial coverage year-round. For example, Chl-a data are highly patchy and limited to summer months, which constrains its use in year-round dynamic models.
The unavailability of sufficient, less temporally and spatially biased sightings hinders efficient use of dynamic models and can, in part, explain modellers' preference for static over dynamic models (Milanesi et al., 2020). High spatio-temporal resolution of some en-  (Figure 3 and Figure S26-S27). Although we found high importance of SIC and distance to SIE in summer, relating species observations to their concomitant environmental conditions should be of higher priority in SDMs ( Figure S28).

| CON CLUS ION
In this study, we used presence-only SDMs (Maxent) to model the circumantarctic habitat of four baleen whale species and identified important predictors affecting their distribution in the Southern Ocean. Model performance was high (Table S2), with generally little predicted cross-validated uncertainty. Unsurprisingly, models identified sea ice-derived predictors and distance to continental shelf break as the main predictors. The indispensable role of sea ice in the lives of many Antarctic species, particularly krill-dependent predators, makes whale species sensitive to future changes in the distribution and the dynamics of the sea ice (Herr et al., 2019;Leaper & Miller, 2011;. Such environmental change signals have already been reported from polar regions, for example, the warming of the West Antarctic Peninsula area (Gutt et al., 2015;Vaughan et al., 2003), and the predicted shrinkage of sea ice in the Antarctic under all future climate change scenarios (Gutt et al., 2015;Leaper & Miller, 2011;Solomon et al., 2007). This emphasizes the need for more studies on the spatio-temporal distribution of baleen whales in the SO to understand the potential impact of climate change on these species. We compared our species-specific results with results of other studies in the SO and provided reasons for results discrepancy, which is generally attributed to the use of different species data quality and quantity, different study area extent and methodological reasons.
Maxent is known for its high predictive accuracy and considered one of the most frequently used technique in marine SDM studies (Melo-Merino et al., 2020). Our models back the usefulness of presence-only SDMs like Maxent as a cost-effective tool for studying the distribution of migratory whales (e.g. Smith et al., 2020).
The current work further supports the pivotal role of crowdsourcing data from biodiversity repositories (e.g. GBIF and OBIS) and circumantarctic dedicated surveys (e.g. SO GLOBEC and SOWER) to strengthen our knowledge about the distribution and niche of migratory whales in less-surveyed oceans (Beekmans et al., 2010).
Nevertheless, future surveys should be prioritized towards less studied areas and the pack ice region, especially beyond the summer months. Alternative data sources, such as PAM and from tagged animals, form a useful addition for studying marine mammals' habi- tat preferences year-round, but still require work before these data can be integrated. PAM is particularly useful in the SO for detecting rarely visually sighted species like ABW and covering difficultto-access areas (e.g. the ice-covered Weddell Sea). PAM data have already been used in SDMs for odontocete species producing clicks which propagate over short distances allowing to use environmental data from the recording sites (e.g. Gallus et al., 2012;Soldevilla et al., 2011). To date, only few applications have included baleen whales of which calls propagate over long distances causing uncertainty in the interpretation of the relationship between whales and the environment due to this potential mismatch in scales (e.g. Širović & Hildebrand, 2011;Stafford et al., 2009). Nevertheless, the use of PAM data in SDMS, particularly for species in polar waters, holds great potential that calls for exploring this further.

ACK N OWLED G EM ENTS
This work was financially supported by the German Federal Ministry of Food and Agriculture (BMEL) through the Federal Office for Agriculture and Food (BLE), grant number 2817HS004. We thank Andy Traumüller for helping in environmental variables preparation and all data providers for making their observation data publicly available, especially captains and nautical officers onboard Polarstern.
An earlier version of this manuscript was improved by comments of Alaaeldin Soultan. This work was partially performed on the HPC computer facility of the Alfred Wegener Institute. We thank Natalja Rakowsky for technical support on the HPC computers. Open Access funding enabled and organized by Projekt DEAL.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13300.

DATA AVA I L A B I L I T Y S TAT E M E N T
The dataset analysed during the current study is already publicly available (details in Appendix S1-S5). The interactive distribution of cleaned occurrences is available at this web application: https://awioza.shiny apps.io/Balee nWhal es/.