Distribution of cetacean species at a large scale ‐ Connecting continents with the Macaronesian archipelagos in the eastern North Atlantic

To describe distribution patterns and species richness of cetaceans along a wide geographical range using occurrence data coupled with survey effort, from poorly studied oceanic areas. Specific objectives were to compare species richness and relative abundances among sub‐regions and to describe the distribution of each species.


| INTRODUC TI ON
Knowledge on distribution patterns of marine species is essential for efficient marine management and biodiversity conservation. While some areas are well-surveyed, the vast majority of the ocean is still lacking baseline data or is insufficiently surveyed to permit a good level of understanding of species diversity and distribution patterns.
Observation platforms of opportunity (OPOs) have been widely used to monitor cetacean presence, allowing the sampling of remote areas, such as the high seas, over long periods of time. This methodology has limitations, for example heterogeneous effort conditioned by the routes, schedules and logistics of the platform of opportunity, usually with a restricted spatial coverage of the study area. However, it is frequently the most cost-effective method to generate baseline data, allowing the collection of valuable data that would otherwise be difficult or impossible to obtain (Aïssi et al., 2015;Alves, Ferreira, et al., 2018;Correia et al., 2015;Evans, Hammond, 2004;Kiszka et al., 2007;Morgado, Martins, Rosso, Moulins, & Tepsich, 2017;Moura et al., 2012;Tobeña et al., 2016;Viddi et al., 2010).
The CETUS Project is a monitoring programme that records cetacean species occurrence in the eastern North Atlantic (ENA). Since 2012, cargo ships from a Portuguese maritime transport company, TRANSINSULAR, have been used as OPOs. On-board observers are trained in cetacean detection and identification, and do not have other duties. Moreover, data collected are effort-based as the survey effort is also recorded, which is fundamental to provide reliable information on distribution and relative abundance, especially when effort is highly heterogeneous and survey activity is conditioned by the weather (Correia et al., 2015;Evans, Hammond, 2004). The project has resulted in a large dataset of cetacean occurrence records across the ENA (Correia, Gandra, et al., 2019).
We provide a descriptive analysis of spatial and temporal patterns in cetacean distribution and species richness, using effort-based data collected within the ENA, from 2012 to 2017, with high survey effort in the open ocean. We identified areas with the highest relative abundance and species richness, which may be priority areas for future research and conservation efforts, and compared species richness and relative abundances among sub-regions (Iberian Peninsula, Azores, Madeira, Canaries, Cape Verde, north-western Africa and international waters).

| Study area
The Canary Basin is characterized by a complex geography, including the existence of several archipelagos (Azores, Madeira, Canaries and Cape Verde) that emerge from deep waters, structures such as seamounts and a rugged coastline along the continents of Europe and Africa. It is a very dynamic region, affected by several important knowledge on cetacean distribution at a large scale in the eastern North Atlantic, relevant to future conservation management.

K E Y W O R D S
cetaceans, CETUS Project, distribution patterns, distribution range, effort-based data, high seas, relative abundances oceanographic features, including the North Atlantic subtropical gyre, and is bounded by the Azores Front (separating the anticyclonic eastern subtropical gyre from the northern cyclonic subpolar gyre), and the Cape Verde Frontal Zone (separating the nutrient-rich South Atlantic Central Waters from cooler North Atlantic Central Waters) (Zenk, Klein, & Schroder, 1991). North-easterly trade winds help maintain the strong upwelling system in north-west Africa, one of the major Eastern Boundary Upwelling Systems (EBUS) of the world (Mason, 2009)-biologically productive marine regions covering less than 1% of the world's ocean but supporting up to 20% of the world's capture fisheries (Pauly, Christensen, 1995).
The transects sampled cross a broad range of ocean habitats, including different topographic systems (continental platform, abyssal plains, steep slope, seamounts and canyons) and a diversity of oceanographic features, including four major currents (Portugal, Azores, Canary and Mauritania currents) and several mesoscale eddies (Mason, 2009).
To analyse cetacean occurrence by sub-regions within the area, we defined the spatial limits for each sub-region of analysis based on the Exclusive Economic Zones (EEZs) (Iberian Peninsula, Azores, Madeira, Canaries, Cape Verde, north-western Africa), thus also delimiting international waters ( Figure 1).

| Data collection
Dedicated trained observers followed a standardized protocol for cetacean monitoring along line-transect surveys, aboard cargo ships from TRANSINSULAR (Correia, Gandra, et al., 2019;Correia et al., 2015), which were used as OPOs. The company operates routes for cargo transport between Continental Portugal and Macaronesian archipelagos, with stopovers in the north-west Africa.
Between 2012 and 2017, three routes were monitored: Continental Portugal to Madeira (starting in 2012, hereafter Madeira route), Azores (starting in 2014, hereafter Azores route) and Cape Verde (with stopovers in the Canary Islands, Mauritania and Senegalstarting in 2015, hereafter Cape Verde route). On two occasions in 2016, the Cape Verde route included a transect to the north-west Spain, although the track was crossed on-effort only once, due to weather conditions. Each trip followed one of these routes and accommodated two MMOs. Observers stood on the wings of the navigation bridge (at an approximate height of 15 m, measured from sea level, considering maximum draught) looking for cetacean presence, from sunrise to sunset. Normally, the two MMOs each covered 90º (from 0º to ± 90º relative to the heading), from opposite sides of the vessel. When one MMO was resting, as detailed below, the lone MMO covered 180º. MMOs switched side every hour to reduce fatigue. Monitoring was performed mainly by naked eye; binoculars (7 × 50 mm, fitted with a scale and compass) were used for occasional scans (approximately every 5 min) and to support the collection of the data (e.g. to detect vessels and for species identification).
Survey effort stopped at sea state or wind state higher than 4 (on the Douglas or Beaufort scales, respectively), when visibility was lower than 1 km, during heavy rain, and whenever observers were not allowed in the navigation bridge (e.g. during manoeuvres, safety drills or cleaning of the deck). MMOs rested in turns for an hour each at mealtimes (lunch and dinner), and optionally for additional periods of approximately 40 min (in the morning and in the afternoon).
Sightings collected off-effort (i.e. when survey effort had to stop for any of the aforementioned reasons) were considered to be opportunistic and were not included in the present analysis. Weather state was assessed at the beginning and end of each survey leg (defined as a continuous period of sampling, usually a day from sunrise to sunset), or whenever it changed significantly. The number of vessels, by size category (small, medium or large), visible over 360 degrees around the observation stand, was registered at the beginning and end of the survey leg, every hour and following each cetacean F I G U R E 1 Sub-regions for the analysis, considering the limits of the Exclusive Economic Zones in the study area. EEZ, Exclusive Economic Zone; IP, Iberian Peninsula; Az, Azores archipelago; Mad, Madeira archipelago; CI, Canary Islands archipelago; CV, Cape Verde archipelago.
sighting. Whenever a cetacean was spotted, if possible, observers recorded species identity, the distance and angle of the position of the animal(s) in relation to the ship (using the scale and compass of the binoculars), the number of animals within the group, their reaction (if any) to the ship and direction of travel. Due to occasional difficulties in determining the exact number of animals present, the minimum and maximum group sizes as well as a best estimate (from the observers' perspective) were recorded. Sightings of other top predators (e.g. turtles, sharks and sunfishes) were also registered.
The route was recorded using a tablet with an inbuilt GPS (points along the track were automatically added, every 10 s or every 50 m), and all the waypoints were marked. In the data analysis, the GPS position of the ship at the moment of the sighting was used, as well as the best estimate for the group size.

| Data analysis
Encounter rates were calculated as the number of cetacean sightings (the all species total and by species) recorded on-effort per 100 km.
Yearly and monthly information on total effort, number of sightings, overall encounter rates and number of species, as well as encounter rates for each species by year and by each of the defined subregions, are provided in Table S1 and Figure S1.
For all cells with non-zero effort in a grid of 100 × 100 km, total effort (total distance surveyed within the cells), overall cetacean encounter rate (total sightings of cetaceans on-effort per 100 km) and the total number of species identified (at least to the genus level) were calculated. This was done for the whole study period over the surveyed calendar months (February and March, May to December).
The 100 km grid was chosen after testing different spatial resolutions. It provided a suitable sample size for statistical analysis, allowed identification of broad-scale patterns and was suitable for data visualization while also avoiding zero inflation. Distance surveyed on-effort was calculated based on the tracks recorded by the GPS, by transforming the set of on-effort points along the track into lines (the effort tracks) and measuring the distance covered by those lines.
In order to provide an indication of the adequacy of the current level of search effort, we first checked the relationship between encounter rate and survey effort ( Figure S2) and then used generalized additive models (GAMs) (Hastie & Tibshirani, 1990) to model number of sightings and number of species in relation to effort. As we expect density and diversity of cetaceans to depend on sea depth and distance to coast (e.g. because shelf species will be replaced by oceanic species as one moves offshore), we also included these variables as covariates. Depth was obtained from bathymetry data in GEBCO (GEBCO, 2017), and distance to coast was calculated using ArcGIS 10.5 (ESRI 2016). These two environmental variables were extracted for the position of the centroid of each cell. They had strong effects on the distribution of the eight most sighted species as revealed by the principal component analysis (PCA) described below. Therefore, the following models were fitted: number of sightings ~ s(effort) + s(distance to coast) + s(depth), and number of species ~ s(effort) + s(distance to coast) + s(depth). Considering that the response variables were counts, we first tested a Poisson distribution (with a log link function). We then checked for overdispersion. Dispersion was adequate for the "number of species model" (0.92) but there was overdispersion for the "number of sightings" model (2.56). As such, for the latter, we fitted a negative binomial distribution (with a log link function). The smoothers obtained essentially depict rarefaction curves.
Model fitting started by including the three explanatory variables, considering only main effects, followed by backward selection (Quian, 2009). Best models were chosen by using the Akaike information criterion (AIC) as a measure of goodness-of-fit and at each step of model selection, comparing between models that differed in one explanatory variable (i.e. with or without the least significant variable). We retained the model with the lowest AIC value or the simplest model when AIC values differed in less than 2 (following the principle of parsimony, e.g. Burnham & Anderson, 2002).
We verified that there were no influential data points or relation- Considering these taxa, the cetacean community composition (in terms of relative abundances and percentage relative contribution), as well as the monthly presences, was represented for each sub-region. Maps of sightings distribution along tracks were created for these eight taxa and are presented in Figure S3.
To describe and compare species according to their geographical distribution and coastal or oceanic occurrence, we considered four factors (henceforth "species distribution factors" (SDFs)): depth, distance to coast, latitude and longitude. To delimit and characterize the surveyed area, a set of points was created, with a point generated every 5 km within effort tracks (Correia et al., 2015). The SDFs were extracted for this set of points.
Summary statistics were calculated for the group size of each species, as well as for the SDFs at the position of the sightings. Values of the quantiles of the distributions for each SDF are presented for each species (see Table S2) and then illustrated with boxplots. To compare the extent and the location of species distributions in the study area (conceptually equivalent to deriving the niche width and niche centre), we applied PCA to the data on the four SDFs (see Fernández et al., 2013). PCA projects data into a lower dimension subspace and is therefore commonly used to search for the linear combination of variables that describe most of the variability on the original data. Moreover, it provides a measure of influence from each of the factors to the principal components (the eigenvalue), which in this case allowed for a better understanding of the most determinant factors in the distribution of cetacean species.
Prior to PCA, we first standardized the data by subtracting the mean value of each variable and dividing by the standard deviation for all data points. We ran the Kaiser-Meyer-Olkin (KMO) test and Bartlett's test of sphericity to diagnose for sampling adequacy.
Although the overall KMO test result was 0.51, very close to the 0.5 threshold considered for eligible data to run a factor analysis, Bartlett's test result was significant (p < .001), indicating that PCA was an adequate and useful test for the dataset. Then, for the most important principal components (PC) (those that together account for more than 75% of the total accumulated variation explained), we used boxplot graphs to represent the quartiles of the PC scores

| Overall distribution of effort, encounter rates and species richness
Survey effort was concentrated in summer and early autumn (July  (Table 1 and Table S1).
In general, the areas with the highest survey effort were in offshore waters between Continental Portugal and Madeira and Azores, where a high diversity of species (up to 11 species per 100 km 2 ) was observed ( Figure 2). The highest encounter rates were registered elsewhere: for example, close to Continental Portugal and West Africa, and near the Macaronesian archipelagos ( Figure 2). Encounter rate was independent of survey effort, consistent with a sufficient amount of effort for reliable estimates of relative abundance ( Figure   S2). As for the GAMs, depth was dropped from the final model for Confidence intervals for the smoothers were wide at high values of effort and distance to coast due to the low number of sampled grid cells with high survey effort or very distant from the coast (Figure 3).

| Analysis of the cetacean community composition by sub-region
In all sub-regions, the sightings of the eight most frequently sighted species made up 40% to 50% of total sightings, except in the Cape The encounter rate for pilot whales was highest in the Cape Verde EEZ, where they were the most frequently sighted species. The second highest encounter rate for this genus was recorded in the EEZs of north-western Africa. They were rarely seen in the remaining sub-regions and never sighted in international waters (Figure 4).
The north-western Africa sub-region had the highest number of species registered (21) and the highest encounter rates for 11 out of the 21 species. The highest overall encounter rate was registered in the Azores EEZ. In international waters, 16 species were recorded, and the overall encounter rate was approximately 1.12 sightings/ 100 km. Almost 20% of the survey effort was undertaken within these waters (Figure 4 and Table S1).
Regarding temporal patterns, six of the eight most frequently sighted species were seen in international waters every month from July to October but were not seen outside this period. Of the two exceptions, sperm whales were absent in September, while pilot whales were never seen in international waters. In the Canary Islands and Cape Verde, survey effort was low, and the presence of most species was restricted to a few months, although Atlantic spotted dolphin was seen from June to November in the Canary Islands ( Figure 5).

| Distributions of the most frequently sighted species
Surveys covered a wide range of habitats in the study area. The most frequently surveyed areas were in deeper waters, at distances up to  Table S2).
In the PCA, the first two PCs together explained of 78.3% of variation. The SDFs that contributed the most to PC1 were depth and distance to coast, while PC2 was mainly related to the geographical SDFs (latitude and longitude). Species with higher PC1 scores are found in deeper waters and further from the coast and the species with higher PC2 scores occur more in northern and eastern regions of the study area (Table 2).
Common and bottlenose dolphins had similar PC1 scores but are significantly different according to PC2, essentially confirming the distribution described above and evident in Figure 6. The two species of the genus Stenella sp. had similar scores on both PCs, significantly different from those of bottlenose and common dolphins.
The PCA results also highlight the similarity of pilot whale and sperm whale distributions (Figures 6 and 7 and Tables S2 and S3).

| D ISCUSS I ON
The number of species reached a plateau at a high amount of effort (approximately 3,000 km per 100 km 2 ), while, as expected, number of sightings increased with effort. Even though the overall survey effort in this study was high, it was spatially heterogeneous.
Consequently, over much of the study area (i.e. the parts with less surveyed effort), the confidence intervals around estimates of relative local abundance and local cetacean species diversity are wide. In less surveyed areas around the globe, such as offshore waters, cetacean abundance and species richness are likely to be underestimated.
The lower availability of nutrients may limit pelagic community productivity and biodiversity further offshore, while the increasing separation of seabed and photic zone limits the productivity of demersal and benthic communities in deeper waters (Mason, 2009). F I G U R E 4 Cetacean community composition in each sub-region defined, highlighting encounter rates and percentage relative contribution for the eight most frequently sighted species. Pie charts illustrate the encounter rates and percentage of contribution of the most frequently sighted species (identified, at least, to the genus level) for each sub-region (defined in Figure 3). Occurrences with associated species were used to calculate the encounter rate of both taxa only if at least one of the taxa sighted was among the eight most frequently sighted species over the whole study area. ER, encounter rate (sightings per 100 km); sp, species; MFS, most frequently sighted; EEZ, Economic Exclusive Zone; IP, Iberian Peninsula; Az, Azores archipelago; Mad, Madeira archipelago; NWA, north-west Africa; CI, Canary Islands archipelago; CV, Cape Verde archipelago; IW, international waters recorded offshore and in very deep waters. Previous analysis using the CETUS dataset from 2012 to 2016 showed that the species presents clear core areas of occurrence, related to specific environmental conditions (e.g. coastal colder waters related to strong coastal upwelling systems) . Within its range in the ENA, the northern Continental Portugal remains a poorly studied area.
Bottlenose dolphins preferred shallower waters in areas closer to the coast, but also extended over a very wide range of depths, being frequently recorded in the high seas. Genetic studies have shown that resident populations in Galicia and the Sado Estuary are likely to have a strong degree of genetic isolation from the populations in the archipelagos and non-resident individuals. On the other hand, there is high gene flow among the Iberian archipelagos . Transient individuals have been identified in the archipelagos of Madeira and Azores Dinis, Carvalho, et al., 2016;Silva et al., 2014), and some individuals from resident populations in Iberia Peninsula were found to undertake long-distance movements .  (Alves, Ferreira, et al., 2018;Silva et al., 2014) suggests that international waters are even more important during this season.  (Baines & Reichelt, 2014;Camphuysen, van Spanje, & Verdaat, 2012). North-west Africa is a hotspot area for the species, where it has an important role in ecosystem functioning (Morissette, Kaschner, & Gerber, 2010). Several marine management issues, mostly related to inefficient management of fisheries, exist in the EEZs of north-western Africa (Nagel & Gray, 2012). As, according to our results, sperm whales seem to occupy areas closer to the coast, it is likely that their area of occupancy overlaps with areas of intensive fishing, which can have negative consequences for both the animals and the economic activity (Karpouzli & Leaper, 2004;Richard, Guinet, Bonnel, Gasco, & Tixier, 2017;Tixier et al., 2019 The development of cost-effective monitoring programmes in high seas areas, for example based on the use of OPOs, would help ensure continuity of monitoring to underpin long-term management (Aïssi et al., 2015;Alves, Ferreira, et al., 2018;Correia et al., 2015;Evans, Hammond, 2004;Kiszka et al., 2007;Morgado et al., 2017;Moura et al., 2012;Tobeña et al., 2016;Viddi et al., 2010). Nevertheless, it is important to acknowledge the limitations of non-dedicated surveys. Thus, in the present study, monitoring was limited by the company's schedule and routines.
Surveyed routes are thin lines crossing a very wide area, with survey effort covering only a subset of the habitats in the region.
Moreover, as in all marine campaigns, survey effort was also con- Another challenge is dealing with the dynamism of cetacean distribution related to their life history, migration and movements, which may call for dynamic marine-protected areas. This in turn requires adaptive marine management (Hooker et al., 2011) and is probably not yet feasible in EU waters. Ultimately, to ensure the conservation of species, it would be desirable to define year-round protected areas for all the core habitats of those species (even if they are only used/ preferred during a specific season). Moreover, besides knowledge on occurrence, abundance and habitat use, the assessment of threats (i.e. by-catch, entanglement, collision), at least in core areas of occurrence, is also essential to design specific conservation measures for effective marine management .
We have to recognize the gap between monitoring and mitigation, and specifically that we cannot solve or provide solutions for all the challenges of marine management and conservation in the high seas. Effective measures in offshore waters, and specifically in areas beyond national jurisdiction, are limited by logistic and political factors (Bohorquez et al., 2019). Nevertheless, the present work may be useful for the design of future dedicated campaigns, to efficiently construct a monitoring programme including both areas within the EEZs and in international waters and to support conservation and management efforts in the area. The CETUS Project is ongoing and aims to continue providing updated and reliable data, such as effort-based relative abundances, that could be used as indicators for management purposes (e.g. Marine Strategy Framework Directive), and to construct a long-term dataset. Moreover, this effort-related dataset is key to understanding the distribution of cetaceans in the area and should permit the development of ecological niche models and enable prediction of the consequences of future climate change scenarios for these species, in support of the European agenda for the conservation of marine ecosystems.

ACK N OWLED G EM ENTS
We thank the contribution and dedication of all the volunteers during the monitoring campaigns. We are extremely grateful to TRANSINSULAR, the cargo ship company that provided all the logistic support, and to the ship's crews for their hospitality. We also

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in VLIZ, and distributed by OBIS and EMODnet, at https://doi. org/10.14284/ 350; see also the associated data paper (Correia, Gandra, et al., 2019). Moreover, supplementary material is provided for a complete and detailed description of the dataset.