Using fisheries observation data to develop a predictive species distribution model for endangered sea turtles

The Eastern Pacific leatherback turtle population (Dermochelys coriacea) has declined precipitously in recent years. One of the major causes is bycatch from coastal and pelagic fisheries. Fisheries observations are often underutilized, despite strong potential for this data to affect policy. In this study, we created a spatiotemporal species distribution model that synthesizes fisheries observations with remotely sensed environmental data. The model will be developed into a dynamic management tool for the Eastern Pacific leatherback population. We obtained leatherback observation data from multiple fisheries that have operated in the Southeast Pacific (2001–2018). A dynamic Poisson point process model was applied to predict leatherback intensity (observation per unit area) as a function of dynamic environmental covariates. This model serves as a tool for application by managers and stakeholders toward the reduction of leatherback turtle bycatch and provides a modeling framework for analyzing fisheries observations from other vulnerable populations and species.


Abstract
The Eastern Pacific leatherback turtle population (Dermochelys coriacea) has declined precipitously in recent years. One of the major causes is bycatch from coastal and pelagic fisheries. Fisheries observations are often underutilized, despite strong potential for this data to affect policy. In this study, we created a spatiotemporal species distribution model that synthesizes fisheries observations with remotely sensed environmental data. The model will be developed into a dynamic management tool for the Eastern Pacific leatherback population. We obtained leatherback observation data from multiple fisheries that have operated in the Southeast Pacific (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). A dynamic Poisson point process model was applied to predict leatherback intensity (observation per unit area) as a function of dynamic environmental covariates. This model serves as a tool for [Correction added on 16 January 2021, after first online publication: additional affiliation added to the author Joanna Alfaro-Shigue; affiliation has been changed to the authors Javier Quinones Davila and David Sarmiento Barturen. affiliation order has been changed from the 10th author.] application by managers and stakeholders toward the reduction of leatherback turtle bycatch and provides a modeling framework for analyzing fisheries observations from other vulnerable populations and species.

K E Y W O R D S
dynamic Poisson process model, habitat-based model, leatherback turtle, Southeast Pacific Ocean
Establishing effective conservation policy to manage sea turtles and other wide ranging species is challenging. EP leatherback turtles inhabit the coastal waters of numerous countries, with varying conservation goals, strategies, levels of enforcement, and fisheries types (e.g., Alfaro-Shigueto, Dutton, van Bressem, & Mangel, 2007;Harrison et al., 2018;Shillinger et al., 2008). Currently, examining areas of bycatch risk for this population has been limited to smaller areas within their range (Alfaro-Shigueto et al., 2007Donoso & Dutton, 2010;Lewison et al., 2014) or to only one type of fishery, such as commercial longlines (Roe et al., 2014).
Given the wide and inherently variable habitat of EP leatherbacks, any statistical model aiming to capture this movement must be able to predict their shifting distributions (Howell et al., 2015;Lewison et al., 2015). One such class of model is the Species Distribution Model, which describes the relationship between a species' presence or abundance and environmental conditions, and can predict how environmental variability may affect species distribution and habitat choice (Elith & Leathwick, 2009). A frequent problem when analyzing a species' distribution is the sole availability of presenceonly data, without any record of species absence (Renner et al., 2015). This can occur with fisheries and opportunistic records when measures of effort are not provided or available (Brodie et al., 2015).
Organism presence, then, must be represented as a realization of a stochastic spatiotemporal process dependent on the relationships between species presence and environmental conditions (Diggle, 2007). Specifically, the in-homogenous Poisson point process model (PPM) defines a general class of models useful for predicting species distribution relative to a dynamic environmental field. These processes model presence-only data as a predictor of species "intensity," or the number of individuals per unit area (leatherbacks km −2 ) per unit time (Brodie et al., 2015;Renner et al., 2015).
In this study, we developed a spatiotemporal PPM that predicts the intensity of fishery-dependent leatherback observations in the southeastern Pacific as a function of environmental covariates. The specific objectives were to: (a) identify environmental factors that were associated with the presence of leatherback observations and (b) extrapolate the spatial variation of leatherback distribution in time to identify areas of bycatch risk. The model presented provides a means of dynamically understanding, predicting, and consequently managing the reduction of leatherback bycatch throughout international waters of the southeastern Pacific. This model framework could be applied to other populations and species vulnerable to bycatch.

| Fisheries data
Leatherback observation data from fisheries operating or having operated in the southeast Pacific were provided by several countries in the East Pacific: from Colombia JUSTSEA Foundation (www.justsea.org); from Peru Areas Costeras y Recursos Marinos (ACOREMA, www.acorema. org.pe), Instituto del Mar del Peru (IMARPE, www. imarpe.gob.pe), and Pro Delphinus (www. prodelphinusperu.org) and finally from Chile Instituto de Fomento Pesquero (IFOP, www.ifop.cl), with records spanning from 1990 to 2018 ( Figure 1, Table 1). We operated under the assumption that the locations and dates of the observations were accurate and data from multiple organizations were equally weighted in the PPM (St. John Glew, Graham, McGill, & Trueman, 2019). The final resolution of the model was 56 km, which corresponds to 0.5 at latitude 17 S. Due to the large geographic scope of the study, as well as the final resolution of the model, minor inaccuracies in the observation records would likely have little, if any, effect on the predicted species distribution.
An important component of point process modeling is the underlying spatial area; extrapolation of the study area can lead to biases in the estimates of resource selection functions, artificial clustering in the prediction and inflated error (Waller & Gotway, 2004). A rectangular region large enough to encompass all point observations was chosen, resulting in a focus area from −90 to −70 East, and −40 to 5 North ( Figure 1). These observations do not necessarily indicate bycatch, only that turtles were observed in areas where fishing was taking place. Further, the fisheries data only represent leatherback presence in areas where fisheries operate, not the entire leatherback distribution for the region.

| Environmental covariate data selection and processing
A number of environmental covariates were used as indicators of leatherback intensity. Based on previous studies (Bailey, Benson, et al., 2012;Benson et al., 2011;Shillinger et al., 2011;Willis-Norton et al., 2015), bathymetry (bathy), sea surface temperature (SST), sea surface height (SSH), Ekman upwelling, and chlorophyll-a concentration (Chl-a) were included as predictors for leatherback presence ( Figure 2). Data for these variables were obtained from the National Oceanic and Atmospheric Administration CoastWatch ERDDAP server (Simons, 2016), with the exception of SSH data, which came from the Copernicus Marine Environment Monitoring Service (Table 2). Environmental data were not available for the full range of years in which leatherback observations were recorded. After removing the years that were missing environmental data, 20% of the data (n = 158) was removed in the truncation, all of the remaining fisheries reported observations collected between July 2003 and November 2017 were used in the analyses (n = 624). Monthly composite environmental data were downloaded for each year from 2003 to 2017. All data were processed in the R software environment using the "raster" package, with each covariate stored as a stack of monthly rasters (Hijmans, 2016;R Core Team, 2020).
High spatial resolution data over a large area can make it computationally intensive and difficult to process  and model resource selection. To keep the processing consistent among covariates, the data for each variable were aggregated over an n × n array of higher-resolution cells, with the array size n estimated to match the raw resolution to a final 56 km resolution, an estimated distance that transiting leatherbacks can cover in a day (Bailey, Fossette, et al., 2012). The aggregated data were resampled to the same grid over the study area using nearest neighbor interpolation (Hijmans, 2016).
Environmental covariates were projected to the Lambertre Azimuthal equal area coordinate system to ensure cells in the final prediction covered the same area across the latitudinal range of the study. In the same manner as previous megafauna movement studies (Hazen et al., 2018;Hoover et al., 2019), temporal resolution of predictions was monthly due to availability of environmental covariates and minor differences in reporting dates and times.

| Model fitting and generation of pseudo-absence data
In order to compensate for the presence-only fisheries data and approximate the likelihood function, pseudoabsence points were generated. The pseudo-absence points were not used as a control mechanism, but in the discretization of the domain to estimate the spatiotemporal intensity of leatherbacks. Thus, they serve as quadrature points, which enable the numerical approximation of the log-likelihood function (Renner et al., 2015). We chose the locations of these quadrature points from a grid based on degrees (WGS 84) that covers the study area. A common difficulty with quadrature points is the unknown association between environmental covariates and species intensity. We included all of the environmental covariates in order to select the resolution of quadrature points, under the assumption that the set of covariates provides a spatial representation of the intensity surface. The number of points was chosen to be large enough to accurately approximate the likelihood function while minimizing computation time. Specifically, in order to determine the optimal number of quadrature points, we generated an approximation to the loglikelihood of model (1) with an increasing number of points, or successively higher resolutions until the loglikelihood converged (Renner et al., 2015). To test the convergence of the log-likelihoods, quadrature points were generated at 16 resolutions-from 0.1 to 6.5using data from 2003 to 2017. Higher resolutions were not generated because of the spatial aggregation of the environmental data to 0.5 .

| Model selection
In order to predict the distribution of leatherback observations from fisheries as a function of the dynamic environmental covariates, a spatiotemporal PPM was built with the response variable (intensity) composed of areaweighted presence data and quadrature points (Brodie et al., 2015). The PPM was implemented using the Berman and Turner (1992) method with a quasi-Poisson distribution and a log link. Nonlinear association was modeled using generalized additive modeling in the "mgcv" package in the R statistical software (Wood, 2011). The likelihood function was approximated using the Berman and Turner (1992) method.
We applied an information criterion approach to model selection. We started with a model with all smooth terms, and systematically created less general, less complex model (Appendix Table S1) by dropping covariates or changing them to linear terms. The model fits were compared using the Akaike information criterion (AIC). Three units' difference in AIC (ΔAIC) were considered statistically significant (Burnham & Anderson, 1998). When two models were not significantly different (ΔAIC <3), the least complex model was selected. This backward model selection process resulted in the following model.

Intensitys bathy
where s is a cubic regression spline term with a 10-dimensional basis. We also applied a forward model selection process by starting with models with one or two smooth terms (Appendix Table S1). None of these simple models had significantly smaller AIC than model (1).

| Model evaluation
Upon final model selection, Pearson residuals were calculated as the scaled difference between the monthly total of observations and the total prediction to evaluate its overall predictive capacity (Baddeley, Turner, Møller, & Hazelton, 2005). Under the assumed PPM, Pearson residuals should be in the pointwise confidence intervals based on their standard errors. While the full range of data (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) was used for the model selection process, the model was evaluated using cross-validation. The model was initially trained using the first half of the data (July 2003-August 2010), monthly prediction was then conducted iteratively for September 2010-November 2017, and Pearson residuals between held-out total observations and monthly predictions were calculated along with the predictive standard errors. Residual outside the pointwise limits indicates poor out-of-sample predictive power.

| Model visualization and interpolation
Intensity predictions were generated for each month from December 2016 to November 2017 using all data prior to December 2016 and based on the selected model. For the purposes of prediction, quadrature points were generated at the chosen resolution for each month prior to December 2016. The predictions were interpolated to a resolution of 3 km for plotting. The Multilevel B-Spline Approximation algorithm (Finley, Banerjee, & Hjelle, 2017) was implemented to provide a finergranularity visualization of the model's predictions.

| Quadrature points
Convergence of the log-likelihoods based upon the resolution of quadrature points was observed at the original resolution (Appendix Figure S1). Computational limitations and the initial aggregation of spatial data for consistency prevented the generation of a larger number of quadrature points, so for the remainder of the modelbuilding process, a spatial resolution of 0.5 was used.

| Model evaluation and predictions
The final model included smooth functions of bathymetry, SST, SSH, and Chl-a. All terms were significant (p-value <.01), and the relationships between leatherback intensity and the environment were nonlinear (Table 3). Leatherback populations exhibit broad preferences of selected habitat indices such as temperature, SSH. Figures 3 and S3 depict these nonlinear partial effects found between intensity and each of the covariates conditional on the remaining variables. Pearson residuals were calculated for each month from September 2010 to November 2017 (n = 87 months). The Pearson residuals for 17% of those months (n = 15 months) fell outside the 95% confidence interval (Appendix Figure S2), indicating that these are "outliers" and poorly explained by the fitted model. Of these 15 outliers, 10 occurred in March through May of various years (Appendix Table S2). Leatherback intensity was predicted to be high along coastal neritic areas of southern Ecuador and Peru (02 -16 S) and in offshore oceanic waters of Chile (24 -34 S) throughout the year and lowest intensities at mid-latitudes in offshore areas of Ecuador and Peru during the austral summer ( Figure 4). Seasonality of intensity was strong in the northern part of the region (lower intensity in late summer/early autumn).

| DISCUSSION
In this study, higher leatherback intensity was predicted in the northern ranges of the study area (closer to the equator) during the austral winter months (June-August) and at higher latitudes during the austral summer months (December-February). There was also relatively high intensity close to the coast from Ecuador to south Peru through most months of the year. A key finding from the observations and predictions was that the spatial distribution of EP leatherbacks was much more coastal than previously found from satellite telemetry tracking studies (Morreale, Standora, Spotila, & Paladino, 1996;Schick et al., 2013;Shillinger et al., 2008;Shillinger et al., 2011). This coastal distribution could be due to prey availability, represented by the Scyohomedusae Chrysaora plocamia (Quinones et al., 2018). This jellyfish species is one of their main prey in Peruvian coasts, according to several necropsies in fresh leatherbacks carried out in Peru (Quiñones, unpublished data). In addition, these findings could be influenced by the coastal distribution of fishing effort, which is more inshore in the north of the region. We hope to address gaps in tracking data through future efforts to equip foraging leatherbacks in the southern hemisphere with satellite tags. Bycatch of leatherbacks had been reported in coastal areas of Central and South America (Alfaro-Shigueto et al., 2011;Roe et al., 2014), but the tracking data of post-nesting EP leatherback females indicated they mainly moved offshore into the South Pacific Gyre (Hoover et al., 2019;Shillinger et al., 2011), where higher levels of pelagic longline leatherback bycatch further supported this offshore movement pattern (Roe et al., 2014). This demonstrates the importance of both fisheries observations and tracking data for informing the development of bycatch reduction measures by management and other stakeholders.
The fisheries observations showed that high EP leatherbacks intensity was predicted within the Peruvian coastal upwelling region along the South American coast, at the edge of the South Pacific subtropical gyre, and in the South Pacific subtropical convergence zone . The upwelling region is highly productive, and there are sharp gradients in productivity at the boundaries of the South Pacific subtropical gyre that could aggregate gelatinous zooplankton prey (Quinones et al., 2018;Saba et al., 2008). Bathymetry, SST, SSH, and Chl-a were all found to be significant nonlinear T A B L E 3 Approximate significance of smooth terms in the model where "edf" denotes estimated degrees of freedom and "F" denotes the F-tests for each term. Note that this is using the original quadrature points, in which EKU was included as a covariate predictors of leatherback intensity. Incorporating subsurface environmental variables might improve the explanatory power of the model (Schick et al., 2013). The leatherback turtle intensity was found to be highest in association with relatively high Chl-a (approximately 1-11 mg/m 3 ) close to the coast where there is strong upwelling . This is in contrast to a previous study using satellite telemetry data that found leatherback preferred habitat was associated with low Chl-a concentration, mainly caused by their movement into the oligotrophic waters within the South Pacific subtropical gyre (Willis-Norton et al., 2015). This difference may result from the limited demographics represented by the satellite telemetry data. As with other sea turtle species, satellite-tagged leatherbacks have mainly been nesting adult females, and the tag duration is generally several months (Godley et al., 2008), skewing toward higher resolution during the internesting and initial postnesting dispersal (migration) periods. The resulting telemetry movement patterns have been restricted to discrete small coastal internesting zones (Shillinger et al., 2010), which some turtles occupied for several months, as well as vast pelagic areas of the South Pacific Gyre, occupied during periods of dispersal (migration and foraging) (Shillinger et al., , 2011. These areas encompassed regions that were deficient in fisheries observations. Further, there was variation in the level of demographic information included with the fisheries reports of individual observations, but the curved carapace length data available (n = 120, Xc = 127 cm, S = 24 cm, range = 58 cm-210 cm) indicated multiple size and age classes and potentially both sexes. The fisheries reported leatherback observations highlight the importance of considering life stages to accurately assess their distribution and potential threats. The observation data used in this study were unable to distinguish between individual turtles and only in occasional cases recorded their sex, size, or movement behavior. The observations were also unable to separate simple sight contact by fishing vessels from direct interactions (e.g., bycatch); instead, each observation point was assumed to be at high risk of bycatch due to the overlapping spatial use of the turtle and the reporting fishing vessel. Therefore, the predicted areas of high leatherback intensity are considered regions with potentially high, dynamic, seasonal risk.
Our approach to use data from multiple fisheries and jurisdictions enabled the creation of a model that could predict areas of high leatherback bycatch potential using environmental covariates, making such data vital for continued conservation efforts. However, region-specific bias could have been introduced if opportunistic reporting processes were also geographically varying. Patchiness of the multiple studies could also introduce confounding factors between dynamic conditions and geographic locations. For example, the complexity of the relationship between fishery observations and bathymetry could result from high intensity of IFOP observations. Removing these data, however, will reduce observer coverage and limit data availability within the equatorial EP and the northern extent of the EP leatherback range. Our model was unable to fully utilize geographical locations to detect regional structure of spatial distribution or resource selections. We applied a log Gaussian Cox Process model to incorporate locations in PPM, but the resulting prediction was biased toward observation "hotspots" (data not shown). To fully describe the population's movement dynamics and supplement these fisheries observations, our model should be used alongside multiple data streams for management. When presence-absence data collected via systematic probability surveys are available, geographic locations could be modeled to account for latent spatially structured variables, such as observer biases or affinity for breeding locations Illian et al., 2013). For leatherbacks, telemetry studies provide more common alternative sources to incorporate geographic locations and correct for observer biases in fishery observations.
There was significant temporal variability in the leatherback distribution associated with dynamic ocean conditions, we used cross-validation to evaluate how well the model could nowcast or forecast. While the model is viable, 17% of the months for which residuals were calculated were outliers. Of these, approximately 66% of them occurred in the period between March and May, potentially indicating seasonal differences in the predictive power of the model. Alternatively, since the observation data contain no record of fishing effort, these outliers could arise from abnormal fisheries observations during these months. Additional information on fishing effort, F I G U R E 4 Monthly predictions of leatherback turtle intensity from December 2016 to November 2017. Intensity is reported is per 3 km raster cell per month. The deep red areas are associated with areas of highest intensity, while paler areas indicate lower leatherback intensity per unit area along with more systematic reporting of turtle size and sex would aid in understanding ontogenetic changes and gender differences in habitat use that would further inform our model.
The current EP leatherback species distribution estimate used by managers is extremely broad (Wallace et al., 2013), in accordance with the highly migratory nature of this species. Our model indicates leatherback distributions vary throughout the study region, which enables managers to focus on their particular region or area of interest and influence. Near real-time monthly prediction maps can be generated through this modeling approach, which will provide important information for Regional Management Plans, and these up-to-date maps can serve as a decision support tool that can be implemented in a dynamic ocean management approach to support leatherback conservation (e.g., Howell et al., 2015).
Dynamic ocean management is an emerging approach to conserving highly migratory species whose movements may be influenced by nonstatic environmental conditions, such as SST, chlorophyll levels, fronts and upwelling zones (Dunn, Maxwell, Boustany, & Halpin, 2016;Hobday et al., 2014;Lewison et al., 2015). Statistical models can help define areas of seasonal high abundance, as well as incorporating other sources of information, to help effectively inform bycatch risk assessments and reduce bycatch without impeding fisheries operations (Dunn et al., 2016;Howell, Kobayashi, Parker, Balazs, & Polovina, 2008). Areas of risk can further be divided into "zones," with high risk areas having high intensity of an endangered marine animal, such as the EP leatherback turtle (Hazen et al., 2018). Our predictions of leatherback turtle hotspots as a function of changing ocean conditions will be a complementary model to a "South Pacific TurtleWatch" tool that was based on movements of leatherback turtles collected from satellite telemetry (Hoover et al., 2019). Both predictions were based on similar environmental covariates and spatiotemporal resolutions. These monthly model predictions based on the environmental conditions will help inform fisheries and other stakeholders of the distribution and high risk areas for this critically endangered population of leatherback turtles to aid in the development of dynamic approaches for reducing fisheries interactions.
Our model results highlight the importance of coastal regions, areas that may previously have been underestimated based on existing telemetry data. The high risk of injury or death from bycatch for these turtles means strategies to reduce fisheries interactions should be prioritized for this endangered population (Alfaro-Shigueto et al., 2007Ayala, Ortiz, & Gelcich, 2019;Shillinger et al., 2008). Such efforts will require collaborative efforts amongst stakeholders, including local communities, fisheries, managers, scientists, and conservationists (Lewison, Crowder, Read, & Freeman, 2004), as well as trade-offs that allow sustainable fisheries harvest while minimizing interactions with leatherback turtles and other marine species (Lewison et al., 2015). The statistical technique we used could be readily applied to fisheries observations of other bycatch-vulnerable marine species to support an ecosystem approach. Liang, Aimee L. Hoover, and Helen Bailey: Conducted data processing and analyses and wrote the first manuscript draft with contributions from George L. Shillinger and Nicole Barbour. All authors revised the manuscript after peer review and approved the final version.

DATA AVAILABILITY STATEMENT
The environmental data that support the findings of this study are available on request from the corresponding author (D. L.). Restrictions apply to data provided by participating organizations that are requested to remain confidential. Due to the nature of this research, the fisheries observation data are not publicly available due to restrictions that could compromise the privacy of the participating organizations.

ETHICS STATEMENT
There was no interaction with human or lab animal.